Add an opt-in --record flag that writes each chat session to disk as OpenAI-shaped JSONL, the same format the coding-agent session tools already read. This turns AFM runs into transcripts you can search, analyze, and replay with existing tooling. It covers AFM's own inference (Foundation and MLX) to start, is off by default, and never blocks a request.
(I am testing this in my repo already, will send a PR when I think it's ready and if you are supportive of approach)
@Keesan12 reviewed this with real depth and several of their points changed the design. The strict versioned first line and the per-line seq index both come from their feedback, as does the provenance-over-text framing (backend, turn index, status, schema version matter more than the message text once you're auditing or replaying). The deferred list below is mostly their suggestions.
Motivation
Logging agent calls is proving valuable for developers and end users. A healthy set of tools now read session logs and give you search, analytics, token accounting, and replay over them: agentsview (kenn-io/agentsview, one I use and contribute to), ccusage, and others. They all consume the JSONL transcripts.
AFM has no way to generate those logs outside of debugging, so users miss the benefit and there's no record to point these tools at. I propose AFM write the same JSONL shape those tools already read, so a local AFM session lands in a format that tooling already understands. (Format compatibility isn't the same as auto-ingestion: getting AFM sessions to show up in a given tool, with the right agent identity and per-message usage, is a small consumer-side step. That's mine to handle on the agentsview side, and it's separate from AFM's format.)
Note: I think the opportunity is much bigger in gateway mode, where AFM fronts the Foundation model, MLX, and proxied backends under one surface. That makes it the best place to capture local traffic in one consistent format. This proposal starts with AFM's own inference, Foundation and MLX. Recording the gatewayed backends is the natural next step.
Summary
--record is available on afm, afm serve, and afm mlx. When set, the server writes one <sessionId>.jsonl file per session to a transcript directory (--transcript-dir, default ~/.afm/sessions). Without the flag, no recorder exists and no directory is touched.
Each file is one JSON object per line:
- A
session_meta first line: schema_version, session id, model, timestamp, and the identification fields below.
- One line per request message (
system/user/assistant/tool), preserving tool_calls, tool_call_id, and name.
- One
assistant line per completed turn: content, reasoning when present, tool_calls, finish_reason, and usage.
Every line also carries a seq, a monotonically increasing index, so a consumer never has to infer turn order from timestamps. schema_version and seq are additive: lenient parsers ignore fields they don't know, so they don't break existing consumers.
Framework identification. A recorded session should mark itself AFM-produced, so a token-usage leaderboard or similar tool can tell an AFM run from another agent's. The meta line already has platform: "afm"; it should also carry afm_version and backend (foundation, mlx, or the gateway backend name), stamped on every assistant line too so the identity survives when a tool reads individual turns.
Session identity. Resolves in priority order: an X-Session-Id header, the OpenAI user body field, then a content-stable id from the first user message. Clients that want stable grouping set the header.
Non-blocking, best-effort to start. Logging must never slow down or fail a request: it runs after the response, errors are logged and swallowed, partial and cancelled streams are skipped. Durability can improve over time without changing that guarantee.
Scope
Record AFM's own inference, streaming and non-streaming: the Foundation model under afm/afm serve, and MLX under afm mlx.
Gateway-proxied backends (Ollama, LM Studio, Jan) are out of scope for now. They take a separate proxy path that returns before the recorder ever runs, and the proxied response is an opaque stream the recorder would have to parse and re-emit. Logging it safely without violating the non-blocking rule needs its own design. That's the next step once that design is worked out, not part of this feature.
Configurability
Recording should be controllable at the granularity of what serves a request. The server already resolves the model id and backend name before recording, so these are filters at that point, not new plumbing. (These flags are proposed; only --record/--transcript-dir are built.)
- Per-model filter.
--record-models <glob,...> / --record-exclude <glob,...> against the resolved model or backend name. Record only the model you're evaluating, or skip a noisy one. The server never writes the excluded bytes, which is the point for volume and privacy.
- Embeddings excluded by default. The main server also serves
/v1/embeddings, but embeddings are high-volume vector lookups, not sessions. The filter can name them back in.
- Per-request override. An
X-Record: off/on header overrides the server default for one call (matching X-AFM-Profile/X-Session-Id), to skip a throwaway probe or a sensitive prompt.
- Per-instance. Already possible: separate
afm instances with different --transcript-dir values give isolated stores.
Out of scope for now, each deferred for a specific reason:
- Per-endpoint toggles. The backend that serves a request, not the route, is the unit worth controlling, and the per-model filter already covers that.
- Redaction / sampling. Needs a content-rewriting design that shouldn't hold up basic recording, and gets safer once there are real transcripts to test against.
- Retention / rotation. A file-lifecycle concern that's orthogonal to capture. The files are plain JSONL an operator or a cron job can manage today.
Deferred
These came out of @Keesan12's review. I agree with the direction and want them in eventually, just not in the first version:
- Terminal event for cancelled/errored runs. Writing nothing is indistinguishable from recording being off. Recording it means hooking the cancel/error paths the recorder is deliberately kept out of, and a new record type is a change consumers have to handle. Worth doing once the mechanism is settled.
- Raw + normalized tool calls. Keep both the normalized shape and the raw provider payload. Caveat: AFM normalizes some formats internally before the recorder sees them, so "raw" means the payload as AFM received it, not the model's literal original.
- A dedicated logging facility and schema. If this gets adopted, a purpose-built logger with its own schema likely beats bolting fields onto the current shape. Better designed against real usage than up front.
- Gateway-proxied recording (see Scope) and the per-endpoint / redaction / retention items (see Configurability) remain deferred for the reasons noted there.
Why opt-in and off by default
Local inference is the privacy story, and silently writing every conversation to disk would break it. --record is off unless asked for and stays out of the request path when absent. Operators who want it always on could set a default via an AFM_RECORD=1 env var (matching AFM_DEBUG/AFM_PERF), but the shipped binary stays off by default.
Test plan
Add an opt-in
--recordflag that writes each chat session to disk as OpenAI-shaped JSONL, the same format the coding-agent session tools already read. This turns AFM runs into transcripts you can search, analyze, and replay with existing tooling. It covers AFM's own inference (Foundation and MLX) to start, is off by default, and never blocks a request.(I am testing this in my repo already, will send a PR when I think it's ready and if you are supportive of approach)
@Keesan12 reviewed this with real depth and several of their points changed the design. The strict versioned first line and the per-line
seqindex both come from their feedback, as does the provenance-over-text framing (backend, turn index, status, schema version matter more than the message text once you're auditing or replaying). The deferred list below is mostly their suggestions.Motivation
Logging agent calls is proving valuable for developers and end users. A healthy set of tools now read session logs and give you search, analytics, token accounting, and replay over them: agentsview (kenn-io/agentsview, one I use and contribute to), ccusage, and others. They all consume the JSONL transcripts.
AFM has no way to generate those logs outside of debugging, so users miss the benefit and there's no record to point these tools at. I propose AFM write the same JSONL shape those tools already read, so a local AFM session lands in a format that tooling already understands. (Format compatibility isn't the same as auto-ingestion: getting AFM sessions to show up in a given tool, with the right agent identity and per-message usage, is a small consumer-side step. That's mine to handle on the agentsview side, and it's separate from AFM's format.)
Note: I think the opportunity is much bigger in gateway mode, where AFM fronts the Foundation model, MLX, and proxied backends under one surface. That makes it the best place to capture local traffic in one consistent format. This proposal starts with AFM's own inference, Foundation and MLX. Recording the gatewayed backends is the natural next step.
Summary
--recordis available onafm,afm serve, andafm mlx. When set, the server writes one<sessionId>.jsonlfile per session to a transcript directory (--transcript-dir, default~/.afm/sessions). Without the flag, no recorder exists and no directory is touched.Each file is one JSON object per line:
session_metafirst line:schema_version, session id, model, timestamp, and the identification fields below.system/user/assistant/tool), preservingtool_calls,tool_call_id, andname.assistantline per completed turn:content,reasoningwhen present,tool_calls,finish_reason, andusage.Every line also carries a
seq, a monotonically increasing index, so a consumer never has to infer turn order from timestamps.schema_versionandseqare additive: lenient parsers ignore fields they don't know, so they don't break existing consumers.Framework identification. A recorded session should mark itself AFM-produced, so a token-usage leaderboard or similar tool can tell an AFM run from another agent's. The meta line already has
platform: "afm"; it should also carryafm_versionandbackend(foundation,mlx, or the gateway backend name), stamped on everyassistantline too so the identity survives when a tool reads individual turns.Session identity. Resolves in priority order: an
X-Session-Idheader, the OpenAIuserbody field, then a content-stable id from the first user message. Clients that want stable grouping set the header.Non-blocking, best-effort to start. Logging must never slow down or fail a request: it runs after the response, errors are logged and swallowed, partial and cancelled streams are skipped. Durability can improve over time without changing that guarantee.
Scope
Record AFM's own inference, streaming and non-streaming: the Foundation model under
afm/afm serve, and MLX underafm mlx.Gateway-proxied backends (Ollama, LM Studio, Jan) are out of scope for now. They take a separate proxy path that returns before the recorder ever runs, and the proxied response is an opaque stream the recorder would have to parse and re-emit. Logging it safely without violating the non-blocking rule needs its own design. That's the next step once that design is worked out, not part of this feature.
Configurability
Recording should be controllable at the granularity of what serves a request. The server already resolves the model id and backend name before recording, so these are filters at that point, not new plumbing. (These flags are proposed; only
--record/--transcript-dirare built.)--record-models <glob,...>/--record-exclude <glob,...>against the resolved model or backend name. Record only the model you're evaluating, or skip a noisy one. The server never writes the excluded bytes, which is the point for volume and privacy./v1/embeddings, but embeddings are high-volume vector lookups, not sessions. The filter can name them back in.X-Record: off/onheader overrides the server default for one call (matchingX-AFM-Profile/X-Session-Id), to skip a throwaway probe or a sensitive prompt.afminstances with different--transcript-dirvalues give isolated stores.Out of scope for now, each deferred for a specific reason:
Deferred
These came out of @Keesan12's review. I agree with the direction and want them in eventually, just not in the first version:
Why opt-in and off by default
Local inference is the privacy story, and silently writing every conversation to disk would break it.
--recordis off unless asked for and stays out of the request path when absent. Operators who want it always on could set a default via anAFM_RECORD=1env var (matchingAFM_DEBUG/AFM_PERF), but the shipped binary stays off by default.Test plan
--recordoff (default): no recorder, no~/.afm/sessions, request path unchanged.session_meta+ every request message + one assistant line.tool_callson the assistant line andtool_call_id/nameon the tool message.reasoningseparately fromcontent.session_metaand every assistant line carryplatform,afm_version, andbackend.