-
Notifications
You must be signed in to change notification settings - Fork 40
Description
Agent Trace is a file-centric attribution format. The data model reads naturally as: for each file, here are the conversations that contributed to it, and for each conversation, here are the line ranges it produced.
TraceRecord -> files[] -> conversations[] -> ranges[]
The schema is flexible. A trace record can hold one file or many. A file can have one conversation or many. This means the spec can work at a unit level (one trace per edit, one file, one conversation) or at an aggregation level (one trace per session, many files, many conversations). The reference implementation uses the unit level. The schema structure suggests the aggregation level. Both are valid, and the spec doesn't need to pick one.
But regardless of which level a producer uses, there are a couple of things in the data model that need to be clearer for consumers to make sense of the data.
Conversation is not the same as a prompt
The schema ties ranges to conversations. A conversation has a url, a contributor, and ranges. But in practice, a conversation is a multi-turn dialogue. You open a chat, you send many prompts, the AI responds to each one.
Conversation #12345 (lasts 30 minutes)
Prompt 1: "Add input validation" -> edits auth.ts lines 10-25
Prompt 2: "Now add error handling" -> edits auth.ts lines 30-50
Prompt 3: "Fix the type error" -> edits auth.ts line 12
Prompt 4: "Write tests for this" -> creates auth.test.ts lines 1-80
Four prompts, one conversation. Each prompt caused a different change. But in Agent Trace, all four would share the same conversation URL. The schema groups ranges by conversation, not by prompt.
This matters because the prompt is the actual unit of causation. When someone asks "why does this code look like this?", the answer is "because the developer asked the AI to add error handling" (prompt 2), not "because there was a conversation" (conversation #12345). The conversation is context. The prompt is the cause.
There's no prompt concept in the schema. The closest thing is Cursor's generation_id in metadata, which roughly maps to a single prompt-response cycle. But that's vendor-specific metadata, not a first-class field.
The reference implementation sidesteps this by creating one trace per tool call, which is even more granular than a prompt. One prompt might trigger three tool calls (edit file A, edit file B, run tests). So you end up with three levels of granularity that don't align:
Conversation (many prompts, one URL) -> what the schema models
Prompt (one user action, many tool calls) -> what actually caused the change
Tool call (one file edit) -> what the reference impl produces
The spec groups ranges at the conversation level, the reference implementation emits at the tool-call level, and neither captures the prompt level, which is the one that matters most for understanding why code was written.
Suggestion: Add an optional prompt_id to the conversation object, so consumers can identify which specific prompt within a conversation produced a set of ranges. This doesn't break anything. Producers that don't have prompt-level visibility can leave it empty. Producers that do (like Cursor with its generation_id) can populate it as a first-class field instead of burying it in metadata.
{
"conversation": {
"properties": {
"url": { "type": "string", "format": "uri" },
"prompt_id": {
"type": "string",
"description": "Identifies the specific prompt or generation within the conversation that produced these ranges"
},
"contributor": { "$ref": "#/$defs/contributor" },
"ranges": { "type": "array" }
}
}
}Conversations have no timestamp
The only timestamp in the schema is on the trace record:
TraceRecord <- has timestamp
files[] <- no timestamp
conversations[] <- no timestamp
ranges[] <- no timestamp
When a trace record has one file and one conversation (the reference implementation pattern), this is fine. The trace-level timestamp covers it.
When a trace record has multiple conversations for the same file, it breaks. Consider this valid trace:
{
"timestamp": "2026-01-23T14:30:00Z",
"files": [{
"path": "src/auth.ts",
"conversations": [
{
"contributor": { "type": "ai", "model_id": "anthropic/claude-sonnet-4-20250514" },
"ranges": [{ "start_line": 10, "end_line": 25 }]
},
{
"contributor": { "type": "human" },
"ranges": [{ "start_line": 10, "end_line": 25 }]
}
]
}]
}Both conversations claim the same lines. One says AI, one says human. There's one timestamp covering the whole record. A consumer can't tell which conversation is current.
You might think the conversation url could help here. If the URL points to something like https://api.cursor.com/v1/conversations/12345, maybe that resource has a timestamp you could use. But that doesn't work as a general solution:
- The
urlis optional. A minimal valid conversation is justranges. - For Claude Code, the URL is a local file path (
file:///path/to/transcript.jsonl) that may not be accessible to other consumers. - Resolving a URL requires a network call, which contradicts Goal How to use the ts script? #4 ("readable without special tooling").
- For human contributions, there is no URL at all. A human editing code doesn't produce a conversation endpoint or a transcript. The URL-based path to temporal information fails entirely for the human case.
Across separate trace records, this is not a problem. Each record has its own timestamp, and a consumer can use last-write-wins. The issue is specifically within a single trace record when multiple conversations touch the same file and the same lines.
Suggestion: Add a timestamp field to the conversation object.
{
"conversation": {
"properties": {
"url": { "type": "string", "format": "uri" },
"timestamp": {
"type": "string",
"format": "date-time",
"description": "When this conversation's contribution was recorded"
},
"prompt_id": { "type": "string" },
"contributor": { "$ref": "#/$defs/contributor" },
"ranges": { "type": "array" }
}
}
}For AI conversations, the hook already knows when the edit happened. This is straightforward.
For human conversations, the timestamp would come from whatever process creates the attribution (e.g., a commit-time tool). It would reflect when the attribution was recorded, not when the human actually typed the code. That's imperfect, but it's better than having no temporal information at all. And it makes the data model self-contained instead of depending on external URL resolution.
With timestamps on conversations, consumers get a natural resolution rule: when two conversations claim the same lines, the one with the later timestamp takes precedence.
Correlating traces from the same prompt
The schema supports putting multiple files in a single trace record. If a producer does that, grouping is built in. One trace, multiple files, all from the same prompt. No extra fields needed.
But in practice, hook-based architectures make this hard. Hooks fire per tool call. When Claude Code edits a file, the PostToolUse hook fires, and the reference implementation creates a trace right there. It doesn't know if more edits are coming from the same prompt. It doesn't have visibility into prompt boundaries. So it emits one trace per file.
This is the pattern most producers will follow, because most AI coding tools expose hooks at the tool-call level, not the prompt level. Buffering edits until a prompt completes is possible but adds complexity and timing concerns (when is a prompt "done"?).
So in practice, a single prompt that edits 3 files produces 3 trace records:
{"id": "aaa-111", "timestamp": "2026-01-23T14:30:00Z", "files": [{"path": "src/auth.ts", ...}]}
{"id": "bbb-222", "timestamp": "2026-01-23T14:30:01Z", "files": [{"path": "src/api.ts", ...}]}
{"id": "ccc-333", "timestamp": "2026-01-23T14:30:02Z", "files": [{"path": "tests/auth.test.ts", ...}]}Nothing connects them. A consumer sees three separate records and has no standard way to know they came from the same user action.
Cursor works around this by putting generation_id in metadata. Claude Code puts session_id in metadata. A third tool might use something else. These are all vendor-specific keys in an unstructured object. A consumer trying to group related traces has to know the conventions of every producer.
Suggestion: Add an optional correlation_id at the trace record level for producers that emit per-file traces.
{
"properties": {
"correlation_id": {
"type": "string",
"description": "Groups trace records that originated from the same user-initiated action"
}
}
}Producers that use multi-file trace records don't need this. Producers that emit per tool call can set it to a shared ID across all traces from the same prompt. Consumers get a standard field to group on instead of parsing vendor-specific metadata.
Summary
The schema's flexibility is a strength. It works at both the unit level and the aggregation level. These suggestions don't constrain that. They fill in three gaps that make the data hard to consume:
prompt_idon conversations so consumers can identify the specific user action that caused a set of ranges, rather than just the broader conversation.timestampon conversations so consumers can resolve overlapping ranges within a single trace record without depending on external URL resolution.correlation_idon trace records so consumers can group per-file traces that came from the same prompt, without relying on vendor-specific metadata conventions.
All three are optional fields. They don't break existing traces. They give producers a standard way to express information that's currently either lost or buried in vendor-specific metadata.