Skip to content

Proposal: range.original for post-processing attribution drift #5

@jwa91

Description

@jwa91

Proposal: range.original for post-processing attribution drift

Problem

In practice, nearly every AI coding tool runs formatters or linters after the agent writes code (Prettier on save, ruff format via pre-commit, gofmt, etc.). These tools change line numbers and sometimes relocate blocks entirely (e.g., import sorting).

This means start_line/end_line in a trace record often don't match the final committed file. Attribution queries against the actual file on disk return wrong results.

The spec already has content_hash for position-independent tracking, but it doesn't help when formatters rewrite content (quote style, line wrapping, trailing commas). The hash changes, and the original coordinates are lost.

Proposal

Add one optional field to the range object: original.

{
  "start_line": 1,
  "end_line": 10,
  "content_hash": "sha256:...",
  "original": {
    "start_line": 10,
    "end_line": 18,
    "content_hash": "sha256:..."
  }
}

Semantics

  • start_line/end_line (existing) refer to the post-processed file at the recorded revision.
  • original.start_line/original.end_line refer to what the agent actually wrote, before any formatters/linters ran.
  • original.content_hash (if present) is computed over the pre-formatting content.

When original is absent, the existing behavior is unchanged: start_line/end_line are the only coordinates.

Schema addition

{
  "$defs": {
    "range_original": {
      "type": "object",
      "required": ["start_line", "end_line"],
      "properties": {
        "start_line": { "type": "integer", "minimum": 1 },
        "end_line": { "type": "integer", "minimum": 1 },
        "content_hash": { "type": "string" }
      }
    }
  }
}

And add to the existing range definition:

"original": { "$ref": "#/$defs/range_original" }

Example

Agent writes lines 10-18 in src/utils.py. A pre-commit hook runs ruff format, which reformats and import-sorts. The same content ends up at lines 1-10.

{
  "version": "0.1.0",
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "timestamp": "2026-01-31T12:00:00Z",
  "files": [
    {
      "path": "src/utils.py",
      "conversations": [
        {
          "url": "https://api.example.com/v1/conversations/abc",
          "contributor": {
            "type": "ai",
            "model_id": "anthropic/claude-opus-4-5-20251101"
          },
          "ranges": [
            {
              "start_line": 1,
              "end_line": 10,
              "content_hash": "sha256:a1b2c3d4",
              "original": {
                "start_line": 10,
                "end_line": 18,
                "content_hash": "sha256:e5f6a7b8"
              }
            }
          ]
        }
      ]
    }
  ]
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions