Skip to content

Comments

fix: reduce memory overflow from checkpoint reads and writes (#344)#571

Open
svarlamov wants to merge 3 commits intomainfrom
devin/1771643297-memory-overflow-fixes
Open

fix: reduce memory overflow from checkpoint reads and writes (#344)#571
svarlamov wants to merge 3 commits intomainfrom
devin/1771643297-memory-overflow-fixes

Conversation

@svarlamov
Copy link
Member

@svarlamov svarlamov commented Feb 21, 2026

fix: reduce memory overflow from checkpoint reads and writes (#344)

Summary

Addresses the runaway memory usage (30-60GB) reported in #344 by fixing the two highest-impact patterns in checkpoint I/O:

1. Append-only checkpoint writes (repo_storage.rs): append_checkpoint previously read ALL checkpoints into memory, appended one, then wrote ALL back — O(N) memory and I/O per append. Now it opens the file in append mode and writes a single JSONL line. Char-level attribution pruning is deferred to write_all_checkpoints, which is called during post-commit anyway.

2. Eliminate redundant full reads (checkpoint.rs): A single checkpoint::run() call previously triggered 4+ independent read_all_checkpoints() deserializations of the entire JSONL file. Now checkpoints are read once at the top of run() and passed through to get_all_tracked_files via a new preloaded_checkpoints parameter.

3. Streaming reads (repo_storage.rs): read_all_checkpoints now uses BufReader line-by-line instead of fs::read_to_string, avoiding holding the full file string and parsed structs in memory simultaneously.

4. BufWriter for writes (repo_storage.rs): write_all_checkpoints now streams serialization through BufWriter instead of building a full string in memory. An explicit flush() call ensures write errors are propagated rather than silently dropped on BufWriter::drop.

All 31 checkpoint-related unit tests pass. No new dependencies added.

Updates since last revision

  • Added explicit writer.flush()?; in write_all_checkpoints to address review feedback: without it, any I/O error during the implicit flush on BufWriter drop would be silently ignored, potentially causing truncated/corrupt JSONL files.

Review & Testing Checklist for Human

  • Deferred pruning safety: prune_old_char_attributions was moved from append_checkpoint to write_all_checkpoints. Between successive appends (before post-commit calls write_all_checkpoints), the JSONL file will contain un-pruned char attributions on older entries. Verify that no code path between append and write_all depends on pruned attributions being present on disk. Check callers of read_all_checkpoints that run between checkpoint appends and commit.
  • has_no_ai_edits logic equivalence: The early-exit check in checkpoint::run() was rewritten from all_ai_touched_files().is_empty() to checkpoints.iter().all(|cp| cp.entries.is_empty() || cp.kind != AiAgent/AiTab). These should be logically equivalent but the double-negative is easy to get wrong — worth a careful trace through both code paths.
  • Real-world validation: Test with a repo that has a large checkpoint file (>100MB) and multiple agent sessions. Verify memory usage stays reasonable during git commit and that attributions are correctly preserved end-to-end. Unit tests validate correctness but not the memory improvement.
  • Concurrent append safety: append_checkpoint now uses OpenOptions::append(). If multiple processes append simultaneously (unlikely but possible with parallel agent runs), JSONL lines could interleave if they exceed PIPE_BUF (typically 4096 bytes).

Notes

  • The write_all_checkpoints signature changed from &[Checkpoint] to &mut [Checkpoint] to allow in-place pruning. All callers updated.
  • get_all_tracked_files gained an optional preloaded_checkpoints parameter. Existing callers that don't pass it will still work (reads from disk as before).
  • No changes to checkpoint format or serialization — purely I/O optimization.

Link to Devin run: https://app.devin.ai/sessions/2a46b6eaa71f4f46913488bef2ff52a1
Requested by: @svarlamov


Open with Devin

- Make append_checkpoint truly append-only (O(1) instead of O(N) read-write-all)
- Use BufReader for streaming JSONL reads instead of read_to_string
- Eliminate 3 redundant read_all_checkpoints() calls in checkpoint::run()
- Pass pre-loaded checkpoints to get_all_tracked_files
- Defer char-level attribution pruning to write_all_checkpoints
- Use BufWriter for efficient checkpoint serialization

Addresses #344

Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

@git-ai-cloud-dev
Copy link

No AI authorship found for these commits. Please install git-ai to start tracking AI generated code in your commits.

@CLAassistant
Copy link

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.

devin-ai-integration[bot]

This comment was marked as resolved.

Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
@svarlamov
Copy link
Member Author

devin review devin review's feedback

…ent data loss

Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants