fix: reduce memory overflow from checkpoint reads and writes (#344)#571
Open
fix: reduce memory overflow from checkpoint reads and writes (#344)#571
Conversation
- Make append_checkpoint truly append-only (O(1) instead of O(N) read-write-all) - Use BufReader for streaming JSONL reads instead of read_to_string - Eliminate 3 redundant read_all_checkpoints() calls in checkpoint::run() - Pass pre-loaded checkpoints to get_all_tracked_files - Defer char-level attribution pruning to write_all_checkpoints - Use BufWriter for efficient checkpoint serialization Addresses #344 Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
Contributor
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
|
No AI authorship found for these commits. Please install git-ai to start tracking AI generated code in your commits. |
|
|
Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
Member
Author
|
devin review devin review's feedback |
…ent data loss Co-Authored-By: Sasha Varlamov <sasha@sashavarlamov.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix: reduce memory overflow from checkpoint reads and writes (#344)
Summary
Addresses the runaway memory usage (30-60GB) reported in #344 by fixing the two highest-impact patterns in checkpoint I/O:
1. Append-only checkpoint writes (
repo_storage.rs):append_checkpointpreviously read ALL checkpoints into memory, appended one, then wrote ALL back — O(N) memory and I/O per append. Now it opens the file in append mode and writes a single JSONL line. Char-level attribution pruning is deferred towrite_all_checkpoints, which is called during post-commit anyway.2. Eliminate redundant full reads (
checkpoint.rs): A singlecheckpoint::run()call previously triggered 4+ independentread_all_checkpoints()deserializations of the entire JSONL file. Now checkpoints are read once at the top ofrun()and passed through toget_all_tracked_filesvia a newpreloaded_checkpointsparameter.3. Streaming reads (
repo_storage.rs):read_all_checkpointsnow usesBufReaderline-by-line instead offs::read_to_string, avoiding holding the full file string and parsed structs in memory simultaneously.4. BufWriter for writes (
repo_storage.rs):write_all_checkpointsnow streams serialization throughBufWriterinstead of building a full string in memory. An explicitflush()call ensures write errors are propagated rather than silently dropped onBufWriter::drop.All 31 checkpoint-related unit tests pass. No new dependencies added.
Updates since last revision
writer.flush()?;inwrite_all_checkpointsto address review feedback: without it, any I/O error during the implicit flush onBufWriterdrop would be silently ignored, potentially causing truncated/corrupt JSONL files.Review & Testing Checklist for Human
prune_old_char_attributionswas moved fromappend_checkpointtowrite_all_checkpoints. Between successive appends (before post-commit callswrite_all_checkpoints), the JSONL file will contain un-pruned char attributions on older entries. Verify that no code path between append and write_all depends on pruned attributions being present on disk. Check callers ofread_all_checkpointsthat run between checkpoint appends and commit.has_no_ai_editslogic equivalence: The early-exit check incheckpoint::run()was rewritten fromall_ai_touched_files().is_empty()tocheckpoints.iter().all(|cp| cp.entries.is_empty() || cp.kind != AiAgent/AiTab). These should be logically equivalent but the double-negative is easy to get wrong — worth a careful trace through both code paths.git commitand that attributions are correctly preserved end-to-end. Unit tests validate correctness but not the memory improvement.append_checkpointnow usesOpenOptions::append(). If multiple processes append simultaneously (unlikely but possible with parallel agent runs), JSONL lines could interleave if they exceed PIPE_BUF (typically 4096 bytes).Notes
write_all_checkpointssignature changed from&[Checkpoint]to&mut [Checkpoint]to allow in-place pruning. All callers updated.get_all_tracked_filesgained an optionalpreloaded_checkpointsparameter. Existing callers that don't pass it will still work (reads from disk as before).Link to Devin run: https://app.devin.ai/sessions/2a46b6eaa71f4f46913488bef2ff52a1
Requested by: @svarlamov