-
Notifications
You must be signed in to change notification settings - Fork 520
fix(repo-manager): add file locking to prevent race conditions #528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
fix(repo-manager): add file locking to prevent race conditions #528
Conversation
|
Warning Rate limit exceeded@shanthanu47 has exceeded the limit for the number of commits or files that can be reviewed per hour. Please wait 21 minutes and 24 seconds before requesting another review. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. 📒 Files selected for processing (2)
Note Other AI code review bot(s) detectedCodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review. WalkthroughAdded file-locking around repository metadata writes using portalocker with temp-file atomic replace, added portalocker to dependencies, and updated exception-wrapping in the Pydantic chat agent to produce distinct async/sync wrappers that log full stack traces and return descriptive error strings. Changes
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
app/modules/repo_manager/repo_manager.py (1)
8-13: Understand thatportalocker.Lockuses advisory locks on Linux/Unix (cooperative) and mandatory locks on Windows.
portalocker.Lock(lock_file_path, timeout=10)uses:
- Windows: OS-enforced mandatory file lock; exclusive behavior is guaranteed.
- Linux/Unix: POSIX advisory lock (fcntl); only prevents cooperating processes that also use
portalocker. Non-cooperating writers bypass the lock.The
timeout=10(in seconds, with 0.25s check interval) is reasonable for most workloads; portalocker retries until timeout or success, then raises an error if the lock cannot be acquired.If your metadata writers (
register_repo,update_last_accessed→_write_metadata_entry) do not all use portalocker, concurrent writes to the same*.json.tmpfile remain possible on Linux, risking JSON corruption. Ensure all writers use the same locking mechanism.Also fix the unused
locked_filevariable at line 326.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
app/modules/repo_manager/repo_manager.py(2 hunks)requirements.txt(1 hunks)
🧰 Additional context used
🪛 GitHub Check: SonarCloud Code Analysis
app/modules/repo_manager/repo_manager.py
[warning] 326-326: Remove the unused local variable "locked_file".
🪛 Ruff (0.14.8)
app/modules/repo_manager/repo_manager.py
326-326: Local variable locked_file is assigned to but never used
Remove assignment to unused variable locked_file
(F841)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
- GitHub Check: Agent
🔇 Additional comments (1)
requirements.txt (1)
175-177: No actionable concerns found forportalocker==3.1.0.The version is stable (supports Python 3.9+), has no direct security advisories, and is released under the permissive BSD-3-Clause license. The transitive CVE-2021-32559 (in the pywin32 dependency) does not affect portalocker itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR addresses a Time-of-Check to Time-of-Use (TOCTOU) race condition in the RepoManager by implementing file locking for metadata writes during repository registration.
- Adds
portalocker==3.1.0dependency for cross-platform file locking - Implements file locking in
register_repomethod to serialize concurrent metadata writes - Creates lock files at the repository directory level to prevent data corruption
Reviewed changes
Copilot reviewed 1 out of 2 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| requirements.txt | Adds portalocker 3.1.0 dependency for file locking functionality |
| app/modules/repo_manager/repo_manager.py | Implements file locking in register_repo method to prevent concurrent write conflicts |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| # Use portalocker for cross-platform file locking | ||
| # Prevent concurrent registrations for the same repo key | ||
| # if another process is currently registering it. | ||
| with portalocker.Lock(lock_file_path, timeout=10) as locked_file: |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The unused variable 'locked_file' should be replaced with an underscore to indicate it's intentionally unused, following Python conventions. The portalocker.Lock context manager returns a file handle that is not being used in this implementation.
| with portalocker.Lock(lock_file_path, timeout=10) as locked_file: | |
| with portalocker.Lock(lock_file_path, timeout=10) as _: |
| ).parent | ||
| os.makedirs(repo_dir, exist_ok=True) | ||
|
|
||
| lock_file_path = repo_dir / ".lock" |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock_file_path construction mixes PathLib and string concatenation incorrectly. The variable 'repo_dir' is a Path object, but the division operator with a string creates a new Path. However, the code should ensure consistent Path usage. More critically, the lock file is placed at the directory level (one lock per repo_name), which means different branches/commits of the same repository will share the same lock file. This could cause unnecessary serialization of independent operations on different branches or commits.
| with portalocker.Lock(lock_file_path, timeout=10) as locked_file: | ||
| self._write_metadata_entry(repo_name, branch, commit_id, data) |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The locking mechanism only protects the register_repo method, but update_last_accessed (line 352-371) also performs a read-modify-write operation on the metadata file without locking. This creates a race condition where concurrent calls to update_last_accessed or between register_repo and update_last_accessed could still result in data corruption or lost updates. Consider applying the same locking mechanism to update_last_accessed and any other methods that modify metadata.
| # Ensure the repository parent directory exists | ||
| repo_dir = self._metadata_path( | ||
| repo_name=repo_name, branch=branch, commit_id=commit_id | ||
| ).parent | ||
| os.makedirs(repo_dir, exist_ok=True) |
Copilot
AI
Dec 12, 2025
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The directory creation here is redundant. The _write_metadata_entry method already creates parent directories at line 151 with 'path.parent.mkdir(parents=True, exist_ok=True)'. This duplication adds unnecessary complexity and could be removed.
|



Description
This PR addresses a Time-of-Check to Time-of-Use (TOCTOU) race condition in RepoManager.
The Issue:
Concurrent parsing requests for the same repository could overwrite the
metadata.jsonfile or result in corrupted JSON, leaving the repository in a broken state.The Fix:
I added
portalockerto the dependencies and implemented a cross-platform file locking mechanism inRepoManager.register_repo. This ensures that writes to the metadata file are serialized.Test Plan:
I verified the fix using a custom script (reproduce_race.py) that simulated concurrent registration attempts. The script confirmed that the lock prevented data corruption.
Summary by CodeRabbit
Bug Fixes
Chores
✏️ Tip: You can customize this high-level summary in your review settings.