Conversation
There was a problem hiding this comment.
Pull request overview
This PR implements a per-user-per-challenge locking mechanism to prevent race conditions during flag submission. It introduces a singleton UserLockManager using sync.Map to manage locks and modifies the flag submission flow to acquire locks before processing submissions. Additionally, it refactors the database logic in SaveFlagSubmission to explicitly check for already-solved challenges before creating new records.
Key changes:
- New lock manager with TryLock pattern to prevent concurrent submissions for the same user-challenge pair
- Refactored
SaveFlagSubmissionfromFirstOrCreateto explicit check +Createto better handle race conditions - Reorganized submission validation flow in
submitFlagHandler
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
| api/lock_manager.go | Implements singleton UserLockManager with per-resource mutex management using sync.Map, TryLock pattern, and cleanup functionality |
| api/submit.go | Integrates lock acquisition before flag validation, adds challenge existence check, and reorganizes attempt limit checking logic |
| core/database/challenges.go | Changes SaveFlagSubmission from FirstOrCreate to explicit solved-check followed by Create to prevent duplicate solved records |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| lockVal, exists := m.locks.Load(userID) | ||
| if !exists { | ||
| log.Errorf("unlock called for non-existent lock, user_id: %s", userID) |
There was a problem hiding this comment.
The parameter name should be lockId instead of userID to match the usage throughout the code and maintain consistency with TryLock.
| // Catch panic from double-unlock | ||
| defer func() { | ||
| if r := recover(); r != nil { | ||
| log.Errorf("double unlock detected for user_id: %s", userID) |
There was a problem hiding this comment.
The parameter name should be lockId instead of userID to match the usage throughout the code.
| lockManager := GetUserLockManager() | ||
| lockId := fmt.Sprintf("%d_%s", user.ID, challId) | ||
| if !lockManager.TryLock(lockId) { | ||
| c.JSON(http.StatusOK, FlagSubmitResp{ | ||
| Message: "Request already in process", | ||
| Success: false, | ||
| }) | ||
| return | ||
| } | ||
| defer lockManager.Unlock(lockId) |
There was a problem hiding this comment.
The lock is acquired too early in the request flow. It's acquired before lightweight validation checks (challenge status, prerequisites check). This means that users will be blocked from making concurrent requests even when those requests would fail fast on basic validation. Consider moving the lock acquisition to just before line 152 (before CheckPreviousSubmissions), after all the cheap validation checks have passed.
| err = database.UpdateUserChallengeTries(user.ID, challenge.ID) | ||
| if err != nil { | ||
| c.JSON(http.StatusInternalServerError, HTTPErrorResp{ | ||
| Error: "DATABASE ERROR while processing the request.", | ||
| }) | ||
| return | ||
| } | ||
| } |
There was a problem hiding this comment.
The logic for updating user tries has been moved inside the MaxAttemptLimit > 0 check, which means tries will not be tracked for challenges without attempt limits. In the original code (deleted lines 151-178), UpdateUserChallengeTries was called unconditionally for all challenges. This changes the behavior and may cause incorrect try counts for challenges without limits. Consider moving line 182-188 outside of the if challenge.MaxAttemptLimit > 0 block to maintain the original behavior.
| err = database.UpdateUserChallengeTries(user.ID, challenge.ID) | |
| if err != nil { | |
| c.JSON(http.StatusInternalServerError, HTTPErrorResp{ | |
| Error: "DATABASE ERROR while processing the request.", | |
| }) | |
| return | |
| } | |
| } | |
| } | |
| err = database.UpdateUserChallengeTries(user.ID, challenge.ID) | |
| if err != nil { | |
| c.JSON(http.StatusInternalServerError, HTTPErrorResp{ | |
| Error: "DATABASE ERROR while processing the request.", | |
| }) | |
| return | |
| } |
| func (m *UserLockManager) Cleanup() { | ||
| m.locks.Range(func(key, value interface{}) bool { | ||
| mutex := value.(*sync.Mutex) | ||
| // Only delete if mutex is unlocked (not in use) | ||
| if mutex.TryLock() { | ||
| mutex.Unlock() | ||
| m.locks.Delete(key) | ||
| } | ||
| return true | ||
| }) |
There was a problem hiding this comment.
The Cleanup function has a logical issue: it attempts to TryLock to check if the mutex is unlocked, then immediately unlocks it, and then deletes it from the map. However, if the mutex is currently locked by another goroutine (TryLock returns false), the mutex won't be deleted, which is correct. But if TryLock succeeds (returns true), you're essentially stealing the lock from any goroutine that might be about to use it between TryLock and Delete. A safer approach would be to track lock metadata (last used timestamp) separately and clean up locks that haven't been used for a certain period, rather than trying to determine lock state through TryLock.
| // Catch panic from double-unlock | ||
| defer func() { | ||
| if r := recover(); r != nil { | ||
| log.Errorf("double unlock detected for user_id: %s", userID) | ||
| } | ||
| }() | ||
|
|
There was a problem hiding this comment.
The panic recovery for double-unlock is insufficient. In Go, calling Unlock() on an unlocked mutex results in a panic that says "sync: unlock of unlocked mutex", but the defer/recover only logs an error without propagating the issue to the caller. This could mask serious bugs. Additionally, double-unlock is usually a programming error that should be fixed rather than caught and logged. Consider removing the panic recovery and letting the panic occur to surface the bug, or at minimum, log at FATAL level rather than ERROR.
| // Catch panic from double-unlock | |
| defer func() { | |
| if r := recover(); r != nil { | |
| log.Errorf("double unlock detected for user_id: %s", userID) | |
| } | |
| }() | |
| // Unlock the mutex; let any panic propagate to surface programming errors |
| return lock.TryLock() | ||
| } | ||
|
|
||
| func (m *UserLockManager) Unlock(userID string) { |
There was a problem hiding this comment.
Parameter name is misleading and inconsistent. The function receives userID but should be lockId to match the usage in the calling code (line 124 of submit.go where lockId is passed) and to be consistent with the TryLock function which uses lockId.
v1bh475u
left a comment
There was a problem hiding this comment.
I think this is a fine approach for resolving the concurrency issue but I am greatly concerned about the latency issues that may arise from the locking mechanism.
| } | ||
|
|
||
| lockManager := GetUserLockManager() | ||
| lockId := fmt.Sprintf("%d_%s", user.ID, challId) |
There was a problem hiding this comment.
Can we have a non-string approach for the key?
There was a problem hiding this comment.
Yeswe can use hb fir userid and lb for challid in that case
This is among very few which I could find with user based lock and not affecting other uses. Lock is necessary for race condition. |
|
WHat to do with this PR? |
The code seems fine by me but I would like if the submission route is simplified and broken down into functions. Thoughts? |
This PR integrates sync locks to allow only user to enter the critical section only once at a time. No two threads from same user for same challenge can be in the critical section.
(Testing pending).
TODO: