Modules: Concurrency issue with module context fix#4603
Modules: Concurrency issue with module context fix#4603pm-nikhil-vaidya wants to merge 8 commits intoprebid:masterfrom
Conversation
Add holdReadLock mechanism to prevent concurrent map access in moduleContexts during parallel bidder hook execution. This fixes panics caused by concurrent reads and writes to the moduleContexts map when multiple bidders execute hooks in parallel. This is an alternate implementation to PR prebid#4603 (prebid#4603), using a simpler approach with a boolean flag to conditionally hold the read lock for the entire duration of hook execution in bidder stages. Changes: - Add holdReadLock boolean field to executionContext - Add getModuleContextLocked() helper that reads without acquiring locks - Modify executeGroup() to conditionally hold RLock for entire execution - Enable holdReadLock for ExecuteBidderRequestStage and ExecuteRawBidderResponseStage The RWMutex is now held for the entire duration that hook goroutines are executing in bidder stages, preventing saveModuleContexts() from acquiring write lock while reads are in progress. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude Sonnet 4.5 (1M context) <noreply@anthropic.com>
|
Please see #4631 as an alternative approach, which uses the read-lock when calling the parallel bidder hook stages. NOTE: I wrote it with Claude code and need to review myself - it may not be exactly what I wanted - I may want it to have better control over the lock than it currently does. I just walked through it - it seems fine - each of the hookExecutor methods seems to call saveModuleContexts AFTER calling the executeStage and therefore holding a single lock for the duration of the executeGroup is fine - I might have grabbed and held at each hook invocation, but conceptually you can't return until all hooks are called so it's perfectly fine to only have one read lock held, so 👍 |
|
CORRECTION from gemini: Implementing something with lock around map is probably best/fastest apparently.
I'll have to take another look at this when I'm fresh. Could be ready to go 🤞 |
|
@postindustria-code can you please review? |
|
@pm-nikhil-vaidya - Happy New Year - can you take a look at the diff and see if you are comfortable applying it? |
|
@scr-oath I have implemented the solution you suggested as it removes intermediate copy while copying the module context and removes extra lock mechanism while reading. I have just added a check in the Insert() function in case of map map is empty. func (mc *ModuleContext) Insert(seq iter.Seq2[string, any]) {
if mc == nil {
return
}
mc.Lock()
defer mc.Unlock()
if mc.data == nil {
mc.data = make(map[string]any)
}
maps.Insert(mc.data, seq)
}You can review and let me know if any more changes required |
|
I understand the motivation behind introducing All() for performance reasons. That makes sense, especially given how hot this path can be. However I see a possible deadlock issue with its usage. ModuleContext.All() holds an RLock for the entire duration of iteration and executes caller-provided code while that lock is held. This creates a structural deadlock hazard:
F.i. this pattern is very natural and currently deadlocks: GetAll() seems to be safer to use although being a bit slower and using copies @pm-nikhil-vaidya @scr-oath what are you thoughts on this? |
|
Quick follow-up / clarification: The previous comment was also from me (@legendko) — just posted from my personal GitHub account by mistake. Apologies for any confusion; I wanted to continue the discussion here from this account. While we’re on the topic, I’d like to raise one additional concern, since this is a breaking API change for module authors:
|
|
@bsardo @scr-oath @postindustria-code I’ve rolled back to the previous implementation. I’ve also retained the approach suggested by @scr-oath for cases where a user prefers performance over the potential risk of a deadlock. |
hooks/hookstage/invocation.go
Outdated
| mc.Lock() | ||
| defer mc.Unlock() | ||
| if mc.data == nil { | ||
| mc.data = make(map[string]any) |
There was a problem hiding this comment.
suggestion: Make with the right size for copy if nil.
| mc.data = make(map[string]any) | |
| mc.data = make(map[string]any, len(data)) |
hooks/hookstage/invocation.go
Outdated
| var emptyMapIter iter.Seq2[string, any] = func(yield func(string, any) bool) {} | ||
|
|
||
| // All returns an iterator over key-value pairs from the module context with read lock held | ||
| func (mc *ModuleContext) All() iter.Seq2[string, any] { |
There was a problem hiding this comment.
suggestion: remove this
I thought we had decided against exposing All due to its risk of deadlock if held in reverse order with another
hooks/hookstage/invocation.go
Outdated
| } | ||
|
|
||
| // Insert adds the key-value pairs from seq to the module context with write lock held | ||
| func (mc *ModuleContext) Insert(seq iter.Seq2[string, any]) { |
There was a problem hiding this comment.
suggestion: remove this
I thought we decided against exporing Insert also for deadlock concerns.
|
@pm-nikhil-vaidya we discussed @scr-oath's comments today. Let's remove |
scr-oath
left a comment
There was a problem hiding this comment.
All three previous change requests have been addressed:
- ✅
make(map[string]any, len(data))— properly sized allocation inSetAllwhendatais nil - ✅
All()iterator removed — eliminates deadlock risk from holding lock across iteration - ✅
Insert()removed — same concern addressed
Concurrency analysis of put(): The copy-based approach (existingCtx.SetAll(mCtx.GetAll())) is safe. Go evaluates mCtx.GetAll() first (acquires/releases mCtx.RLock), then passes the returned plain map to existingCtx.SetAll() (acquires/releases existingCtx.Lock). No two ModuleContext locks are ever held simultaneously. No reverse-order deadlock is possible since ModuleContext methods never reference back to moduleContexts.
nitpick (non-blocking): SetAll comment says "replaces all data" but maps.Copy merges into the existing map (existing keys not in data are preserved). The merge behavior is correct for this use case (accumulating context across stages), but the comment could say "merges data into the context" to be precise.
LGTM — the RWMutex + copy-based approach is sound, modules are all updated correctly, and tests are comprehensive.
#4239