Skip to content

Crash fix in case of forks: remove usage of pthread_once#483

Closed
r1viollet wants to merge 4 commits intomainfrom
r1viollet/pthread_once_crash_fix
Closed

Crash fix in case of forks: remove usage of pthread_once#483
r1viollet wants to merge 4 commits intomainfrom
r1viollet/pthread_once_crash_fix

Conversation

@r1viollet
Copy link
Collaborator

@r1viollet r1viollet commented Jan 14, 2026

What does this PR do?

Disclaimer: I do not fully understand why we have a crash where the key was not initialized.
However we know that in the case of forks we can be in a state where we try to init in the hot path (during an allocation).
This could be the reason for the crash.

This PR addresses the symptoms.

  • We remove the init from the hot path
  • We add a fork handler to make sure init of the key is OK
    We could go one step further here, to make sure we wrap the full init sequence (not just thread local state)
  • We use the key as synchronization mechanism

Motivation

Avoid a crash.

Additional Notes

NA

How to test the change?

I added a test for forking. Though it does not reproduce the crash.

When compiling on Alpine Linux (musl libc) and running on glibc systems,
pthread_once has incompatible internal state machines that cause crashes
after fork(). This was observed as SIGSEGV in PHP binaries that use
forking when returning from the pthread_once API.

Root cause:
- musl uses state value 1 during pthread_once initialization
- glibc uses state value (__fork_generation | 1), typically 5
- After fork(), glibc sees musl's state=1, treats it as invalid for
  the current fork generation, and re-initializes
- This causes double initialization of pthread keys

Solution:
Replaced pthread_once with atomic-based initialization in
AllocationTracker::get_tl_state():
- Uses std::atomic<int> with three states: kKeyNotInitialized,
  kKeyInitializing, kKeyInitialized
- Hot path optimized with relaxed atomic load
- Slow path uses compare-and-swap with proper memory ordering
- Added safety timeout to prevent infinite loops
- Returns nullptr on timeout
- Cover TLS init with forking
@r1viollet r1viollet requested a review from nsavoire as a code owner January 14, 2026 12:01
@pr-commenter
Copy link

pr-commenter bot commented Jan 14, 2026

Benchmark results for collatz

Parameters

Baseline Candidate
config baseline candidate
profiler-version ddprof 0.23.0+f9e8f218.87303337 ddprof 0.23.0+d40c8110.93509981

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

See unchanged results
scenario Δ mean execution_time
scenario:ddprof -S bench-collatz --preset cpu_only collatz_runner.sh unsure
[+13.370ms; +37.728ms] or [+0.338%; +0.954%]

@pr-commenter
Copy link

pr-commenter bot commented Jan 14, 2026

Benchmark results for BadBoggleSolver_run

Parameters

Baseline Candidate
config baseline candidate
profiler-version ddprof 0.23.0+f9e8f218.87303337 ddprof 0.23.0+d40c8110.93509981

Summary

Found 0 performance improvements and 0 performance regressions! Performance is the same for 1 metrics, 0 unstable metrics.

See unchanged results
scenario Δ mean execution_time
scenario:ddprof -S bench-bad-boggle-solver BadBoggleSolver_run work 1000 same

@r1viollet r1viollet marked this pull request as draft January 27, 2026 09:57
@r1viollet
Copy link
Collaborator Author

Sadly, we will have to rethink this PR.

@r1viollet r1viollet closed this Jan 27, 2026
@r1viollet
Copy link
Collaborator Author

What could be happening:

  • init is incomplete at fork (hard to understand why, this runs in main thread 🤔 )
  • second init triggers in the fork in a lazy hot path (as we would not re-run the init)
    the second init triggers an alloc which crashes the app

A fix could be about securing the init after the fork ?

@r1viollet r1viollet reopened this Jan 29, 2026
@r1viollet r1viollet marked this pull request as ready for review January 29, 2026 09:44
pthread_key_t key = _tl_state_key.load(std::memory_order_relaxed);

// In debug builds, verify our assumption
assert(key != kInvalidKey && "pthread key should be initialized before use");
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might trigger with a relaxed order, but at least I will know about it
I might change mem order

@r1viollet r1viollet marked this pull request as draft January 29, 2026 09:48
@r1viollet r1viollet force-pushed the r1viollet/pthread_once_crash_fix branch from 5dd2c62 to dc0d1b6 Compare January 29, 2026 09:56
- Use the key as synchronization
- Avoid init during a get of the thread local state
- Add a fork handler to ensure we have the key initialized
@r1viollet r1viollet force-pushed the r1viollet/pthread_once_crash_fix branch from dc0d1b6 to 57ba49e Compare January 29, 2026 09:59
@r1viollet r1viollet marked this pull request as ready for review January 29, 2026 09:59
@nsavoire
Copy link
Collaborator

nsavoire commented Feb 5, 2026

I think we could completely remove pthread_once / pthread_(set|get)_specific.
The reason we introduced pthread_get_specific was that TLS access through tls_get_addr could trigger an allocation the first time.
But if we force the TLS model to be initial-exec instead of general dynamic, we can avoid tls_get_addr and since ddprof library is not dlopen'ed (it's either ld-preloaded or loaded at startup through DT_NEEDED), it's guaranteed to work.
Here is a proof of concept of the idea:
#486

@r1viollet
Copy link
Collaborator Author

Thanks, This is a super interesting simplification. That would indeed guarantee that the variable is available.

Can LD_PRELOAD + initial-exec not lead to ?
cannot allocate memory in static TLS block

@r1viollet
Copy link
Collaborator Author

looks like defaults are at 512 bytes in glibc if I'm looking at the correct setting: The default allocation of optional static TLS is 512 bytes and is allocated in every thread.

@nsavoire
Copy link
Collaborator

nsavoire commented Feb 6, 2026

Thanks, This is a super interesting simplification. That would indeed guarantee that the variable is available.

Can LD_PRELOAD + initial-exec not lead to ? cannot allocate memory in static TLS block

looks like defaults are at 512 bytes in glibc if I'm looking at the correct setting: The default allocation of optional static TLS is 512 bytes and is allocated in every thread.

Static TLS size is computed at startup from the requirements of all the initially loaded libs (ie. DT_NEEDED libs + preloaded libs), therefore it cannot fail with cannot allocate memory in static TLS block.

glibc allocates some additional ("static surplus") space to static TLS to account for dlopen'ed libraries that use initial-exec TLS model for performance (libgomp, audit libraries?).
That's what triggers cannot allocate memory in static TLS block when dlopen'ing a library that uses initial-exec TLS but no more space is available in static TLS.

But since libdd_profiling is not dlopen'ed (at least not the loader, that's why I put the TLS variable in it), static TLS space is not an issue.

@r1viollet
Copy link
Collaborator Author

Ok, I'm convinced! Thanks for the proposal, I'll close this again to do a new refactor.

@r1viollet r1viollet closed this Feb 6, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants