-
Notifications
You must be signed in to change notification settings - Fork 19
Open
Description
Summary
In storage hammer tests, when the process crashes inside malloc() (with arena lock held), the signal handler's spdlog logging triggers another malloc() call, causing a self-deadlock. The same issue occurs with spdlog::shutdown() waiting for threads that are stuck in allocation.
Deadlock Chain
malloc() → [holds arena lock] → SIGSEGV
→ signal_handler()
→ spdlog::critical()
→ fwrite()
→ _IO_file_doallocate()
→ malloc() → [waits for same arena lock] → DEADLOCK
Stack Trace
Thread 67 (Thread 0x799a70ff9680 (LWP 32) "storage_mgr"):
#0 0x0000799a9eed3f0b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libc.so.6
#1 0x0000799a9eee8920 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
#2 0x0000799a9eec01b5 in _IO_file_doallocate () from /lib/x86_64-linux-gnu/libc.so.6
#3 0x0000799a9eed0524 in _IO_doallocbuf () from /lib/x86_64-linux-gnu/libc.so.6
#4 0x0000799a9eecdf90 in _IO_file_overflow () from /lib/x86_64-linux-gnu/libc.so.6
#5 0x0000799a9eeceaaf in _IO_file_xsputn () from /lib/x86_64-linux-gnu/libc.so.6
#6 0x0000799a9eec1a12 in fwrite () from /lib/x86_64-linux-gnu/libc.so.6
#7 0x0000608c00c9462f in spdlog::details::file_helper::write() at file_helper-inl.h:104
#8 0x0000608c00c97318 in spdlog::sinks::rotating_file_sink<std::mutex>::sink_it_() at rotating_file_sink-inl.h:88
#9 0x0000608c00c7e27a in spdlog::sinks::base_sink<std::mutex>::log() at base_sink-inl.h:28
#10 0x0000608c00c79013 in spdlog::logger::sink_it_() at logger-inl.h:138
#11 0x0000608c00c79fba in spdlog::logger::log_it_() at logger-inl.h:128
#12 0x0000608c00c66970 in spdlog::logger::log_() at logger.h:332
#13 0x0000608c00c66330 in spdlog::logger::log() at logger.h:85
#14 0x0000608c00c66330 in spdlog::logger::critical() at logger.h:155
#15 sisl::logging::crash_handler (signal_number=<optimized out>) at stacktrace.cpp:133
#16 <signal handler called>
#17 0x0000799a9eee67fc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
#18 0x0000799a9eee86f4 in malloc () from /lib/x86_64-linux-gnu/libc.so.6 ← Original crash location
#19 0x0000799a9f226904 in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
#20 0x0000608c0076cdee in folly::futures::detail::Core<bool>::make() at Core.h:563
#21 0x0000608c0076cdee in folly::makeFuture<bool>() at Future-inl.h:1357
#22 0x0000608c0076cdee in folly::makeFuture<bool>() at Future-inl.h:1306
#23 homestore::DataSvcCPCallbacks::cp_flush() at data_svc_cp.cpp:36
#24 0x0000608c007414ca in homestore::CPManager::cp_start_flush() at cp_mgr.cpp:280
#25 0x0000608c00741e14 in homestore::CPGuard::~CPGuard() at cp_mgr.cpp:400
#26 0x0000608c0074293a in homestore::CPManager::do_trigger_cp_flush() at cp_mgr.cpp:270
#27 0x0000608c007435c6 in homestore::CPManager::trigger_cp_flush() at cp_mgr.cpp:198
#28 0x0000608c0056089e in homeobject::HSHomeObject::destroy_pg_superblk() at unique_ptr.h:199
Root Cause
- Primary issue: Unknown crash in malloc() during checkpoint flush (Frame Revised btree with split of multiple files and seperated out #18-Merge Symbiosis sisl #28), possibly memory corruption
- Immediate cause of hang: Signal handler invokes non-async-signal-safe spdlog, which calls malloc() and deadlocks on the arena lock already held by the interrupted malloc()
Suggested Fix
Use async logging or async-signal-safe alternatives in signal handlers to avoid malloc-dependent functions.
Notes
This is a secondary issue tracking the hang behavior. Priority is low as the root cause of the initial crash remains unknown. Recording here for future investigation.
Metadata
Metadata
Assignees
Labels
No labels