Skip to content

Process Hangs on Exit Due to spdlog Allocation in Signal Handler #290

@Besroy

Description

@Besroy

Summary
In storage hammer tests, when the process crashes inside malloc() (with arena lock held), the signal handler's spdlog logging triggers another malloc() call, causing a self-deadlock. The same issue occurs with spdlog::shutdown() waiting for threads that are stuck in allocation.

Deadlock Chain

  malloc() → [holds arena lock] → SIGSEGV
    → signal_handler()
      → spdlog::critical()
        → fwrite()
          → _IO_file_doallocate()
            → malloc() → [waits for same arena lock] → DEADLOCK

Stack Trace


  Thread 67 (Thread 0x799a70ff9680 (LWP 32) "storage_mgr"):
  #0  0x0000799a9eed3f0b in __lll_lock_wait_private () from /lib/x86_64-linux-gnu/libc.so.6
  #1  0x0000799a9eee8920 in malloc () from /lib/x86_64-linux-gnu/libc.so.6
  #2  0x0000799a9eec01b5 in _IO_file_doallocate () from /lib/x86_64-linux-gnu/libc.so.6
  #3  0x0000799a9eed0524 in _IO_doallocbuf () from /lib/x86_64-linux-gnu/libc.so.6
  #4  0x0000799a9eecdf90 in _IO_file_overflow () from /lib/x86_64-linux-gnu/libc.so.6
  #5  0x0000799a9eeceaaf in _IO_file_xsputn () from /lib/x86_64-linux-gnu/libc.so.6
  #6  0x0000799a9eec1a12 in fwrite () from /lib/x86_64-linux-gnu/libc.so.6
  #7  0x0000608c00c9462f in spdlog::details::file_helper::write() at file_helper-inl.h:104
  #8  0x0000608c00c97318 in spdlog::sinks::rotating_file_sink<std::mutex>::sink_it_() at rotating_file_sink-inl.h:88
  #9  0x0000608c00c7e27a in spdlog::sinks::base_sink<std::mutex>::log() at base_sink-inl.h:28
  #10 0x0000608c00c79013 in spdlog::logger::sink_it_() at logger-inl.h:138
  #11 0x0000608c00c79fba in spdlog::logger::log_it_() at logger-inl.h:128
  #12 0x0000608c00c66970 in spdlog::logger::log_() at logger.h:332
  #13 0x0000608c00c66330 in spdlog::logger::log() at logger.h:85
  #14 0x0000608c00c66330 in spdlog::logger::critical() at logger.h:155
  #15 sisl::logging::crash_handler (signal_number=<optimized out>) at stacktrace.cpp:133
  #16 <signal handler called>
  #17 0x0000799a9eee67fc in ?? () from /lib/x86_64-linux-gnu/libc.so.6
  #18 0x0000799a9eee86f4 in malloc () from /lib/x86_64-linux-gnu/libc.so.6  ← Original crash location
  #19 0x0000799a9f226904 in operator new(unsigned long) () from /lib/x86_64-linux-gnu/libstdc++.so.6
  #20 0x0000608c0076cdee in folly::futures::detail::Core<bool>::make() at Core.h:563
  #21 0x0000608c0076cdee in folly::makeFuture<bool>() at Future-inl.h:1357
  #22 0x0000608c0076cdee in folly::makeFuture<bool>() at Future-inl.h:1306
  #23 homestore::DataSvcCPCallbacks::cp_flush() at data_svc_cp.cpp:36
  #24 0x0000608c007414ca in homestore::CPManager::cp_start_flush() at cp_mgr.cpp:280
  #25 0x0000608c00741e14 in homestore::CPGuard::~CPGuard() at cp_mgr.cpp:400
  #26 0x0000608c0074293a in homestore::CPManager::do_trigger_cp_flush() at cp_mgr.cpp:270
  #27 0x0000608c007435c6 in homestore::CPManager::trigger_cp_flush() at cp_mgr.cpp:198
  #28 0x0000608c0056089e in homeobject::HSHomeObject::destroy_pg_superblk() at unique_ptr.h:199

Root Cause

Suggested Fix

Use async logging or async-signal-safe alternatives in signal handlers to avoid malloc-dependent functions.

Notes

This is a secondary issue tracking the hang behavior. Priority is low as the root cause of the initial crash remains unknown. Recording here for future investigation.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions