Skip to content

Conversation

@bneradt
Copy link
Contributor

@bneradt bneradt commented Dec 10, 2025

Add a destructor to the Stripe class to properly free the directory memory allocated in _init_directory(). The memory was allocated via ats_memalign or ats_alloc_hugepage but never freed when Stripe objects were destroyed. A new private member tracks whether hugepages were used so the destructor can call the appropriate free function. This was discovered via an infrequent ASAN failure in the
cache_disk_replacement_stability regression test, which reported a ~27GB leak from 24 StripeSM objects that went out of scope without releasing their directory buffers.

=================================================================
==5207==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 29272965120 byte(s) in 24 object(s) allocated from:
    #0 0x7f8b9cd1669c in __interceptor_posix_memalign (/lib64/libasan.so.6+0xb569c)
    #1 0xa69d99 in ats_memalign(unsigned long, unsigned long) ../src/tscore/ink_memory.cc:108
    #2 0xfd96e2 in Stripe::_init_directory(unsigned long, int, int) ../src/iocore/cache/Stripe.cc:159
    #3 0xfd905f in Stripe::Stripe(CacheDisk*, long, long, int, int) ../src/iocore/cache/Stripe.cc:97
    #4 0xfdc7ef in StripeSM::StripeSM(CacheDisk*, long, long, int, int) ../src/iocore/cache/StripeSM.cc:120
    #5 0xff8c81 in std::_MakeUniq<StripeSM>::__single_object std::make_unique<StripeSM, CacheDisk*, long, int>(CacheDisk*&&, long&&, int&&) (/tmp/ats-quiche/bin/traffic_server+0xff8c81)
    #6 0xff4797 in RegressionTest_cache_disk_replacement_stability(RegressionTest*, int, int*) ../src/iocore/cache/CacheTest.cc:458
    #7 0xa4c994 in start_test ../src/tscore/Regression.cc:83
    #8 0xa4cd6c in RegressionTest::run(char const*, int) ../src/tscore/Regression.cc:106
    #9 0xa1a2a8 in mainEvent ../src/traffic_server/traffic_server.cc:1570
    #10 0x9fd61a in Continuation::handleEvent(int, void*) ../include/iocore/eventsystem/Continuation.h:228
    #11 0x13373e7 in EThread::process_event(Event*, int, long) ../src/iocore/eventsystem/UnixEThread.cc:171
    #12 0x133809c in EThread::execute_regular() ../src/iocore/eventsystem/UnixEThread.cc:288
    #13 0x1338a7b in EThread::execute() ../src/iocore/eventsystem/UnixEThread.cc:383
    #14 0x13356e8 in spawn_thread_internal ../src/iocore/eventsystem/Thread.cc:75
    #15 0x7f8b9abeb1c9 in start_thread (/lib64/libpthread.so.0+0x81c9)

Add a destructor to the Stripe class to properly free the directory
memory allocated in _init_directory(). The memory was allocated via
ats_memalign or ats_alloc_hugepage but never freed when Stripe objects
were destroyed. A new private member tracks whether hugepages were used
so the destructor can call the appropriate free function. This was
discovered via an infrequent ASAN failure in the
cache_disk_replacement_stability regression test, which reported a ~27GB
leak from 24 StripeSM objects that went out of scope without releasing
their directory buffers.

```
=================================================================
==5207==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 29272965120 byte(s) in 24 object(s) allocated from:
    #0 0x7f8b9cd1669c in __interceptor_posix_memalign (/lib64/libasan.so.6+0xb569c)
    #1 0xa69d99 in ats_memalign(unsigned long, unsigned long) ../src/tscore/ink_memory.cc:108
    #2 0xfd96e2 in Stripe::_init_directory(unsigned long, int, int) ../src/iocore/cache/Stripe.cc:159
    #3 0xfd905f in Stripe::Stripe(CacheDisk*, long, long, int, int) ../src/iocore/cache/Stripe.cc:97
    #4 0xfdc7ef in StripeSM::StripeSM(CacheDisk*, long, long, int, int) ../src/iocore/cache/StripeSM.cc:120
    #5 0xff8c81 in std::_MakeUniq<StripeSM>::__single_object std::make_unique<StripeSM, CacheDisk*, long, int>(CacheDisk*&&, long&&, int&&) (/tmp/ats-quiche/bin/traffic_server+0xff8c81)
    #6 0xff4797 in RegressionTest_cache_disk_replacement_stability(RegressionTest*, int, int*) ../src/iocore/cache/CacheTest.cc:458
    #7 0xa4c994 in start_test ../src/tscore/Regression.cc:83
    #8 0xa4cd6c in RegressionTest::run(char const*, int) ../src/tscore/Regression.cc:106
    #9 0xa1a2a8 in mainEvent ../src/traffic_server/traffic_server.cc:1570
    #10 0x9fd61a in Continuation::handleEvent(int, void*) ../include/iocore/eventsystem/Continuation.h:228
    apache#11 0x13373e7 in EThread::process_event(Event*, int, long) ../src/iocore/eventsystem/UnixEThread.cc:171
    apache#12 0x133809c in EThread::execute_regular() ../src/iocore/eventsystem/UnixEThread.cc:288
    apache#13 0x1338a7b in EThread::execute() ../src/iocore/eventsystem/UnixEThread.cc:383
    apache#14 0x13356e8 in spawn_thread_internal ../src/iocore/eventsystem/Thread.cc:75
    apache#15 0x7f8b9abeb1c9 in start_thread (/lib64/libpthread.so.0+0x81c9)
```
@bneradt bneradt added this to the 10.2.0 milestone Dec 10, 2025
@bneradt bneradt self-assigned this Dec 10, 2025
@bneradt
Copy link
Contributor Author

bneradt commented Dec 10, 2025

@zwoop has a fix for this contained inside of #12717 (which may be broken out as a separate PR later). I'll close this PR, which has it's own problems at anyrate, judging from the failed CI runs.

@bneradt bneradt closed this Dec 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant