Fix read-only deadlock #145

heifner · 2025-12-19T17:07:18Z

See Wire-Network/eos-vm#7 for description of read-only deadlock issue.

Update read_only_trx_test.py so that it is much more likely to fail without the fix. I was able to get the test to fail every run on my machine without the fix of: Wire-Network/eos-vm#7. Also added another contract with an infinite transaction so that more than one wasm module is active for the test.

Depends on: Wire-Network/eos-vm#7

Updated platform_timer to have an explicit set_expired to call from wire-sys-vm timeout_exception since that can be thrown on a thread where the thread's platform_timer has not timed out with wire-sys-vm 1.1.0.

…issue.

…rm_timer on timeout_exception

…ys-vm 1.1.0

Copilot

Pull request overview

This PR fixes a read-only transaction deadlock issue by updating the wire-sys-vm dependency and adding an explicit set_expired() method to platform_timer. The fix addresses a multi-threaded scenario where timeout_exception can be thrown on threads whose timers haven't actually expired yet, requiring explicit state transitions to ensure proper timeout handling.

Key changes:

Updated wire-sys-vm from version 1.0.1-rc1 to 1.1.0 to include the deadlock fix
Added set_expired() method to platform_timer for explicit timeout state setting
Enhanced read_only_trx_test.py to be more likely to trigger the deadlock without the fix
Added new "infinite" test contract to provide multiple active WASM modules during testing

Reviewed changes

Copilot reviewed 15 out of 16 changed files in this pull request and generated 7 comments.

Show a summary per file

File	Description
vcpkg.json	Updates wire-sys-vm dependency from 1.0.1-rc1 to 1.1.0
vcpkg-configuration.json	Updates vcpkg registry baseline to support new wire-sys-vm version
unittests/test-contracts/infinite/infinite.cpp	Implements infinite contract with slow, forever, and segfault actions for testing
unittests/test-contracts/infinite/infinite.hpp	Header file for infinite contract defining action interfaces
unittests/test-contracts/infinite/infinite.abi	ABI definition for infinite contract actions
unittests/test-contracts/infinite/infinite.wasm	Compiled WASM binary for infinite contract
unittests/test-contracts/infinite/CMakeLists.txt	Build configuration for infinite contract
unittests/test-contracts/CMakeLists.txt	Adds infinite subdirectory to test contracts build
tests/read_only_trx_test.py	Enhances test with infinite contract, additional threads, and relaxed assertions for timeout scenarios
plugins/producer_plugin/producer_plugin.cpp	Adds debug logging for read-only transaction tracking
libraries/chain/webassembly/runtimes/sys-vm.cpp	Calls set_expired() when handling timeout_exception to ensure checktime() sees timeout
libraries/chain/platform_timer_posix.cpp	Implements set_expired() method for POSIX platform timer
libraries/chain/platform_timer_kqueue.cpp	Implements set_expired() method for kqueue platform timer
libraries/chain/platform_timer_asio_fallback.cpp	Implements set_expired() method for ASIO fallback platform timer
libraries/chain/include/sysio/chain/transaction_context.hpp	Adds set_expired() wrapper method to transaction_checktime_timer
libraries/chain/include/sysio/chain/platform_timer.hpp	Declares set_expired() method in platform_timer interface

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-12-23T17:15:47Z

unittests/test-contracts/infinite/infinite.cpp

+
+void infinite::segv() {
+   volatile int* p = nullptr;
+   int x = *p; // cause segfault


The variable 'x' is assigned but never used after the initial dereference. The dereference of nullptr on line 65 will cause the segfault as intended, but the subsequent assignments to 'x' in the while loop (line 67) serve no functional purpose. While this may be intentional to prevent compiler optimizations, consider adding a comment explaining why 'x' is being assigned in the loop, or use the value in some way to make the intent clearer (e.g., volatile reads or a side effect).

Suggested change

int x = *p; // cause segfault

volatile int x = *p; // cause segfault (and keep dereferences from being optimized out)

Copilot · 2025-12-23T17:15:47Z

tests/read_only_trx_test.py

+    threadList.append(threading.Thread(target = foreverTransactions))
+    threadList.append(threading.Thread(target = infiniteTransactions))
+    threadList.append(threading.Thread(target = foreverTransactions))
    Print("Sending different speeds of read only transactions simutaneously")


The comment has a typo: "simutaneously" should be "simultaneously".

Copilot · 2025-12-23T17:15:48Z

tests/read_only_trx_test.py

+def sendReadOnlySEGV():
+    return sendTransaction('infinite', action='segv', data={}, auth=[], opts='--read')
+
 # Send read-only trxs from mutltiple threads to bump load


The comment has a typo: "mutltiple" should be "multiple".

Copilot · 2025-12-23T17:15:48Z

libraries/chain/platform_timer_posix.cpp

+void platform_timer::set_expired() {
+   timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation};
+   _state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation});
+}


The set_expired() implementation uses the non-atomic 'generation' member variable, while interrupt_timer() loads 'generation_running' from the atomic state before using it. For consistency and to avoid potential race conditions, consider following the same pattern as interrupt_timer() by loading generation_running from the state:

const generation_t generation_running = _state.load().generation_running;
timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation_running};

This ensures the generation value used in compare_exchange matches the actual running generation in the atomic state, which is more robust in multi-threaded scenarios where timeout_exception can be thrown on a thread before its timer has actually expired.

Copilot · 2025-12-23T17:15:48Z

libraries/chain/platform_timer_asio_fallback.cpp

+   timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation};
+   _state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation});


The set_expired() implementation uses the non-atomic 'generation' member variable, while interrupt_timer() loads 'generation_running' from the atomic state before using it. For consistency and to avoid potential race conditions, consider following the same pattern as interrupt_timer() by loading generation_running from the state:

const generation_t generation_running = _state.load().generation_running;
timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation_running};

This ensures the generation value used in compare_exchange matches the actual running generation in the atomic state, which is more robust in multi-threaded scenarios where timeout_exception can be thrown on a thread before its timer has actually expired.

Suggested change

timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation};

_state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation});

const generation_t generation_running = _state.load().generation_running;

timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation_running};

_state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation_running});

Copilot · 2025-12-23T17:15:48Z

libraries/chain/platform_timer_kqueue.cpp

+   timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation};
+   _state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation});


The set_expired() implementation uses the non-atomic 'generation' member variable, while interrupt_timer() loads 'generation_running' from the atomic state before using it. For consistency and to avoid potential race conditions, consider following the same pattern as interrupt_timer() by loading generation_running from the state:

const generation_t generation_running = _state.load().generation_running;
timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation_running};

This ensures the generation value used in compare_exchange matches the actual running generation in the atomic state, which is more robust in multi-threaded scenarios where timeout_exception can be thrown on a thread before its timer has actually expired.

Suggested change

timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation};

_state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation});

const generation_t generation_running = _state.load().generation_running;

timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation_running};

_state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation_running});

Copilot · 2025-12-23T17:15:49Z

libraries/chain/include/sysio/chain/platform_timer.hpp

   void stop();
   //interrupt_timer() can be called from any thread
   void interrupt_timer();
+   //explicit set to expired


The comment for set_expired() should provide more context about its purpose and thread safety guarantees, similar to the comment for interrupt_timer(). Consider documenting:

When this method should be called (e.g., when handling timeout_exception in sys-vm contexts)

Thread safety characteristics (whether it can be called from any thread)

What happens if the timer is not in the running state

This would help API consumers understand when and how to use this method safely.

Suggested change

//explicit set to expired

/* Explicitly force the timer into an "expired" state.

This is typically used by higher‑level code when a timeout has already been detected

(for example, when handling a timeout_exception in a system VM or similar context) and

the platform_timer needs to be updated to reflect that expiration.

Thread safety: like interrupt_timer(), this function is intended to be callable from

any thread and may be invoked even while the timer is running.

If the timer is not currently in the running state, calling set_expired() has no harmful

effect; it will not start the timer or schedule a new timeout, and may simply update the

internal state to reflect that no further waiting is required.

*/

…k-read-only

… Also update comment with additional info.

…k-read-only

heifner added 8 commits December 19, 2025 11:00

Improve read only test to more easily reproduce read-only "deadlock" …

b4154fc

…issue.

Update to wire-sys-vm 1.1.0 with read-only deadlock fix

a16ada3

Add logging for read-only trx

22e28ad

Add explicit set_expired for manually expiring platform_timer

37f7d8c

Integrate wire-sys-vm 1.1.0 which requires setting expired for platfo…

769819c

…rm_timer on timeout_exception

Add a SEGV to the mix

242b574

Update to latest wire-sys-vm 1.1.0

5757c15

Update baseline version in vcpkg configuration with the latest wire-s…

8aaab7a

…ys-vm 1.1.0

brianjohnson5972 requested a review from Copilot December 23, 2025 17:10

Copilot started reviewing on behalf of brianjohnson5972 December 23, 2025 17:10 View session

Copilot AI reviewed Dec 23, 2025

View reviewed changes

heifner added 5 commits December 26, 2025 08:32

Merge remote-tracking branch 'origin/master' into feature/fix-deadloc…

f243f3e

…k-read-only

Add volatile and additional comment

dd4ae32

Fix spelling

4ff997c

Update to match interrupt_timer implementation by loading generation.…

86a8305

… Also update comment with additional info.

Merge remote-tracking branch 'origin/master' into feature/fix-deadloc…

f016b5b

…k-read-only

heifner requested a review from brianjohnson5972 December 26, 2025 15:08

brianjohnson5972 approved these changes Dec 29, 2025

View reviewed changes

heifner mentioned this pull request Dec 29, 2025

Multiple read-only trx running on eos-vm AntelopeIO/spring#1882

Open

heifner merged commit 77c4531 into master Dec 29, 2025
52 of 54 checks passed

heifner deleted the feature/fix-deadlock-read-only branch December 29, 2025 14:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix read-only deadlock #145

Fix read-only deadlock #145

Uh oh!

heifner commented Dec 19, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Copilot AI Dec 23, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

	int x = *p; // cause segfault
	volatile int x = *p; // cause segfault (and keep dereferences from being optimized out)

		timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation};
		_state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation});

-   timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation};
-   _state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation});
+   const generation_t generation_running = _state.load().generation_running;
+   timer_state_t expected{.state = state_t::running, .callback_in_flight = false, .generation_running = generation_running};
+   _state.compare_exchange_strong(expected, timer_state_t{state_t::timed_out, false, generation_running});

-   //explicit set to expired
+   /* Explicitly force the timer into an "expired" state.
+      This is typically used by higher‑level code when a timeout has already been detected
+      (for example, when handling a timeout_exception in a system VM or similar context) and
+      the platform_timer needs to be updated to reflect that expiration.
+      Thread safety: like interrupt_timer(), this function is intended to be callable from
+      any thread and may be invoked even while the timer is running.
+      If the timer is not currently in the running state, calling set_expired() has no harmful
+      effect; it will not start the timer or schedule a new timeout, and may simply update the
+      internal state to reflect that no further waiting is required.
+   */

Fix read-only deadlock #145

Fix read-only deadlock #145

Uh oh!

Conversation

heifner commented Dec 19, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Dec 23, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants