Skip to content

Conversation

@0xmrree
Copy link

@0xmrree 0xmrree commented Dec 8, 2025

Fixes TOCTOU issue in beacon node by replacing unsafe port-finding with OS-assigned ports (port 0), ensuring atomic port allocation without race conditions. (Basicly fixing a theoritcal race condition that would prevent the node to start up) Production beacon client is fully fixed with new compute_listen_ports() helper; integration tests retain deprecated approach temporarily pending maintainer input.

Issue Addressed

Which issue # does this PR address? - #8490

Proposed Changes

Changes Introduced

  • Fixes TOCTOU vulnerability in beacon node network port allocation by using OS-assigned ports (port 0)
  • Adds compute_listen_ports() helper function in listen_addr.rs to centralize port computation logic for TCP, discovery UDP, and QUIC UDP ports
  • Implements overflow-safe port handling
  • Adds unit tests covering zero-ports flag, explicit port overrides, and overflow scenarios
  • Updates beacon_node/src/config.rs to use new helper for IPv4, IPv6, and dual-stack configurations
  • Adds integration tests in lighthouse/tests/beacon_node.rs to verify --zero-ports flag behavior
  • Provides safe bind_*_any() APIs for future migration of test code (currently unused)
  • Retains one of the deprecated functions for execution_engine_integration tests pending maintainer guidance - the rest of the old methods were removed

What's Fixed / Additional Info

  • Production beacon node: fully protected against port allocation races
  • Integration tests: still use deprecated approach (to be addressed separately)

Next steps

I think this PR should be merged as is, but we need to address the CL/EL integration test issues. Here I need help from maintainers. Basically, the EL/CL integration tests are still using the old approach that has the TOCTOU issue because after we start the EL process, we need to pass the ports chosen by the EL downstream, which is not easily possible without parsing stdout of the child process—a fragile approach I'm not sure you want to take. That being said, outside of trying to pass a file descriptor into the CLI of EL, this is the only approach I can think of.

Let me know your thoughts. The relevant code is here: https://github.com/sigp/lighthouse/blob/stable/testing/execution_engine_integration/src/execution_engine.rs#L55 If you folks want me to go down the path of parsing the stdout of the EL CLI, I will make an additional PR after this is merged.

Testing

ran cargo nextest run -p lighthouse --release and passed all tests, see below

...
PASS [   0.014s] lighthouse::lighthouse_tests validator_manager::validator_exit_using_beacon_and_presign_flags
PASS [   0.019s] lighthouse::lighthouse_tests validator_manager::validator_import_defaults
PASS [   0.020s] lighthouse::lighthouse_tests validator_manager::validator_import_misc_flags
PASS [   0.015s] lighthouse::lighthouse_tests validator_manager::validator_import_missing_both_file_flags
PASS [   0.017s] lighthouse::lighthouse_tests validator_manager::validator_import_missing_token
PASS [   0.015s] lighthouse::lighthouse_tests validator_manager::validator_import_using_both_file_flags
PASS [   0.019s] lighthouse::lighthouse_tests validator_manager::validator_list_defaults
PASS [   0.020s] lighthouse::lighthouse_tests validator_manager::validator_move_count
PASS [   0.021s] lighthouse::lighthouse_tests validator_manager::validator_move_defaults
PASS [   0.019s] lighthouse::lighthouse_tests validator_manager::validator_move_misc_flags_0
PASS [   0.019s] lighthouse::lighthouse_tests validator_manager::validator_move_misc_flags_1
PASS [   0.019s] lighthouse::lighthouse_tests validator_manager::validator_move_misc_flags_2
PASS [  24.726s] lighthouse::lighthouse_tests beacon_node::validator_monitor_file_flag
PASS [  24.531s] lighthouse::lighthouse_tests beacon_node::validator_monitor_metrics_threshold_custom
PASS [  52.022s] lighthouse::lighthouse_tests beacon_node::test_builder_disable_ssz_flag
PASS [  25.477s] lighthouse::lighthouse_tests beacon_node::validator_monitor_pubkeys_flag
PASS [  25.558s] lighthouse::lighthouse_tests beacon_node::validator_monitor_metrics_threshold_default
PASS [  25.330s] lighthouse::lighthouse_tests beacon_node::wss_checkpoint_flag
PASS [  27.968s] lighthouse::lighthouse_tests beacon_node::zero_ports_flag
────────────
Summary [ 728.214s] 310 tests run: 310 passed, 0 skipped

ran cargo test -p network_utils --lib and passed all tests, see below:

...
running 13 tests
test listen_addr::tests::test_compute_listen_ports_default_behavior ... ok
test listen_addr::tests::test_compute_listen_ports_max_port_overflow ... ok
test listen_addr::tests::test_compute_listen_ports_max_port_with_explicit_quic ... ok
test listen_addr::tests::test_compute_listen_ports_quic_avoids_discovery_conflict ... ok
test listen_addr::tests::test_compute_listen_ports_tcp_and_udp_can_share_port ... ok
test listen_addr::tests::test_compute_listen_ports_with_all_explicit_ports ... ok
test listen_addr::tests::test_compute_listen_ports_with_explicit_disc_port ... ok
test listen_addr::tests::test_compute_listen_ports_with_explicit_quic_port ... ok
test listen_addr::tests::test_compute_listen_ports_with_zero_ports_flag ... ok
test listen_addr::tests::test_compute_listen_ports_with_zero_tcp_and_explicit_ports ... ok
test listen_addr::tests::test_compute_listen_ports_with_zero_tcp_port ... ok
test enr_ext::tests::test_ed25519_peer_conversion ... ok
test enr_ext::tests::test_secp256k1_peer_id_conversion ... ok

test result: ok. 13 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 0.00s

ran cargo test -p lighthouse_network and passed all tests, see below:

...
tus(V2(StatusMessageV2 { fork_digest: [0, 0, 0, 0], finalized_root: 0x0000000000000000000000000000000000000000000000000000000000000000, finalized_epoch: Epoch(1), head_root: 0x0000000000000000000000000000000000000000000000000000000000000000, head_slot: Slot(1), earliest_available_slot: Slot(0) }))
test rpc_tests::test_active_requests ... ok
2025-12-08T02:40:26.947027Z DEBUG Receiver: lighthouse_network_tests::rpc_tests: Sending message 18
2025-12-08T02:40:27.949528Z DEBUG Receiver: lighthouse_network_tests::rpc_tests: Sending message 19
2025-12-08T02:40:28.952231Z DEBUG Receiver: lighthouse_network_tests::rpc_tests: Sending message 20
test rpc_tests::test_tcp_blocks_by_range_chunked_rpc_terminates_correctly ... ok

test result: ok. 14 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out; finished in 20.23s

   Doc-tests lighthouse_network

running 1 test

@0xmrree 0xmrree requested a review from jxs as a code owner December 8, 2025 03:29
@cla-assistant
Copy link

cla-assistant bot commented Dec 8, 2025

CLA assistant check
All committers have signed the CLA.

@0xmrree 0xmrree marked this pull request as draft December 8, 2025 20:47
@0xmrree 0xmrree marked this pull request as ready for review December 8, 2025 20:47
@0xmrree 0xmrree force-pushed the TOCTOU_vulnerability branch from 8897c84 to 529da42 Compare December 12, 2025 08:05
Copy link
Member

@jxs jxs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi, thanks for this. Left some comments

/// - QUIC port defaults to TCP port + 1 (to avoid conflict with discovery UDP)
pub fn compute_listen_ports(
use_zero_ports: bool,
port: u16,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
port: u16,
tcp_port: u16,

return (0, 0, 0);
}

let tcp_port = port;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
let tcp_port = port;

}

#[cfg(test)]
mod tests {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need all these tests?

Copy link
Author

@0xmrree 0xmrree Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i can lighten them up a bit

Comment on lines +88 to +112
pub fn bind_tcp4_any() -> Result<TcpListener, String> {
let addr = std::net::SocketAddr::new(std::net::Ipv4Addr::LOCALHOST.into(), 0);
TcpListener::bind(addr).map_err(|e| format!("Failed to bind TCPv4 listener: {:?}", e))
}

/// Bind a TCPv6 listener on localhost with an ephemeral port (port 0) and return it.
/// Safe against TOCTOU: the socket remains open and reserved by the OS.
pub fn bind_tcp6_any() -> Result<TcpListener, String> {
let addr = std::net::SocketAddr::new(std::net::Ipv6Addr::LOCALHOST.into(), 0);
TcpListener::bind(addr).map_err(|e| format!("Failed to bind TCPv6 listener: {:?}", e))
}

/// Bind a UDPv4 socket on localhost with an ephemeral port (port 0) and return it.
/// Safe against TOCTOU: the socket remains open and reserved by the OS.
pub fn bind_udp4_any() -> Result<UdpSocket, String> {
let addr = std::net::SocketAddr::new(std::net::Ipv4Addr::LOCALHOST.into(), 0);
UdpSocket::bind(addr).map_err(|e| format!("Failed to bind UDPv4 socket: {:?}", e))
}

/// Bind a UDPv6 socket on localhost with an ephemeral port (port 0) and return it.
/// Safe against TOCTOU: the socket remains open and reserved by the OS.
pub fn bind_udp6_any() -> Result<UdpSocket, String> {
let addr = std::net::SocketAddr::new(std::net::Ipv6Addr::LOCALHOST.into(), 0);
UdpSocket::bind(addr).map_err(|e| format!("Failed to bind UDPv6 socket: {:?}", e))
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these functions used anywhere?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those methods are not used, there there as part of requirements from issue.

Copy link
Member

@jxs jxs Dec 12, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIUC the issue mentions

PR #8016 by @sashaodessa proposed a fix by replacing these functions with secure APIs that return already-bound sockets:

bind_tcp4_any() / bind_tcp6_any() → returns TcpListener
bind_udp4_any() / bind_udp6_any() → returns UdpSocket

not that they are required right?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so to get rid of theoretical toctou when actually running the node, you just need to set port to zero when your setting up config (you dont need those methods). as for the integ tests that is a separate issue, please read next step section because that will influence if these methods stay or not. They were not used in original CR from @sashaodessa either

@jxs jxs requested a review from michaelsproul December 12, 2025 15:09
Eliminates TOCTOU vulnerability in beacon node by replacing unsafe port-finding with OS-assigned ephemeral ports (port 0), ensuring atomic port allocation without race conditions. Production beacon client is fully fixed with new compute_listen_ports() helper; integration tests retain deprecated approach temporarily pending maintainer input.

This is related to issue 8490
@0xmrree 0xmrree force-pushed the TOCTOU_vulnerability branch from 529da42 to 6b94f6b Compare December 12, 2025 20:49
@0xmrree
Copy link
Author

0xmrree commented Dec 12, 2025

Should be good for another review

@0xmrree 0xmrree requested a review from jxs December 12, 2025 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants