Skip to content

[DO NOT MERGE] test faster ci for interop tests#392

Closed
kamilsa wants to merge 11 commits intomainfrom
optimize/use-faster-ci-runners
Closed

[DO NOT MERGE] test faster ci for interop tests#392
kamilsa wants to merge 11 commits intomainfrom
optimize/use-faster-ci-runners

Conversation

@kamilsa
Copy link
Collaborator

@kamilsa kamilsa commented Feb 12, 2026

🗒️ Description

🔗 Related Issues or PRs

✅ Checklist

  • Ran tox checks to avoid unnecessary CI fails:
    uvx tox
  • Considered adding appropriate tests for the changes.
  • Considered updating the online docs in the ./docs/ directory.

The aggregated attestation pipeline was broken at multiple points,
preventing finalization in multi-node setups:

- Add missing GossipAggregatedAttestationEvent to network events
- Add AGGREGATED_ATTESTATION decoding and dispatch in event sources
- Fix SyncService.publish_aggregated_attestation to use a callback
  instead of a missing method on NetworkRequester
- Wire publish callback directly in Node.from_genesis
- Publish aggregates from ChainService._initial_tick (was discarded)
- Enable test_late_joiner_sync with is_aggregator=True on node 0
- Unpack (store, aggregates) tuple from on_tick and
  aggregate_committee_signatures in fork choice fill framework
- Update attestation target selection tests for the +1 safe_target
  allowance introduced in get_attestation_target
- Remove @pytest.mark.skip from test_mesh_finalization and test_mesh_2_2_2_finalization
- Update store attribute references: latest_new_attestations -> latest_new_aggregated_payloads,
  latest_known_attestations -> latest_known_aggregated_payloads
- test_partition_recovery remains xfail (known sync service limitation)
- Fix test_store_attestations indentation: steps 2-3 were inside for loop,
  causing aggregation to clear signatures before second validator check
- Fix store.py lint: remove blank line after docstring, wrap long line
- Fix type errors: peer_id accepts None for self-produced blocks throughout
  sync pipeline (SyncService, HeadSync, BlockCache)
- Fix formatting in service.py, node_runner.py, test_multi_node.py
Replace fixed 70s duration loops with convergence helpers:
- assert_all_finalized_to: polls until finalization target reached
- assert_heads_consistent: polls until head slots converge
- assert_same_finalized_checkpoint: polls until nodes agree

This fixes CI flakiness where slow machines cause nodes to diverge
during the fixed wait period. Tests now exit early on success and
tolerate slower environments via generous timeouts.
- Increase gossipsub mesh stabilization from 5s to 10s in start_all
  (CI machines need more time for mesh formation before block production)
- Increase finalization timeout from 100s to 150s
- Increase peer connection timeout from 15s to 30s
- Increase pytest timeout from 200s to 300s

The CI failure showed all 3 nodes stuck at finalized slot 0,
indicating gossip mesh wasn't fully formed when services started.
Remove all logger.debug() and logger.info() statements plus
logging imports that were added during development for debugging.
Switch from ubuntu-latest to macos-15 (M1 runner) for interop tests.
Revert convergence polling and timeout increases - the original
fixed-duration tests should pass on faster macOS hardware.
…i-runners

# Conflicts:
#	src/lean_spec/subspecs/networking/service/service.py
@kamilsa kamilsa closed this Feb 12, 2026
@kamilsa kamilsa deleted the optimize/use-faster-ci-runners branch February 12, 2026 15:04
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant