Skip to content

fix: prevent massive resyncs when RPC returns stale block number#85

Open
vp993 wants to merge 2 commits intodebridge-finance:masterfrom
vp993:fix/stale-rpc-block-range-handling
Open

fix: prevent massive resyncs when RPC returns stale block number#85
vp993 wants to merge 2 commits intodebridge-finance:masterfrom
vp993:fix/stale-rpc-block-range-handling

Conversation

@vp993
Copy link

@vp993 vp993 commented Jan 23, 2026

Summary

When an RPC returns a stale block number causing fromBlock > toBlock, the current error handling in PR #80 queries the submissions table for the last event block and resets the chain's sync position to that block.

The problem: On chains with low debridge activity (like Zilliqa), the last event could be millions of blocks behind the current sync position, causing massive unnecessary resyncs. For example, on Zilliqa this caused a 2.9 million block resync when the RPC momentarily returned stale data.

Root Cause Analysis

The bug was discovered while debugging validator alerts showing Zilliqa lagging 2.8M blocks behind. Log analysis revealed:

Time (UTC) Event
5:50:57 Normal sync at block 18,653,005-18,653,006
5:51:06 RPC returned stale data: toBlock (18653005) < fromBlock (18653006)
5:51:06 Error triggered: "Invalid block range"
5:51:08 Fallback logic queried DB: "Found last event block number: 15729178"
5:51:08 Chain progress reset to 15,729,178 (~2.9M blocks behind)

The Fix

This PR changes the fallback behavior to use toBlock (the RPC's current confirmed block) instead of querying the submissions table. This is consistent with what the code already does when there are NO events:

// Before (inconsistent behavior):
const newLatestBlock = lastEvent?.blockNumber ?? toBlock;

// After (always use toBlock):
await this.supportedChainRepository.update(chainId, { latestBlock: toBlock });

Changes

  • Remove unnecessary SubmissionEntity repository injection from AddNewEventsAction
  • Simplify error handling to use toBlock as fallback
  • Update tests to reflect the fix

Test plan

  • Build passes (npm run build)
  • Unit tests pass (npm test -- --testPathPattern="AddNewEventsAction")
  • Deploy to staging validator and monitor for stale RPC responses
  • Verify no massive resyncs occur when RPC returns stale block numbers

🤖 Generated with Claude Code

vp993 and others added 2 commits January 23, 2026 15:21
When an RPC returns a stale block number (fromBlock > toBlock), the
previous fix in PR debridge-finance#80 would query the submissions table for the last
event block and reset the chain's sync position to that block.

The problem: On chains with low debridge activity (like Zilliqa), the
last event could be millions of blocks behind the current sync position,
causing massive unnecessary resyncs (e.g., 2.9M blocks on Zilliqa).

This fix simply uses toBlock as the fallback, which is the RPC's current
confirmed block number. This prevents the resync while maintaining
consistency with the RPC's view of the chain.

The old code already used toBlock when there were NO events - this fix
makes the behavior consistent for all cases.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Adds a dedicated test that verifies the fix for stale RPC responses:
- Simulates scenario where DB has latestBlock=200 but RPC returns block 100
- Verifies no events are fetched when fromBlock > toBlock
- Verifies latestBlock is reset to toBlock (99) to prevent massive resync

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant