Skip to content

Conversation

@KolbyML
Copy link
Member

@KolbyML KolbyML commented Nov 25, 2025

Resolves NIT-4118

Pulls in OffchainLabs/go-ethereum#588

Goal of this PR

we now support eth_sendTransactionSync

Add a mechanism that does synchronous express lane transaction and returns the receipt

Questions I have

I have some concerns regarding code duplication

  • I duplicated txSyncTimeoutError, because it is behind github.com/ethereum/go-ethereum/internal/ethapi an internal crate
  • the majority of SendExpressLaneTransactionSync is duplicated from eth_sendTransactionSync, normally I would refactor the internal code then just pass in different backends, but I am assuming we don't want to change eth_sendTransactionSync, or it would make it harder to maintain? I am not sure if it is ok to refactor, so if someone could let me know.

I also have some questions about on my geth PR, the whole internal package and go-ethereum submodule, is a little confusing as I don't know the best practices yet, and don't feel I have the seniority on the team to choose what we do

Maybe what I have in this PR is good? Maybe it isn't? Let me know 👍

@github-actions
Copy link

github-actions bot commented Nov 26, 2025

❌ 4 Tests Failed:

Tests completed Failed Passed Skipped
2177 4 2173 0
View the top 3 failed tests by shortest run time
TestMultigasStylus_StorageWrite/out_of_gas
Stack Traces | 0.010s run time
=== RUN   TestMultigasStylus_StorageWrite/out_of_gas
    multigas_stylus_program_test.go:380: 
        	Error Trace:	/home/runner/work/nitro/nitro/system_tests/multigas_stylus_program_test.go:380
        	Error:      	An error is expected but got nil.
        	Test:       	TestMultigasStylus_StorageWrite/out_of_gas
--- FAIL: TestMultigasStylus_StorageWrite/out_of_gas (0.01s)
TestMultigasStylus_StorageWrite
Stack Traces | 2.400s run time
=== RUN   TestMultigasStylus_StorageWrite
INFO [11-27|20:05:47.548] New Key                                  name=Owner       Address=0x26E554a8acF9003b83495c7f45F06edCB803d4e3
INFO [11-27|20:05:47.549] New Key                                  name=Faucet      Address=0xaF24Ca6c2831f4d4F629418b50C227DF0885613A
WARN [11-27|20:05:47.549] Sequencer ReadFromTxQueueTimeout is higher than MaxBlockSpeed ReadFromTxQueueTimeout=1s MaxBlockSpeed=10ms
WARN [11-27|20:05:47.549] Sequencer ReadFromTxQueueTimeout is higher than MaxBlockSpeed ReadFromTxQueueTimeout=1s MaxBlockSpeed=10ms
=== PAUSE TestMultigasStylus_StorageWrite
=== CONT  TestMultigasStylus_StorageWrite
�[90mstorage: len 5.46K vs 13.85K�[0;0m
�[90mstorage: deployed to 0x3a0a61C11D96F5B8c1492bEaA5bDAedefFff15E8�[0;0m
�[90mTime to activate storage: 127.186205ms�[0;0m
ERROR[11-27|20:12:50.099] Dangling trie nodes after full cleanup
--- FAIL: TestMultigasStylus_StorageWrite (2.40s)
TestTimeboostExpressLaneTransactionHandling
Stack Traces | 17.980s run time
... [CONTENT TRUNCATED: Keeping last 20 lines]
DEBUG[11-27|20:11:12.240] Dereferenced trie from memory database   nodes=17  size=3.29KiB   time="34.705µs"  gcnodes=454  gcsize=83.53KiB   gctime=1.483947ms  livenodes=208   livesize=41.59KiB
DEBUG[11-27|20:11:12.240] Dereferenced trie from memory database   nodes=16  size=3.19KiB   time="44.282µs"  gcnodes=470  gcsize=86.72KiB   gctime=1.528099ms  livenodes=192   livesize=38.40KiB
DEBUG[11-27|20:11:12.240] Dereferenced trie from memory database   nodes=16  size=3.18KiB   time="39.383µs"  gcnodes=486  gcsize=89.90KiB   gctime=1.567362ms  livenodes=176   livesize=35.22KiB
DEBUG[11-27|20:11:12.240] Dereferenced trie from memory database   nodes=16  size=3.22KiB   time="39.384µs"  gcnodes=502  gcsize=93.12KiB   gctime=1.606625ms  livenodes=160   livesize=32.00KiB
DEBUG[11-27|20:11:12.240] Dereferenced trie from memory database   nodes=16  size=3.19KiB   time="41.377µs"  gcnodes=518  gcsize=96.32KiB   gctime=1.647882ms  livenodes=144   livesize=28.81KiB
DEBUG[11-27|20:11:12.240] Dereferenced trie from memory database   nodes=16  size=3.18KiB   time="38.682µs"  gcnodes=534  gcsize=99.49KiB   gctime=1.686404ms  livenodes=128   livesize=25.63KiB
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=16  size=3.18KiB   time="35.166µs"  gcnodes=550  gcsize=102.67KiB  gctime=1.721439ms  livenodes=112   livesize=22.46KiB
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=16  size=3.22KiB   time="41.588µs"  gcnodes=566  gcsize=105.89KiB  gctime=1.762907ms  livenodes=96    livesize=19.23KiB
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=17  size=3.29KiB   time="35.697µs"  gcnodes=583  gcsize=109.18KiB  gctime=1.798494ms  livenodes=79    livesize=15.94KiB
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=16  size=3.22KiB   time="34.434µs"  gcnodes=599  gcsize=112.41KiB  gctime=1.832808ms  livenodes=63    livesize=12.72KiB
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=16  size=3.22KiB   time="40.155µs"  gcnodes=615  gcsize=115.63KiB  gctime=1.872832ms  livenodes=47    livesize=9.50KiB
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=15  size=3.08KiB   time="34.314µs"  gcnodes=630  gcsize=118.71KiB  gctime=1.907005ms  livenodes=32    livesize=6.42KiB
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=16  size=3.21KiB   time="35.667µs"  gcnodes=646  gcsize=121.92KiB  gctime=1.942541ms  livenodes=16    livesize=3.21KiB
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=16  size=3.21KiB   time="52.458µs"  gcnodes=662  gcsize=125.13KiB  gctime=1.994848ms  livenodes=0     livesize=0.00B
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=0   size=0.00B     time=260ns       gcnodes=662  gcsize=125.13KiB  gctime=1.994968ms  livenodes=0     livesize=0.00B
DEBUG[11-27|20:11:12.241] Dereferenced trie from memory database   nodes=0   size=0.00B     time=220ns       gcnodes=662  gcsize=125.13KiB  gctime=1.995068ms  livenodes=0     livesize=0.00B
INFO [11-27|20:11:12.242] Blockchain stopped
TRACE[11-27|20:11:12.242] Refreshing our lock                      id=auctioneer-5a03ac95-1764274254882910422
TRACE[11-27|20:11:12.244] P2P networking is spinning down
--- FAIL: TestTimeboostExpressLaneTransactionHandling (17.98s)

📣 Thoughts on this report? Let Codecov know! | Powered by Codecov

Copy link
Contributor

@ganeshvanahalli ganeshvanahalli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Contributor

@diegoximenes diegoximenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 🙂

System tests should be created covering the following scenarios:

  • SendExpressLaneTransactionSync call is made to the active sequencer.
  • SendExpressLaneTransactionSync call is made to a non active sequencer.
  • SendExpressLaneTransactionSync receives a tx that should succeed
  • SendExpressLaneTransactionSync receives a tx that should fail

return nil, err
}
chainEvent := make(chan core.ChainEvent, 128)
sub := a.backend.BlockChain().SubscribeChainEvent(chainEvent)
Copy link
Contributor

@diegoximenes diegoximenes Nov 26, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are some failures while sequencing transactions that will not be detected through this approach.
An example is when the transaction is too large, this is detected by the sequencer here.
This happens out of the PublishExpressLaneTransaction call, and without geth's Blockchain object being aware of that.

Current PublishTransaction API is synchronous and doesn't have this issue.
It uses resultChan for waiting the transaction to be processed.

@KolbyML
Copy link
Member Author

KolbyML commented Nov 27, 2025

Nice 🙂

System tests should be created covering the following scenarios:

  • SendExpressLaneTransactionSync call is made to the active sequencer.
  • SendExpressLaneTransactionSync call is made to a non active sequencer.
  • SendExpressLaneTransactionSync receives a tx that should succeed
  • SendExpressLaneTransactionSync receives a tx that should fail

@diegoximenes what is nice?

@KolbyML
Copy link
Member Author

KolbyML commented Nov 27, 2025

Nice 🙂

System tests should be created covering the following scenarios:

  • SendExpressLaneTransactionSync call is made to the active sequencer.
  • SendExpressLaneTransactionSync call is made to a non active sequencer.
  • SendExpressLaneTransactionSync receives a tx that should succeed
  • SendExpressLaneTransactionSync receives a tx that should fail

@diegoximenes ready for another look I added 2 tests

  • SendExpressLaneTransactionSync call is made to the active sequencer.
  • SendExpressLaneTransactionSync call is made to a non active sequencer.

One for each of these which by essense is 2 tests for SendExpressLaneTransactionSync receives a tx that should succeed

  • SendExpressLaneTransactionSync receives a tx that should fail

I didn't make a test for this ^. Because the "bug" mentioned #4074 (comment) I don't believe is related to this PR, or the implementation, but is instead a side affect from a missing check in ValidateExpressLaneTx(). The check missing in ValidateExpressLaneTx() is a MaxTxSize check, and hence ValidateExpressLaneTx() returns that invalid transactions are valid, which is bad.

I was testing locally, added the size check to ValidateExpressLaneTx(), ran timeboost_sendExpressLaneTransaction with a TX which was invalid by being too big, and it failed, as expected. That check isn't included in this PR of course, and I made a Linear Ticket here https://linear.app/offchain-labs/issue/NIT-4157/validateexpresslanetx-is-missing-max-transaction-size-check-which, a test for MaxTxSize can be done with the fix.

I believe this resolves your concerns #4074 (comment) . That being said that problem mentioned isn't related to this PR or the current issue at hand, and was present before my PR, and can be replicated if users called timeboost_sendExpressLaneTransaction. Hence I don't believe this PR is blocked by this concern, and it can be handled by the Linear Ticket I opened.

@KolbyML KolbyML requested a review from diegoximenes November 27, 2025 19:46
@KolbyML KolbyML assigned diegoximenes and unassigned KolbyML Nov 27, 2025
@KolbyML KolbyML force-pushed the NIT-4118 branch 2 times, most recently from 73b3b82 to 3f0a604 Compare November 27, 2025 19:54
Copy link
Contributor

@diegoximenes diegoximenes left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for opening the ticket to check MaxTxSize timeboost_sendExpressLaneTransaction call life cycle, this will provide a better UX for users 🙂.

The MaxTxSize error was just an example though.
There can be other errors that can occur during sequencing, out of the PublishExpressLaneTransaction call path, in which BlockChain().SubscribeChainEvent will not detect.
One is this, which depends on the last block while sequencing the tx.
We likely don't want to detect this error during timeboost_sendExpressLaneTransaction call cycle, but we can, and should, detect in the timeboost_sendExpressLaneTransactionSync call cycle.
It is not unlikely that the list of possible errors like that will grow in the future.

This PR should do something like this:

  • Abstract that select into a waitTxToBeSequenced func.
  • Create a resultChan in SendExpressLaneTransactionSync, and pass it to the appropriate func calls until it reaches publishTransactionToQueue func.
  • After SendExpressLaneTransactionSync calls PublishExpressLaneTransaction, it will call waitTxToBeSequenced, and after that it will call GetTransactionReceipt. a.backend.BlockChain().SubscribeChainEvent will be redundant and can be removed then.

This is the strategy used by PublishTransaction, in which eth_sendRawTransaction* APIs rely on.
With this approach timeboost_sendExpressLaneTransactionSync will detect all those errors that can occur during sequencing, out of the PublishExpressLaneTransaction call path, and return to the user.

Ideally this PR should also have a test covering SendExpressLaneTransactionSync failing, it can be a MaxTxSize failure, but not necessarily.
This test can likely guide you on improving error handling on this PR, guaranteeing that this sync API is able to catch errors generated out of the PublishExpressLaneTransaction call path.


timeout := defaultTimeout
if timeoutMs != nil && *timeoutMs > 0 {
timeoutMs := int64(math.Min(float64(*timeoutMs), math.MaxInt64))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this line is needed?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was getting a lint 115 warning, so I was trying to get around it, maybe instead I should do

timeoutMs := int64(math.Min(*timeoutMs, uint64(math.MaxInt64))

What do you think?

package gethexec

import "github.com/ethereum/go-ethereum/common"

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about adding a comment describing that those constructs were copied from an internal geth package?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I will apply this change when we finish discussing what to do and I make the bigger changes 👍

@diegoximenes diegoximenes assigned KolbyML and unassigned diegoximenes Nov 28, 2025
@KolbyML
Copy link
Member Author

KolbyML commented Nov 28, 2025

Thank you for opening the ticket to check MaxTxSize timeboost_sendExpressLaneTransaction call life cycle, this will provide a better UX for users 🙂.

The MaxTxSize error was just an example though. There can be other errors that can occur during sequencing, out of the PublishExpressLaneTransaction call path, in which BlockChain().SubscribeChainEvent will not detect. One is this, which depends on the last block while sequencing the tx. We likely don't want to detect this error during timeboost_sendExpressLaneTransaction call cycle, but we can, and should, detect in the timeboost_sendExpressLaneTransactionSync call cycle. It is not unlikely that the list of possible errors like that will grow in the future.

This PR should do something like this:

  • Abstract that select into a waitTxToBeSequenced func.
  • Create a resultChan in SendExpressLaneTransactionSync, and pass it to the appropriate func calls until it reaches publishTransactionToQueue func.
  • After SendExpressLaneTransactionSync calls PublishExpressLaneTransaction, it will call waitTxToBeSequenced, and after that it will call GetTransactionReceipt. a.backend.BlockChain().SubscribeChainEvent will be redundant and can be removed then.

This is the strategy used by PublishTransaction, in which eth_sendRawTransaction* APIs rely on. With this approach timeboost_sendExpressLaneTransactionSync will detect all those errors that can occur during sequencing, out of the PublishExpressLaneTransaction call path, and return to the user.

Ideally this PR should also have a test covering SendExpressLaneTransactionSync failing, it can be a MaxTxSize failure, but not necessarily. This test can likely guide you on improving error handling on this PR, guaranteeing that this sync API is able to catch errors generated out of the PublishExpressLaneTransaction call path.

What you are describing for PublishTransaction(), used to be the strategy/behavior for PublishExpressLaneTransaction(), but was removed here #3010

The current eth_sendRawTransactionSync and SendExpressLaneTransactionSync implementations fetch the receipt locally, and I think that is a requirement to reduce calls to the forwarders/sequencer.

So if you want that behavior that is fine, but I think we would need to add a new endpoint to achieve that timeboost_sendExpressLaneTransactionSyncWithoutReceipt/PublishExpressLaneTransactionSyncWithoutReceipt.

SendExpressLaneTransactionSync would then call PublishExpressLaneTransactionSyncWithoutReceipt which would use the old behavior from #3010 (the behavior you are asking for and is what exists for PublishTransaction). Instead of calling PublishExpressLaneTransaction like it does currently.

What do you think of this? @diegoximenes

@Tristan-Wilson
Copy link
Member

@diegoximenes wrote:

After SendExpressLaneTransactionSync calls PublishExpressLaneTransaction, it will call waitTxToBeSequenced, and after that it will call GetTransactionReceipt.

@KolbyML wrote:

So if you want that behavior that is fine, but I think we would need to add a new endpoint to achieve that timeboost_sendExpressLaneTransactionSyncWithoutReceipt/PublishExpressLaneTransactionSyncWithoutReceipt.

I think there's been a misunderstanding, we definitely want to return the receipt with SendExpressLaneTransactionSync.

I think what Diego is proposing is to pass resultChan all the way through to publishTransactionToQueue. Then wait on resultChan which will catch all sequencing errors, then call GetTransactionReceipt afterwards without need for polling.

I agree that we need to surface sequencing errors in some way as many Timeboost users have been asking for this. But there is an added complication relating to the background of #3010.

The added complication

The reordering queue (msgBySequenceNumber) creates a two-level queue system:

  • Level 1: Reordering queue - buffers out of order transactions, waits indefinitely for gaps to fill, only published when sequence numbers are consecutive
  • Level 2: Sequencer queue - processes transactions for block inclusion, resultChan reports success/failure from there

The problem with waiting on resultChan is that if eg a tx with seq=5 arrives when the next expected sequence number is 3, then it will be buffered in msgBySequenceNumber and not published to the sequencer queue. If we have a timeout external to this then we would return the timeout to the user, but then if messages with seq=3 and 4 arrive then msg 5 will be released from the reordering queue to be put into the sequencer queue.

The original context of
#3010 was that we could have exactly this situation, leading to a weird experience for the user where they think their tx failed because they got a timeout error but it actually didn't.

We have some logic inside publishTransactionToQueue itself that sets a blockstamp on the transaction of when it is first introduced to the queue, to be used to filter out express lane transactions that are too old. But the blockstamp is only set at Level 2, the sequencer queue, so this doesn't really work but maybe could conceptually.

Maybe a way to use this concept is to set an arrival time on the tx in msgBySequenceNumber, which is checked against a timeout shorter than the desired RPC timeout before we actually send it down to the sequencer. This is still a bit yucky because it relies on timeouts being different lengths, and so it would be nice if there was some bidirectional signalling between the top-level place with the waiting is happening and inside the express lane service. So if the top-level timeout is reached, then the ELS definitely won't queue the transaction.

I think the way you could do this is actually by passing the top-level context down to the msgBySequenceNumber, have something like

  msgBySequenceNumber map[uint64]*expressLaneSubmissionWithContext                         

Then when publishing from reordering queue:

  if msg.ctx != nil && msg.ctx.Err() != nil {                                              
      // Caller already timed out / cancelled - don't publish                              
      continue                                                                             
  }    

What do you think of this? It's a bit of a rework, sorry!

As a possibly-relevant aside, I really dislike the Timeboost sequence number feature for exactly the reason of all of the complication it adds. The idea was originally for it to be used by bundlers, but I don't know if it's really getting any use. UserOps style bundlers might actually make more sense with Timeboost than bundlers relying on the sequence number feature (which can add considerable delay if txs get out of sequence). I will discuss this with the product team.

@diegoximenes
Copy link
Contributor

diegoximenes commented Dec 2, 2025

It makes sense to attach the context to the msgBySequenceNumber entries 🙂.
Regarding // Caller already timed out / cancelled - don't publish , you can likely still publish the transaction to the queue though, passing a derived context to it.
It needs to double check that, but the sequencer should be already instrumented to be responsible to detect that the context was cancelled, and properly write the error to the resultChan

@KolbyML
Copy link
Member Author

KolbyML commented Dec 2, 2025

@Tristan-Wilson asked if I can hold on working on this PR, well some questions are answered regarding timeboost internally

So I will see what happens, and potentially start working on this PR again in 1 to 2 days

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants