Conversation
StephenButtolph
left a comment
There was a problem hiding this comment.
not doing a full review since it isn't marked as r4r yet. Just dumping my thoughts as I looked through things
alarso16
left a comment
There was a problem hiding this comment.
I know it's not ready, but I had some questions that might be easier to address early (especially since I'll be out starting late tomorrow)
…d no longer exported
Co-authored-by: Austin Larson <78000745+alarso16@users.noreply.github.com> Signed-off-by: Arran Schlosberg <519948+ARR4N@users.noreply.github.com>
| assert.Equal(t, b, lastExecuted.Load(), "Atomic pointer to last-executed block") | ||
| require.NoError(t, b.MarkExecuted(db, gasTime, wallTime, baseFee.ToBig(), receipts, stateRoot, lastExecuted), "MarkExecuted()") | ||
|
|
||
| fromDB := newBlock(t, b.EthBlock(), b.ParentBlock(), b.LastSettled()) |
There was a problem hiding this comment.
These tests are identical to the old ones, just placed into a test-table loop to allow them to be run on the original (post-MarkExecuted) and the restored Blocks.
|
|
||
| func (b *Block) setAncestors(parent, lastSettled *Block) error { | ||
| // SetAncestors sets the block's ancestry while enforcing invariants. | ||
| func (b *Block) SetAncestors(parent, lastSettled *Block) error { |
There was a problem hiding this comment.
The comment isn't very helpful (maybe just to satisfy the linter), but why would the parent be nil? Is it just the genesis block?
There was a problem hiding this comment.
You would typically have both or neither be nil. For example, in VerifyBlock() the rebuilding is performed without known ancestry (i.e. both nil via a call from New()) and then the ancestors are copied in with this function.
| "bounds", | ||
| "interimExecutionTime", |
There was a problem hiding this comment.
cmputils question, but you didn't allow these even though they're unexported - why do you list them explicitly?
There was a problem hiding this comment.
For full context:
cmp.AllowUnexported(Block{}, ancestry{}),
cmpopts.IgnoreFields(
Block{},
"bounds",
"interimExecutionTime",
),The first line tells it to compare the un-exported fields of Block while the second option says "buuut, ignore these ones". The latter also supports ignoring exported fields. The two ignored ones are effectively just optional scratch space that aren't critical to normal operation.
| lastExecuted *atomic.Pointer[Block], | ||
| ) error { | ||
| if it := b.interimExecutionTime.Load(); it != nil && byGas.Compare(it) < 0 { | ||
| // The final execution time is scaled to the new gas target but interim |
There was a problem hiding this comment.
I found this comment confusing. It took me several minutes to understand:
- What the point of the interim execution time is
- Why the post-target scaling is monotonic
- The expected relation between these two variables.
I think I was mostly confused because it's not a "rounding error", but just actually different, right?
There was a problem hiding this comment.
FWIW, this code isn't introduced by this PR, it's just moved. Have a look at the call site in saexec/execution.go to see how it's set and blocks/settlement.go (LastToSettleAt()) to see how it's used.
I think I was mostly confused because it's not a "rounding error", but just actually different, right?
It is a rounding error.
We have the interim clock that ticks for each transaction and the execution clock that ticks for the sum of per-transaction gas. In total they have both ticked by the same amount so are initially equal value.
But then the execution clock MUST be scaled to the new gas target to keep with ACP-176. This scaling might induce a rounding error due to the fractional numerator not being properly divisible by the new denominator. If we didn't handle that in a monotonic fashion (achieved by rounding up) then LastToSettleAt() could return different blocks based on whether interim or execution clocks were checked.
| // execution so no error is returned and execution MUST continue optimistically. | ||
| // Any such log in development will cause tests to fail. | ||
| func (b *Block) CheckBaseFeeBound(actual *uint256.Int) { | ||
| if b.bounds == nil { |
There was a problem hiding this comment.
This function is only used during execution, so even though the bounds aren't instantiated when loading from disk, this doesn't seem necessary. Am I missing something?
There was a problem hiding this comment.
The block replay at recovery requires execution of all blocks since the last one with an available state root. The iter.Seq2 returned in recovery.go will yield blocks that hit this bit of the code.
| return vm.close() | ||
| } | ||
|
|
||
| func (vm *VM) close() error { |
There was a problem hiding this comment.
nit: I don't think you need this change.
There was a problem hiding this comment.
Why not? The vm.close() method is used in NewVM() to tear down things already constructed if there's a later failure.
StephenButtolph
left a comment
There was a problem hiding this comment.
I haven't looked at any of the tests yet, but the actual code makes sense to me.
| // This would require the node to crash at such a precise point in time | ||
| // that it's not worth a preemptive fix. If this ever occurs then just | ||
| // try the root [params.CommitTrieDBEvery] blocks earlier. |
There was a problem hiding this comment.
I'm a bit confused, when can this case happen? We commit the state tree before we update the head block, so shouldn't we be guaranteed that the state is always available here?
I guess can we be more specific about what precise point in time a crash would have to occur? I'm hoping to determine whether or not it is actually a problem that needs a preemptive fix haha
| if err := canonicaliseLastSynchronous(db, lastSynchronous); err != nil { | ||
| return nil, err | ||
| } |
There was a problem hiding this comment.
Could you be able to explain the rationale on why we MarkSynchronous in SinceGenesis.Initialize and then canonicalize in NewVM?
To me, it feels like it would flow more naturally for NewVM to take in a lastSynchronous *types.Block, and then inside NewVM manage marking the block as synchronous (if needed) and making sure the state is correct.
There was a problem hiding this comment.
If we did that then NewVM() would also have to take the starting gas excess. It's absolutely doable, but I'm not sure how much is gained because there's only a single "degree of freedom" in the call to MarkSynchronous() so it's not like NewVM() would be ensuring any invariants.
Co-authored-by: Austin Larson <78000745+alarso16@users.noreply.github.com> Signed-off-by: Arran Schlosberg <519948+ARR4N@users.noreply.github.com>
…genesis reference
Co-authored-by: Austin Larson <78000745+alarso16@users.noreply.github.com> Signed-off-by: Arran Schlosberg <519948+ARR4N@users.noreply.github.com>
Adds support for shutdown and restart without state sync, recovering entirely from the local database.
Recommended review order
blocks.Block.Mark{Executed,Synchronous}()and the newBlock.RestoreExecutionArtefacts(); plus accompanying tests.sae.NewVM()andsae.SinceGenesis.Initialize().sae/recovery.goto support (2), and associated test inrecovery_test.go.Mempool rationale
The upstream
legacypoolimplementation expects a synchronous blockchain, initially requesting the current block and then updating based on chain-head events, in both cases opening astate.StateDBat the latesttypes.Header.Root. In an asynchronous implementation this results in the mempool acting on settled, not executed state. So far this has caused two undesirable properties:sae.VM.WaitForEvent(), only to be filtered out byworstcase. This also suggests an underlying inefficiency in which everyBuildBlock()first discards some prefix of already-included transactions.sae.TestRecoverFromDatabase()in this PR) that don't allow theirBuildBlock()method to include any transaction from an EOA with included but not settled transactions.The wrapper returned by
txgossip.NewBlockChain()addresses this by always serving the latest executed state, regardless of which root is requested. The impossibility of re-orgs makes this safe and efficient (no mempool resets), and addresses (2) entirely. Although this doesn't address all of (1) and some empty blocks and discarded prefixes can occur, it significantly curtails the issue.