Make parallel apply compatible with RUN_STANDALONE#5146
Make parallel apply compatible with RUN_STANDALONE#5146SirTyson wants to merge 1 commit intostellar:masterfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR makes RUN_STANDALONE compatible with parallel/background ledger apply so that more unit tests (and apply-load runs) can exercise the parallel apply path.
Changes:
- Allow
RUN_STANDALONEto usePARALLEL_LEDGER_APPLYby removing the standalone restriction and updating manual-close/test flows to wait for background apply completion. - Prevent virtual time from advancing while background-thread actions are queued (to avoid time jumps that can delay processing apply results).
- Adjust several tests/configs to be robust under background apply (timeouts, draining apply before shutdown, disabling parallel apply in tests that mutate headers).
Reviewed changes
Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.
Show a summary per file
| File | Description |
|---|---|
| src/util/Timer.cpp | Avoid virtual-time fast-forward when background-thread actions are pending. |
| src/overlay/test/TCPPeerTests.cpp | Increase timeouts to account for extra main-thread callbacks under background apply. |
| src/main/Config.cpp | Stop auto-disabling parallel apply in standalone; disable only for in-memory SQLite. |
| src/main/CommandLine.cpp | Force parallel apply in apply-load max-sac-tps mode. |
| src/main/ApplicationImpl.cpp | In standalone manual-close, crank until background apply finishes and LCL advances. |
| src/ledger/test/LedgerCloseMetaStreamTests.cpp | Drain background apply before shutdown for consistent debug txset/LCL reads. |
| src/ledger/LedgerManagerImpl.cpp | Improve test-only error for txset mismatch on background apply thread. |
| src/invariant/test/ConservationOfLumensTests.cpp | Disable parallel apply in tests that directly mutate the ledger header. |
| src/history/test/HistoryTests.cpp | Wait for background apply to catch up after externalization in catchup simulation. |
| src/herder/test/TransactionQueueTests.cpp | Wait for background apply to finish before asserting queue state. |
| src/herder/test/HerderTests.cpp | Wait for background apply to finish after externalization helper. |
| docs/apply-load-max-sac-tps.cfg | Enable parallel apply for the max-sac-tps apply-load config. |
| // With background apply, triggerNextLedger posts work to the | ||
| // apply thread. Crank until the ledger is fully applied and | ||
| // LCL has advanced. | ||
| while (getLedgerManager().getLastClosedLedgerNum() < | ||
| targetLedgerSeq) | ||
| { | ||
| getClock().crank(true); | ||
| } |
There was a problem hiding this comment.
This introduces an unbounded while loop that will crank forever if the ledger never advances (e.g., apply thread stuck or an exception prevented completion). To avoid hanging the manualclose command, add a timeout/deadline and throw a descriptive error if targetLedgerSeq isn’t reached in time.
| if (PARALLEL_LEDGER_APPLY && !parallelLedgerClose()) | ||
| { | ||
| if (RUN_STANDALONE) | ||
| { | ||
| LOG_WARNING(DEFAULT_LOG, "RUN_STANDALONE is enabled, disabling " | ||
| "PARALLEL_LEDGER_APPLY"); | ||
| PARALLEL_LEDGER_APPLY = false; | ||
| } | ||
| else | ||
| { | ||
| std::string msg = | ||
| "Invalid configuration: PARALLEL_LEDGER_APPLY " | ||
| "does not support in-memory database modes."; | ||
| throw std::runtime_error(msg); | ||
| } | ||
| LOG_WARNING(DEFAULT_LOG, | ||
| "PARALLEL_LEDGER_APPLY is not supported with " | ||
| "in-memory SQLite, disabling."); | ||
| PARALLEL_LEDGER_APPLY = false; |
There was a problem hiding this comment.
Behavior change: previously, enabling PARALLEL_LEDGER_APPLY with an in-memory DB would throw unless RUN_STANDALONE was set; now it silently disables parallel apply for all modes. This can mask misconfiguration outside tests (users may think parallel apply is active). Consider preserving the previous behavior (throw when not RUN_STANDALONE, or at least when not BUILD_TESTS) and only auto-disable in standalone/testing contexts.
| // With background apply, crank until all queued ledgers are applied | ||
| while (lm.getLastClosedLedgerNum() < nextLedger + 5) | ||
| { | ||
| app->getClock().crank(true); | ||
| } |
There was a problem hiding this comment.
This unbounded crank loop can hang the test indefinitely if something goes wrong and LCL never advances. Since this file already uses testutil::crankUntil elsewhere, consider switching to a bounded crankUntil/deadline-based wait that fails with a clear message on timeout.
| // With background apply, crank until all queued ledgers are applied | |
| while (lm.getLastClosedLedgerNum() < nextLedger + 5) | |
| { | |
| app->getClock().crank(true); | |
| } | |
| // With background apply, crank until all queued ledgers are applied. | |
| REQUIRE(testutil::crankUntil(app->getClock(), [&]() { | |
| return lm.getLastClosedLedgerNum() >= nextLedger + 5; | |
| })); |
| // With background apply, crank until the ledger is fully applied | ||
| while (lm.getLastClosedLedgerNum() < ledgerSeq) | ||
| { |
There was a problem hiding this comment.
This unbounded crank loop can hang the test indefinitely if the ledger apply never completes. Consider adding a deadline (e.g., auto timeout = clock.now() + ...; REQUIRE(clock.now() < timeout);) or using a helper that waits with a bounded timeout so failures don’t wedge the test runner.
| // With background apply, crank until the ledger is fully applied | |
| while (lm.getLastClosedLedgerNum() < ledgerSeq) | |
| { | |
| // With background apply, crank until the ledger is fully applied | |
| auto const timeout = | |
| clock.now() + std::chrono::seconds(10); | |
| while (lm.getLastClosedLedgerNum() < ledgerSeq) | |
| { | |
| REQUIRE(clock.now() < timeout); |
| // With background apply, crank until the ledger is fully applied | ||
| while (lm.getLastClosedLedgerNum() < ledgerSeq) | ||
| { | ||
| app.getClock().crank(true); |
There was a problem hiding this comment.
This helper now contains an unbounded while loop that will crank forever if ledger apply stalls. Since this is used by multiple tests, it’s especially important to add a deadline/timeout and fail fast with a clear error when ledgerSeq isn’t reached.
| // With background apply, crank until the ledger is fully applied | |
| while (lm.getLastClosedLedgerNum() < ledgerSeq) | |
| { | |
| app.getClock().crank(true); | |
| // With background apply, crank until the ledger is fully applied, | |
| // but fail fast if it does not close within a reasonable time. | |
| auto timeout = app.getClock().now() + std::chrono::seconds(5); | |
| while (lm.getLastClosedLedgerNum() < ledgerSeq) | |
| { | |
| app.getClock().crank(true); | |
| REQUIRE(app.getClock().now() < timeout); |
790b206 to
576c1d6
Compare
marta-lokhova
left a comment
There was a problem hiding this comment.
Thanks, this is a helpful change! Looks good overall, I just had a couple of suggestions and a question.
576c1d6 to
b601bbc
Compare
b601bbc to
1dde4b8
Compare
Description
This change makes RUN_STANDALONE compatible with parallel ledger apply. There are a few motivations for this. First, many tests enable
RUN_STANDALONE, so we silently disable parallel apply in many unit tests. This change increases test coverage (and we'll need to do it eventually anyway if we make parallel apply the unconditional default). 2nd, this allows us to use parallel apply in apply load tests, which significantly reduces variance of tests like max-sac-tps test.This touches a few tests I'm not very familiar with (and was AI assisted), but it looks reasonable to me and CI passes, but I'd like for someone more familiar with these unit tests to take a look.
Checklist
clang-formatv8.0.0 (viamake formator the Visual Studio extension)