Deterministic crash testing for storage engines
Filesystem Injection and Reliability Stress Test
Design • Limitations • Examples
FIRST tests what happens when your storage system crashes. It injects crashes at precise points in your code, restarts, runs recovery, and validates invariants — deterministically and reproducibly.
first::test()
.run(|env| {
let mut wal = Wal::open(&env.path("wal")).unwrap();
let tx = wal.begin();
wal.put(tx, "key", "value");
first::crash_point("before_commit");
wal.commit(tx);
})
.verify(|env, _crash| {
let wal = Wal::open(&env.path("wal")).unwrap();
// Invariant: uncommitted data must not be visible
assert!(wal.get("key").is_none() || wal.get("key") == Some("value"));
})
.execute();Real crashes don't happen at function boundaries. They happen:
- After
write()but beforefsync() - After appending to WAL but before the commit marker
- After
rename()but before directory sync
Traditional tests don't catch these. FIRST does.
| Traditional Testing | FIRST |
|---|---|
| Tests happy paths | Tests crash paths |
| Random fault injection | Deterministic crash points |
| Failures are flaky | Every failure is reproducible |
| Recovery rarely exercised | Recovery runs after every crash |
# Cargo.toml
[dev-dependencies]
first = "0.1"// tests/my_crash_test.rs
#[test]
fn test_atomicity() {
first::test()
.run(|env| {
// Your workload with crash_point() calls
})
.verify(|env, crash| {
// Recovery + invariant checks
})
.execute();
}cargo test┌─────────────┐ ┌─────────────┐ ┌─────────────┐
│ Orchestrator│────▶│ Execution │────▶│ Verify │
│ (Parent) │ │ (Child) │ │ (Child) │
└─────────────┘ └─────────────┘ └─────────────┘
│ │ │
│ SIGKILL at Recovery +
│ crash_point() Invariants
│ │ │
└───────────────────┴────────────────────┘
Repeat for each crash point
- Execution: Your workload runs until hitting the target crash point →
SIGKILL - Verify: Fresh process opens the preserved filesystem state, runs recovery, checks invariants
- Iterate: Repeat for each crash point discovered
[first] crash point 1: OK
[first] crash point 2: OK
[first] crash point 3: FAILED (see /tmp/first/run_3)
[first] crash label: "after_commit_write"
[first] to reproduce:
FIRST_PHASE=VERIFY FIRST_CRASH_TARGET=3 FIRST_WORK_DIR=/tmp/first/run_3 \
cargo test my_test -- --exact
| Feature | Description |
|---|---|
| Deterministic | Same seed/target = same crash = same failure |
| Reproducible | Every failure includes exact reproduction command |
| Zero Setup | Works with cargo test, no kernel modules or VMs |
| Invariant-Based | Assert properties, not expected outputs |
| Artifact Preservation | Filesystem state preserved on failure for debugging |
- Linux only
- Single-threaded execution
- Explicit crash points (no syscall interception yet)
- One
first::test()per#[test]function - No async test support
See docs/limitations.md for details.
v0.1 — Core crash testing functionality complete.
- ✅ Deterministic crash injection
- ✅ Orchestrator lifecycle
- ✅ Invariant validation
- ✅ Reference WAL example
- ⬜ Syscall interception (planned v0.2)
- ⬜ Parallel execution (planned v0.2)
- Design Document — Architecture and implementation details
- Limitations — Known constraints and crash model
- Reference WAL — Complete example with crash-consistency proof
Licensed under the Apache License, Version 2.0.
FIRST — Making crash consistency testable
