Skip to content

SIGKILL does not model deterministic power loss (page cache footgun) #33

@siphonite

Description

@siphonite

Problem

Using SIGKILL does not control OS page cache behavior. Data written but not fsynced may or may not persist depending on kernel background flushes.

Code reference:
https://github.com/siphonite/first/blob/main/src/rt.rs#L161-L178

fn trigger_crash() -> ! {
    // ...
    unsafe {
        libc::kill(libc::getpid(), libc::SIGKILL);
    }
   // ...
}

Impact

  • Tests asserting “data is lost without fsync” may be flaky: A test might expect data to be lost, but a background flush (pdflush/kworker) might persist it just before the kill.
  • Users may assume stronger guarantees than actually provided: Users might think FIRST intercepts writes and drops non-fsynced data, but it relies on the OS.
  • “Deterministic” claim is misleading: While the crash point is deterministic (in single-threaded code), the state on disk after the crash is not fully deterministic w.r.t. the page cache.

Acceptance Criteria

  • README explicitly documents page cache nondeterminism: Clarify that FIRST models "sudden process termination", not necessarily "power loss" in the sense of disk controller volatile cache loss (though OS crash is similar).
  • “Deterministic” claim is qualified with clear constraints.
  • Optional future “strict mode” is explicitly deferred (not implied).

Metadata

Metadata

Assignees

No one assigned

    Labels

    designdocumentationImprovements or additions to documentation

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions