disk: replaces lmdb with book by matthew-levan · Pull Request #936 · urbit/vere

matthew-levan · 2026-01-02T00:04:27Z

This PR replaces LMDB with book, a custom append-only file-based event log persistence layer tailored to Urbit's sequential access patterns.

Motivation

Unlimited event size

LMDB's general-purpose key-value store features (random access, transactions) are unnecessary overhead for Urbit's strictly append-only event log. With LMDB, reducing log size on disk is impossible (due to B+tree) and maximum value size (event size, in our case) is limited to 4GB or less. This new API provides a simpler, more focused solution.

Faster writes

Additionally, write speeds with book will exceed LMDB's, thus removing a potential bottleneck (should we approach it after integrating SKA with the core operating function).

Implementation

Double-Buffered Headers (LMDB-style Durability)

Book uses a double-buffered header strategy inspired by LMDB to achieve single-fsync durability:

Two header slots exist at page-aligned offsets (0 and 4096)
Each header contains a monotonically increasing sequence number (seq_d)
On commit: write deed data, then write updated header to the inactive slot, then fsync once
On startup: read both headers, use the one with the higher valid seq_d

This provides crash consistency without requiring two fsyncs per commit. If a crash occurs mid-write, the old header remains valid (the new one will have an invalid CRC), and any uncommitted deed data is overwritten/truncated on next startup.

File Format

Events are stored in book.log with the following layout:

Offset 0:      Header Slot A (32 bytes, padded to 4096)
Offset 4096:   Header Slot B (32 bytes, padded to 4096)
Offset 8192:   Deeds start here

/* u3_book_head: on-disk file header (32 bytes, page-aligned slots)
**
**   two header slots at offsets 0 and 4096; deeds start at 8192.
*/
typedef struct _u3_book_head {
  c3_w mag_w;      //  magic number: 0x424f4f4b ("BOOK")
  c3_w ver_w;      //  format version: 1
  c3_d fir_d;      //  first event number in file
  c3_d las_d;      //  last event number (commit marker)
  c3_d seq_d;      //  sequence number (for double-buffer)
  c3_w crc_w;      //  CRC32 checksum (of preceding fields)
} u3_book_head;

Events on-disk are written as deeds with a minimal framing format:

/* u3_book_deed: on-disk event record
**
**   on-disk format: len_d | buffer_data | let_d
**   where buffer_data is len_d bytes of opaque buffer data
**   and let_d echoes len_d for validation (used for backward scanning)
*/
typedef struct _u3_book_deed {
  c3_d len_d;    //  buffer size (bytes)
  // c3_y buf_y[];  //  variable-length buffer data
  c3_d let_d;    //  length trailer (echoes len_d)
} u3_book_deed;

The trailing let_d field enables efficient backward scanning during crash recovery—we can read the last 8 bytes to determine the previous deed's size without a forward scan.

reeds are used to represent deeds in memory:

/* u3_book_reed: in-memory event record representation for I/O
*/
typedef struct _u3_book_reed {
  c3_d  len_d;    //  total buffer size (bytes)
  c3_y* buf_y;    //  complete buffer (caller owns)
} u3_book_reed;

The u3_book structure is used for operations like reading, writing, etc.:

/* u3_book: event log handle
*/
typedef struct _u3_book {
  c3_i         fid_i;      //  file descriptor for book.log
  c3_i         met_i;      //  file descriptor for meta.bin
  c3_c*        pax_c;      //  file path to book.log
  u3_book_head hed_u;      //  cached header (current valid state)
  c3_d         las_d;      //  cached last event number
  c3_d         off_d;      //  cached append offset (end of last event)
  c3_w         act_w;      //  active header slot (0 or 1)
} u3_book;

Batch Writes with Scatter-Gather I/O

Batch writes use pwritev() with iovecs to write multiple deeds in a single syscall, avoiding both per-deed syscall overhead and buffer copying. Each deed requires 3 iovecs (len_d, buffer, let_d), chunked to respect IOV_MAX limits.

Features:

Testing

Tests focus on failure mode, edge case, recovery, and benchmarks. This PR adds write benchmarks for LMDB as well (executable via zig build lmdb-test).

Run: zig build book-test

Compatibility

This PR changes how events are stored in future epochs, but it continues to use LMDB to store global pier metadata in the top-level log directory ($pier/.urb/log/data.mdb). This ensures that helpful error messages can be printed even when users attempt to boot their book-style piers with old binaries. It should be noted that the top-level metadata should be considered canonical. Metadata stored within epochs (meta.bin, as of this PR) maintains consistency with the top-level too, though.

Performance

Book's performance is slightly favorable in the single-event case, and marginally favorable with larger event batches. Disk use is equivalent.

Metric	book single	lmdb single	book batched	lmdb batched
Events written	1000	1000	100000	100000
Event size	128 bytes	128 bytes	1280 bytes	1280 bytes
Total data	0.12 MB	0.12 MB	122.07 MB	122.07 MB
Total time	4.020 s	4.045 s	0.625 s	0.662 s
Write speed	249 ev/s	247 ev/s	160083 ev/s	151125 ev/s
Throughput	0.03 MB/s	0.03 MB/s	195.41 MB/s	184.48 MB/s
Latency	4020.2 μs	4045.2 μs	6.2 μs	6.6 μs

To-do

Migrations from vere-v3,4.x piers
Failure mode tests

… `deed`

dozreg-toplud · 2026-01-13T15:21:51Z

Overall looks good, couple of comments:

…u3_book_meta)`

dozreg-toplud · 2026-01-14T08:09:09Z

_book_scan_end iterates over every event in the file, validating them and the event count in the header, and it is called on every event log load (including on every boot) to locate the append offset.

On my laptop it took 0.320835 seconds to iterate over 119054 events. On ~dozreg-toplud (far from the busiest ship on the network) there are epochs with ~20M events. This means that it would take around a minute to just read the event log in order to boot.

Surely the last offset should just be stored in the header, and _book_scan_end should be reserved for corruption recovery. With that we could also iterate from end to the start of the iterator range in u3_book_walk_init whenever it would make sense: the deeds already have sizes in their tails.

dozreg-toplud · 2026-01-14T10:32:32Z

_book_scan_end will also attempt to truncate all events after a corrupted event was encountered. Is this desirable?

…limit

…ck` performance

…n the header

…om ~mastyr-bottec

matthew-levan added 3 commits December 31, 2025 18:48

disk: book initial commit

e5f25c2

disk: replaces lmdb with book

8ae7a7e

Merge branch 'develop' into ml/book

8c69f6b

matthew-levan force-pushed the ml/book branch from 62bb69f to 54f392b Compare January 2, 2026 00:31

book: formalizes in-memory and on-disk event structures as reed and…

5e7f368

… `deed`

matthew-levan force-pushed the ml/book branch from 54f392b to 5e7f368 Compare January 2, 2026 00:33

matthew-levan added 7 commits January 2, 2026 09:11

book: uses PRIu64 instead of llu format specifier for portability

6bed267

book: adds failure mode tests

4beab90

book: simplifies metadata api

9dc21ae

book: fixes contiguity validation printf

c4443f9

book: runs book-test in ci

9e1cb8d

disk: renames mdb_u to txt_u

59ae845

book: uses PRIu64 format specifier instead of llu in tests

f3210b3

matthew-levan added 3 commits January 13, 2026 10:46

Merge branch 'develop' into ml/book

e9ad7a3

book: removes unused _book_crc32 function

20b05a7

book: removes BOOK_META_SIZE macro and replaces usage with `sizeof(…

ed807d5

…u3_book_meta)`

matthew-levan added 2 commits January 19, 2026 10:02

book: simplifies header, makes it immutable, and refactors accordingly

861fbda

book: sync first event number of header before writing events to log

e988500

matthew-levan force-pushed the ml/book branch from dec2562 to 6e95311 Compare January 19, 2026 15:05

book: improves tests

6f621f7

matthew-levan force-pushed the ml/book branch from 6e95311 to 6f621f7 Compare January 19, 2026 16:17

matthew-levan added 4 commits January 19, 2026 13:57

wip: uses lmdb for top-level metadata

89704e1

Merge branch 'develop' into ml/book

063ebad

disk: remove unused function and use lmdb for top-level metadata

2d62e24

disk: implements book migration

3d994bd

matthew-levan force-pushed the ml/book branch from 3a3610d to 3d994bd Compare January 23, 2026 01:36

matthew-levan marked this pull request as ready for review January 23, 2026 01:37

matthew-levan requested a review from a team as a code owner January 23, 2026 01:37

matthew-levan added 6 commits January 22, 2026 21:19

Merge branch 'develop' into ml/book

d158692

book: ensures consistency of scans

4950a2b

book: cleans, clarifies, and adds metadata to stat

720a296

book: fixes leak, improves test safety, removes arbitrary event size …

ef589da

…limit

book: adds eve_d to u3_book_deed_head and improves `_book_scan_ba…

67d1378

…ck` performance

book: ensure las_d is initialized correctly in new, non-zero epochs

b7a6964

matthew-levan force-pushed the ml/book branch 2 times, most recently from 85c7a88 to 53751f5 Compare January 26, 2026 19:25

book: cleans up and enforces u3_book_init is passed an epoch directory

7c54c40

matthew-levan force-pushed the ml/book branch from 53751f5 to 7c54c40 Compare January 26, 2026 19:36

matthew-levan and others added 14 commits January 26, 2026 14:56

book: improve scanning semantics

d001c00

book: fixes tests for updated api

8c30faf

book: improves code quality

cd4d389

book: adds benchmarks

89cfe42

book: replaces deed event numbers with a tracking last event number i…

5eb10c4

…n the header

book: removes per-event checksums

b636f7e

book: implements lmdb-style double-buffering

02bcb3b

book: cleans double-buffer code

b37552b

book: cleans entire api

70ff357

book: adds _bench_write_speed_mixed according to event histogram fr…

cdbcb1f

…om ~mastyr-bottec

Merge branch 'develop' into ml/book

c9242c2

book: simplify batched writes

58b7efd

book: windows compatibility

65e4f15

book: uses c3_d by default

23d7e9e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

disk: replaces lmdb with book#936

disk: replaces lmdb with book#936
matthew-levan wants to merge 42 commits intodevelopfrom
ml/book

matthew-levan commented Jan 2, 2026 •

edited

Loading

Uh oh!

dozreg-toplud commented Jan 13, 2026

Uh oh!

dozreg-toplud commented Jan 14, 2026

Uh oh!

dozreg-toplud commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

matthew-levan commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Motivation

Unlimited event size

Faster writes

Implementation

Double-Buffered Headers (LMDB-style Durability)

File Format

Batch Writes with Scatter-Gather I/O

Testing

Compatibility

Performance

To-do

Uh oh!

dozreg-toplud commented Jan 13, 2026

Uh oh!

dozreg-toplud commented Jan 14, 2026

Uh oh!

dozreg-toplud commented Jan 14, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

matthew-levan commented Jan 2, 2026 •

edited

Loading