Performance: 2x faster page splitting and extraction by Mythie · Pull Request #30 · LibPDF-js/core

Mythie · 2026-02-18T02:50:04Z

Page splitting and extraction were slower than they needed to be. Profiling the 2000-page split workload revealed a few easy wins that compound nicely.

What changed

Sync ObjectCopier — ObjectCopier was fully async despite doing zero I/O. Every recursive call in the deep-copy graph walk went through the microtask queue for no reason. Made all methods synchronous. (~15% on split workloads)

npm lru-cache — Our internal LRU cache did Map.delete() + Map.set() on every get() to maintain recency. Replaced with the lru-cache package which uses a doubly-linked-list internally. Biggest impact on large PDF loading where PdfRef.of() and PdfName.of() are called thousands of times. (~17% on split, ~60% on single-page extraction from large PDFs)

Cached PdfName serialization — PdfName.toBytes() was calling new TextEncoder(), encoding to bytes, and iterating the result on every single write. Since names are interned, we can cache the serialized bytes on the instance. Added an ASCII fast-path that skips the encoder entirely for the 99% of PDF names that are plain ASCII. Also extracted a shared HEX_TABLE lookup used by both bytesToHex and escapeName. (~23% on split)

Skip deflate for tiny streams — pako's Deflate constructor zeros a 64KB hash table on every call (~0.023ms). When splitting 2000 pages, each output PDF has a few tiny unfiltered content streams (2-74 bytes) — that's 6000+ deflate initializations for streams that never compress meaningfully. Added a configurable compressionThreshold (default 512 bytes) to skip compression below that size. Also exposed compressStreams and compressionThreshold on SaveOptions. (~30% on split)

Numbers

Benchmark	Before	After	Speedup
split 2000-page PDF	582ms	245ms	2.38x
split 100-page PDF	31.6ms	12.7ms	2.48x
1 page from 2000-page PDF	40.9ms	25.1ms	1.63x
100 pages from 2000-page PDF	50.9ms	26.8ms	1.90x

Also included

Benchmark suite for splitting/copying workloads (benchmarks/splitting.bench.ts)
CI workflow that runs benchmarks on PRs touching .ts files and posts a comparison comment
scripts/bench-compare.ts for the comparison logic

Add benchmarks for page splitting, copying, and merging (#26). Synthetic 100-page and 2000-page PDFs are generated from sample.pdf and cached to disk for reuse. New benchmark suites: - splitting.bench.ts: single-page extraction, full split, batch extract - copying.bench.ts: cross-doc copy, duplication, merging - comparison.bench.ts: head-to-head vs pdf-lib for all of the above Report generation: - scripts/bench-report.ts transforms vitest JSON output to markdown - reports/benchmarks.md committed to repo, updated by CI - .github/workflows/bench.yml runs weekly + on push to main

ObjectCopier does zero I/O — every method was async but never awaited anything asynchronous. Removing async/await eliminates microtask scheduling overhead on every recursive call in the deep-copy graph walk. Benchmarks show ~15% improvement on full-split workloads: - 100-page split: 31.6ms → 27.3ms (1.16x) - 2000-page split: 582.5ms → 506.6ms (1.15x)

The internal LRU cache did Map.delete()+Map.set() on every get() to maintain recency ordering. The npm lru-cache package uses a doubly-linked-list for O(1) operations without Map rehashing. Benchmarks show significant gains especially on large PDF parsing: - 2000-page split: 506.6ms → 432.3ms (1.17x incremental) - Single page from 2000p: 41.0ms → 25.5ms (1.61x incremental) - Cumulative from baseline: 1.35x–1.60x across split workloads

Three changes: - PdfName.toBytes() caches serialized bytes on the interned instance (compute once, writeBytes on every subsequent call). ASCII fast-path skips TextEncoder entirely for the 99% of names that are pure ASCII. - Shared HEX_TABLE in buffer.ts replaces per-byte toString(16) calls in both bytesToHex and escapeName. - Skip deflate for streams under 512 bytes (configurable via compressionThreshold). Deflate init zeros a 64KB hash table per call; for tiny streams the overhead dwarfs any savings. - Expose compressStreams and compressionThreshold on SaveOptions. Cumulative from baseline: 582ms → 245ms (2.38x) on 2000-page split.

Runs splitting benchmarks on both base and PR branches when .ts files are changed. Posts a comparison table as a sticky PR comment showing per-benchmark speedup/regression with 🟢/🔴 indicators at ±5% threshold.

vercel · 2026-02-18T02:50:08Z

The latest updates on your projects. Learn more about Vercel for GitHub.

Project	Deployment	Actions	Updated (UTC)
core	Ready	Preview, Comment	Feb 18, 2026 3:09am

Just run benchmarks and post results as a PR comment. No base comparison — check manually if needed.

github-actions · 2026-02-18T03:04:05Z

Benchmark Results

Comparison

Load PDF

Benchmark	Mean	p99	RME	Samples
libpdf	2.37ms	2.94ms	±1.0%	211
pdf-lib	38.10ms	44.73ms	±4.1%	14

Create blank PDF

Benchmark	Mean	p99	RME	Samples
libpdf	56μs	110μs	±1.7%	8893
pdf-lib	410μs	1.42ms	±2.4%	1220

Add 10 pages

Benchmark	Mean	p99	RME	Samples
libpdf	99μs	149μs	±1.1%	5072
pdf-lib	502μs	1.77ms	±2.6%	996

Draw 50 rectangles

Benchmark	Mean	p99	RME	Samples
libpdf	763μs	2.08ms	±2.9%	656
pdf-lib	1.64ms	5.75ms	±6.0%	306

Load and save PDF

Benchmark	Mean	p99	RME	Samples
libpdf	2.44ms	3.85ms	±1.9%	206
pdf-lib	89.22ms	93.70ms	±2.0%	10

Load, modify, and save PDF

Benchmark	Mean	p99	RME	Samples
libpdf	57.66ms	70.37ms	±7.6%	10
pdf-lib	91.15ms	109.98ms	±7.1%	10

Extract single page from 100-page PDF

Benchmark	Mean	p99	RME	Samples
libpdf	3.97ms	6.38ms	±2.1%	126
pdf-lib	9.34ms	12.10ms	±2.7%	54

Split 100-page PDF into single-page PDFs

Benchmark	Mean	p99	RME	Samples
libpdf	33.80ms	36.10ms	±1.7%	15
pdf-lib	87.10ms	90.68ms	±2.8%	6

Split 2000-page PDF into single-page PDFs (0.9MB)

Benchmark	Mean	p99	RME	Samples
libpdf	625.08ms	625.08ms	±0.0%	1
pdf-lib	1.66s	1.66s	±0.0%	1

Copy 10 pages between documents

Benchmark	Mean	p99	RME	Samples
libpdf	4.78ms	6.13ms	±1.9%	105
pdf-lib	11.76ms	12.89ms	±1.3%	43

Merge 2 x 100-page PDFs

Benchmark	Mean	p99	RME	Samples
libpdf	14.74ms	16.63ms	±1.2%	34
pdf-lib	52.66ms	53.21ms	±0.5%	10

Copying

Copy pages between documents

Benchmark	Mean	p99	RME	Samples
copy 1 page	988μs	2.02ms	±2.1%	507
copy 10 pages from 100-page PDF	4.60ms	5.32ms	±1.8%	109
copy all 100 pages	7.55ms	8.44ms	±0.9%	67

Duplicate pages within same document

Benchmark	Mean	p99	RME	Samples
duplicate page 0	923μs	1.71ms	±1.3%	542
duplicate all pages (double the document)	906μs	1.63ms	±1.0%	552

Merge PDFs

Benchmark	Mean	p99	RME	Samples
merge 2 small PDFs	1.51ms	2.15ms	±1.1%	332
merge 10 small PDFs	7.79ms	8.35ms	±0.7%	65
merge 2 x 100-page PDFs	14.14ms	15.40ms	±1.0%	36

Drawing

benchmarks/drawing.bench.ts

Benchmark	Mean	p99	RME	Samples
draw 100 rectangles	1.53ms	3.98ms	±4.3%	327
draw 100 circles	3.57ms	6.77ms	±5.3%	141
draw 100 lines	1.23ms	2.80ms	±3.1%	408
draw 100 text lines (standard font)	3.47ms	7.37ms	±4.7%	144
create 10 pages with mixed content	3.94ms	8.73ms	±5.9%	128

Forms

benchmarks/forms.bench.ts

Benchmark	Mean	p99	RME	Samples
get form fields	3.55ms	6.45ms	±3.6%	141
fill text fields	11.97ms	17.18ms	±4.1%	42
read field values	3.09ms	4.02ms	±1.3%	162
flatten form	9.04ms	14.32ms	±3.2%	56

Loading

benchmarks/loading.bench.ts

Benchmark	Mean	p99	RME	Samples
load small PDF (888B)	65μs	145μs	±0.8%	7745
load medium PDF (19KB)	97μs	175μs	±0.5%	5130
load form PDF (116KB)	1.43ms	2.60ms	±1.5%	350
load heavy PDF (9.9MB)	2.38ms	2.81ms	±0.9%	211

Saving

benchmarks/saving.bench.ts

Benchmark	Mean	p99	RME	Samples
save unmodified (19KB)	110μs	268μs	±0.9%	4538
save with modifications (19KB)	779μs	1.50ms	±1.5%	643
incremental save (19KB)	180μs	387μs	±1.0%	2771
save heavy PDF (9.9MB)	2.33ms	2.75ms	±0.6%	215
incremental save heavy PDF (9.9MB)	5.53ms	9.91ms	±3.0%	91

Splitting

Extract single page

Benchmark	Mean	p99	RME	Samples
extractPages (1 page from small PDF)	1.03ms	2.52ms	±2.9%	487
extractPages (1 page from 100-page PDF)	3.72ms	6.27ms	±2.1%	135
extractPages (1 page from 2000-page PDF)	59.36ms	61.48ms	±1.5%	10

Split into single-page PDFs

Benchmark	Mean	p99	RME	Samples
split 100-page PDF (0.1MB)	32.87ms	37.14ms	±3.7%	16
split 2000-page PDF (0.9MB)	583.25ms	583.25ms	±0.0%	1

Batch page extraction

Benchmark	Mean	p99	RME	Samples
extract first 10 pages from 2000-page PDF	60.31ms	63.39ms	±1.9%	9
extract first 100 pages from 2000-page PDF	64.23ms	65.84ms	±1.4%	8
extract every 10th page from 2000-page PDF (200 pages)	69.01ms	87.16ms	±8.9%	8

Environment

Runner: Linux (X64)
Runtime: Bun 1.3.9

Results are machine-dependent.

Mythie added 5 commits February 18, 2026 11:46

ci: add PR benchmark comparison workflow

009a614

Runs splitting benchmarks on both base and PR branches when .ts files are changed. Posts a comparison table as a sticky PR comment showing per-benchmark speedup/regression with 🟢/🔴 indicators at ±5% threshold.

vercel bot deployed to Preview February 18, 2026 02:51 View deployment

Mythie added 2 commits February 18, 2026 13:54

chore: tidy baselines remove local report

ca39f3e

Merge branch 'main' into perf/splitting-optimizations

1ab50dc

vercel bot deployed to Preview February 18, 2026 02:58 View deployment

ci: simplify PR benchmark workflow

d2b0741

Just run benchmarks and post results as a PR comment. No base comparison — check manually if needed.

vercel bot deployed to Preview February 18, 2026 03:05 View deployment

Mythie added 2 commits February 18, 2026 14:06

ci: run all benchmarks, use collapsible sections in PR comment

cbf5f98

ci: use actual runner info instead of hardcoded string

44141fc

Mythie merged commit 0a1bc71 into main Feb 18, 2026
5 of 6 checks passed

vercel bot deployed to Preview February 18, 2026 03:09 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: 2x faster page splitting and extraction#30

Performance: 2x faster page splitting and extraction#30
Mythie merged 10 commits intomainfrom
perf/splitting-optimizations

Mythie commented Feb 18, 2026

Uh oh!

vercel bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Feb 18, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Mythie commented Feb 18, 2026

What changed

Numbers

Also included

Uh oh!

vercel bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions bot commented Feb 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmark Results

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vercel bot commented Feb 18, 2026 •

edited

Loading

github-actions bot commented Feb 18, 2026 •

edited

Loading