Skip to content
Anup Ghatage edited this page Feb 12, 2026 · 5 revisions

Zeppelin

Zeppelin

S3-native vector search engine.
Object storage is the source of truth. Nodes are stateless.

Key Features

  • S3-Native — All data lives in object storage (S3, GCS, Azure Blob, local filesystem). No local databases. Nodes are disposable.
  • IVF Indexing — IVF-Flat, IVF-SQ8 (4x compression), IVF-PQ (16-32x compression), and Hierarchical ANN indexes.
  • BM25 Full-Text Search — Inverted indexes with configurable tokenization, stemming, and multi-field rank_by expressions. Combine vector and keyword search.
  • Bitmap Pre-Filters — RoaringBitmap indexes for sub-millisecond attribute filtering before vector search.
  • Strong Consistency — WAL-based writes with CAS (compare-and-swap) on manifest. Strong or eventual consistency per query.
  • Formally Verified — 15 TLA+ specifications verify correctness of compaction, multi-writer leases, BM25 consistency, and more.

At a Glance

  • 198 unit tests across all modules
  • 15 TLA+ specifications with 26,000+ explored states
  • 6 stress tests and 6 end-to-end performance tests
  • Written in Rust (1.84+), async with Tokio, HTTP via Axum 0.7
  • Apache 2.0 license

Quick Links

Section Description
Getting Started Docker quickstart, first API calls
API Reference Complete HTTP API with JSON schemas
Configuration All config options, env vars, TOML format
Filters Filter query language with examples
Full-Text Search BM25 ranking, tokenization, rank_by expressions
Python SDK Python client guide
TypeScript SDK TypeScript client guide
Architecture System design, data flow, S3 key structure
Indexing Vector indexing internals (IVF, SQ8, PQ, H-ANN)
WAL & Compaction Write-ahead log, manifest, compaction pipeline
Deployment Docker, EC2, production tuning
Contributing Build, test, CI pipeline, code style
Formal Verification TLA+ specs and findings
Metrics Reference All Prometheus metrics

Clone this wiki locally