Skip to content

Configuration

Anup Ghatage edited this page Feb 12, 2026 · 1 revision

Configuration

Zeppelin is configured through environment variables, a TOML config file, or both.

Loading Priority

  1. Environment variables — highest priority, override everything
  2. TOML config file — set via ZEPPELIN_CONFIG env var
  3. Built-in defaults — used when no override exists

Server

Field Env Var Default Description
host ZEPPELIN_HOST 0.0.0.0 Listen address
port ZEPPELIN_PORT 8080 Listen port
request_timeout_secs ZEPPELIN_REQUEST_TIMEOUT_SECS 30 HTTP request timeout
max_concurrent_queries ZEPPELIN_MAX_CONCURRENT_QUERIES 64 Max in-flight queries
max_batch_size ZEPPELIN_MAX_BATCH_SIZE 10000 Max vectors per upsert
max_top_k ZEPPELIN_MAX_TOP_K 10000 Max results per query
shutdown_timeout_secs ZEPPELIN_SHUTDOWN_TIMEOUT_SECS 30 Graceful shutdown timeout
max_dimensions ZEPPELIN_MAX_DIMENSIONS 65536 Max vector dimensions
max_vector_id_length ZEPPELIN_MAX_VECTOR_ID_LENGTH 1024 Max vector ID length (chars)
max_request_body_mb ZEPPELIN_MAX_REQUEST_BODY_MB 50 Max HTTP request body size (MB)

Storage

Field Env Var Default Description
backend STORAGE_BACKEND s3 Storage backend: s3, gcs, azure, local
bucket S3_BUCKET zeppelin Bucket / container name
s3_region AWS_REGION AWS region
s3_endpoint S3_ENDPOINT Custom S3 endpoint (MinIO, R2, etc.)
s3_access_key_id AWS_ACCESS_KEY_ID AWS access key
s3_secret_access_key AWS_SECRET_ACCESS_KEY AWS secret key
s3_allow_http S3_ALLOW_HTTP false Allow HTTP (non-TLS) connections
gcs_service_account_path GCS_SERVICE_ACCOUNT_PATH Path to GCS service account JSON
azure_account AZURE_ACCOUNT Azure storage account name
azure_access_key AZURE_ACCESS_KEY Azure storage access key

Cache

Field Env Var Default Description
dir ZEPPELIN_CACHE_DIR /var/cache/zeppelin Local disk cache directory
max_size_gb ZEPPELIN_CACHE_MAX_SIZE_GB 50 Max cache size in GB
eviction lru Eviction policy (LRU only)

Indexing

Field Env Var Default Description
default_num_centroids ZEPPELIN_DEFAULT_NUM_CENTROIDS 256 K-means centroids for IVF
default_nprobe ZEPPELIN_DEFAULT_NPROBE 16 Default clusters to probe
max_nprobe 128 Maximum clusters to probe
kmeans_max_iterations 25 K-means convergence iterations
kmeans_convergence_epsilon 1e-4 K-means convergence threshold
oversample_factor 3 Centroid oversampling factor
quantization ZEPPELIN_QUANTIZATION none "none", "sq8" / "scalar", "pq" / "product"
pq_m 8 PQ subspace count
rerank_factor 4 Quantized search rerank multiplier
hierarchical ZEPPELIN_HIERARCHICAL false Use hierarchical ANN index
beam_width ZEPPELIN_BEAM_WIDTH 10 Beam width for H-ANN search
leaf_size ZEPPELIN_LEAF_SIZE H-ANN leaf cluster size
bitmap_index ZEPPELIN_BITMAP_INDEX true Build bitmap pre-filter indexes
fts_index ZEPPELIN_FTS_INDEX false Build full-text search indexes

Compaction

Field Env Var Default Description
interval_secs ZEPPELIN_COMPACTION_INTERVAL_SECS 30 Compaction check interval
max_wal_fragments_before_compact ZEPPELIN_MAX_WAL_FRAGMENTS 1000 Fragment threshold to trigger compaction
retrain_imbalance_threshold 5.0 Cluster imbalance ratio to retrain centroids

Consistency

Field Default Description
default strong Default query consistency: "strong" or "eventual"

Logging

Field Env Var Default Description
level RUST_LOG info Log level (standard RUST_LOG filter syntax)
format ZEPPELIN_LOG_FORMAT json Log format: "json" or "pretty"

Example TOML Config

Save as zeppelin.toml and set ZEPPELIN_CONFIG=./zeppelin.toml:

[server]
host = "0.0.0.0"
port = 8080
request_timeout_secs = 30
max_concurrent_queries = 64
max_batch_size = 10000
max_top_k = 10000
max_dimensions = 65536
max_request_body_mb = 50

[storage]
backend = "s3"
bucket = "zeppelin"
s3_region = "us-east-1"
# s3_endpoint = "http://minio:9000"  # Uncomment for MinIO

[cache]
dir = "/var/cache/zeppelin"
max_size_gb = 50

[indexing]
default_num_centroids = 256
default_nprobe = 16
quantization = "none"
hierarchical = false
bitmap_index = true
fts_index = false

[compaction]
interval_secs = 30
max_wal_fragments_before_compact = 1000

[consistency]
default = "strong"

[logging]
level = "info"
format = "json"

Environment Variable Quick Reference

# Server
ZEPPELIN_HOST=0.0.0.0
ZEPPELIN_PORT=8080
ZEPPELIN_REQUEST_TIMEOUT_SECS=30
ZEPPELIN_MAX_CONCURRENT_QUERIES=64
ZEPPELIN_MAX_BATCH_SIZE=10000
ZEPPELIN_MAX_TOP_K=10000
ZEPPELIN_SHUTDOWN_TIMEOUT_SECS=30
ZEPPELIN_MAX_DIMENSIONS=65536
ZEPPELIN_MAX_VECTOR_ID_LENGTH=1024
ZEPPELIN_MAX_REQUEST_BODY_MB=50

# Storage
STORAGE_BACKEND=s3
S3_BUCKET=zeppelin
AWS_REGION=us-east-1
S3_ENDPOINT=              # custom endpoint for MinIO/R2
AWS_ACCESS_KEY_ID=
AWS_SECRET_ACCESS_KEY=
S3_ALLOW_HTTP=false
GCS_SERVICE_ACCOUNT_PATH= # GCS backend
AZURE_ACCOUNT=            # Azure backend
AZURE_ACCESS_KEY=         # Azure backend

# Cache
ZEPPELIN_CACHE_DIR=/var/cache/zeppelin
ZEPPELIN_CACHE_MAX_SIZE_GB=50

# Indexing
ZEPPELIN_DEFAULT_NUM_CENTROIDS=256
ZEPPELIN_DEFAULT_NPROBE=16
ZEPPELIN_QUANTIZATION=none
ZEPPELIN_HIERARCHICAL=false
ZEPPELIN_BEAM_WIDTH=10
ZEPPELIN_LEAF_SIZE=
ZEPPELIN_BITMAP_INDEX=true
ZEPPELIN_FTS_INDEX=false

# Compaction
ZEPPELIN_COMPACTION_INTERVAL_SECS=30
ZEPPELIN_MAX_WAL_FRAGMENTS=1000

# Logging
RUST_LOG=info
ZEPPELIN_LOG_FORMAT=json

# Config file
ZEPPELIN_CONFIG=./zeppelin.toml

Clone this wiki locally