Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
72 changes: 71 additions & 1 deletion vdb_benchmark/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -56,12 +56,82 @@ The benchmark process consists of three main steps:

### Step 1: Load Vectors into the Database
Use the load_vdb.py script to generate and load 10 million vectors into your vector database: (this process can take up to 8 hours)

#### Default/ Standard Mode

##### Basic execution with config file
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/10m_diskann.yaml
```

##### With explicit parameters (no config)
```bash
python vdbbench/load_vdb.py --collection-name benchmark_test \
--dimension 1536 \
--num-vectors 1000000 \
--batch-size 10000
```

##### Override config values
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/10m_diskann.yaml \
--collection-name custom_collection \
--num-vectors 500000 \
--force
```

##### With reproducible seed
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/10m_diskann.yaml \
--seed 42
```

#### Adaptive Mode (Memory-Aware Batch Sizing)

##### Enable adaptive batching (auto-scales based on memory pressure)
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/100m_diskann.yaml \
--adaptive
```

##### With explicit memory budget
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/100m_diskann.yaml \
--adaptive \
--memory-budget 4G
```
##### Adaptive with smaller budget for constrained systems
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/100m_diskann.yaml \
--adaptive \
--memory-budget 2G \
--batch-size 5000
```

#### Disk-Backed Mode (Billion-Scale / Low Memory)

##### Use memory-mapped temp file (default temp directory)
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/1b_diskann.yaml \
--disk-backed
```

##### Specify fast NVMe for temp storage
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/1b_diskann.yaml \
--disk-backed \
--temp-dir /mnt/nvme/tmp
```

##### Disk-backed with seed for reproducibility
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/1b_diskann.yaml \
--disk-backed \
--temp-dir /mnt/nvme/tmp \
--seed 12345
```

For testing, I recommend using a smaller data by passing the num_vectors option:
For testing, we recommend using a smaller data by passing the num_vectors option:
```bash
python vdbbench/load_vdb.py --config vdbbench/configs/10m_diskann.yaml --collection-name mlps_500k_10shards_1536dim_uniform_diskann --num-vectors 500000
```
Expand Down
3 changes: 2 additions & 1 deletion vdb_benchmark/pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,8 @@ dependencies = [
"pandas",
"pymilvus",
"pyyaml",
"tabulate"
"tabulate",
"psutil"
]

[project.urls]
Expand Down
Loading