Add recall@k metric to simple bench and milvus-cluster.yaml config by idevasena · Pull Request #7 · wvaske/vdb-bench

idevasena · 2025-09-19T10:21:25Z

CSV Fields Updated: Added recall_at_10, recall_at_5, recall_at_1 to output data
Recall Calculation Function: New calculate_recall() function that computes recall@k between search results and ground truth
Brute Force Ground Truth: For each query batch, performs both:
- Regular search with index (ef: 50)
- Brute force search as ground truth (ef: 1000)
Batch-Level Recall: Calculates recall for each query in batch, then averages across the batch
Added recall statistics:
- Mean, median, min, max for recall@1, recall@5, recall@10
Output Display: New "RECALL STATISTICS" section in benchmark summary

Details:

Recall@k = (Number of relevant items retrieved in top-k) / (Total relevant items in top-k of ground truth)
Ground truth comes from brute force search (for exhaustive comparison)

…ch to reflect accuracy trade-off between search configs

wvaske · 2025-10-13T18:56:38Z

Functionally, this works, but the recall stats are going to mess with the performance numbers since each batch now has 2 queries associated with it -- the actual query and the higher limit query. I did a quick test and this change shows throughput dropping by 1/2 since we're doubling queries but not counting them.

Calculating the ground truth needs to happen outside the timed portion of the benchmark. The way vectordb bench does this is by pre-calculating the ground truth for the queries they will execute. The other option would be to capture each query vector and responses to an output file then after the benchmark run, calculate the ground truth from the record of the queries & responses.

And we can't use an ANN index to calculate the ground truth as the recall metric is effectively trying to measure if the ANN is accurate. Capturing more results doesn't guarantee that we've got the 'ground truth' as the algorithm could be bad all the way down.

We would need to generate the list of queries ahead of time and do ground truth from the set of queries outside Milvus or duplicate the collection but use FLAT for the index which gives brute-force results and will give ground truth accuracy (I think)

As it stands we can't merge this in as it greatly effects the measured performance.

What's the primary goal of the recall metric? Since we're using synthetic data we know our recall is going to be wonky.

krish-reddy · 2025-12-08T17:53:06Z

vdbbench/simple_bench.py

                        limit=10,
                        output_fields=["id"]
                    )
+


The way timing is measured in your patch, has an issue that will affect your benchmark results.
Goal of this script is to report how fast the Primary Search (ef=50) is.

Place the "batch_end" time measurement before the ground_truth search. This should ideally fix the performance issue !

idevasena · 2026-02-10T16:04:27Z

Addresses review comments for this PR and updated recall calculation in the storage repo TF_VectorDB branch PR here: mlcommons/storage@f9ab288 . Please review. Thank you!

idevasena added 7 commits June 3, 2025 08:18

initial pgvector test cases

43d126b

single node milvus cluster yaml

45e7f96

Merge branch 'main' into di/vdb-test

8857d1e

added recall stats calc between search results and ground truth results

c6f9003

fixed recall metric implementation with primary and ground truth sear…

8b03e05

…ch to reflect accuracy trade-off between search configs

omitting pgvector support for now

af4adb6

clean up

e8d7587

idevasena requested a review from wvaske September 19, 2025 12:23

krish-reddy reviewed Dec 8, 2025

View reviewed changes

idevasena mentioned this pull request Feb 18, 2026

Add recall metrics to VDB benchmark script mlcommons/storage#245

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Add recall@k metric to simple bench and milvus-cluster.yaml config#7

Add recall@k metric to simple bench and milvus-cluster.yaml config#7
idevasena wants to merge 7 commits intomainfrom
di/vdb-test

idevasena commented Sep 19, 2025

Uh oh!

wvaske commented Oct 13, 2025

Uh oh!

krish-reddy Dec 8, 2025

Uh oh!

idevasena commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Comments

Conversation

idevasena commented Sep 19, 2025

Uh oh!

wvaske commented Oct 13, 2025

Uh oh!

krish-reddy Dec 8, 2025

Choose a reason for hiding this comment

Uh oh!

idevasena commented Feb 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants