Add recall@k metric to simple bench and milvus-cluster.yaml config#7
Add recall@k metric to simple bench and milvus-cluster.yaml config#7
Conversation
…ch to reflect accuracy trade-off between search configs
|
Functionally, this works, but the recall stats are going to mess with the performance numbers since each batch now has 2 queries associated with it -- the actual query and the higher limit query. I did a quick test and this change shows throughput dropping by 1/2 since we're doubling queries but not counting them. Calculating the ground truth needs to happen outside the timed portion of the benchmark. The way vectordb bench does this is by pre-calculating the ground truth for the queries they will execute. The other option would be to capture each query vector and responses to an output file then after the benchmark run, calculate the ground truth from the record of the queries & responses. And we can't use an ANN index to calculate the ground truth as the recall metric is effectively trying to measure if the ANN is accurate. Capturing more results doesn't guarantee that we've got the 'ground truth' as the algorithm could be bad all the way down. We would need to generate the list of queries ahead of time and do ground truth from the set of queries outside Milvus or duplicate the collection but use FLAT for the index which gives brute-force results and will give ground truth accuracy (I think) As it stands we can't merge this in as it greatly effects the measured performance. What's the primary goal of the recall metric? Since we're using synthetic data we know our recall is going to be wonky. |
| limit=10, | ||
| output_fields=["id"] | ||
| ) | ||
|
|
There was a problem hiding this comment.
The way timing is measured in your patch, has an issue that will affect your benchmark results.
Goal of this script is to report how fast the Primary Search (ef=50) is.
Place the "batch_end" time measurement before the ground_truth search. This should ideally fix the performance issue !
|
Addresses review comments for this PR and updated recall calculation in the storage repo TF_VectorDB branch PR here: mlcommons/storage@f9ab288 . Please review. Thank you! |
calculate_recall()function that computesrecall@kbetween search results and ground truthDetails: