- exhaustive searches on a benchmark dataset (wikipedia) - compare recall/latency - compare against pure python, faiss, etc.