Add a way to obtain internal statistics and parameters of an HNSW index#594
Open
mbautin wants to merge 1 commit intonmslib:masterfrom
Open
Add a way to obtain internal statistics and parameters of an HNSW index#594mbautin wants to merge 1 commit intonmslib:masterfrom
mbautin wants to merge 1 commit intonmslib:masterfrom
Conversation
mbautin
added a commit
to yugabyte/yugabyte-db
that referenced
this pull request
Oct 22, 2024
Summary: Fixing some inconsistencies in index parameters that are causing a discrepancy between Usearch and Hnswlib performance: - Correctly specifying connectivity for hnswlib as num_neighbors_per_vertex instead of max_neighbors_per_vertex. - Passing the ef option into hnswlib configuration. Adding internal statistics introspection to Usearch and Hnswlib index wrappers. PR for hnswlib changes: nmslib/hnswlib#594. PR for usearch changes: unum-cloud/USearch#508 Also allow specifying multiple values of k to pass in as input, as long as they are not greater than the precomputed ground truth result list size. Updating hnsw_tool to always convert uint8_t coordinates to float32 when using Hnswlib to have a fair comparison with Usearch on the SIFT1B dataset. Usearch does not currently support the uint8_t type natively. The changes to src/inline-thirdparty will be pushed as separate commits generated by `build-support/thirdparty_tool --sync-inline-thirdparty`. Test Plan: Jenkins Manual testing using hnsw_tool - hnswlib: https://gist.githubusercontent.com/mbautin/d21580dcac0b51ad2d7bc9fc130c5f9e/raw ``` Hnswlib index with 5 levels max_elements: 1000000 M: 16 maxM: 16 maxM0: 32 ef_construction: 128 ef: 10 mult: 0.360674 Level 0: 1000000 nodes, 21613828 edges, 21.61 average edges per node Level 1: 62323 nodes, 885027 edges, 14.20 average edges per node Level 2: 3855 nodes, 50515 edges, 13.10 average edges per node Level 3: 238 nodes, 2543 edges, 10.68 average edges per node Level 4: 17 nodes, 244 edges, 14.35 average edges per node Totals: 1066433 nodes, 22552157 edges, 21.15 average edges per node i-recall @ 50, i=1..10: 1-recall @ 50: 0.9695000052 2-recall @ 50: 0.9645000100 3-recall @ 50: 0.9604333043 4-recall @ 50: 0.9568499923 5-recall @ 50: 0.9541400075 6-recall @ 50: 0.9504333138 7-recall @ 50: 0.9467428327 8-recall @ 50: 0.9435999990 9-recall @ 50: 0.9406333566 10-recall @ 50: 0.9377999902 ``` - usearch: https://gist.githubusercontent.com/mbautin/74948b310780562e74831eb29e43cb13/raw ``` Usearch index with 4 levels connectivity: 16 connectivity_base: 32 expansion_add: 128 expansion_search: 10 inverse_log_connectivity: 0.360674 Level 0: 1000000 nodes, 20973352 edges, 20.97 average edges per node Level 1: 64036 nodes, 890428 edges, 13.91 average edges per node Level 2: 5090 nodes, 66295 edges, 13.02 average edges per node Level 3: 481 nodes, 5304 edges, 11.03 average edges per node Totals: 1069607 nodes, 21935379 edges, 20.51 average edges per node i-recall@50, i=1..10: 1-recall @ 40: 0.9305999875 2-recall @ 40: 0.9201999903 3-recall @ 40: 0.9141333103 4-recall @ 40: 0.9085000157 5-recall @ 40: 0.9036399722 6-recall @ 40: 0.8987166882 7-recall @ 40: 0.8932142854 8-recall @ 40: 0.8890249729 9-recall @ 40: 0.8852999806 10-recall @ 40: 0.8813199997 ``` Reviewers: sergei, aleksandr.ponomarenko Reviewed By: sergei, aleksandr.ponomarenko Subscribers: ybase Differential Revision: https://phorge.dev.yugabyte.com/D38977
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This allows us to diagnose potential configuration issues and compare the statistics of the internal structure of the index against other HNSW implementations.
Also allow specifying ef as a constructor parameter of HierarchicalNSW.