-
Notifications
You must be signed in to change notification settings - Fork 6.7k
Add interpolation search as an alternative to binary search #14247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
9d8f304 to
c9220ee
Compare
c9220ee to
9c7d8ae
Compare
|
@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D91063163. |
|
@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D91063163. |
|
@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D91063163. |
| if (raw_key_.IsUserKey()) { | ||
| // Need to add this counter here because .user_comparator() points to the | ||
| // raw comparator and not the wrapper than increments this counter. | ||
| PERF_COUNTER_ADD(user_key_comparison_count, 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing I noticed is that adding this perf counter regressed benchmark performance slightly, not sure if it's desirable to keep this
|
Discussing in direct chat |
|
@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D91063163. |
Summary
Interpolation search is an alternative algorithm to binary search, which performs better on uniformly distributed keys. Instead of binary search always computing the mid point of the left and right boundaries, interpolation search "interpolates" the mid point based on the distance to the target. Fortunately, we can re-use existing block format to support interpolation search.
Interpolation search is usually done with numerical target values. With variable length binary keys, we compute the "value" of the key as the big endian value of the first 8 bytes. If the key is < 8 bytes, then we pad right with 0s. This also means interpolation search would only really be effective for bytewise comparator. If interpolation search detects a non-bytewise comparator OR if the the prefix of left == prefix of right, then we fallback to classic binary search. Interpolation search is significantly more computationally expensive than binary search, so when the search distance is small, we also fallback to binary search. Interpolation search also performs best when there is minimal shortening, especially shortening of the last block, as it can heavily skew the distribution of the actual keys.
Note that each search algorithm is guaranteed to make progress because at each iteration the search space is guaranteed to be reduce by at least 1.
Other minor changes
CompareCurrentKeydid not properly incrementuser_key_comparison_countwhenraw_key_was a user key, so I manually added a PERF_COUNTER_ADD to itTest Plan
Updated unit tests and crash test with new search option
Benchmark
The default benchmark sets up keys in generally uniform distribution, so it was a good way to test performance improvements.
Setup:
./db_bench -benchmarks=fillseq,compact -index_shortening_mode=1