Skip to content

Conversation

@joshkang97
Copy link

@joshkang97 joshkang97 commented Jan 20, 2026

Summary

Interpolation search is an alternative algorithm to binary search, which performs better on uniformly distributed keys. Instead of binary search always computing the mid point of the left and right boundaries, interpolation search "interpolates" the mid point based on the distance to the target. Fortunately, we can re-use existing block format to support interpolation search.

Interpolation search is usually done with numerical target values. With variable length binary keys, we compute the "value" of the key as the big endian value of the first 8 bytes. If the key is < 8 bytes, then we pad right with 0s. This also means interpolation search would only really be effective for bytewise comparator. If interpolation search detects a non-bytewise comparator OR if the the prefix of left == prefix of right, then we fallback to classic binary search. Interpolation search is significantly more computationally expensive than binary search, so when the search distance is small, we also fallback to binary search. Interpolation search also performs best when there is minimal shortening, especially shortening of the last block, as it can heavily skew the distribution of the actual keys.

Note that each search algorithm is guaranteed to make progress because at each iteration the search space is guaranteed to be reduce by at least 1.

Other minor changes

  • CompareCurrentKey did not properly increment user_key_comparison_count when raw_key_ was a user key, so I manually added a PERF_COUNTER_ADD to it

Test Plan

Updated unit tests and crash test with new search option

Benchmark

The default benchmark sets up keys in generally uniform distribution, so it was a good way to test performance improvements.

Setup: ./db_bench -benchmarks=fillseq,compact -index_shortening_mode=1

./db_bench -use_existing_db=true -benchmarks=readrandom -perf_level=3 -seed=1 -index_search_type=interpolation_search
readrandom   :       2.705 micros/op 369624 ops/sec 2.705 seconds 1000000 operations;   40.9 MB/s (1000000 of 1000000 found)
 PERF_CONTEXT:
user_key_comparison_count = 12698466
./db_bench -use_existing_db=true -benchmarks=readrandom -perf_level=3 -seed=1 -index_search_type=binary_search
readrandom   :       2.992 micros/op 334175 ops/sec 2.992 seconds 1000000 operations;   37.0 MB/s (1000000 of 1000000 found)
 PERF_CONTEXT:
user_key_comparison_count = 26103307

@meta-cla meta-cla bot added the CLA Signed label Jan 20, 2026
@joshkang97 joshkang97 force-pushed the interpolation_seek branch 2 times, most recently from 9d8f304 to c9220ee Compare January 20, 2026 19:27
@meta-codesync
Copy link

meta-codesync bot commented Jan 20, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D91063163.

@meta-codesync
Copy link

meta-codesync bot commented Jan 21, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D91063163.

@joshkang97 joshkang97 marked this pull request as ready for review January 21, 2026 20:11
@joshkang97 joshkang97 requested a review from pdillinger January 21, 2026 20:11
@meta-codesync
Copy link

meta-codesync bot commented Jan 22, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D91063163.

if (raw_key_.IsUserKey()) {
// Need to add this counter here because .user_comparator() points to the
// raw comparator and not the wrapper than increments this counter.
PERF_COUNTER_ADD(user_key_comparison_count, 1);
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I noticed is that adding this perf counter regressed benchmark performance slightly, not sure if it's desirable to keep this

@pdillinger
Copy link
Contributor

Discussing in direct chat

@joshkang97 joshkang97 marked this pull request as draft January 26, 2026 21:16
@meta-codesync
Copy link

meta-codesync bot commented Jan 26, 2026

@joshkang97 has imported this pull request. If you are a Meta employee, you can view this in D91063163.

@joshkang97 joshkang97 marked this pull request as ready for review January 27, 2026 00:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants