Open
Conversation
Mytherin
added a commit
to duckdb/duckdb
that referenced
this pull request
Aug 6, 2025
…value for key comparison (#18374) This PR fixes the salt collision handling for the Join. The HashTable first checks whether the upper 16 bits of the hash (= salt) match between the probing tuple and the key in the slot. If these match, the complete keys are compared. The keys can still be distinct if the salts collide. In this case, we need to continue probing. There was an error in this logic, which led to salts and offsets of probing tuples being combined incorrectly when there were many collisions (See #17891) . For example, this can happen with strings that don't result in a high-entropy hash, such as genome strings. The approach for this fix is systematically different from previous ones. Instead of adding the salt later only for the keys that did not match, the salt and offsets are now always stored for all tuples that require a key comparison. This means one instruction more per match, but it also makes the whole process logically easier. To fully stress test this, I have a PR that always extracts the salt to zero: gropaul#22 Here, the salt will never lead to faster linear probing and constantly collide, which stress-tests the new logic. All the logic tests are green for this one. I would suggest that such a test should be included in the base DuckDB CI, but I am unsure how to add it.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.