Skip to content

Comments

Salt always zero#22

Open
gropaul wants to merge 2 commits intomainfrom
salt-always-zero
Open

Salt always zero#22
gropaul wants to merge 2 commits intomainfrom
salt-always-zero

Conversation

@gropaul
Copy link
Owner

@gropaul gropaul commented Jul 22, 2025

No description provided.

Mytherin added a commit to duckdb/duckdb that referenced this pull request Aug 6, 2025
…value for key comparison (#18374)

This PR fixes the salt collision handling for the Join.

The HashTable first checks whether the upper 16 bits of the hash (=
salt) match between the probing tuple and the key in the slot. If these
match, the complete keys are compared. The keys can still be distinct if
the salts collide. In this case, we need to continue probing.

There was an error in this logic, which led to salts and offsets of
probing tuples being combined incorrectly when there were many
collisions (See #17891) . For
example, this can happen with strings that don't result in a
high-entropy hash, such as genome strings.

The approach for this fix is systematically different from previous
ones. Instead of adding the salt later only for the keys that did not
match, the salt and offsets are now always stored for all tuples that
require a key comparison. This means one instruction more per match, but
it also makes the whole process logically easier.

To fully stress test this, I have a PR that always extracts the salt to
zero: gropaul#22
Here, the salt will never lead to faster linear probing and constantly
collide, which stress-tests the new logic. All the logic tests are green
for this one. I would suggest that such a test should be included in the
base DuckDB CI, but I am unsure how to add it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant