Open
Conversation
mojadita
reviewed
Oct 23, 2024
| if (ch >= 'A' && ch <= 'Z') ch += 'a' - 'A'; | ||
| hash += hash * 67 + ch - 113; | ||
| hash = hash * 67 + ch - 113; | ||
| } |
Collaborator
There was a problem hiding this comment.
What was the problem with that hash calculation?
Author
Collaborator
|
For me is ok. I have not checked the spreadness of this or the other
approach, it simply sounds strange to me. I've seen that approach in the
hash proposals made by NetBeans on Java classes to calculate hashes, and
they initialize to a prime number (instead of 0) and apply the `=`
operator, as you do.
*Luis Colorado Urcola*
*Hakarinne 2, N170*
*02100 EspooFI*
*Cellphone: +358 41 31 82 147*
El jue, 24 oct 2024 a las 1:22, Martin ***@***.***>)
escribió:
… I stumbled over the issue #7 <#7>
reported by @aimfiz <https://github.com/aimfiz> who spotted a weakness
(just a weakness, not a problem) in the hash calculation that I fixed for
my local build. I've provided the simple change as a PR in case you want to
use it.
—
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCU2OZQ24ZD4TB72E3HYGTZ5AORBAVCNFSM6AAAAABQHUBQ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZTGU3TONJUGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
|
Please see the comment at the top of zmac.y:
gwp 10-4-22 Much better symbol table hash function from Al Petrofsky as
used in gcc and gas.
What I suggested, and George intended, was to switch from the primitive
hash function zmac had used since 1978 to the hash function gcc has used
since 1999. That would have meant changing this line:
hash += ch;
to this:
hash = hash * 67 + ch - 113;
but a one-character editing mistake resulted in this:
hash += hash * 67 + ch - 113;
which is equivalent to this:
hash = hash * 68 + ch - 113;
This is clearly inferior, because 68^16 = 0 mod 2^32, and therefore only
the last 16 characters of a symbol contribute to the hash value at all (if
longs are 32 bits).
I suggest switching to the correct multiplier (67) and perhaps also
throwing in a 32-bit mask:
hash = (hash * 67 + ch - 113) & 0xffffffff;
I suggest the mask because: (1) this will make bugs more reproducible
across platforms; and (2) while it would seem that allowing more bits would
only improve performance, that sort of intuition about hash and
random-number functions is often wrong, and the 32-bit version is the one
that's backed by decades of experience with gcc.
…On Thu, Oct 24, 2024 at 6:44 AM Luis Colorado ***@***.***> wrote:
For me is ok. I have not checked the spreadness of this or the other
approach, it simply sounds strange to me. I've seen that approach in the
hash proposals made by NetBeans on Java classes to calculate hashes, and
they initialize to a prime number (instead of 0) and apply the `=`
operator, as you do.
*Luis Colorado Urcola*
*Hakarinne 2, N170*
*02100 EspooFI*
*Cellphone: +358 41 31 82 147*
El jue, 24 oct 2024 a las 1:22, Martin ***@***.***>)
escribió:
> I stumbled over the issue #7 <#7>
> reported by @aimfiz <https://github.com/aimfiz> who spotted a weakness
> (just a weakness, not a problem) in the hash calculation that I fixed
for
> my local build. I've provided the simple change as a PR in case you want
to
> use it.
>
> —
> Reply to this email directly, view it on GitHub
> <#12 (comment)>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/ACCU2OZQ24ZD4TB72E3HYGTZ5AORBAVCNFSM6AAAAABQHUBQ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZTGU3TONJUGE>
> .
> You are receiving this because you commented.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/A747EZM3QBKS5WWJNL5FHC3Z5DFQFAVCNFSM6AAAAABQHUBQ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZUHEZDMOJVGM>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
|
Looking at the rest of the function, I see it would be equivalent and
possibly faster to just do the masking once, after the loop, like so:
unsigned long hash = 0;
char *p = name;
while (*p) {
char ch = *p++;
if (ch >= 'A' && ch <= 'Z') ch += 'a' - 'A';
hash = hash * 67 + ch - 113;
}
/* Mask to 32 bits before remaindering, so that the result is
the same regardless of how long a long is. */
hash = (hash & 0xffffffff) % table_size;
ip = &table[hash];
…On Thu, Oct 24, 2024 at 1:34 PM Al Petrofsky ***@***.***> wrote:
Please see the comment at the top of zmac.y:
gwp 10-4-22 Much better symbol table hash function from Al Petrofsky
as used in gcc and gas.
What I suggested, and George intended, was to switch from the primitive
hash function zmac had used since 1978 to the hash function gcc has used
since 1999. That would have meant changing this line:
hash += ch;
to this:
hash = hash * 67 + ch - 113;
but a one-character editing mistake resulted in this:
hash += hash * 67 + ch - 113;
which is equivalent to this:
hash = hash * 68 + ch - 113;
This is clearly inferior, because 68^16 = 0 mod 2^32, and therefore only
the last 16 characters of a symbol contribute to the hash value at all (if
longs are 32 bits).
I suggest switching to the correct multiplier (67) and perhaps also
throwing in a 32-bit mask:
hash = (hash * 67 + ch - 113) & 0xffffffff;
I suggest the mask because: (1) this will make bugs more reproducible
across platforms; and (2) while it would seem that allowing more bits would
only improve performance, that sort of intuition about hash and
random-number functions is often wrong, and the 32-bit version is the one
that's backed by decades of experience with gcc.
On Thu, Oct 24, 2024 at 6:44 AM Luis Colorado ***@***.***>
wrote:
> For me is ok. I have not checked the spreadness of this or the other
> approach, it simply sounds strange to me. I've seen that approach in the
> hash proposals made by NetBeans on Java classes to calculate hashes, and
> they initialize to a prime number (instead of 0) and apply the `=`
> operator, as you do.
>
> *Luis Colorado Urcola*
> *Hakarinne 2, N170*
>
> *02100 EspooFI*
> *Cellphone: +358 41 31 82 147*
>
>
> El jue, 24 oct 2024 a las 1:22, Martin ***@***.***>)
> escribió:
>
> > I stumbled over the issue #7 <#7>
> > reported by @aimfiz <https://github.com/aimfiz> who spotted a weakness
> > (just a weakness, not a problem) in the hash calculation that I fixed
> for
> > my local build. I've provided the simple change as a PR in case you
> want to
> > use it.
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <#12 (comment)>, or
> > unsubscribe
> > <
> https://github.com/notifications/unsubscribe-auth/ACCU2OZQ24ZD4TB72E3HYGTZ5AORBAVCNFSM6AAAAABQHUBQ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZTGU3TONJUGE>
>
> > .
> > You are receiving this because you commented.Message ID:
> > ***@***.***>
> >
>
> —
> Reply to this email directly, view it on GitHub
> <#12 (comment)>, or
> unsubscribe
> <https://github.com/notifications/unsubscribe-auth/A747EZM3QBKS5WWJNL5FHC3Z5DFQFAVCNFSM6AAAAABQHUBQ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZUHEZDMOJVGM>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
|
Collaborator
|
Very good answer, thanks for it. The nonprime factor actually results in a
bad hash... good reasoning. I didn't realize on that. Thanks.
*Luis Colorado Urcola*
*Hakarinne 2, N170*
*02100 EspooFI*
*Cellphone: +358 41 31 82 147*
El jue, 24 oct 2024 a las 20:35, aimfiz ***@***.***>)
escribió:
… Please see the comment at the top of zmac.y:
gwp 10-4-22 Much better symbol table hash function from Al Petrofsky as
used in gcc and gas.
What I suggested, and George intended, was to switch from the primitive
hash function zmac had used since 1978 to the hash function gcc has used
since 1999. That would have meant changing this line:
hash += ch;
to this:
hash = hash * 67 + ch - 113;
but a one-character editing mistake resulted in this:
hash += hash * 67 + ch - 113;
which is equivalent to this:
hash = hash * 68 + ch - 113;
This is clearly inferior, because 68^16 = 0 mod 2^32, and therefore only
the last 16 characters of a symbol contribute to the hash value at all (if
longs are 32 bits).
I suggest switching to the correct multiplier (67) and perhaps also
throwing in a 32-bit mask:
hash = (hash * 67 + ch - 113) & 0xffffffff;
I suggest the mask because: (1) this will make bugs more reproducible
across platforms; and (2) while it would seem that allowing more bits
would
only improve performance, that sort of intuition about hash and
random-number functions is often wrong, and the 32-bit version is the one
that's backed by decades of experience with gcc.
On Thu, Oct 24, 2024 at 6:44 AM Luis Colorado ***@***.***>
wrote:
> For me is ok. I have not checked the spreadness of this or the other
> approach, it simply sounds strange to me. I've seen that approach in the
> hash proposals made by NetBeans on Java classes to calculate hashes, and
> they initialize to a prime number (instead of 0) and apply the `=`
> operator, as you do.
>
> *Luis Colorado Urcola*
> *Hakarinne 2, N170*
>
> *02100 EspooFI*
> *Cellphone: +358 41 31 82 147*
>
>
> El jue, 24 oct 2024 a las 1:22, Martin ***@***.***>)
> escribió:
>
> > I stumbled over the issue #7 <#7>
> > reported by @aimfiz <https://github.com/aimfiz> who spotted a
weakness
> > (just a weakness, not a problem) in the hash calculation that I fixed
> for
> > my local build. I've provided the simple change as a PR in case you
want
> to
> > use it.
> >
> > —
> > Reply to this email directly, view it on GitHub
> > <#12 (comment)>, or
> > unsubscribe
> > <
>
https://github.com/notifications/unsubscribe-auth/ACCU2OZQ24ZD4TB72E3HYGTZ5AORBAVCNFSM6AAAAABQHUBQ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZTGU3TONJUGE>
>
> > .
> > You are receiving this because you commented.Message ID:
> > ***@***.***>
> >
>
> —
> Reply to this email directly, view it on GitHub
> <#12 (comment)>, or
> unsubscribe
> <
https://github.com/notifications/unsubscribe-auth/A747EZM3QBKS5WWJNL5FHC3Z5DFQFAVCNFSM6AAAAABQHUBQ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZUHEZDMOJVGM>
> .
> You are receiving this because you were mentioned.Message ID:
> ***@***.***>
>
—
Reply to this email directly, view it on GitHub
<#12 (comment)>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/ACCU2O4IIBSZNWJAWG44NMLZ5EVVVAVCNFSM6AAAAABQHUBQ4SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDIMZVHEZDSMZQGE>
.
You are receiving this because you commented.Message ID:
***@***.***>
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I've implemented the proposed fix for #7 in my fork, maybe you want to pull it too.