[WIP] implement slice-by-4 with SIMD instructions#21
Closed
Conversation
Owner
Author
|
Closing this PR in favor of the other one |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I need to run more comprehensive performance tests in various browsers (the benchmark in the README has been out of sync for a long time, I know), but preliminary results in Deno show a ~28% speedup of the CRC32 computation for the relatively small price of 0.5 kB of code (including the fallback for browsers without SIMD support).
The performance gain might increase someday, but the SIMD proposal does not (so far) include
loadthat'd take 4 separate memargs as a vector (that one is unlikely to ever exist anyway)therefore, I could not vectorise all of the slice-by-4 at this time.
Anyway, 28% is already a significant improvement to the hottest part of the code.
That being said, the way I constructed the WASM binary in crc32.ts is really bad for maintenance. It's probably better to generate two full binaries and use a dynamic import (though that would prevent the synchronous import of
downloadZip). Or make the SIMD optimisation an add-on with a Promise, while the basic implementation would remain hard-coded. I'd love to have some feedback about that issue.