Skip to content

Conversation

@derekbruening
Copy link

Adds two new chunk types to the Snappy framing format: compressed data
without a checksum, and uncompressed data without a checksum. These
types are identical to their existing counterparts except they do not
contain a CRC-32C checksum. Essentially, this makes including
checksums for each data chunk optional rather than required.

In some use cases, computing the CRC-32C checksums for the data chunks
in the Snappy framing format ends up dominating execution time.
Eliminating the checksums provides massive 2.5x performance
improvements in our uses of Snappy for compressing address trace data
prior to storing to disk.

Existing readers of the Snappy framing format would be expected to
fail up front on an unknown chunk type when encountering the new
types, until updated to handle them, which should be a simple coding
change.

Adds two new chunk types to the Snappy framing format: compressed data
without a checksum, and uncompressed data without a checksum.  These
types are identical to their existing counterparts except they do not
contain a CRC-32C checksum.  Essentially, this makes including
checksums for each data chunk optional rather than required.

In some use cases, computing the CRC-32C checksums for the data chunks
in the Snappy framing format ends up dominating execution time.
Eliminating the checksums provides massive 2.5x performance
improvements in our uses of Snappy for compressing address trace data
prior to storing to disk.

Existing readers of the Snappy framing format would be expected to
fail up front on an unknown chunk type when encountering the new
types, until updated to handle them, which should be a simple coding
change.
@derekbruening
Copy link
Author

@pwnall PTAL

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant