-
Notifications
You must be signed in to change notification settings - Fork 0
File Hashing
Details on Maxlo file hashing.
On small files, simply calculate the SHA256 of the file. Provide this to the backend and it will provide you with a signed PUT to upload the file. The S3 backend will enforce SHA256 correctness.
We handle file integrity and hashing based on the S3 SHA256 hashing algorithm. In all supported S3 providers, we get SHA256 integrity per part. We leverage that to ensure multipart files have file integrity by hashing those hashes.
The Maxlo backend enforces all these requirements before providing signed upload URLs.
- Between 3 and 999 file parts.
- All parts must be the same size except for the last part that may be smaller.
- The first part must be at least 5MB.
- The sum of the
part_sizeof the parts must equal thefile_size. - The SHA256 of the bytes of the
part_hash's is equal to thefile_hash.
Additionally, you must have an active file in the appropriate volume to create a multipart upload. We do not support uploading expired versions of multipart files at this time.
Calculate the chunk size (see below) and then hash each part independantly.
Record the part_hash and part_size. Combine all the bytes of the
part_hashs into 1 buffer and then hash that buffer. The result is the
file_hash. You will need the part_hash and part_size for all the parts
to obtain signed upload parameters.
In general the following algorithm should be used to calculate the chunk size. The backend will record the chunk size for uploaded objects. For objects stored on the file system, its is not always possible to record the chunk size. As such, consistently calculating chunk sizes on all platforms significantly improves efficency of storage by preventing 2 identical files having different hashes due to different chunk sizes.
The algorithm roughly described:
- If the file is 16MB or smaller do not chunk it.
- Start at 8MB chunk size.
- If more than 900 chunks are required, multiply the chunk size by 64 and retry.
- The max chunk size is 5GB.
Given this the chunk sizes are:
- 8MB
- 512MB
- 5GB
Example in javascript:
const START_CHUNK_SIZE = 8 * 1024 * 1024;
const START_MULTIPART_SIZE = START_CHUNK_SIZE * 2;
const MAX_CHUNK_SIZE = 5 * 1024 * 1024 * 1024;
const MAX_PARTS = 900;
const CHUNK_MULT = 64;
const SIZE_BREAK1 = START_CHUNK_SIZE * MAX_PARTS;
const SIZE_BREAK2 = START_CHUNK_SIZE * CHUNK_MULT * MAX_PARTS;
exports.calcChunkSize = calcChunkSize;
exports.START_MULTIPART_SIZE = START_MULTIPART_SIZE;
function calcChunkSize(size) {
let chunk_size;
if (size <= START_MULTIPART_SIZE) {
chunk_size = START_MULTIPART_SIZE;
} else if (size <= SIZE_BREAK1) {
chunk_size = START_CHUNK_SIZE;
} else if (size <= SIZE_BREAK2) {
chunk_size = START_CHUNK_SIZE * CHUNK_MULT;
} else {
chunk_size = MAX_CHUNK_SIZE;
}
return chunk_size;
}