-
Notifications
You must be signed in to change notification settings - Fork 43
Open
Description
We have an issue where we would like this library to allow us to handle slightly corrupted PDF:s better. We have gotten this error in production twice when parsing PDF:s:
{
"stack": "Error: AES ciphertext length must be multiple of 16, got 7204\n at aesDecrypt (/dist/main.js:1473705:55)\n at AES256Handler.decrypt (/dist/main.js:1473956:12)\n at AES256Handler.decryptString (/dist/main.js:1473674:17)\n at StandardSecurityHandler.decryptString (/dist/main.js:1474436:40)\n at decryptObject (/dist/main.js:1475806:74)\n at decryptObject (/dist/main.js:1475824:74)\n at Object.getObject (/dist/main.js:1475842:55)\n at ObjectRegistry.resolver (/dist/main.js:1481323:43)\n at ObjectRegistry.resolve (/dist/main.js:1472836:24)\n at PDFContext.resolve (/dist/main.js:1476967:26)",
"message": "AES ciphertext length must be multiple of 16, got 7204"
}After inspecting the source code with claude code I got this response:
Root Cause
The error originates at src/security/ciphers/aes.ts:74. The aesDecrypt function:
1. Receives the raw data (IV + ciphertext)
2. Extracts the 16-byte IV from the front
3. Strictly validates that the remaining ciphertext is a multiple of 16 bytes (AES block size)
4. Throws if it's not
In your case, the ciphertext portion is 7204 bytes (7204 % 16 = 4 extra bytes), so the check fails.
Why It Happens
The full call chain has zero error handling:
document-parser.ts:542 → securityHandler.decryptString(obj.bytes, ...) // no try-catch
→ standard-handler.ts:455 → this.handlers.stringHandler.decryptString(...)
→ abstract.ts:36 → this.decrypt(...)
→ aes256.ts:24 → aesDecrypt(this.fileKey, data)
→ aes.ts:74 → 💥 throws Error
No layer in this chain catches the error, so it propagates straight to the caller.
Why the Data Is Misaligned
This typically happens with real-world PDFs when:
- Buggy PDF generators didn't properly PKCS#7-pad the data before encryption
- String boundary parsing grabbed slightly too many or too few bytes from the hex/literal string
- Corruption — a few bytes were lost or added during processing
- Mixed encryption — the PDF claims AES but some objects were written with RC4 or no encryption
Comparison to pdf.js / PDFBox
Both reference libraries handle this gracefully rather than crashing:
- pdf.js wraps decryption in try-catch and returns the original (undecrypted) bytes on failure
- PDFBox catches exceptions, logs a warning, and continues parsing
The Fix
Two things should happen, following the project's "be super lenient" design principle:
1. aesDecrypt should truncate to the nearest block boundary instead of throwing — this way the maximum amount of data can still be decrypted
2. decryptObject should wrap decryption in try-catch as a safety net — if decryption fails for any reason, return the original object with a warning instead of crashing
Perhaps we should have the same approach as PDFBox to continue parsing the pdf even if there is a misalignment? It seems to align with what you want this library to become.
Thanks for making an awesome PDF lib!!
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels