This Python script automates the compression and linearization of PDF files using two open-source tools: Ghostscript and QPDF.
- Compress PDFs using Ghostscript: Offers high compression while maintaining original source quality.
- Linearize PDFs using QPDF: Enables fast web viewing of large PDF files.
- Batch Processing: Recursively scans directories for PDFs matching a given keyword.
- Automatic File Naming: Prevents overwriting and infinite loops with unique output filenames.
Ghostscript is used for compressing PDFs. After thorough testing, the following command options were chosen for best results based on the given case. But this can be changed based on the users needs. Full commands can be found at https://community.intersystems.com/post/pdf-compression-using-ghostscript
gs_commands = [
"gs", "-sDEVICE=pdfwrite",
"-dCompatibilityLevel=1.4",
f"-dPDFSETTINGS=/{quality}",
"-dNOPAUSE",
"-dBATCH",
f"-sOutputFile={output_path}",
input_path
]QPDF is used to linearize the compressed PDF, enabling faster loading on web browsers. The process to actually linearize a file is much simpler. More information on the commands can be found at https://www.linux-magazine.com/Issues/2019/226/Reconstruction
qpdf --linearize input.pdf output.pdf- Start: The script begins traversing the specified input directory (and all its subdirectories).
- Search: It looks for all .pdf files that contain a specified keyword in the filename.
- Compress: Matching files are compressed using Ghostscript.
- Linearize: The compressed PDF is then linearized using QPDF.
- Output: Results are saved with unique names in the specified output directory.
- Need Python 3.x
- Ghostscript
- QPDF
brew install ghostscript
brew install qpdfpython3 filename.py