Skip to content

Automates PDF compression and linearization using Ghostscript and QPDF for efficient storage and fast web viewing.

Notifications You must be signed in to change notification settings

hamim23z/PDF-Compression

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

3 Commits
Β 
Β 
Β 
Β 

Repository files navigation

PDF Compression and Linearization Script

This Python script automates the compression and linearization of PDF files using two open-source tools: Ghostscript and QPDF.

πŸ“¦ Features

  • Compress PDFs using Ghostscript: Offers high compression while maintaining original source quality.
  • Linearize PDFs using QPDF: Enables fast web viewing of large PDF files.
  • Batch Processing: Recursively scans directories for PDFs matching a given keyword.
  • Automatic File Naming: Prevents overwriting and infinite loops with unique output filenames.

πŸ“š Tools Used

Ghostscript

Ghostscript is used for compressing PDFs. After thorough testing, the following command options were chosen for best results based on the given case. But this can be changed based on the users needs. Full commands can be found at https://community.intersystems.com/post/pdf-compression-using-ghostscript

gs_commands = [
  "gs", "-sDEVICE=pdfwrite",
  "-dCompatibilityLevel=1.4",
  f"-dPDFSETTINGS=/{quality}",
  "-dNOPAUSE",
  "-dBATCH",
  f"-sOutputFile={output_path}",
  input_path
]

QPDF

QPDF is used to linearize the compressed PDF, enabling faster loading on web browsers. The process to actually linearize a file is much simpler. More information on the commands can be found at https://www.linux-magazine.com/Issues/2019/226/Reconstruction

qpdf --linearize input.pdf output.pdf

πŸ“ How It Works

  1. Start: The script begins traversing the specified input directory (and all its subdirectories).
  2. Search: It looks for all .pdf files that contain a specified keyword in the filename.
  3. Compress: Matching files are compressed using Ghostscript.
  4. Linearize: The compressed PDF is then linearized using QPDF.
  5. Output: Results are saved with unique names in the specified output directory.

πŸ›  Usage + Running

  1. Need Python 3.x
  2. Ghostscript
  3. QPDF

Installing

brew install ghostscript
brew install qpdf

Running The Script

python3 filename.py

About

Automates PDF compression and linearization using Ghostscript and QPDF for efficient storage and fast web viewing.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages