Skip to content

Lightweight, Dockerized EXIF cleaner for fast publishing of JPEG photos without leaking sensitive metadata

License

Notifications You must be signed in to change notification settings

per2jensen/scrubexif

scrubexif

Tag CI License

Docker Pulls Base OS # clones Milestone

🎯 Stats powered by ClonePulse

GitHub: per2jensen/scrubexif

Docker Hub: per2jensen/scrubexif

High-trust JPEG scrubbing. Removes location, serial and private camera tags while preserving photographic context. The most excellent Exiftool is used to process the JPEGs.

Breaking change in 0.7.11

Default behaviour is now to NOT modify jpeg files, instead write modified files to ><current_directory>/output/

Full documentation moved β†’ DETAILS.md
This README is intentionally short for Docker Hub visibility.

Quick Start

Easiest one-liner (default safe mode, non-destructive)

Scrub all JPEGs in the current directory ($PWD) and write cleaned copies to
$PWD/output/:

docker run --rm -v "$PWD:/photos" per2jensen/scrubexif:0.7.12

This:

  • scans your current directory ($PWD) for *.jpg / *.jpeg (also in capital letters)
  • writes scrubbed copies to $PWD/output/
  • leaves the originals untouched in $PWD/
  • refuses to run if $PWD/output/ already exists
  • prints host paths by default (use --show-container-paths to include /photos/... paths)

Hardened in-place scrub (current directory)

Same idea, but with container hardening and in-place overwrite (destructive):

docker run -it --rm \
  --read-only --security-opt no-new-privileges \
  --tmpfs /tmp \
  -v "$PWD:/photos" \
  per2jensen/scrubexif:0.7.12 --clean-inline

Batch workflow (PhotoPrism / intake style)

Use auto mode with explicit input/output/processed directories:

mkdir input output processed errors
docker run -it --rm \
  --read-only --security-opt no-new-privileges \
  --tmpfs /tmp \
  -v "$PWD/input:/photos/input" \
  -v "$PWD/output:/photos/output" \
  -v "$PWD/processed:/photos/processed" \
  -v "$PWD/errors:/photos/errors" \
  per2jensen/scrubexif:0.7.12 --from-input

Uploads β†’ input/
Scrubbed β†’ output/
Originals β†’ processed/ (or deleted)
Duplicates β†’ deleted or errors/
Corrupted β†’ logged as failures, originals relocated to processed/ for inspection

Data flow overview (auto mode: --from-input)

This flow diagram describes what happens only in auto mode (--from-input), where four directories (input/, output/, processed/, errors/) are used.

[input/]  --scrub-->  [output/]
     |
     +-->  [processed/]   (original JPEGs moved here after successful scrub,
                           unless --delete-original is used)
     |
     +-->  [errors/]      (duplicates or corrupted files β€” only used when
                           --on-duplicate move)

Meaning:

  • input/ New JPEGs arrive here (e.g. from uploads or PhotoSync).

  • output/ Scrubbed JPEGs with safe EXIF metadata.

  • processed/ Original JPEGs moved here after scrub (or deleted when requested).

  • errors/ Only created/used when --on-duplicate move is enabled.

Build & Run Locally

# build the image from the Dockerfile in this repo
docker build -t scrubexif:local .

# show CLI usage (ENTRYPOINT runs python -m scrubexif.scrub)
docker run --rm scrubexif:local --help

# scrub the current directory with hardened defaults
docker run -it --rm \
  --read-only --security-opt no-new-privileges \
  --tmpfs /tmp \
  -v "$PWD:/photos" \
  scrubexif:local

Any arguments appended to docker run … scrubexif:* are forwarded to the underlying python3 -m scrubexif.scrub entrypoint.

Key Features

  • Removes GPS and personal data
  • Keeps camera + exposure metadata
  • Default run uses read-only + no-new-privileges hardening
  • Duplicate handling: delete or move
  • Optional state-file for high-volume pipelines
  • --preview, --paranoia, --stable-seconds N

Supply Chain Transparency

  • Every release is produced by a public GitHub Actions workflow that builds the Docker image, runs Syft to publish an SPDX SBOM, and scans the image with Grype (failing on high/critical CVEs).
  • The vulnerability results (grype-results-<version>.sarif) and SBOM (sbom-v<version>.spdx.json) are attached to each GitHub Release β†’ see the Releases tab for the latest artifacts.
  • doc/build-history.json tracks every tag with the Git commit, image digest, and (when available) the Grype severity counts, giving downstream users a verifiable audit trail.

Common Options

--from-input          auto mode
--clean-inline        in-place scrub (destructive)
--show-container-paths include container paths in output
-q, --quiet           no output on success
--preview             no write, view only
--paranoia            maximum scrub, removes ICC
--on-duplicate        delete | move
--stable-seconds N    intake stability window
--state-file PATH     override queue DB

Full CLI reference β†’ in DETAILS.md

Example setup

This is an example of my workflow to quickly upload JPEG files to PhotoPrism. One use case is to quickly show dog owners photos at exhibitions.

Host filesystem path Container path Purpose
/some/directory/ /photos/input/ Location for new JPEG uploads on the server
/photoprism/sooc/ /photos/output/ Destination for scrubbed JPEG versions, for Photoprim import
/photoprism/processed/ /photos/processed/ Holding area for already-imported files.

Systemd

/etc/systemd/system/scrubexif.service:

[Service]
ExecStart=/usr/bin/docker run --rm \
  --read-only --security-opt no-new-privileges \
  --tmpfs /tmp \
  -v /some/directory:/photos/input \
  -v /photoprism/sooc:/photos/output \
  -v /photoprism/processed:/photos/processed \
  per2jensen/scrubexif:0.7.12 --from-input --stable-seconds 10

/etc/systemd/system/scrubexif.timer:

[Unit]
Description=Run scrubexif every 5 minutes

[Timer]
OnBootSec=1min
OnUnitActiveSec=5min
Persistent=true

[Install]
WantedBy=timers.target

Photoprism systemd script

I use scrubexif to clean my jpegs on dog exhibitions. I upload the files to a server using rclone and a systemd timer runs the script below every 5 minutes.

You can see my (anonymized) script in the Github scrubexif repo

Development

make dev-clean   # remove dev image
make test        # make dev image and run full test suite
pytest -m soak   # optional 10 min run or try scripts/soak.sh

License

GPL-3.0-or-later

Licensed under GNU GENERAL PUBLIC LICENSE v3, see the supplied file "LICENSE" for details.

THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW, not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See section 15 and section 16 in the supplied "LICENSE" file.


GitHub: per2jensen/scrubexif

Docker Hub: per2jensen/scrubexif

Full docs β†’ DETAILS.md