idem is a command-line tool for discovering identical files across one or more directories. It helps identify unnecessary duplicates in a safe, deterministic, and restartable way — without deleting anything automatically.
idem currently focuses on indexing and content identification. User-facing reports and review workflows are planned but not yet available.
- Recursively scans one or more root directories
- Identifies files by content, not by name
- Uses SHA-256 for content identification
- Stores results in a SQLite-backed index
- Supports resumable indexing
- Uses bounded parallelism for hashing
- Keeps memory usage low, even for large trees
idem never deletes files. It is designed to surface information, not make destructive decisions. After reviewing the results, you can create a new directory tree with single copies and remove the original directories yourself.
- No user-facing report of duplicate files
- No grouping or tagging of ambiguous duplicates
- No interactive review or deletion workflow
These capabilities will be added in future releases.
idem is currently installed directly from Git using pip.
pip install git+https://github.com/venekamp/idem.git@v0.1.0