Image Tagger is a cross-platform Python app and library that captions and tags photos. It can run a local BLIP captioning model or connect to an Ollama server so you can choose the workflow that best fits your computer and bandwidth.
- Drag-and-drop photos or whole folders and process them in batches with live progress
- Create clean, alt-text-friendly captions plus keyword tags
- Write results straight into the image metadata or into YAML sidecar files
- Switch between local BLIP models and remote Ollama vision models without touching the code
- Surface suggested filenames in the results grid (and optionally rename files) so you can accept or ignore them.
- Highlight the drag-and-drop area with a dashed frame that always shows the drop zone and brightens when you hover files over it.
- Suggested filenames now also appear in sidecar metadata (null when disabled) so downstream tools can consistently read that value.
- Ask the model to suggest safe, descriptive filenames and optionally auto-rename your images
- Runs directly on your CPU or GPU using PyTorch and Hugging Face Transformers.
- First launch downloads ~1 GB of weights and caches them in your Hugging Face cache directory (usually
~/.cache/huggingface). - Works fully offline once the weights are cached.
- Requires the free Ollama desktop/server app.
- You download the multimodal model you want (
ollama run llava,qwen2.5-vl,minicpm-v, etc.). - Image Tagger talks to the local Ollama HTTP endpoint and streams the captions/tags back into the app.
- Windows, macOS, or Linux with Python 3.10+ already installed.
- ~3 GB of free disk space (Python env + BLIP weights download).
- Optional: an NVIDIA/AMD GPU or Apple Silicon for faster BLIP inference. CPU-only still works.
- Internet access the first time you install dependencies, download BLIP weights, or pull an Ollama model.
All commands below run inside the project folder. Replace the activation command with the Windows version if needed.
- Get the code
git clone https://github.com/taggedzi/image-tagger.git cd image-tagger - Create a virtual environment so you do not pollute your global Python
python -m venv .venv source .venv/bin/activate # Windows PowerShell: .venv\Scripts\Activate.ps1 python -m pip install --upgrade pip
- Install the app
pip install -e . - Install BLIP support (required for local captioning)
The first BLIP run triggers an automatic Hugging Face download. Keep the terminal open until it finishes.
pip install -e .[blip]
- (Optional) list the available models
python -m image_tagger --list-models
- Download and install Ollama from ollama.com/download. Launch it so the background service starts.
- Pull a multimodal model once. Example:
The first call both downloads and tests the model.
ollama run llava
- Keep Ollama running. Image Tagger will connect to
http://127.0.0.1:11434by default. If you host Ollama elsewhere, copy the base URL and API key (if any) for later.
python -m image_tagger- Drop images or folders into the window, or use the buttons to pick them.
- Open Settings → Model to pick
caption.blip-base,caption.blip-large, orremote.ollama. - When
remote.ollamais selected, fill in Remote base URL, Remote model id, and other fields (temperature, token limit, timeout, API key). Click Refresh list to see which Ollama models are currently running. - Under Metadata Output, enable Filename suggestions to include a proposed slug in results, and turn on Auto-rename if you want Image Tagger to rename files on disk using those suggestions (collisions get
-1,-2, etc.). - Choose Output mode: Embedded metadata edits supported image formats in place, while YAML sidecars keeps the original files untouched and writes
.yamlfiles next to them. - Changes are saved automatically to
%APPDATA%\image_tagger\settings.yamlon Windows or~/.config/image_tagger/settings.yamlon Linux/macOS.
python -m image_tagger \
--headless \
--input /path/to/images \
--model caption.blip-base \
--output-mode sidecar \
--suggest-filenames \
--auto-rename-filesHeadless runs print a JSON summary with every caption, tag list, and destination path. This mode is handy for automation or server use.
- Slow first run? BLIP weight downloads or Ollama model loads can take several minutes. They are cached, so future runs are fast.
- CPU vs GPU: BLIP auto-detects CUDA/Metal. If you only have a CPU, expect longer processing times but the results are identical.
- Need to start fresh? Delete the settings file listed above; Image Tagger will recreate it with safe defaults.
- Remote timeouts: Ollama may take longer than 90 s to warm up. Increase Remote timeout under Settings if you see timeout errors.
- Install tooling plus the package in editable mode:
make install-dev # Equivalent to pip install -e .[blip] plus test/lint deps - Helpful commands (defined in the
Makefile):make fmt– format withruff format.make lint– static analysis viaruff check.make style– legacypycodestyle.make test– run the pytest suite.make check– lint + style + tests (CI default).make coverage– run coverage and print a summary.
- Project layout:
image_tagger/ ├── models/ # BLIP + Ollama implementations registered via ModelRegistry ├── services/ # High-level pipeline and metadata orchestration ├── gui/ # PySide6 desktop UI ├── io/ # Metadata writers and YAML sidecars ├── config.py # Pydantic settings shared by GUI and CLI └── settings_store.py # Cross-platform persistence helper - Adding new models:
- Implement
TaggingModelinimage_tagger/models/your_model.py. - Register it with
ModelRegistry.register("your.id", YourModel). - List extra dependencies inside
pyproject.tomlunder[project.optional-dependencies].
- Implement
With these steps, someone with basic command-line skills can install Image Tagger in an isolated Python environment, choose either local BLIP or Ollama-powered captions, and start tagging images in minutes.
- Image Tagger itself is released under the MIT License (see
LICENSE). Third-party libraries and models that ship with or are installed alongside the app are documented inTHIRD_PARTY_NOTICES.mdplus the companion files inside thelicenses/directory. - The desktop UI relies on PySide6/Qt for Python, which is licensed under the GNU LGPL v3.0. If you redistribute a packaged build, you must preserve Qt's license text and keep the Qt libraries relinkable (typically by shipping shared libraries).
- Local captioning uses Salesforce Research's BLIP checkpoints via Hugging Face Transformers. Cite Salesforce BLIP if you redistribute the model weights and keep their MIT license in your distributions.
- Remote captioning can talk to any Ollama-served multimodal model (LLaVA, Qwen-VL, MiniCPM-V, etc.). Each model and Ollama itself has its own license/usage policy—be sure your usage and redistribution comply with those upstream terms.
- Set the version. Edit
pyproject.tomland bump[project].versionto the number you intend to release. Commit this change (and anything else required) before building artifacts; release files should be created from clean, tagged commits. - Run checks. Execute
make check(ormake test,make lint, etc.) to verify the codebase is ready. Commit any fixes. - Tag the release. Create an annotated git tag such as
git tag -a v1.1.0 -m "Image Tagger 1.1.0"and push it withgit push origin --tags. - Build distributables. Run
make build(it invokespython -m build) to generate both the source distribution and wheel indist/. If you prefer not to use the Makefile you can directly runpython -m build. Thebuildpackage is included in thedevextra, somake install-devinstalls everything required. - Publish. Attach the wheel (
dist/*.whl) and source tarball (dist/*.tar.gz) to a GitHub release for the corresponding tag, or upload them to PyPI withtwineif desired. These generated files are artifacts—do not commit them to git. - Document the release. In the GitHub Release notes (or CHANGELOG), summarize notable changes, requirements, and any manual steps (e.g., BLIP model downloads).
- Release notes tip. For
v1.1.0, highlight the new Ollama filename suggestions/auto-rename option, the CLI/GUI sinks for that feature, and remind users to enable the new metadata options if they want file renaming behavior.