Build and test image captioning pipelines with diverse AI models using a flexible, node-based visual interface.
- Visual Node Editor - Build AI pipelines with drag-and-drop nodes
- AI Models - BLIP & R-4B (detailed reasoning) and many more options
- Smart Templates - Combine prompts and AI outputs with Conjunction nodes
- Real-time Stats - Live progress tracking with speed & ETA
- Flexible Export - ZIP with embedded EXIF/PNG metadata
Docker provides the simplest setup with no dependency management required.
Pull and run pre-built image:
docker run --gpus all -p 5000:5000 \
-v ai-captioner-data:/app/backend/data \
-v ai-captioner-thumbnails:/app/backend/thumbnails \
-v huggingface-cache:/root/.cache/huggingface \
ghcr.io/maxiarat1/ai-image-captioner:latest-python312-cuda128OR build locally with docker-compose:
git clone https://github.com/maxiarat1/ai-image-captioner.git
cd ai-image-captioner
docker-compose upNote: GPU support requires NVIDIA Container Toolkit
Pre-built executables are available for Windows and Linux with CUDA 12.8 support.
- Download from Releases
- Extract the archive:
- Windows: If split archives (
.zip.001,.zip.002), use 7-Zip to extract.zip.001 - Linux: If split archives (
.tar.gz.partaa,.tar.gz.partab):cat ai-image-captioner-linux-*.tar.gz.part* > ai-image-captioner.tar.gz tar -xzf ai-image-captioner.tar.gz
- Windows: If split archives (
- Run the executable:
- Windows:
ai-image-captioner.exe - Linux:
./ai-image-captioner/ai-image-captioner
- Windows:
- Open
frontend/index.htmlin your browser
Install from source with conda for full control over your environment.
Prerequisites:
- Conda or Anaconda
- Git
- For GPU: NVIDIA GPU with CUDA 12.1+ drivers
Linux/macOS:
git clone https://github.com/maxiarat1/ai-image-captioner.git
cd ai-image-captioner
# GPU installation (auto-detects CUDA version)
./setup.sh --gpu
# OR specify CUDA version
./setup.sh --gpu --cuda 12.8
# OR CPU-only installation
./setup.sh --cpu
# Activate environment and run
conda activate captioner-gpu # or captioner-cpu
cd backend && python app.pyWindows:
git clone https://github.com/maxiarat1/ai-image-captioner.git
cd ai-image-captioner
REM GPU installation (auto-detects CUDA version)
setup.bat /gpu
REM OR specify CUDA version
setup.bat /gpu /cuda 12.8
REM OR CPU-only installation
setup.bat /cpu
REM Activate environment and run
conda activate captioner-gpu
cd backend && python app.pyThen open http://localhost:5000 in your browser.
ai-image-captioner-usage.mp4
- Upload images to the Input node
- Connect nodes: Input → AI Model → Output
- Add Prompt or Conjunction nodes for custom templates
- Click Process and monitor progress
- Export results
Example pipeline:
Images ┐
│—→ BLIP —→ Output
Prompt ┘
Hi! This project is alive and growing. I’m building it to make it easy to connect small, efficient models into simple pipelines, so together they can pull more detail out of images than any single model could on its own.
- There are still some rough edges while I refactor a fairly large codebase (~60k lines).
- Performance and memory usage are improving steadily. The first run may be slow while models download.
- It runs fastest on GPU, but CPU works too. If you’re short on VRAM, try 4-bit or 8-bit modes.
- More nodes and adapters (OCR, VLMs, and utility helpers)
- Better multi-model workflows and templates for richer image captions
- Cleaner docs, more example pipelines, and a few quality-of-life fixes
- Open an issue, please include steps, logs, and screenshots
- Submit a PR for a bug fix, performance tweak, doc improvement, or a new node/adapter
- Share any pipeline setups that work well for you
Thanks for checking this out! I read the issues and try to ship improvements regularly, your feedback genuinely shapes where the project goes next.
| Issue | Solution |
|---|---|
| Out of memory | Use 4-bit/8-bit precision |
| Slow first run | Models download to ~/.cache/huggingface/ (~2GB) |
| Docker GPU error | Install nvidia-container-toolkit |
