AI Image Captioner

Build and test image captioning pipelines with diverse AI models using a flexible, node-based visual interface.

Features

Visual Node Editor - Build AI pipelines with drag-and-drop nodes
AI Models - BLIP & R-4B (detailed reasoning) and many more options
Smart Templates - Combine prompts and AI outputs with Conjunction nodes
Real-time Stats - Live progress tracking with speed & ETA
Flexible Export - ZIP with embedded EXIF/PNG metadata

Quick Start

Option 1: Docker (Recommended)

Docker provides the simplest setup with no dependency management required.

Pull and run pre-built image:

docker run --gpus all -p 5000:5000 \
  -v ai-captioner-data:/app/backend/data \
  -v ai-captioner-thumbnails:/app/backend/thumbnails \
  -v huggingface-cache:/root/.cache/huggingface \
  ghcr.io/maxiarat1/ai-image-captioner:latest-python312-cuda128

OR build locally with docker-compose:

git clone https://github.com/maxiarat1/ai-image-captioner.git
cd ai-image-captioner
docker-compose up

Note: GPU support requires NVIDIA Container Toolkit

Option 2: Download Executable

Pre-built executables are available for Windows and Linux with CUDA 12.8 support.

Download from Releases
Extract the archive:
- Windows: If split archives (.zip.001, .zip.002), use 7-Zip to extract .zip.001
- Linux: If split archives (.tar.gz.partaa, .tar.gz.partab):
```
cat ai-image-captioner-linux-*.tar.gz.part* > ai-image-captioner.tar.gz
tar -xzf ai-image-captioner.tar.gz
```
Run the executable:
- Windows: ai-image-captioner.exe
- Linux: ./ai-image-captioner/ai-image-captioner
Open frontend/index.html in your browser

Option 3: From Source

Install from source with conda for full control over your environment.

Prerequisites:

Conda or Anaconda
Git
For GPU: NVIDIA GPU with CUDA 12.1+ drivers

Linux/macOS:

git clone https://github.com/maxiarat1/ai-image-captioner.git
cd ai-image-captioner

# GPU installation (auto-detects CUDA version)
./setup.sh --gpu

# OR specify CUDA version
./setup.sh --gpu --cuda 12.8

# OR CPU-only installation
./setup.sh --cpu

# Activate environment and run
conda activate captioner-gpu  # or captioner-cpu
cd backend && python app.py

Windows:

git clone https://github.com/maxiarat1/ai-image-captioner.git
cd ai-image-captioner

REM GPU installation (auto-detects CUDA version)
setup.bat /gpu

REM OR specify CUDA version
setup.bat /gpu /cuda 12.8

REM OR CPU-only installation
setup.bat /cpu

REM Activate environment and run
conda activate captioner-gpu
cd backend && python app.py

Then open http://localhost:5000 in your browser.

Usage

ai-image-captioner-usage.mp4

Upload images to the Input node
Connect nodes: Input → AI Model → Output
Add Prompt or Conjunction nodes for custom templates
Click Process and monitor progress
Export results

Example pipeline:

Images ┐  
       │—→ BLIP —→ Output
Prompt ┘

Project Overview

Hi! This project is alive and growing. I’m building it to make it easy to connect small, efficient models into simple pipelines, so together they can pull more detail out of images than any single model could on its own.

A few notes

There are still some rough edges while I refactor a fairly large codebase (~60k lines).
Performance and memory usage are improving steadily. The first run may be slow while models download.
It runs fastest on GPU, but CPU works too. If you’re short on VRAM, try 4-bit or 8-bit modes.

What’s coming next

More nodes and adapters (OCR, VLMs, and utility helpers)
Better multi-model workflows and templates for richer image captions
Cleaner docs, more example pipelines, and a few quality-of-life fixes

How to contribute

Open an issue, please include steps, logs, and screenshots
Submit a PR for a bug fix, performance tweak, doc improvement, or a new node/adapter
Share any pipeline setups that work well for you

Thanks for checking this out! I read the issues and try to ship improvements regularly, your feedback genuinely shapes where the project goes next.

Troubleshooting

Issue	Solution
Out of memory	Use 4-bit/8-bit precision
Slow first run	Models download to `~/.cache/huggingface/` (~2GB)
Docker GPU error	Install nvidia-container-toolkit

Name		Name	Last commit message	Last commit date
Latest commit History 160 Commits
.github		.github
assets		assets
backend		backend
frontend		frontend
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
build.bat		build.bat
build.sh		build.sh
docker-compose.yml		docker-compose.yml
release.sh		release.sh
requirements.txt		requirements.txt
setup.bat		setup.bat
setup.sh		setup.sh
version.json		version.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

AI Image Captioner

Features

Quick Start

Option 1: Docker (Recommended)

Option 2: Download Executable

Option 3: From Source

Usage

Project Overview

A few notes

What’s coming next

How to contribute

Troubleshooting

About

Uh oh!

Releases 3

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

maxiarat1/ai-image-captioner

Folders and files

Latest commit

History

Repository files navigation

AI Image Captioner

Features

Quick Start

Option 1: Docker (Recommended)

Option 2: Download Executable

Option 3: From Source

Usage

Project Overview

A few notes

What’s coming next

How to contribute

Troubleshooting

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases 3

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages