AI transcription without the cloud.
An open-source web app powered by Whisper that lets you transcribe audio and video locally — private, fast, and developer-friendly.
Developed by Brendown Ferreira.
Easily upload local files or paste a YouTube URL, watch a sleek progress overlay, and download your transcript as .txt.
Built with FastAPI, Jinja2, Tailwind, and Alpine.js — lightweight, modern, and easy to hack on.
-
🎥 Transcribe from YouTube URLs or local audio/video files
-
⚡ Two Whisper backends with automatic fallback:
- openai-whisper (preferred when FFmpeg is available)
- faster-whisper (fallback, works even without FFmpeg)
-
🚀 GPU acceleration (CUDA) when available, seamless CPU fallback otherwise
-
🔎 Robust FFmpeg detection (
PATHandimageio-ffmpeg) -
🌓 Clean UI with dark/light mode, progress bar, and smooth overlay
-
📜 Transcript history with one-click download
-
🧹 Temporary media cleanup after transcription
-
✅ Health check endpoint for quick status
- Backend: FastAPI, Uvicorn
- Frontend: Jinja2 templates, Tailwind (CDN), Alpine.js
- Media & ML: yt-dlp, Whisper (openai-whisper + faster-whisper), PyTorch
- Utilities: python-dotenv, pydantic
app/
├─ main.py # FastAPI app + router registration
├─ core/
│ ├─ config.py # Environment + directories
│ └─ theme.py # Default theme + template globals
├─ routers/
│ ├─ home.py # GET /
│ ├─ upload.py # POST /transcribe/*
│ └─ history.py # GET /history
├─ services/
│ ├─ youtube.py # YouTube download logic
│ ├─ transcriber.py # FFmpeg detection + Whisper backends
│ └─ file_manager.py # File save/load/transcript listing
├─ templates/ # Jinja2 templates (UI)
└─ static/ # CSS, JS, etc.
storage/
├─ uploads/
└─ transcriptions/
tests/ # End-to-end validation scripts- Python 3.10+
- FFmpeg (optional but recommended)
- NVIDIA GPU + CUDA (optional, for acceleration)
On Windows, requirements.txt pins the CUDA 12.4 wheel index for PyTorch. CPU-only mode works out of the box.
Windows (PowerShell):
python -m venv .venv
.\.venv\Scripts\Activate.ps1
pip install -r requirements.txt --upgrade --no-cache-dir
python -m uvicorn app.main:app --reload --host 0.0.0.0 --port 8000macOS/Linux:
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt --upgrade --no-cache-dir
uvicorn app.main:app --reload --host 0.0.0.0 --port 8000Then open: http://localhost:8000
docker build -t transcribe-hub .
docker run --rm -p 8000:8000 transcribe-hubFor GPU support, enable NVIDIA Container Toolkit.
- Go to
/→ paste a YouTube URL or upload a file - Watch the progress overlay
- View transcript + download as
.txt - Visit
/historyto browse past transcripts
Endpoints:
GET /→ UIPOST /transcribe/youtube→ YouTube → transcriptPOST /transcribe/upload→ Local file → transcriptGET /history→ List transcriptsGET /download/{filename}→ Download transcriptGET /health→ Health check
Run all tests with:
pytest -qKey tests include:
test_imports.py→ ML/media library checkstest_download.py→ yt-dlp validationtest_transcribe_direct.py→ Direct transcription flowtest_call_api.py→ API endpoint validation
Contributions are welcome! 🎉 Whether it’s bug fixes, features, or docs:
- Fork the repo
- Create a branch (
feature/my-idea) - Commit & push
- Open a Pull Request
- FFmpeg not found: install FFmpeg or rely on
imageio-ffmpegfallback - GPU not used: check CUDA drivers and run
torch.cuda.is_available() - Large files slow: progress bar is an estimate; consider SSE/WebSockets for real-time updates
Open Source — see LICENSE.
