minimal audio transcription tool
100% on-premise audio transcription using OpenAI Whisper. No API keys, no cloud services, no data leaves your device.
- record from microphone and transcribe in real-time
- transcribe audio files (mp3, wav, m4a, flac, ogg, etc.)
- multiple language support
- multiple stopping modes: manual (SPACE), auto (VAD), signal (SIGUSR1)
- scripting support with JSON output and quiet mode
- persistent configuration
- HTTP API server mode
- claude integration
- clipboard support
- works on macOS, Linux, Android (Termux)
brew tap gmoqa/listen
brew install listen# via package manager (coming soon - pending approval)
pkg install listen
# current install method (one-liner)
curl -sSL https://raw.githubusercontent.com/gmoqa/listen/main/install-termux.sh | bashsee TERMUX.md for detailed android setup
yay -S listencurl -sSL https://raw.githubusercontent.com/gmoqa/listen/main/install.sh | bashlisten # record and transcribe (english)
listen -l es # spanish
listen -l fr # french
listen -m medium # better model
listen -c # send to claude
listen -v # verbose modepress SPACE to stop recording (default mode)
manual stop (default)
listen # press SPACE to stopauto-stop with silence detection (VAD)
listen --vad 2 # stop after 2 seconds of silence
listen --vad 3 # stop after 3 seconds of silencesignal control (for scripts/automation)
listen --signal-mode &
PID=$!
# do something...
kill -SIGUSR1 $PID # stop recordinglisten -f audio.mp3 # transcribe audio file
listen -f audio.wav -l es # transcribe with language
listen -f audio.m4a -m medium # transcribe with better model
listen -f audio.mp3 -c # transcribe and send to claudesupports: mp3, wav, m4a, flac, ogg, and other formats supported by ffmpeg
# output options
listen --quiet # no UI, just text
listen --json # JSON output
listen --clipboard # copy to clipboard
listen -o transcript.txt # save to file
# combine flags for automation
listen --quiet --json -o output.json # silent JSON output
listen --quiet --json --signal-mode # for background recording
listen --vad 2 --json # auto-stop with JSONexample: background recording with signal
listen --quiet --json --signal-mode -l es > output.json &
PID=$!
sleep 5 # record for 5 seconds
kill -SIGUSR1 $PID
wait $PID
cat output.jsonset defaults (persistent)
listen config -l es # set spanish as default language
listen config -m tiny # set tiny model as default
listen config --vad 3 # enable VAD with 3s silence
listen config --signal-mode # toggle signal mode on/off
listen config --verbose # toggle verbose mode
listen config --claude # toggle claude integration
listen config --show # view current config
listen config --reset # delete config fileserver configuration
listen config --host 0.0.0.0 # set server host
listen config --port 8080 # set server portconfiguration is saved to ~/.listen/config.json
precedence: CLI args > config file > defaults
listen config -l es -m tiny # save defaults
listen # uses: es, tiny (from config)
listen -l en # uses: en (override), tiny (from config)listen -s # start server on :5000
listen -s --port 8080 --host 127.0.0.1
listen -s -m medium -l es # server with specific model/language
# test the server
curl http://localhost:5000/health
curl -X POST -F "audio=@file.mp3" http://localhost:5000/transcribe
curl -X POST -F "audio=@file.mp3" -F "language=es" http://localhost:5000/transcribethis project is committed to always using the best open source language model available at any given time. when a superior open source model emerges, we will update to it. currently, we use OpenAI Whisper as it represents the state-of-the-art in open source speech recognition.
models ordered by speed/accuracy tradeoff:
| model | size | speed | accuracy | use case |
|---|---|---|---|---|
| tiny | ~75MB | fastest | lowest | quick tests, mobile |
| base | ~150MB | fast | good | default, balanced |
| small | ~500MB | medium | better | more accuracy needed |
| medium | ~1.5GB | slow | high | production quality |
| large | ~3GB | slowest | highest | maximum accuracy |
recommendations:
- development/testing:
tinyorbase - production:
smallormedium - mobile/termux:
tiny(avoid medium/large due to memory)
microphone permissions
# grant storage and microphone permissions
termux-setup-storage
# allow microphone when promptedrecommended models for mobile
tiny- fastest, lowest memory (~1GB RAM)base- balanced (default, ~2GB RAM)- avoid
medium/largeon phones (high memory usage)
troubleshooting
# if audio fails, try:
pkg install pulseaudio
pulseaudio --start
# check microphone access
termux-microphone-record -f test.wav -l 1
termux-microphone-record -qquick start
./run_tests.sh # run all tests
./run_tests.sh quick # quick run (minimal output)
./run_tests.sh coverage # run with coverage report
./run_tests.sh config # run only config tests
./run_tests.sh listen # run only listen testsusing pytest directly
# run all tests
pytest
# run with coverage report
pytest --cov=. --cov-report=html
# run specific test file
pytest test_config.py -v
# run specific test class
pytest test_config.py::TestConfigDefaults -v
# run specific test
pytest test_config.py::TestConfigDefaults::test_get_defaults -vtest_config.py- tests for configuration managementtest_listen.py- tests for CLI and main functionalityconftest.py- shared fixtures and test utilitiespytest.ini- pytest configuration
view coverage report after running tests with --cov-report=html:
open htmlcov/index.html # mac
xdg-open htmlcov/index.html # linuxmodes
listen # default: record from microphone
listen -f FILE # transcribe audio file
listen -s # start HTTP API server
listen config # manage configurationrecording options
-l, --language LANG # language code (en, es, fr, de, etc.)
-m, --model MODEL # whisper model (tiny, base, small, medium, large)
-c, --claude # send transcription to claude
-v, --verbose # verbose debug outputstopping modes
# default: press SPACE to stop
--vad SECONDS # auto-stop after N seconds of silence
--signal-mode # stop with SIGUSR1 signaloutput options
-q, --quiet # no UI, only transcription text
-j, --json # output in JSON format
--clipboard # copy transcription to clipboard
-o, --output FILE # write transcription to fileserver options
--host HOST # server host (default: 0.0.0.0)
--port PORT # server port (default: 5000)basic recording
listen # english, base model
listen -l es # spanish
listen -l es -m medium # spanish with better modelfile transcription
listen -f audio.mp3 # transcribe file
listen -f audio.mp3 -l es -m small # with optionsautomation
listen --quiet --json # minimal output
listen --vad 2 --json # auto-stop with JSON
listen --signal-mode --quiet & # background recordingserver
listen -s # start server
listen -s --port 8080 -m medium # custom port and modelhomebrew
brew uninstall listen
brew untap gmoqa/listenarch linux
yay -R listen
# or
sudo pacman -R listentermux
rm $PREFIX/bin/listen
pip uninstall openai-whisper sounddevice numpy scipymanual install
rm -rf ~/.local/share/listen ~/.local/bin/listen- all processing happens locally on your device
- no API keys required
- no cloud services or external dependencies
- first run downloads the specified Whisper model (~75MB-3GB)
- models are cached for future use