Scribe

RTSP audio transcription service. Captures audio from IP cameras and transcribes using whisper.cpp.

Features

RTSP Capture - Pulls audio from IP camera streams via ffmpeg
Whisper Transcription - Local speech-to-text using whisper.cpp
Speaker Diarization - Identify who said what via whisperX (optional)
Audio Filters - High-pass, normalize, boost, RNNoise noise reduction
Multiple Outputs - SQLite, file, stdout, Slack webhook
Timeline View - Query merged transcripts across all cameras

Quick Start

# Install dependencies
make setup

# Copy and edit config
cp config.example.yaml config.yaml
# Edit config.yaml with your RTSP URLs

# Run
make run

Requirements

Go 1.22+
ffmpeg (for RTSP capture)
whisper.cpp (whisper-cli, whisper-cpp, or whisper in PATH)
uv (for diarization, optional)

Configuration

streams:
  - name: front-door
    url: rtsp://192.168.1.100:554/stream1
  - name: garage
    url: rtsp://192.168.1.101:554/stream1

chunk_seconds: 30
model: models/ggml-base.en.bin

audio_filter:
  high_pass: true   # Remove HVAC rumble (80Hz cutoff)
  normalize: true   # Consistent volume levels
  boost: true       # Amplify quiet speech from distant mics
  rnnoise: false    # ML noise suppression

transcribe:
  diarize: false    # Speaker diarization (requires HuggingFace token)
  device: mps       # cpu, cuda, or mps

outputs:
  - type: sqlite    # Store in data/scribe.db
  - type: file      # Write to data/transcripts/
  - type: slack     # Post to Slack
    webhook_url: "https://hooks.slack.com/services/XXX/YYY/ZZZ"

Usage

# Run transcription
./scribe -config config.yaml -data ./data

# Query timeline (last hour)
./scribe-timeline -since 1h

# Query today's transcripts
./scribe-timeline -since "00:00"

# Filter by camera
./scribe-timeline -since 1h -stream front-door,garage

Makefile Targets

make build          # Build scribe binary
make build-all      # Build all binaries
make run            # Run with config.yaml
make test           # Run tests
make lint           # Run golangci-lint
make timeline       # Show last hour of transcripts
make db-stats       # Show database statistics
make setup          # Full first-time setup

Diarization Setup

Speaker diarization requires a HuggingFace token for pyannote models:

Accept terms at https://huggingface.co/pyannote/speaker-diarization-3.1
Get token from https://huggingface.co/settings/tokens

Add to config:

transcribe:
  diarize: true
  hf_token: "hf_..."

Architecture

RTSP Stream → ffmpeg → WAV chunks → whisper.cpp → Outputs
                                         ↓
                              (optional) whisperX diarization

Each stream runs in its own pipeline goroutine
Audio captured in 30s WAV chunks (configurable)
Transcripts stored with timestamps and speaker labels
Timeline tool merges/deduplicates across cameras

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.github/workflows		.github/workflows
cmd		cmd
docs/plans		docs/plans
internal		internal
scripts		scripts
.gitignore		.gitignore
.golangci.yml		.golangci.yml
.goreleaser.yml		.goreleaser.yml
.pre-commit-config.yaml		.pre-commit-config.yaml
.python-version		.python-version
Makefile		Makefile
README.md		README.md
config.example.yaml		config.example.yaml
go.mod		go.mod
go.sum		go.sum
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Scribe

Features

Quick Start

Requirements

Configuration

Usage

Makefile Targets

Diarization Setup

Architecture

License

About

Uh oh!

Releases 1

Packages

Languages

2389-research/scribe

Folders and files

Latest commit

History

Repository files navigation

Scribe

Features

Quick Start

Requirements

Configuration

Usage

Makefile Targets

Diarization Setup

Architecture

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages