Skip to content

2389-research/scribe

Repository files navigation

Scribe

RTSP audio transcription service. Captures audio from IP cameras and transcribes using whisper.cpp.

Features

  • RTSP Capture - Pulls audio from IP camera streams via ffmpeg
  • Whisper Transcription - Local speech-to-text using whisper.cpp
  • Speaker Diarization - Identify who said what via whisperX (optional)
  • Audio Filters - High-pass, normalize, boost, RNNoise noise reduction
  • Multiple Outputs - SQLite, file, stdout, Slack webhook
  • Timeline View - Query merged transcripts across all cameras

Quick Start

# Install dependencies
make setup

# Copy and edit config
cp config.example.yaml config.yaml
# Edit config.yaml with your RTSP URLs

# Run
make run

Requirements

  • Go 1.22+
  • ffmpeg (for RTSP capture)
  • whisper.cpp (whisper-cli, whisper-cpp, or whisper in PATH)
  • uv (for diarization, optional)

Configuration

streams:
  - name: front-door
    url: rtsp://192.168.1.100:554/stream1
  - name: garage
    url: rtsp://192.168.1.101:554/stream1

chunk_seconds: 30
model: models/ggml-base.en.bin

audio_filter:
  high_pass: true   # Remove HVAC rumble (80Hz cutoff)
  normalize: true   # Consistent volume levels
  boost: true       # Amplify quiet speech from distant mics
  rnnoise: false    # ML noise suppression

transcribe:
  diarize: false    # Speaker diarization (requires HuggingFace token)
  device: mps       # cpu, cuda, or mps

outputs:
  - type: sqlite    # Store in data/scribe.db
  - type: file      # Write to data/transcripts/
  - type: slack     # Post to Slack
    webhook_url: "https://hooks.slack.com/services/XXX/YYY/ZZZ"

Usage

# Run transcription
./scribe -config config.yaml -data ./data

# Query timeline (last hour)
./scribe-timeline -since 1h

# Query today's transcripts
./scribe-timeline -since "00:00"

# Filter by camera
./scribe-timeline -since 1h -stream front-door,garage

Makefile Targets

make build          # Build scribe binary
make build-all      # Build all binaries
make run            # Run with config.yaml
make test           # Run tests
make lint           # Run golangci-lint
make timeline       # Show last hour of transcripts
make db-stats       # Show database statistics
make setup          # Full first-time setup

Diarization Setup

Speaker diarization requires a HuggingFace token for pyannote models:

  1. Accept terms at https://huggingface.co/pyannote/speaker-diarization-3.1
  2. Get token from https://huggingface.co/settings/tokens
  3. Add to config:
    transcribe:
      diarize: true
      hf_token: "hf_..."

Architecture

RTSP Stream → ffmpeg → WAV chunks → whisper.cpp → Outputs
                                         ↓
                              (optional) whisperX diarization
  • Each stream runs in its own pipeline goroutine
  • Audio captured in 30s WAV chunks (configurable)
  • Transcripts stored with timestamps and speaker labels
  • Timeline tool merges/deduplicates across cameras

License

MIT

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

No packages published