Stream Sentry

HDMI passthrough with real-time ML-based ad detection and blocking using dual NPUs:

PaddleOCR on RK3588 NPU (~300ms per frame)
Qwen3-VL-2B on Axera LLM 8850 NPU (~1.5s per frame)
Audio passthrough with auto-mute during ads
Spanish vocabulary practice during ad blocks!
Web UI for remote monitoring and control via Tailscale

Overview

Stream Sentry captures video from HDMI-RX, displays it via GStreamer kmssink at 30fps, while running two ML workers concurrently to detect ads. When ads are detected, instantly switches to a blocking overlay with Spanish vocabulary practice.

Key features:

Instant ad blocking - GStreamer input-selector switches in ~1 frame (no black screen!)
Audio passthrough - HDMI audio with instant mute during ads, silent keepalive prevents stalls
Dual NPU inference - OCR and VLM run concurrently on separate NPUs
Web UI - Remote monitoring/control via Tailscale (mobile-friendly)
MPP hardware encoding - 60fps 4K streaming via RK3588 VPU
No X11 required - Pure DRM/KMS display via kmssink
Spanish learning - Practice vocabulary while ads are blocked
30fps display - Smooth passthrough without stutter
Set and forget - systemd service, health monitoring, automatic recovery

┌──────────────┐     ┌────────────────────┐     ┌─────────────────────┐
│   HDMI-RX    │────▶│     ustreamer      │────▶│  GStreamer Pipeline │
│ /dev/video0  │     │ (MJPEG encoding)   │     │  (input-selector)   │
│  4K@30fps    │     │                    │     │                     │
│              │     │   :9090/stream     │     │  Video ◄──► Blocking│
│  Audio ──────┼─────┼────────────────────┼────▶│   INSTANT SWITCH!   │
│  hw:4,0      │     │   :9090/snapshot   │     │         │           │
└──────────────┘     └────────┬───────────┘     │    kmssink + audio  │
                              │                 │   (auto-mute on ad) │
                              │                 └─────────────────────┘
                              │
                              ▼ HTTP snapshot (~150ms)
              ┌───────────────┴───────────────┐
              │                               │
     ┌────────┴────────┐           ┌──────────┴──────────┐
     │   OCR Worker    │           │    VLM Worker       │
     │   PaddleOCR     │           │   Qwen3-VL-2B       │
     │   RK3588 NPU    │           │   Axera LLM 8850    │
     │   ~400ms        │           │   ~1.5s             │
     └─────────────────┘           └─────────────────────┘

Hardware

Board: Radxa with RK3588
HDMI-RX: /dev/video0 (rk_hdmirx driver)
HDMI-RX Audio: hw:4,0 (rockchip,hdmiin @ 48kHz)
HDMI-TX Audio: hw:0,0 (rockchip-hdmi0)
OCR NPU: RK3588 6 TOPS NPU
VLM NPU: Axera M5 LLM 8850 (AX650N) via M.2
Supported resolutions: Up to 4K@30fps

Quick Start

cd /home/radxa/StreamSentry

# Install dependencies (first time only)
sudo apt install -y imagemagick ffmpeg curl

# Build ustreamer (first time only)
git clone https://github.com/pikvm/ustreamer.git
cd ustreamer && make -j$(nproc) && sudo cp ustreamer /usr/local/bin/

# Python packages
pip3 install pyclipper shapely numpy opencv-python pexpect PyGObject

# Run everything
python3 stream_sentry.py

# Check HDMI signal only
python3 stream_sentry.py --check-signal

Command Line Options

Option	Description
`--device PATH`	Video device (default: /dev/video0)
`--screenshot-dir DIR`	Screenshot directory (default: screenshots)
`--ocr-timeout SEC`	Skip OCR frames taking longer than this (default: 1.5s)
`--max-screenshots N`	Keep only N recent screenshots (default: 50, 0=unlimited)
`--check-signal`	Check HDMI signal and exit
`--connector-id N`	DRM connector ID (default: 215)
`--plane-id N`	DRM plane ID (default: 72)
`--webui-port N`	Web UI port (default: 8080)

Performance

Metric	Value
Display framerate	30fps (video), 2-3fps (blocking overlay)
ustreamer stream	~60fps (MPP hardware encoding at 4K)
Ad blocking switch	INSTANT (~1 frame)
Snapshot capture	~150ms (4K JPEG download)
OCR latency	250-400ms per frame (960x540 input)
VLM latency	1.3-1.5s per frame
VLM model load	~40s (once at startup)
JPEG quality	80% (MPP hardware encoder)

FPS Monitoring: Output FPS is logged every 60 seconds via health monitor.

Ad Detection Logic (Weighted Model)

OCR is PRIMARY (high trust):

Triggers blocking immediately on 1 detection
Needs 3 consecutive no-ads to stop blocking

VLM is SECONDARY (contextual trust):

If OCR detected within last 5s: VLM is trusted
If no recent OCR: VLM needs 5 consecutive detections to trigger alone
Needs 2 consecutive no-ads to stop

Anti-flicker protection:

Minimum 3 seconds blocking duration
Both OCR and VLM must agree to stop (when VLM has context)

Blocking Overlay

When ads are detected, the screen shows:

Header: BLOCKING (OCR), BLOCKING (VLM), or BLOCKING (OCR+VLM)
Spanish word: Random intermediate-level vocabulary
Translation: English meaning
Example: Sentence using the word
Rotation: New vocabulary every 11-15 seconds

Example display:

BLOCKING (OCR)

aprovechar
= to take advantage of

Hay que aprovechar el tiempo.

Spanish Vocabulary

120+ intermediate-level words and phrases including:

Common verbs: aprovechar, lograr, desarrollar, destacar, enfrentar...
Reflexive verbs: comprometerse, enterarse, arrepentirse, darse cuenta...
Adjectives: disponible, imprescindible, agotado, capaz, dispuesto...
Nouns: desarrollo, comportamiento, conocimiento, ambiente, herramienta...
Expressions: sin embargo, a pesar de, de repente, hoy en dia, cada vez mas...
False friends: embarazada, exito, sensible, libreria, asistir...
Subjunctive triggers: es importante que, espero que, dudo que, ojala...
Time expressions: hace poco, dentro de poco, a la larga, de antemano...

Ad Keywords Detected

Exact phrases (match anywhere):

skip ad, skip ads, skipad, skipads
sponsored, advertisement, ad break
shop now, buy now, promoted

Whole words:

skip, sponsor

Log Output

2025-12-24 02:32:47 [I] Starting Stream Sentry...
2025-12-24 02:32:47 [I] HDMI signal: 3840x2160 @ 30.0fps
2025-12-24 02:32:50 [I] ustreamer started on port 9090
2025-12-24 02:32:52 [I] Display pipeline started - 30 FPS with instant ad blocking
2025-12-24 02:32:52 [I] [AudioPassthrough] Audio passthrough started
2025-12-24 02:33:33 [I] VLM model loaded successfully
2025-12-24 02:33:45 [W] AD BLOCKING STARTED (OCR)
2025-12-24 02:33:45 [I] [DRMAdBlocker] Switching to blocking overlay (ocr)
2025-12-24 02:33:45 [I] [AudioPassthrough] Audio MUTED
2025-12-24 02:34:04 [W] AD BLOCKING ENDED after 19.3s
2025-12-24 02:34:04 [I] [DRMAdBlocker] Switching to video stream
2025-12-24 02:34:04 [I] [AudioPassthrough] Audio UNMUTED

Project Structure

stream-sentry/
├── stream_sentry.py      # Main entry point
├── stream_sentry.spec    # PyInstaller build spec
├── src/
│   ├── ocr.py            # PaddleOCR on RKNN NPU
│   ├── vlm.py            # Qwen3-VL-2B on Axera NPU
│   ├── ad_blocker.py     # GStreamer video pipeline with input-selector
│   ├── audio.py          # GStreamer audio passthrough with mute control
│   ├── health.py         # Health monitor for all subsystems
│   ├── webui.py          # Flask web UI server
│   ├── templates/
│   │   └── index.html    # Web UI single-page app
│   └── static/
│       └── style.css     # Web UI dark theme styles
├── install.sh            # Install as systemd service
├── uninstall.sh          # Remove systemd service
├── stop.sh               # Graceful shutdown script
├── stream-sentry.service # systemd service file
├── models/
│   └── paddleocr/        # RKNN models (or symlink)
├── screenshots/
│   ├── ocr/              # Ad detection screenshots (auto-truncated)
│   └── non_ad/           # Non-ad screenshots for VLM training
├── README.md             # This file
├── CLAUDE.md             # Development notes
└── AUDIO.md              # Audio implementation details

VLM Model

The VLM uses Qwen3-VL-2B-INT4 on the Axera LLM 8850 NPU:

Metric	Value
Accuracy	96% on ad detection benchmark
Inference	1.3-1.7s per frame
Model load	~40s (once at startup)
Prompt	"Is this an advertisement? Answer Yes or No."

Model location:

/home/radxa/axera_models/Qwen3-VL-2B/
├── main_axcl_aarch64_rebuilt
├── qwen3_tokenizer.txt
└── Qwen3-VL-2B-Instruct-AX650-c128_p1152-int4/

Dependencies

# System packages
sudo apt install -y imagemagick ffmpeg curl libevent-dev libjpeg-dev libbsd-dev

# Build ustreamer with MPP hardware encoding
git clone https://github.com/garagehq/ustreamer.git /home/radxa/ustreamer-garagehq
cd /home/radxa/ustreamer-garagehq && make WITH_MPP=1
cp ustreamer /home/radxa/ustreamer-patched

# Python packages
pip3 install --break-system-packages pyclipper shapely numpy opencv-python pexpect PyGObject flask requests

Troubleshooting

No HDMI signal: When started without HDMI input, Stream Sentry displays "NO HDMI INPUT" on screen and waits for user to connect a source. To check signal manually:

v4l2-ctl -d /dev/video0 --query-dv-timings

ustreamer fails to start:

fuser -k /dev/video0  # Kill processes using device
pkill -9 ustreamer    # Kill orphaned ustreamer

VLM not loading:

axcl_smi              # Check Axera card status
ls /home/radxa/axera_models/Qwen3-VL-2B/  # Verify model files

Display issues:

modetest -M rockchip -p | grep -A5 "plane\[72\]"  # Check DRM plane
modetest -M rockchip -c | grep HDMI               # Check connector

OCR not detecting:

curl http://localhost:9090/snapshot -o test.jpg  # Test snapshot

Audio issues:

# Check audio devices
arecord -l                                        # List capture devices
aplay -l                                          # List playback devices
v4l2-ctl -d /dev/video0 --get-ctrl audio_present # Check if HDMI has audio

# Test audio passthrough with silent keepalive (prevents stalls)
gst-launch-1.0 \
  alsasrc device=hw:4,0 ! "audio/x-raw,rate=48000,channels=2,format=S16LE" ! \
  queue ! audioconvert ! "audio/x-raw,rate=48000,channels=2,format=F32LE" ! mix. \
  audiotestsrc wave=silence is-live=true ! "audio/x-raw,rate=48000,channels=2,format=F32LE" ! mix. \
  audiomixer name=mix ! alsasink device=hw:0,0 sync=false

The audiomixer with audiotestsrc wave=silence keeps the pipeline alive even when the HDMI source has no audio (between songs, during silence, etc.).

Color correction: Adjust saturation/contrast/brightness in src/ad_blocker.py via the videobalance element:

# In _init_pipeline(), modify:
videobalance saturation=0.85  # Range 0-2, default 1.0

Running as a Service

Stream Sentry can run as a systemd service for 24/7 unattended operation:

# Install as systemd service
sudo ./install.sh

# View logs
journalctl -u stream-sentry -f

# Stop service
sudo systemctl stop stream-sentry

# Uninstall
sudo ./uninstall.sh

The service automatically:

Starts on boot
Restarts on crashes (5 attempts per 5 minutes)
Disables X11/display managers to avoid conflicts

Health Monitoring

Stream Sentry includes a unified health monitor that runs in the background:

What it monitors:

HDMI signal (detects unplug/replug, shows "NO SIGNAL" message)
ustreamer health (HTTP health check, not just PID)
Video pipeline health (buffer flow, pipeline state)
Output FPS (logged every 60s, warning if < 25 fps)
VLM/OCR health (consecutive timeout detection)
Memory usage (warning at 80%, critical at 90%)
Disk space (warning below 500MB)

Automatic recovery:

HDMI signal lost → Shows "NO SIGNAL" overlay, mutes audio
HDMI signal restored → Restarts ustreamer + video pipeline, unmutes audio (~7s recovery)
ustreamer stall → Restarts ustreamer + video pipeline
Video pipeline stall → Restarts pipeline with exponential backoff (1s-30s)
VLM failure → Degrades to OCR-only mode, attempts VLM restart
Critical memory → Triggers garbage collection, cleans old screenshots

HDMI Cable Robustness:

Jiggling or unplugging HDMI cables triggers automatic recovery
No manual restart required - system recovers automatically
Video pipeline watchdog detects stalls (10s threshold)
Exponential backoff prevents restart storms

Graceful degradation:

If VLM fails repeatedly (5+ consecutive timeouts), switches to OCR-only mode
VLM restart is attempted after 30 seconds
OCR continues working even if VLM is disabled
30-second startup grace period before health checks begin

Web UI

Stream Sentry includes a mobile-friendly web UI for remote monitoring and control:

Access:

Local: http://localhost:8080
Tailscale: http://<tailscale-hostname>:8080
Direct video stream: http://<hostname>:9090/stream

Features:

Live video feed - Real-time MJPEG stream from ustreamer
Status display - Blocking state, FPS, HDMI resolution, uptime
Pause controls - 1/2/5/10 minute presets to pause ad blocking
Detection history - Recent OCR/VLM detections with timestamps
Log viewer - Collapsible log output for debugging

Pause & Training Data: When you pause blocking via the WebUI, Stream Sentry automatically saves a screenshot to screenshots/non_ad/. This creates training data for improving the VLM:

Pausing = "this is NOT an ad" (false positive correction)
Screenshots saved with non_ad_ prefix for easy labeling
Use these to fine-tune the VLM and reduce false positives

Housekeeping

Log File:

Location: /tmp/stream_sentry.log
Max 5MB per log file
Keeps 3 backup files (stream_sentry.log.1, .2, .3)

Screenshot Management:

Ad screenshots: screenshots/ocr/ (auto-truncated to last 50)
Non-ad screenshots: screenshots/non_ad/ (saved when pausing via WebUI)
Configurable via --max-screenshots (0 = unlimited)

Audio Error Recovery:

Watchdog checks every 3 seconds, restarts if stalled for 6+ seconds
Exponential backoff for restart attempts (1s → 2s → 4s → ... → 60s max)
No maximum restart limit - always tries to recover
Backoff resets after 5 seconds of sustained audio flow

Video Pipeline Recovery:

Watchdog checks every 3 seconds, restarts if stalled for 10+ seconds
Monitors GStreamer pipeline state and buffer flow
Handles HTTP connection errors (ustreamer restart)
Handles unexpected EOS events
Exponential backoff for restart attempts (1s → 2s → 4s → ... → 30s max)
Backoff resets after 10 seconds of sustained buffer flow
Preserves blocking overlay state across restarts

Building Executable

Stream Sentry can be compiled into a standalone executable using PyInstaller:

# Install PyInstaller
pip3 install pyinstaller

# Build executable
pyinstaller stream_sentry.spec

# Output will be in dist/stream_sentry

Note: The executable still requires external model files at runtime:

PaddleOCR models in standard location
VLM models in /home/radxa/axera_models/Qwen3-VL-2B/

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
src		src
tests		tests
.gitignore		.gitignore
AUDIO.md		AUDIO.md
BENCHMARK-STREAMING.md		BENCHMARK-STREAMING.md
CLAUDE.md		CLAUDE.md
DISPLAY_COLOR_QUALITY.md		DISPLAY_COLOR_QUALITY.md
FPS_DEBUGGING.md		FPS_DEBUGGING.md
LICENSE		LICENSE
LICENSE-DOCS		LICENSE-DOCS
LICENSE-HARDWARE		LICENSE-HARDWARE
README.md		README.md
compare_colors.sh		compare_colors.sh
config.yaml		config.yaml
disable-lockscreen.sh		disable-lockscreen.sh
install.sh		install.sh
requirements.txt		requirements.txt
start.sh		start.sh
stop.sh		stop.sh
stream-sentry.service		stream-sentry.service
stream-sentry.sudoers		stream-sentry.sudoers
stream_sentry.py		stream_sentry.py
stream_sentry.spec		stream_sentry.spec
uninstall.sh		uninstall.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Uh oh!

Repository files navigation

Stream Sentry

Overview

Hardware

Quick Start

Command Line Options

Performance

Ad Detection Logic (Weighted Model)

Blocking Overlay

Spanish Vocabulary

Ad Keywords Detected

Log Output

Project Structure

VLM Model

Dependencies

Troubleshooting

Running as a Service

Health Monitoring

Web UI

Housekeeping

Building Executable

License

About

Licenses found

Uh oh!

Releases

Packages

Languages

License

Licenses found

garagehq/StreamSentry

Folders and files

Latest commit

History

Repository files navigation

Stream Sentry

Overview

Hardware

Quick Start

Command Line Options

Performance

Ad Detection Logic (Weighted Model)

Blocking Overlay

Spanish Vocabulary

Ad Keywords Detected

Log Output

Project Structure

VLM Model

Dependencies

Troubleshooting

Running as a Service

Health Monitoring

Web UI

Housekeeping

Building Executable

License

About

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages