Prism

Semantic Search for Autonomous Vehicle & Robotics Datasets

Find any frame in terabytes of sensor data using natural language. 100% local. Zero cloud dependencies.

Quick Start • How It Works • Features • Docs • Contributing

The Problem

Engineers working with autonomous vehicles and robotics generate terabytes of sensor data. Finding specific scenarios—a pedestrian jaywalking, a cyclist at dusk, a truck blocking an intersection—means hours of manual review or brittle keyword searches through metadata.

The Solution

Prism lets you search your image and video datasets with natural language:

"red car turning left at intersection"

"pedestrian with umbrella crossing street"

"construction zone with orange cones"

Prism uses state-of-the-art vision AI (YOLOv8 for detection, Google SigLIP for semantic understanding) running entirely on your local machine. Your data never leaves your network.

Quick Start

Requirements

Python 3.9+ (GPU recommended: CUDA or Apple MPS)
Go 1.21+

Installation

# Clone the repository
git clone https://github.com/sjanney/prism.git
cd prism

# Install dependencies and build
make install
make build

# Launch Prism
./run_prism.sh

Your First Search

Select Index New Data → enter data/sample → press Enter
Wait for indexing to complete (~10 seconds)
Select Search Dataset → type car → press Enter

You're now searching images with natural language.

💡 First run downloads AI models (~2GB). Subsequent launches are instant.

How It Works

┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Images/   │────▶│   YOLOv8    │────▶│   SigLIP    │
│   Videos    │     │  Detection  │     │  Embedding  │
└─────────────┘     └─────────────┘     └─────────────┘
                                              │
                                              ▼
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   Search    │◀────│   Vector    │◀────│   SQLite    │
│   Results   │     │  Similarity │     │   + NumPy   │
└─────────────┘     └─────────────┘     └─────────────┘

Indexing: For each image/video frame, Prism detects objects (cars, pedestrians, signs) and generates semantic embeddings
Storage: Embeddings are stored locally in SQLite with NumPy vector blobs
Search: Your query is embedded and compared against all indexed frames using cosine similarity
Results: Matching frames are ranked and displayed in the TUI

Why Local-First?

Challenge	Prism's Approach
Data Sensitivity	Proprietary AV data stays on your machine—no cloud uploads
Cost	No egress fees, no API costs, no subscription
Speed	Sub-second queries on local hardware
Compliance	Full control for GDPR, SOC2, and enterprise security requirements
Offline	Works without internet after initial model download

Features

Core Capabilities

Feature	Description
Natural Language Search	Query with plain English: "truck at loading dock"
Video Indexing	Automatically extract and index frames from MP4, AVI, MOV, MKV
Object-Aware	YOLOv8 detects 80+ object classes for context-rich indexing
Cross-Platform	Runs on macOS (MPS), Linux (CUDA), and Windows (CPU/CUDA)
Terminal UI	Beautiful, keyboard-driven interface with real-time progress
gRPC API	Integrate Prism into your existing pipelines

Video Support

Prism extracts frames intelligently:

1 frame per second by default (configurable)
Max 300 frames per video to prevent index bloat
Frames reference source video with timestamps

Configuration

# ~/.prism/config.yaml
video:
  enabled: true
  frames_per_second: 1.0
  max_frames_per_video: 300

device: auto  # auto, cuda, mps, cpu

Documentation

Guide	Description
Getting Started	Detailed installation and first run
Architecture	System design and data flow
Configuration	All configuration options
Benchmarks	Performance testing and diagnostics
API Reference	gRPC service documentation
Error Codes	Troubleshooting common issues

Tech Stack

Component	Technology
Frontend	Go, Bubbletea, Lipgloss
Backend	Python, PyTorch, gRPC
Detection	YOLOv8 (Ultralytics)
Embeddings	SigLIP-SO400M (Google)
Storage	SQLite, NumPy

Roadmap

Prism Pro: Unlimited indexing, S3/GCP ingestion, remote GPU mode
Export: YOLO/COCO format output for training pipelines
Clustering: Automatic scene grouping and anomaly detection
Web UI: Browser-based interface option

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

# Run the test suite
make test

# Check formatting
make fmt

License

Apache 2.0 — see LICENSE for details.

_{Built with ❤️ for the AV & robotics community by Shane Janney}

Name		Name	Last commit message	Last commit date
Latest commit History 38 Commits
.github/workflows		.github/workflows
assets		assets
backend		backend
data/sample		data/sample
docs		docs
frontend		frontend
licensing-worker		licensing-worker
proto		proto
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
Makefile		Makefile
README.md		README.md
VERSION		VERSION
codegen.sh		codegen.sh
demo.tape		demo.tape
run_backend.sh		run_backend.sh
run_frontend.sh		run_frontend.sh
run_prism.sh		run_prism.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Prism

The Problem

The Solution

Quick Start

Requirements

Installation

Your First Search

How It Works

Why Local-First?

Features

Core Capabilities

Video Support

Configuration

Documentation

Tech Stack

Roadmap

Contributing

License

About

Uh oh!

Releases

Packages

Languages

License

Thabhelo/prism

Folders and files

Latest commit

History

Repository files navigation

Prism

The Problem

The Solution

Quick Start

Requirements

Installation

Your First Search

How It Works

Why Local-First?

Features

Core Capabilities

Video Support

Configuration

Documentation

Tech Stack

Roadmap

Contributing

License

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages