Real-time AI video transformation using StreamDiffusion and NDI (Network Device Interface) for Windows.
This tool captures video from an NDI source, applies real-time AI transformation using StreamDiffusion, and outputs the result as a new NDI stream.
Pipeline:
NDI Input → StreamDiffusion (img2img) → NDI Output
- List and select from available NDI sources
- Auto-select NDI sources by name (text search)
- Real-time AI-powered video transformation with SD-Turbo
- GPU-accelerated processing with xformers or TensorRT
- Output as NDI stream for integration with OBS, vMix, Wirecast, etc.
- Easy Windows batch file launchers
- OS: Windows 10/11
- GPU: NVIDIA GPU with 8GB+ VRAM (RTX 2060 or better recommended)
- CUDA: CUDA 12.1+
- Python: 3.10
- Disk Space: ~10GB for models and dependencies
Download and install CUDA Toolkit 12.1: https://developer.nvidia.com/cuda-12-1-0-download-archive
Verify installation:
nvcc --versionDownload Miniconda for Windows: https://docs.conda.io/en/latest/miniconda.html
Install to a location with plenty of space (e.g., C:\miniconda3)
Download and install the NDI Tools: https://ndi.tv/tools/
This includes the NDI SDK required for video streaming.
conda create -n streamdiffusion python=3.10 -y
conda activate streamdiffusionpip install torch torchvision --index-url https://download.pytorch.org/whl/cu121Verify CUDA is available:
python -c "import torch; print(torch.cuda.is_available())"Should print: True
pip install xformers==0.0.22.post7cd /d C:\Projects
git clone https://github.com/cumulo-autumn/StreamDiffusion.git
cd StreamDiffusionpython -m pip install -e .Install TinyVAE (required):
python -m streamdiffusion.tools.install-tensorrtInstall TensorRT for 2-3x faster inference:
pip install tensorrt==8.6.1 --extra-index-url https://pypi.nvidia.com
pip install polygraphy onnx-graphsurgeon --extra-index-url https://pypi.nvidia.comNote: TensorRT compilation takes 5-10 minutes on first run but is cached for subsequent runs.
cd /d C:\Projects
git clone https://github.com/ktamas77/streamdiffusion-ndi.git
cd streamdiffusion-ndiconda activate streamdiffusion
pip install ndi-pythonEdit start.bat and update these paths for your system:
REM Set Python path - UPDATE THIS to match your conda environment location
set PYTHON_BIN=C:\miniconda3\envs\streamdiffusion\python.exe
REM Set StreamDiffusion path - UPDATE THIS to match your StreamDiffusion installation
set STREAMDIFFUSION_PATH=C:\Projects\StreamDiffusion\streamdiffusion_repoOption 1: Interactive Mode
start.bat- Lists all available NDI sources
- Prompts you to select one
- Uses xformers acceleration (fast startup)
Option 2: Manual Command
python main.py --acceleration tensorrt --ndi-source "your-source-name"--timeout <seconds> NDI source search timeout (default: 5)
--acceleration <mode> xformers or tensorrt (default: xformers)
--device <device> cuda or cpu (default: cuda)
--ndi-source <name> Auto-select NDI source by name (text search)
--streamdiffusion-path <path> Path to StreamDiffusion repository
Auto-select any source containing "obs":
python main.py --ndi-source obsUse TensorRT for maximum performance:
python main.py --acceleration tensorrtLonger search timeout for remote sources:
python main.py --timeout 10The processed video is available as an NDI source named:
streamdiffusion-ndi-render
Add this as a source in:
- OBS Studio (NDI plugin)
- vMix
- Wirecast
- Any NDI-compatible application
When running, you'll see output like this:
Searching for NDI sources (timeout: 5s)...
Found 2 NDI source(s):
[0] MY-PC (OBS Studio)
[1] DESKTOP-ABC (vMix - Camera 1)
Select NDI source [0-1]: 0
Selected source: MY-PC (OBS Studio)
Creating NDI receiver...
Creating NDI sender: streamdiffusion-ndi-render
Initializing StreamDiffusion...
Model: stabilityai/sd-turbo
Device: cuda
Resolution: 512x512
Prompt: cyberpunk, neon lights, dark background, glowing, futuristic
Acceleration: xformers
StreamDiffusion initialized successfully!
================================================================================
STREAMING STARTED
================================================================================
Input Source: MY-PC (OBS Studio)
Input Resolution: 1920x1080
Internal Resolution: 512x512
Output Source: streamdiffusion-ndi-render
Output Resolution: 1920x1080
Model: stabilityai/sd-turbo
Device: cuda:0
Acceleration: xformers
Prompt: cyberpunk, neon lights, dark background, glowing, futuristic
Negative Prompt: black and white, blurry, low resolution, pixelated, pixel art, low quality, low fidelity
================================================================================
Press Ctrl+C to stop
2025-10-25 14:23:45 | FPS: 18.34 | RX: 2.45 GB (24.3 MB/s) | TX: 3.12 GB (31.4 MB/s) | Frames: 1834
Stats Legend:
- FPS: Average frames per second since start
- RX: Total data received from input source (per-second rate)
- TX: Total data sent to output stream (per-second rate)
- Frames: Total frames processed
When you press Ctrl+C:
Stopping...
Cleaning up...
Processed 1834 frames in 100.0s (18.34 FPS average)
Done!
- Startup: Fast (30-60 seconds)
- Performance: Good (~15-20 FPS on RTX 3080)
- Use case: Quick testing, development
- Startup: Slow first time (5-10 minutes compilation), then fast
- Performance: Excellent (~25-35 FPS on RTX 3080)
- Use case: Production, maximum performance
- Note: Engine is cached after first compilation
Edit main.py to customize:
DEFAULT_PROMPT = "cyberpunk, neon lights, dark background, glowing, futuristic"DEFAULT_NEGATIVE_PROMPT = "black and white, blurry, low resolution"MODEL_ID = "stabilityai/sd-turbo" # or any Stable Diffusion modelWIDTH = 512 # Higher = better quality but slower
HEIGHT = 512# Anime style
DEFAULT_PROMPT = "anime style, detailed, vibrant colors, studio quality"
# Oil painting
DEFAULT_PROMPT = "oil painting, classical art style, detailed brushstrokes"
# Sketch
DEFAULT_PROMPT = "pencil sketch, hand-drawn, artistic, detailed lines"
# Watercolor
DEFAULT_PROMPT = "watercolor painting, soft colors, artistic, flowing"- Use TensorRT for best performance (2-3x faster than xformers)
- Close other GPU applications (browsers, games, etc.)
- Lower resolution if needed (try 256x256 for maximum speed)
- Reduce denoising steps (line 37:
T_INDEX_LIST = [35, 45]) - Use SD-Turbo model (already configured, optimized for speed)
| GPU | Resolution | xformers | TensorRT |
|---|---|---|---|
| RTX 4090 | 512x512 | ~30 FPS | ~50+ FPS |
| RTX 3080 | 512x512 | ~15 FPS | ~30 FPS |
| RTX 2060 | 512x512 | ~8 FPS | ~15 FPS |
- Ensure NDI Tools are installed
- Check NDI sources are on the same network
- Increase timeout:
python main.py --timeout 10 - Verify firewall isn't blocking NDI (port 5960)
pip install ndi-python- Lower resolution in
main.py(try 256x256) - Close other GPU applications
- Reduce batch size (line 38:
FRAME_BUFFER_SIZE = 1)
Ensure StreamDiffusion path is correct in start.bat:
set STREAMDIFFUSION_PATH=C:\Projects\StreamDiffusion\streamdiffusion_repoOr pass it directly via command line:
python main.py --streamdiffusion-path "C:/Projects/StreamDiffusion/streamdiffusion_repo"- Ensure TensorRT is installed:
pip install tensorrt==8.6.1 - Make sure CUDA 12.1 is installed
- Try with xformers first to verify setup
- Use TensorRT:
--acceleration tensorrt - Lower resolution in
main.py - Check GPU usage with Task Manager
- Ensure no other GPU-intensive apps are running
main.py- Main NDI processor scriptstart.bat- Quick launcher with xformers (interactive)requirements.txt- Python dependencies
main.py
├── NDI Input
│ ├── List available sources (with timeout)
│ ├── Auto-select by name or prompt user
│ ├── Connect to source
│ └── Receive UYVY/RGBA video frames
│
├── Frame Conversion
│ ├── NDI → PIL Image (RGB)
│ └── Resize to model resolution (512x512)
│
├── StreamDiffusion Pipeline
│ ├── Load SD-Turbo model
│ ├── Initialize with acceleration (xformers/TensorRT)
│ ├── Prepare prompts
│ ├── Process img2img transformation
│ └── Return PIL Image
│
├── Frame Conversion
│ ├── PIL Image → RGBA numpy array
│ └── Create NDI VideoFrameV2
│
└── NDI Output
├── Create NDI sender
├── Send processed frames
└── Output as "streamdiffusion-ndi-render"
- Triton warning: Harmless warning about missing Triton optimization (optional)
- TensorRT TracerWarnings: Normal during first-time compilation
- First frame slow: Model warmup takes a few seconds
- FutureWarnings: Diffusers library deprecation warnings (cosmetic)
Set HuggingFace cache location (optional):
set HF_HOME=C:\huggingface_cacheAdd verbose logging in main.py:
import logging
logging.basicConfig(level=logging.DEBUG)Based on StreamDiffusion (Apache 2.0)
- StreamDiffusion - Real-time diffusion pipeline
- NDI SDK - Network Device Interface
- Stability AI - SD-Turbo model
- xformers - Memory-efficient attention
- TensorRT - High-performance inference
For issues and questions:
- StreamDiffusion: https://github.com/cumulo-autumn/StreamDiffusion/issues
- NDI: https://ndi.tv/support/
- This repo: Create an issue
Tested on:
- Windows 11
- NVIDIA RTX 3080 (10GB VRAM)
- CUDA 12.1
- Python 3.10
- StreamDiffusion v1