A high-performance C++ application for generating 3D point clouds from stereo camera images using GPU acceleration (CUDA for NVIDIA or HIP for AMD GPUs).
# Quick setup and run
git clone https://github.com/username/stereo-vision-app.git
cd stereo-vision-app
./setup_dev_environment.sh
./run.sh- One-command startup: ./run.sh up launches the stack with sensible defaults.
- Browser-based GUI (noVNC) option works cross-platform; native X11 GUI also available on Linux.
- Consistent dev/prod environment via Docker and Compose profiles.
- GPU-ready: toggle NVIDIA/AMD with ENABLE_CUDA/ENABLE_HIP at build time.
- Persistent data and logs: host directories ./data and ./logs are bind-mounted to /app/data and /app/logs and survive rebuilds/restarts.
- Reproducible builds with BuildKit caching and clean isolation from host toolchains.
- 📋 Manual Calibration Wizard: Interactive step-by-step calibration guide (✅ Now Available!)
- 🤖 AI Auto-Calibration: Intelligent automatic calibration with quality assessment (✅ Fully Functional)
- 🧠 Enhanced Neural Matcher: Real AI stereo matching with ONNX Runtime integration (✅ Just Added!)
- ⚡ Multi-Model Support: HITNet, RAFT-Stereo, CREStereo with adaptive selection
- 🚀 TensorRT Optimization: GPU-accelerated neural inference for maximum performance
- Real-time Stereo Vision: GPU-accelerated stereo matching algorithms
- ⚡ Live Processing: Real-time disparity mapping and 3D reconstruction
- Webcam Capture Integration: Direct capture from USB/built-in cameras with device selection
- Live Camera Preview: Real-time preview from left and right cameras
- Synchronized Capture: Capture perfectly synchronized stereo image pairs
- 🎯 Single Camera Mode: Manual stereo capture workflow for single camera setups
- Cross-Platform GPU Support: NVIDIA CUDA and AMD HIP backends
- 3D Point Cloud Generation: Convert disparity maps to dense point clouds
- 📊 Performance Monitoring: Real-time FPS and processing quality metrics
- Interactive GUI: User-friendly interface for parameter tuning and visualization
- Multiple Export Formats: Support for PLY, PCD, and other point cloud formats
Comprehensive interactive calibration wizard with professional-grade features:
- Step-by-step guided workflow with clear instructions at each stage
- Live pattern detection with visual feedback and quality assessment
- Multiple calibration patterns (chessboard, circles, asymmetric circles)
- Real-time quality metrics for optimal frame selection
- Interactive frame review with thumbnail gallery and detailed analysis
- Professional results display with comprehensive calibration data
- Start Camera: Camera → Start Left Camera
- Launch Wizard: Process → Calibrate Cameras (Ctrl+C)
- Configure Pattern: Set your calibration pattern type and dimensions
- Capture Frames: Follow guided frame capture with live feedback
- Review Quality: Examine captured frames and remove poor quality ones
- Generate Results: Automatic calibration computation with error analysis
Advanced AI-powered calibration system that automatically detects and captures optimal calibration frames:
- Automatic Chessboard Detection: Real-time detection with quality assessment
- Intelligent Frame Selection: AI selects frames with optimal pose diversity
- Quality Metrics: Multi-factor quality scoring (sharpness, coverage, uniformity)
- Progress Monitoring: Real-time feedback on calibration progress
- Single & Stereo Support: Works with both single camera and stereo camera setups
- Configurable Parameters: Adjustable quality thresholds and capture settings
- Start Capture: Begin webcam capture from configured cameras
- Launch AI Calibration: Process → AI Auto-Calibration (Ctrl+Alt+C)
- Position Chessboard: Move 9x6 chessboard through various positions and orientations
- Automatic Collection: AI automatically captures 20+ optimal frames
- Calibration Complete: Parameters automatically calculated and ready for use
| Feature | �📋 Manual Wizard | 🤖 AI Auto-Calibration |
|---|---|---|
| User Control | ✅ Full control over each frame | ⚡ Automated frame selection |
| Pattern Support | ✅ Multiple pattern types | 🔧 Chessboard only |
| Learning Curve | 📚 Educational, step-by-step | 🚀 Instant results |
| Quality Control | 🎯 Manual frame review | 🤖 AI quality assessment |
| Time Required | ⏱️ 5-10 minutes | ⚡ 2-3 minutes |
| Best For | 📖 Learning, precision control | 🏃 Quick setup, beginners |
Recommendation: Use the Manual Wizard for learning calibration concepts and precise control, or AI Auto-Calibration for quick, reliable results.
The interactive manual calibration wizard is under development and will provide:
- 📋 Step-by-Step Guidance: Intuitive wizard interface with clear instructions
- 🎯 Interactive Detection: Live corner detection with manual refinement tools
- 📊 Quality Visualization: Real-time quality metrics and coverage analysis
- 🔄 Multiple Patterns: Support for various calibration patterns
- 💾 Advanced Management: Comprehensive parameter saving and validation
Current Recommendation: Use the AI Auto-Calibration feature for immediate calibration needs.
Revolutionary AI-powered stereo matching with real neural network inference capabilities:
- Real Neural Network Inference: Genuine ONNX Runtime integration (no more placeholders!)
- Multiple Model Architecture Support: HITNet (speed), RAFT-Stereo (accuracy), CREStereo (balanced)
- Adaptive Backend Selection: TensorRT optimization with intelligent CPU/GPU fallback
- Professional Model Management: Automatic model loading, validation, and error handling
- Production-Ready Implementation: Comprehensive logging, error handling, and performance monitoring
- HITNet: High-speed inference optimized for real-time applications
- RAFT-Stereo: Maximum accuracy for precision-critical scenarios
- CREStereo: Balanced performance for general-purpose stereo matching
- Custom Models: Extensible architecture for additional ONNX-compatible models
- Model Selection: Choose neural model based on speed/accuracy requirements
- Automatic Setup: Model manager handles loading and optimization
- Real-time Inference: Process stereo pairs with genuine AI acceleration
- Quality Monitoring: Live performance metrics and quality assessment
- Fallback Support: Seamless fallback to traditional methods if needed
- ONNX Runtime 1.15+: Industry-standard neural inference engine
- TensorRT Integration: NVIDIA GPU optimization for maximum performance
- Smart Memory Management: Efficient model caching and memory optimization
- Error Recovery: Robust error handling with graceful degradation
- Cross-Platform Support: Windows, Linux, macOS with unified API
Recommendation: Use HITNet for real-time applications, RAFT-Stereo for maximum accuracy, or CREStereo for balanced performance.
Real-time stereo vision processing with live disparity mapping and 3D point cloud generation:
- Real-time Disparity Maps: Live computation during webcam capture
- 3D Point Cloud Generation: Instant 3D reconstruction with color mapping
- Performance Monitoring: Live FPS tracking and queue management
- GPU Acceleration: Automatic CUDA/HIP acceleration with CPU fallback
- Interactive Parameters: Real-time adjustment of processing parameters
- Quality Indicators: Live feedback on processing quality and performance
- Complete Calibration: Ensure cameras are calibrated (manual or AI)
- Start Live Processing: Process → Toggle Live Processing (Ctrl+Shift+P)
- View Live Results: Switch to "Live Processing" tab for real-time view
- Monitor Performance: Watch FPS and quality metrics in status bar
- Adjust Parameters: Use parameter panel for real-time fine-tuning
The application now supports direct webcam capture for real-time stereo vision processing:
- Camera Device Detection: Automatically detect available USB and built-in cameras
- Dual Camera Setup: Configure separate left and right camera devices
- Single Camera Mode: Use one camera for manual stereo capture (move camera between shots)
- Live Preview: Real-time preview from both cameras simultaneously
- Synchronized Capture: Capture perfectly timed stereo image pairs
- Device Testing: Test camera connections before starting capture
- Flexible Configuration: Support for different camera resolutions and frame rates
- Robust Error Handling: Clear feedback on connection issues and permissions
-
Select Cameras: Use
File → Select Cameras...to configure camera devices- Choose different cameras for left and right channels for true stereo
- OR choose the same camera for both to enable single camera manual stereo mode
- Test camera connections with live preview
- Configure camera parameters if needed
-
Start Live Capture: Use
File → Start Webcam Capture(Ctrl+Shift+S)- Live preview appears in the image display tabs
- Dual camera mode: Both cameras stream at ~30 FPS
- Single camera mode: Same feed shows in both panels for manual positioning
- Real-time feedback on capture status
-
Capture Images: Multiple capture options available
- Capture Left Image (L key): Save current camera frame as left image
- Capture Right Image (R key): Save current camera frame as right image
- Capture Stereo Pair (Space key): Save synchronized stereo pair
- Single Camera: Move camera between left/right captures for stereo pairs
-
Stop Capture: Use
File → Stop Webcam Capture(Ctrl+Shift+T)
- Ctrl+Shift+C: Open camera selector dialog
- Ctrl+Shift+S: Start webcam capture
- Ctrl+Shift+T: Stop webcam capture
- L: Capture left image (during capture)
- R: Capture right image (during capture)
- Space: Capture synchronized stereo pair
- Supported Formats: PNG, JPEG, BMP, TIFF for captured images
- Frame Rate: Up to 30 FPS live preview (hardware dependent)
- Resolution: Automatic detection of optimal camera resolution
- Synchronization: Frame-level synchronization for stereo pairs
- File Naming: Automatic timestamped file naming for captured images
- Left Mouse + Drag: Rotate view around the point cloud
- Right Mouse + Drag: Pan the camera view
- Mouse Wheel: Zoom in/out with smooth scaling
- Double Click: Reset view to default position
- R: Reset view to default position
- 1: Front view
- 2: Side view
- 3: Top view
- A: Toggle auto-rotation animation
- G: Toggle grid display
- X: Toggle coordinate axes
- Statistical Outlier Removal: Removes noisy points based on statistical analysis
- Voxel Grid Filtering: Downsamples point cloud to reduce noise and improve performance
- Radius Outlier Removal: Removes isolated points based on neighborhood density
- Real-time Preview: See filtering effects immediately
- Adjustable Parameters: Fine-tune filtering strength
- RGB Color Mode: Display original colors from stereo cameras
- Depth Color Mode: Color-code points by distance (blue=near, red=far)
- Height Color Mode: Color-code points by Y-coordinate
- Intensity Mode: Grayscale visualization based on brightness
- Quality Levels: Fast/Medium/High rendering quality
- Smooth Shading: Enhanced visual quality with lighting
- Adaptive Point Size: Automatically adjust point size based on distance
- Level-of-Detail: Optimize rendering for large point clouds
- Point Count: Total number of points in cloud
- Depth Range: Minimum and maximum depth values
- Noise Level: Percentage of potentially noisy points
- Bounding Box: 3D dimensions of the point cloud
- Memory Usage: Real-time memory consumption
- PLY Format: Binary and ASCII variants
- PCD Format: Point Cloud Data format
- XYZ Format: Simple coordinate format
- Image Export: Save current view as image
- Video Recording: Capture rotating animations
- 3D Reconstruction: Build detailed 3D models from stereo images
- Robotics: Navigation and obstacle detection
- AR/VR: Content creation for immersive experiences
- Research: Academic and industrial computer vision projects
- Quality Control: Dimensional analysis and inspection
This project supports both NVIDIA and AMD GPUs:
- NVIDIA GPUs: Uses CUDA for acceleration
- AMD GPUs: Uses ROCm/HIP for acceleration
- CPU Fallback: Automatic fallback to CPU-only mode if no GPU is detected
- OpenCV (>= 4.5): Computer vision and image processing with CUDA/OpenCL support
- PCL (Point Cloud Library >= 1.12): Point cloud processing and visualization
- Qt6 (>= 6.0): GUI framework with modern Windows 11 styling
- VTK (>= 9.0): Visualization toolkit (dependency of PCL)
- CMake (>= 3.18): Build system with AI/ML integration support
- ONNX Runtime (>= 1.15): Neural network inference engine for stereo matching
- TensorRT (>= 8.5, Optional): NVIDIA GPU acceleration for neural models
- OpenCV DNN (>= 4.5): Deep neural network support for enhanced stereo vision
- NVIDIA: CUDA Toolkit (>= 11.0) with TensorRT for optimal neural model performance
- AMD: ROCm (>= 5.0) with HIP support for GPU acceleration
📋 Complete setup instructions for Ubuntu, Windows, and macOS are available in docs/SETUP_REQUIREMENTS.md
# Run the main setup script (auto-detects NVIDIA)
./setup_dev_environment.sh# First run basic setup
./setup_dev_environment.sh
# Then run AMD-specific setup
./setup_amd_gpu.sh# Install OpenCV
sudo apt update
sudo apt install libopencv-dev
# Install PCL and VTK
sudo apt install libpcl-dev libvtk9-dev
# Install Qt6
sudo apt install qt6-base-dev qt6-opengl-dev qt6-opengl-widgets-dev
# Install additional dependencies
sudo apt install libboost-all-dev libeigen3-dev libglew-dev
# For NVIDIA: Install CUDA (follow NVIDIA's official guide)
# For AMD: Install ROCm (see setup_amd_gpu.sh)# Auto-detects GPU and builds accordingly
./build.sh./run.sh- Build and run with GUI (default, at project root)./build_scripts/build.sh- Build only./build_scripts/build_amd.sh- AMD/HIP specific build./build_scripts/build_debug.sh- Debug build with symbols
mkdir build && cd build
cmake .. -DUSE_CUDA=ON -DUSE_HIP=OFF -DWITH_ONNX=ON -DWITH_TENSORRT=ON
make -j$(nproc)mkdir build && cd build
cmake .. -DUSE_CUDA=OFF -DUSE_HIP=ON -DWITH_ONNX=ON -DWITH_TENSORRT=OFF
make -j$(nproc)mkdir build && cd build
cmake .. -DUSE_CUDA=OFF -DUSE_HIP=OFF -DWITH_ONNX=ON -DWITH_TENSORRT=OFF
make -j$(nproc)./stereo_vision_app- Print a checkerboard pattern (9x6 recommended)
- Capture calibration images with both cameras
- Use the calibration tool to compute camera parameters
- Load calibrated camera parameters
- Capture or load stereo image pairs
- Adjust stereo matching parameters
- Generate and export point cloud
computer-vision/ # 🎯 Clean, modern project structure with AI/ML integration
├── 📁 src/ # Source code
│ ├── core/ # Core algorithms (stereo, calibration)
│ ├── ai/ # Neural network implementations (Enhanced Neural Matcher)
│ ├── gui/ # Qt GUI components with modern Windows 11 styling
│ ├── gpu/ # GPU acceleration (CUDA/HIP)
│ ├── multicam/ # Multi-camera system
│ └── utils/ # Utility functions
├── 📁 include/ # Header files (mirrors src/)
│ ├── ai/ # Neural stereo matching (enhanced_neural_matcher.hpp)
│ ├── gui/ # GUI component headers
│ ├── multicam/ # Multi-camera headers
│ └── benchmark/ # Performance benchmarking
├── 📁 tests/ # Unit and integration tests
├── 📁 test_programs/ # 🧪 Standalone test utilities
│ └── README.md # Test program guide
├── 📁 documentation/ # 📖 Organized project documentation
│ ├── features/ # Feature implementation docs
│ ├── build/ # Build system documentation
│ └── setup/ # Environment setup guides
├── 📁 build_scripts/ # ⚙️ Build and utility scripts
│ ├── build*.sh # Various build configurations
│ ├── setup*.sh # Environment setup scripts
│ └── README.md # Script documentation
├── 📁 reports/ # 📊 Generated reports and benchmarks
│ └── benchmarks/ # Performance benchmark results
├── 📁 archive/ # � Historical documentation and temp files
│ ├── milestone_docs/ # Completed milestone documentation
│ └── temp_tests/ # Completed Priority 2 test implementations
├── 📁 data/ # Sample data and calibration files
├── 📁 docs/ # Technical documentation
├── 📁 logs/ # 📋 Build and runtime logs
├── 📁 scripts/ # Utility scripts
├── 📁 cmake/ # CMake modules
├── 📄 CMakeLists.txt # Build configuration
├── 📄 README.md # This file (modern, comprehensive)
├── 📄 PROJECT_MODERNIZATION_STRATEGY.md # Modernization roadmap
└── 🚀 run.sh # Main build and run script
- Start Here: README.md → run.sh
- Documentation: documentation/
- Test Hardware: test_programs/
- Build Issues: build_scripts/ → logs/
- Development: src/ → include/
- Performance Reports: reports/benchmarks/
- Project History: archive/milestone_docs/
- 🧠 Enhanced Neural Matcher - Advanced AI stereo matching with multiple model support
- 🚀 ONNX Runtime Integration - Real neural network inference replacing placeholder implementations
- 🎯 Multiple Model Support - HITNet, RAFT-Stereo, CREStereo with adaptive selection
- ⚡ TensorRT Optimization - Optional GPU acceleration for maximum performance
- 🔧 Smart Model Management - Automatic model loading, validation, and fallback handling
- Enhanced Neural Matcher: Real ONNX Runtime integration with production-ready inference
- Model Architecture Support: HITNet (high-speed), RAFT-Stereo (accuracy), CREStereo (balanced)
- Adaptive Backend: TensorRT optimization with CPU/GPU fallback handling
- Professional API: Clean C++ interface with comprehensive error handling and logging
- Neural Network Stereo Matching - TensorRT/ONNX backends with adaptive optimization
- Multi-Camera Support - Synchronized capture and real-time processing
- Professional Installers - Cross-platform packaging framework
- Enhanced Performance Benchmarking - Comprehensive testing with HTML/CSV reports
See archive/milestone_docs/PRIORITY2_COMPLETE.md for full details.
- Enhanced Neural Models: Real-time inference with ONNX Runtime optimization
- Neural Networks: 274 FPS (StereoNet), 268 FPS (PSMNet)
- Multi-Camera: 473 FPS (2 cameras), 236 FPS (4 cameras)
- Latest Reports: Available in reports/benchmarks/