Parallel implementation of a 9-stage stereo matching algorithm using OpenMP on x86 and TI C6678 DSP platforms.
Course: PPEM (Parallel Programming for Embedded Multicore) Institution: INSA Rennes Team Size: 3 members Duration: 3 weeks - 10 hours Target Platforms: x86 (Windows/Linux), TI C6678 8-core DSP
This project parallelizes a depth-from-stereo algorithm that processes stereo camera images to generate depth maps. The algorithm executes 9 sequential stages per frame, with parallelization opportunities in the computationally intensive stages.
YUV Read → YUV2RGB → RGB2Gray → Census → Cost Construction (60 disparities)
→ Aggregate Costs → Disparity Select → Median Filter → MD5 Validation → Display
- ✅ Parallelize on x86 using OpenMP (≥2x speedup)
- ✅ Port to C6678 DSP and parallelize (≥2x speedup)
- ✅ Cross-platform validation (identical MD5 hashes)
- ✅ Professional report (≤5 pages, English)
- ✅ Live demonstration on both platforms
For x86 development:
- Windows: Visual Studio 2022, CMake ≥3.15
- Linux: GCC/Clang with OpenMP, CMake, SDL2 development libraries
For C6678 development:
- Code Composer Studio (CCS) v12.x
- TI C6000 compiler tools
- C6678 hardware access (EVM or lab setup)
# Generate Visual Studio solution
cmake -G "Visual Studio 17 2022" -B bin
# Build
cmake --build bin --config Release
# Run
cd bin/Release
./stereo.exe# Generate Makefiles
cmake -B bin
# Build
cmake --build bin
# Run
cd bin
./stereoSee docs/issues/epic_04_c6678_sequential/ for detailed CCS project setup instructions.
PPEM-project/
├── src/ # Algorithm implementation (9 stages)
│ ├── main.c # Main loop, orchestrates pipeline
│ ├── costConstruction.c # Hotspot #1 for parallelization
│ ├── aggregateCost.c # Hotspot #2 for parallelization
│ └── ... # Other algorithm stages
├── include/ # Header files
│ ├── params.h # Image dimensions, disparity range
│ └── ... # Function declarations
├── dat/ # Test stereo image datasets
├── lib/ # SDL2 libraries (x86 only)
├── bin/ # Build outputs, generated by CMake
├── CMakeLists.txt # x86 build configuration
└── README.md # This file
Browse issues in docs/issues/ organized by epic:
- Epic 1: Foundation (setup, validation)
- Epic 2: x86 Baseline (profiling, documentation)
- Epic 3: x86 Parallel (OpenMP implementation)
- Epic 4: C6678 Sequential (DSP porting)
- Epic 5: C6678 Parallel (DSP parallelization)
- Epic 6: Validation & Report (cross-platform, demo, report)
git checkout main
git pull origin main
git checkout -b epic-XX/issue-YY-description- Follow subtasks in the issue file (
docs/issues/epic_XX_name/issue_YY.md) - Test locally: build, run, validate MD5
- Commit frequently with clear messages
- Use PR template (
.github/PULL_REQUEST_TEMPLATE.md) - Request code review from ≥1 team member
- Address feedback, then merge
- OpenMP: Parallel programming API (thread-level parallelism)
- CMake: Cross-platform build system generator
- SDL2: Simple DirectMedia Layer (visualization on x86)
- TI C6678: 8-core DSP (1.25 GHz C66x cores, 4 MB L2 cache)
- Code Composer Studio: TI's IDE for embedded development
| Platform | Sequential Time | Parallel Time | Speedup | Target |
|---|---|---|---|---|
| x86 (8 cores) | Baseline | TBD | ≥2.0x | Required |
| C6678 (8 cores) | Baseline | TBD | ≥2.0x | Required |
Critical validation: MD5 hashes must match across:
- Sequential vs. parallel (same platform)
- x86 vs. C6678 (cross-platform)
-
M1 - Foundation Complete (Week 1, Dec 1)
- Development environments working
- Sequential code validated with baseline MD5
-
M2 - x86 Parallelization Complete (Week 2-3, Dec 12)
- x86 parallelized with ≥2x speedup
- MD5 validation passing
-
M3 - C6678 Sequential Port Complete (Week 2-3, Dec 12)
- C6678 compiles and runs sequentially
- MD5 matches x86 baseline
-
M4 - C6678 Parallelization Complete (Week 3-4, Dec 19)
- C6678 parallelized with ≥2x speedup
- MD5 validation passing
-
M5 - Project Delivery (Week 4-5, Dec 27)
- Report complete (≤5 pages)
- Demo rehearsed and successful
- Code packaged for submission
See docs/project_milestones.md for detailed timeline.
The algorithm computes an MD5 hash of the output depth map for correctness validation.
Baseline MD5: Stored in bin/md5.txt after first sequential run (Issue #4)
Validation rules:
- Parallel code must produce identical MD5 as sequential
- C6678 must produce identical MD5 as x86
- Use
schedule(static)in OpenMP to ensure determinism