Skip to content

A Rust-based CLI tool that generates Software Bill of Materials (SBOM) documents for C++ projects in CycloneDX format. The tool scans C++ project directories to discover dependencies from multiple package managers and source code.

Notifications You must be signed in to change notification settings

ephraimfeldblum/cpp-cbom-builder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

cpp-sbom-builder

A Rust-based CLI tool that generates Software Bill of Materials (SBOM) documents for C++ projects in CycloneDX format. The tool scans C++ project directories to discover dependencies from multiple package managers and source code.

Features

  • Multi-source dependency detection:

    • Conan package manager (conanfile.txt, conanfile.py)
    • vCPkg package manager (vcpkg.json)
    • CMake (CMakeLists.txt)
    • Meson (meson.build)
    • Bazel (WORKSPACE, MODULE.bazel)
    • ROS (package.xml)
    • Source code analysis (C++ #include directives)
    • Binary artifacts (Windows .dll imports, .exe, .lib, .so, .a, .o, .obj, .dylib)
  • Standard SBOM output: CycloneDX 1.7 format JSON

  • High-performance parallel scanning: Optimized for large monorepos (10GB+) using rayon

  • Smart deduplication: Merges components from multiple sources, prioritizing entries with version information

  • Multi-layer version detection: Extracts versions from build artifacts (vcpkg_installed/), CMake config files, lock files (conanfile.lock, vcpkg-lock.json, meson.lock, Bazel .lock, Meson .wrap), and source code macros (including major/minor/patch semantic versions)

  • Build artifact exclusion: Automatically skips 34 common build/vendor directories for faster scanning

  • Fallback source scanning: Analyzes C++ #include directives when no manifest files are found

Build

Prerequisites

  • Rust 1.56+ (edition 2021)
  • Cargo

Debug Build

cargo build

Release Build

cargo build --release

Usage

Basic Usage

cargo run -- --path <C++ project directory>

With Custom Output Path

cargo run -- --path <C++ project directory> --output my-sbom.json

Command-line Options

  • -p, --path <PATH>: Path to the C++ project root (required)
  • -o, --output <PATH>: Output file path (default: sbom.json)

How It Works

  1. Efficient Directory Traversal:

    • Walks the project directory tree with early directory skipping (build artifacts, vendor dirs, etc.)
    • Uses walkdir crate for efficient recursive scanning
  2. Manifest-based Detection (Primary):

    • Discovers and dispatches to format-specific parsers via compile-time perfect hash function
    • Conan: Parses conanfile.txt/conanfile.py for explicit dependencies and versions
    • vCPkg: Parses vcpkg.json for dependencies
    • CMake: Extracts find_package() and FetchContent_Declare() calls
    • Meson/Bazel/ROS: Format-specific manifest parsing
    • Collects version information from lock files and build artifacts
  3. Component Deduplication:

    • Merges components discovered from multiple sources using hash-based lookup
    • Prefers entries with version information over those without
    • Uses zero-copy insertion via raw HashMap entry API
  4. Fallback Source Scanning (When no manifests found):

    • Analyzes C++ source files for #include directives
    • Filters out standard library headers using comprehensive C++98-C++26 header set
    • Extracts library names and validates against naming conventions
    • Searches source code for version macros
  5. SBOM Generation:

    • Serializes all components to CycloneDX 1.7 format JSON
    • Maintains consistent ordering and structure

Accuracy & Limitations

False Positives & Inaccuracies

Challenge: Without a compiler, distinguishing between standard library, internal headers, and third-party libraries is difficult.

Solutions Implemented:

  • Standard Library Filtering: Maintains a phf_set with 120+ C++ standard library headers (C++98 through C++26) for constant-time lookup
  • System Header Exclusion: Skips relative paths (../, ./) and system paths (sys/, linux/, windows/, etc.)
  • Library Name Validation: Only accepts lowercase alphanumeric identifiers with underscores/hyphens—rejects file names, paths, and C++ keywords
  • Path Analysis: Extracts the first component of include paths (e.g., boost/ from boost/asio.hpp) as the library name

Remaining Limitations:

  • Headers from only-header libraries detected via source code have no version info
  • Internal project headers with single-word names may be detected as dependencies
  • Third-party headers not following common naming patterns may be missed
  • CMake string variables are not expanded

Version Detection Strategy

For vCPkg dependencies:

  1. Build artifacts: Extracts versions from vcpkg_installed/<lib>/<version> directory structure
  2. CMake config files: Parses FindXXX.cmake and XXXConfig.cmake for version patterns
  3. Lock files: Extracts versions from vcpkg-lock.json metadata

For Conan dependencies:

  • Parses version directly from manifest files (conanfile.txt, conanfile.py)
  • Falls back to conanfile.lock for lock file version info

For all detected dependencies (source scanning):

  • Searches source code for version macros (e.g., BOOST_LIB_VERSION, OPENSSL_VERSION)

This layered approach provides version info even when dependencies lack explicit manifest entries or are only detected via source code scanning.

Testing

To verify the SBOM output format:

cargo test

Performance Characteristics

The tool is optimized for efficiency through:

  • Early directory skipping: Avoids traversing build artifacts, vendor directories, and cache folders
  • Lazy-evaluated regex patterns: Patterns compiled once and cached via lazy-regex
  • Perfect hash function for parser dispatch: O(1) manifest type lookup using phf::Map
  • HashSet-based component collection: Avoids repeated allocations during scanning

Excluded Directories

The tool automatically skips these 34 directories to avoid scanning artifacts:

Build Systems: build/, target/, bazel-out/, bazel-bin/, bazel-cache/, cmake-build-debug/, cmake-build-release/, cmake-build-relwithdebinfo/, dist/, out/, obj/, bin/

Version Control & IDEs: .git/, .github/, .vscode/, .idea/, .cmake/, .tox/

Package Managers: vendor/, node_modules/, .conan/, .gradle/, .m2/, .venv/, venv/, env/

Dependencies: deps/, dependencies/, third_party/, ext/, external/

Cache/Temp: __pycache__/, .pytest_cache/, .gitignore/

Implementation Details

Module Architecture

  • main.rs: Entry point and orchestration
  • cli.rs: Command-line argument parsing
  • parser.rs: Manifest discovery and dispatcher logic
  • binary_parser.rs: Binary artifact parsing for Windows DLLs
  • {conan,vcpkg,cmake,meson,bazel,ros}_parser.rs: Format-specific parsers
  • source_parser.rs: C++ source code scanning with header filtering and parallel processing
  • cyclonedx.rs: SBOM data structures and JSON serialization

Key Design Decisions

Performance-First Architecture:

  • Efficient hash-based deduplication: Uses hashbrown::HashMap with raw entry API for zero-copy component merging; prioritizes entries with version information over those without
  • Compile-time perfect hashing: Uses phf::Map (perfect hash functions) for manifest parser dispatch—O(1) lookup with zero runtime overhead
  • Lazy regex evaluation: Regex patterns are compiled once and cached via lazy-regex to avoid repeated compilation
  • Macro-driven code generation: Uses paste! macro to generate parser dispatch logic, reducing boilerplate while maintaining type safety

Dependency Detection Strategy:

  • Multi-tier version extraction (for vCPkg): Walks vcpkg_installed/ folders → parses CMake config files → extracts from vcpkg-lock.json
  • Lock file scanning (for Conan): Parses conanfile.lock for version information
  • Header-level version detection: Searches for version macros (e.g., BOOST_LIB_VERSION) in source code when manifest versions unavailable
  • Directory exclusion filter: Skips 34 known build/vendor directories early to avoid traversing 100+ GB of artifacts in large monorepos

Robustness:

  • Fallback cascading: When no manifests exist, automatically switches to source code scanning via #include directive analysis
  • Trait-based parser interface: ManifestParser trait allows easy extension for new build systems without modifying core dispatch logic

Completed Features ✓

  • Multi-format dependency manifest parsing (Conan, vCPkg, CMake, Meson, Bazel, ROS)
  • Binary artifact analysis (Windows DLL imports)
  • High-performance parallel directory scanning (Rayon)
  • Version detection from version macros in headers (including semantic versions)
  • Skip build artifacts and vendor directories for performance
  • Smart deduplication across multiple sources with version preference
  • Fallback source code scanning for projects without manifests
  • CycloneDX 1.7 JSON output format

Future Improvements

  • SPDX format support in addition to CycloneDX
  • Configuration file to exclude custom paths or patterns
  • Vulnerability correlation with SBOM components
  • Support for transitive dependency relationships
  • Command-line option to limit directory scanning depth

About

A Rust-based CLI tool that generates Software Bill of Materials (SBOM) documents for C++ projects in CycloneDX format. The tool scans C++ project directories to discover dependencies from multiple package managers and source code.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages