A Rust-based CLI tool that generates Software Bill of Materials (SBOM) documents for C++ projects in CycloneDX format. The tool scans C++ project directories to discover dependencies from multiple package managers and source code.
-
Multi-source dependency detection:
- Conan package manager (
conanfile.txt,conanfile.py) - vCPkg package manager (
vcpkg.json) - CMake (
CMakeLists.txt) - Meson (
meson.build) - Bazel (
WORKSPACE,MODULE.bazel) - ROS (
package.xml) - Source code analysis (C++
#includedirectives) - Binary artifacts (Windows
.dllimports,.exe,.lib,.so,.a,.o,.obj,.dylib)
- Conan package manager (
-
Standard SBOM output: CycloneDX 1.7 format JSON
-
High-performance parallel scanning: Optimized for large monorepos (10GB+) using
rayon -
Smart deduplication: Merges components from multiple sources, prioritizing entries with version information
-
Multi-layer version detection: Extracts versions from build artifacts (
vcpkg_installed/), CMake config files, lock files (conanfile.lock,vcpkg-lock.json,meson.lock, Bazel.lock, Meson.wrap), and source code macros (including major/minor/patch semantic versions) -
Build artifact exclusion: Automatically skips 34 common build/vendor directories for faster scanning
-
Fallback source scanning: Analyzes C++
#includedirectives when no manifest files are found
- Rust 1.56+ (edition 2021)
- Cargo
cargo buildcargo build --releasecargo run -- --path <C++ project directory>cargo run -- --path <C++ project directory> --output my-sbom.json-p, --path <PATH>: Path to the C++ project root (required)-o, --output <PATH>: Output file path (default:sbom.json)
-
Efficient Directory Traversal:
- Walks the project directory tree with early directory skipping (build artifacts, vendor dirs, etc.)
- Uses
walkdircrate for efficient recursive scanning
-
Manifest-based Detection (Primary):
- Discovers and dispatches to format-specific parsers via compile-time perfect hash function
- Conan: Parses
conanfile.txt/conanfile.pyfor explicit dependencies and versions - vCPkg: Parses
vcpkg.jsonfor dependencies - CMake: Extracts
find_package()andFetchContent_Declare()calls - Meson/Bazel/ROS: Format-specific manifest parsing
- Collects version information from lock files and build artifacts
-
Component Deduplication:
- Merges components discovered from multiple sources using hash-based lookup
- Prefers entries with version information over those without
- Uses zero-copy insertion via raw HashMap entry API
-
Fallback Source Scanning (When no manifests found):
- Analyzes C++ source files for
#includedirectives - Filters out standard library headers using comprehensive C++98-C++26 header set
- Extracts library names and validates against naming conventions
- Searches source code for version macros
- Analyzes C++ source files for
-
SBOM Generation:
- Serializes all components to CycloneDX 1.7 format JSON
- Maintains consistent ordering and structure
Challenge: Without a compiler, distinguishing between standard library, internal headers, and third-party libraries is difficult.
Solutions Implemented:
- Standard Library Filtering: Maintains a
phf_setwith 120+ C++ standard library headers (C++98 through C++26) for constant-time lookup - System Header Exclusion: Skips relative paths (
../,./) and system paths (sys/,linux/,windows/, etc.) - Library Name Validation: Only accepts lowercase alphanumeric identifiers with underscores/hyphens—rejects file names, paths, and C++ keywords
- Path Analysis: Extracts the first component of include paths (e.g.,
boost/fromboost/asio.hpp) as the library name
Remaining Limitations:
- Headers from only-header libraries detected via source code have no version info
- Internal project headers with single-word names may be detected as dependencies
- Third-party headers not following common naming patterns may be missed
- CMake string variables are not expanded
For vCPkg dependencies:
- Build artifacts: Extracts versions from
vcpkg_installed/<lib>/<version>directory structure - CMake config files: Parses
FindXXX.cmakeandXXXConfig.cmakefor version patterns - Lock files: Extracts versions from
vcpkg-lock.jsonmetadata
For Conan dependencies:
- Parses version directly from manifest files (
conanfile.txt,conanfile.py) - Falls back to
conanfile.lockfor lock file version info
For all detected dependencies (source scanning):
- Searches source code for version macros (e.g.,
BOOST_LIB_VERSION,OPENSSL_VERSION)
This layered approach provides version info even when dependencies lack explicit manifest entries or are only detected via source code scanning.
To verify the SBOM output format:
cargo testThe tool is optimized for efficiency through:
- Early directory skipping: Avoids traversing build artifacts, vendor directories, and cache folders
- Lazy-evaluated regex patterns: Patterns compiled once and cached via
lazy-regex - Perfect hash function for parser dispatch: O(1) manifest type lookup using
phf::Map - HashSet-based component collection: Avoids repeated allocations during scanning
The tool automatically skips these 34 directories to avoid scanning artifacts:
Build Systems: build/, target/, bazel-out/, bazel-bin/, bazel-cache/, cmake-build-debug/, cmake-build-release/, cmake-build-relwithdebinfo/, dist/, out/, obj/, bin/
Version Control & IDEs: .git/, .github/, .vscode/, .idea/, .cmake/, .tox/
Package Managers: vendor/, node_modules/, .conan/, .gradle/, .m2/, .venv/, venv/, env/
Dependencies: deps/, dependencies/, third_party/, ext/, external/
Cache/Temp: __pycache__/, .pytest_cache/, .gitignore/
main.rs: Entry point and orchestrationcli.rs: Command-line argument parsingparser.rs: Manifest discovery and dispatcher logicbinary_parser.rs: Binary artifact parsing for Windows DLLs{conan,vcpkg,cmake,meson,bazel,ros}_parser.rs: Format-specific parserssource_parser.rs: C++ source code scanning with header filtering and parallel processingcyclonedx.rs: SBOM data structures and JSON serialization
Performance-First Architecture:
- Efficient hash-based deduplication: Uses
hashbrown::HashMapwith raw entry API for zero-copy component merging; prioritizes entries with version information over those without - Compile-time perfect hashing: Uses
phf::Map(perfect hash functions) for manifest parser dispatch—O(1) lookup with zero runtime overhead - Lazy regex evaluation: Regex patterns are compiled once and cached via
lazy-regexto avoid repeated compilation - Macro-driven code generation: Uses
paste!macro to generate parser dispatch logic, reducing boilerplate while maintaining type safety
Dependency Detection Strategy:
- Multi-tier version extraction (for vCPkg): Walks
vcpkg_installed/folders → parses CMake config files → extracts fromvcpkg-lock.json - Lock file scanning (for Conan): Parses
conanfile.lockfor version information - Header-level version detection: Searches for version macros (e.g.,
BOOST_LIB_VERSION) in source code when manifest versions unavailable - Directory exclusion filter: Skips 34 known build/vendor directories early to avoid traversing 100+ GB of artifacts in large monorepos
Robustness:
- Fallback cascading: When no manifests exist, automatically switches to source code scanning via
#includedirective analysis - Trait-based parser interface:
ManifestParsertrait allows easy extension for new build systems without modifying core dispatch logic
- Multi-format dependency manifest parsing (Conan, vCPkg, CMake, Meson, Bazel, ROS)
- Binary artifact analysis (Windows DLL imports)
- High-performance parallel directory scanning (Rayon)
- Version detection from version macros in headers (including semantic versions)
- Skip build artifacts and vendor directories for performance
- Smart deduplication across multiple sources with version preference
- Fallback source code scanning for projects without manifests
- CycloneDX 1.7 JSON output format
- SPDX format support in addition to CycloneDX
- Configuration file to exclude custom paths or patterns
- Vulnerability correlation with SBOM components
- Support for transitive dependency relationships
- Command-line option to limit directory scanning depth