The author prefers working on code directly rather than focusing on specifications. While the AI BOM specifications are being sorted by committees, this repository focuses on analyzing code of AI/LLM-based projects to identify components. Once BOM formats are finalized, this tool might become the aibom generator by cyfinoid or be archived as more mature solutions appear.
To reiterate: This is experimental code. We make no guarantees it will produce output in the correct format or work consistently.
A client-side web application that analyzes GitHub repositories for AI/LLM usage and generates machine-readable AI Bill of Materials (AI BOM) in CycloneDX 1.7 and SPDX 3.0.1 formats.
- Smart Detection: Language-aware scanning with provider-specific model identification
- Machine-Readable: Individual components per dependency with exact line numbers and code snippets
- Precise Evidence: Every finding links to exact file and line number in GitHub
- Zero False Positives: Filters out MIME types, framework imports, and documentation files
- No Backend: Runs entirely in your browser
- Standards Compliant: CycloneDX 1.7 (JSON/XML) and SPDX 3.0.1
GitHub Personal Access Token (REQUIRED)
The tool requires a GitHub token to operate efficiently. Analysis typically needs 100+ API requests:
- Without token: 60 requests/hour (insufficient for most repositories)
- With token: 5,000 requests/hour
Create a token: GitHub Settings β Developer settings β Personal access tokens β Generate new token (classic)
- No scopes required - leave all checkboxes unchecked
- This provides read-only access to public repositories
- Clone this repository
- Open
index.htmlin a modern browser (Chrome, Firefox, Safari, Edge) - Enter a GitHub repository (
owner/repoor full URL) - Paste your GitHub token (required for accurate analysis)
- Click "Analyze Repository"
No server, no installation, no dependencies required - just open the HTML file!
Note: The code is now split into manageable JavaScript files in the js/ folder for easier maintenance.
Individual components for each AI/LLM library with version and exact location:
- Primary Detection: Uses GitHub's Dependency Graph SBOM API (SPDX format) for comprehensive, accurate dependency detection
- Fallback Detection: Manual parsing of manifest files when SBOM API is unavailable
- Supported Ecosystems: Python, Node.js, Go, Java, Rust
- Example Libraries:
openai,anthropic,langchain,transformers,chromadb,@anthropic-ai/sdk,ai
Identifies specific models with type classification:
- Text Generation: GPT-4o, Claude-3, Gemini-Pro
- Embeddings: text-embedding-3-large, models/embedding-001
- Image Generation: DALL-E-3, Stable Diffusion
SDK imports and API calls with precise line numbers
Model names and API keys in config files
Detects specialized compute requirements:
- GPU: CUDA, cuDNN, PyTorch GPU, TensorFlow GPU
- TPU: TensorFlow TPU, JAX TPU configurations
- Specialized: TensorRT, OpenVINO, ONNX Runtime
Identifies deployment platforms and tools:
- Containerization: Docker, Docker Compose, GPU-enabled containers
- Orchestration: Kubernetes deployments and services
- Cloud Platforms: AWS SageMaker, GCP Vertex AI, Azure ML, AWS Bedrock
- MLOps: MLflow, Weights & Biases, TensorBoard, ClearML
Analyzes model documentation for responsible AI:
- Intended Use: Purpose and use case documentation
- Limitations: Known constraints and limitations
- Ethical Considerations: Privacy, consent, responsible use
- Bias & Fairness: Demographic parity, fairness assessments
Evaluates security and compliance risks:
- Missing Documentation: README, MODEL_CARD, SECURITY files
- Vulnerabilities: Deprecated packages, known issues
- Compliance: Documentation completeness scoring
- Recommendations: Actionable improvement suggestions
This project was developed with Cursor IDE and Claude Code. All AI-generated code has been reviewed and validated to ensure quality and correctness.
- Individual components per dependency with PURL
- ML model components with type classification
- Evidence with file:line precision
- Relationship tracking for duplicate models
- AIPackage elements for models
- Provider attribution and model types
- Detection method provenance
- Standards-compliant relationships
A comprehensive format that enhances standard BOMs with industry best practices from Snyk and Trail of Bits:
Hardware Detection:
- GPU/TPU/specialized compute requirements
- CUDA, TensorRT, OpenVINO detection
- Hardware libraries and dependencies
Infrastructure & Deployment:
- Containerization (Docker, container images)
- Orchestration (Kubernetes)
- Cloud platforms (AWS SageMaker, GCP Vertex AI, Azure ML)
- MLOps tools (MLflow, Weights & Biases, TensorBoard)
Model Governance:
- Documented intended use
- Limitations and constraints
- Ethical considerations
- Bias and fairness assessments
- Model provenance
Risk Assessment:
- Missing documentation warnings
- Security considerations
- Deprecated dependencies
- Overall risk level evaluation
- Actionable recommendations
Data Pipeline:
- Data loading libraries
- Preprocessing frameworks
- Feature engineering tools
- ML frameworks
Summary Statistics:
- Documentation completeness score
- Category breakdown
- Risk level assessment
Analysis Notes: Scan-specific gaps
- Components scanned for but not found in this repository
- Documentation: README, MODEL_CARD, SECURITY files (if missing)
- Hardware: GPU/TPU libraries (if not detected)
- Infrastructure: Docker, Kubernetes, cloud configs (if not found)
- Governance: Model documentation (if models present but no governance)
- Data Pipeline: Data processing libraries (if not detected)
- Actionable suggestions for improving AIBOM completeness
Note: Analysis notes are scan-specific - they list what we actively searched for in THIS repository but didn't find. Not philosophical limitations, but practical gaps that could be filled.
- SBOM-First Approach: Leverages GitHub's Dependency Graph SBOM API for efficient, standardized dependency detection
- Automatic Fallback: If SBOM API is unavailable, falls back to manual parsing of manifest files
- Smart Scanning: Only scans file types matching detected repository languages (TypeScript β .ts/.tsx, Python β .py)
- Provider Detection: Identifies which AI providers are used (OpenAI, Google, HuggingFace, etc.) based on dependencies
- Model Classification: Distinguishes between LLM (text-generation), embedding models, and image generation
- Precise Evidence: Tracks exact file and line number for every detection
- Machine-Readable: Each dependency is a separate component, queryable and tooling-compatible
For a repository using LangChain with Google AI:
{
"components": [
{
"type": "library",
"name": "langchain-google-genai",
"version": "1.0.0",
"purl": "pkg:pypi/langchain-google-genai@1.0.0",
"properties": [
{"name": "cdx:evidence:location:0", "value": "requirements.txt:12"}
]
},
{
"type": "machine-learning-model",
"author": "Google",
"name": "models/embedding-001",
"properties": [
{"name": "category", "value": "embeddings"},
{"name": "intended-use", "value": "Text embeddings for semantic search"},
{"name": "evidence:location:1", "value": "main.py:23"}
]
}
]
}- β All processing in browser (no backend)
- β No data sent to external servers (except GitHub API)
- β Tokens never stored or transmitted
- β Generated BOMs remain local
Chrome/Edge 90+, Firefox 88+, Safari 14+, Modern mobile browsers
GNU General Public License v3 (GPLv3) - see LICENSE file for details.
Join our Discord server for discussions, questions, and collaboration:
Connect with other security researchers, share your findings, and get help with usage and development.
- Attempts to follow CycloneDX 1.7 and SPDX 3.0.1 specifications (not claiming full compliance)
- Inspired by the need for AI transparency in software supply chains
This tool is designed for security auditing and analysis of systems you own or have explicit permission to analyze. Always ensure you have proper authorization before using this tool against any systems or repositories you don't own. The authors are not responsible for any misuse of this software.
Cutting-Edge Software Supply Chain Security Research
Pioneering advanced software supply chain security research and developing innovative security tools for the community. This tool is part of our free research toolkit - helping security researchers and organizations identify software supply chain vulnerabilities and assess license compliance.
Specializing in software supply chain attacks, CI/CD pipeline security, and offensive security research. Our research tools help organizations understand their software supply chain vulnerabilities and develop effective defense strategies.
Explore our professional training programs, latest research insights, and free open source tools developed from our cutting-edge cybersecurity research.
Upcoming Trainings | Read Our Blog | Open Source by Cyfinoid
Hands-on training in software supply chain security, CI/CD pipeline attacks, and offensive security techniques
Β© 2025 Cyfinoid Research.