Analyze $50B+ in SBIR/STTR funding data: Track technology transitions, patent outcomes, and economic impact of federal R&D investments.
- π 533K+ SBIR awards from 1983-present across all federal agencies
- π 40K-80K technology transitions detected using 6 independent signals
- π CET classification for Critical & Emerging Technology trend analysis
- π° Economic impact analysis with ROI and federal tax receipt estimates
- π Patent ownership chains tracking SBIR-funded innovation outcomes
- Python 3.11+ (required)
- Docker (optional, for local Neo4j database)
- AWS credentials (optional, for cloud features and S3 data)
Get started in 2 minutes:
git clone https://github.com/your-org/sbir-analytics
cd sbir-analytics
make install # Install dependencies with uv
make dev # Start Dagster UI
# Open http://localhost:3000Next steps:
- Materialize
raw_sbir_awardsasset in Dagster UI - Explore data in Neo4j Browser (http://localhost:7474)
- See Getting Started Guide for detailed walkthrough
For production use, see Deployment Guide for:
- GitHub Actions (orchestrates ETL pipelines via
dagster job execute) - AWS Lambda (serverless, for scheduled data downloads)
- Five-stage ETL: Extract β Validate β Enrich β Transform β Load
- Asset-based orchestration: Dagster with dependency management
- Data quality gates: Comprehensive validation at each stage
- Cloud-first design: AWS S3 + Neo4j Aura + GitHub Actions
| System | Purpose | Documentation |
|---|---|---|
| Transition Detection | Identify SBIR β federal contract transitions (β₯85% precision) | docs/transition/ |
| CET Classification | ML-based technology area classification | docs/ml/ |
| PaECTER Embeddings | Patent-award similarity using semantic embeddings | docs/ml/paecter.md |
| Fiscal Returns | Economic impact & ROI analysis using StateIO | docs/fiscal/ |
| Patent Analysis | USPTO patent chains and tech transfer tracking | docs/schemas/patent-neo4j-schema.md |
- Orchestration: Dagster 1.7+ (asset-based pipeline), GitHub Actions
- Database: Neo4j 5.x (graph database for relationships)
- Processing: DuckDB 1.0+ (analytical queries), Pandas 2.2+
- Configuration: Pydantic 2.8+ (type-safe YAML config)
- Deployment: Docker, AWS Lambda, GitHub Actions
| Topic | Description |
|---|---|
| Getting Started | Detailed setup guides for local, cloud, and ML workflows |
| Architecture | System design, patterns, and technical decisions |
| Deployment | Production deployment options and guides |
| Testing | Testing strategy, guides, and coverage |
| Schemas | Neo4j graph schema and data models |
| API Reference | Code documentation and API reference |
See Documentation Index for complete map.
sbir-analytics/
βββ src/ # Source code
β βββ assets/ # Dagster asset definitions
β βββ extractors/ # Data extraction (SBIR, USAspending, USPTO)
β βββ enrichers/ # External enrichment and fuzzy matching
β βββ transformers/ # Business logic and normalization
β βββ loaders/ # Neo4j loading and relationship creation
β βββ ml/ # Machine learning (CET classification)
βββ tests/ # Unit, integration, and E2E tests
βββ docs/ # Documentation
βββ config/ # YAML configuration files
βββ .kiro/ # Kiro specifications
βββ infrastructure/ # AWS CDK and deployment configs
See CONTRIBUTING.md for detailed breakdown.
# Development
make install # Install dependencies
make dev # Start Dagster UI
make test # Run tests
make lint # Run linters
# Docker (alternative)
make docker-build # Build Docker image
make docker-up-dev # Start development stack
make docker-test # Run tests in container
# Data operations
make transition-mvp-run # Run transition detection
make cet-pipeline-dev # Run CET classificationSee Makefile for all available commands.
Configuration uses YAML files with environment variable overrides:
# Override any config using SBIR_ETL__SECTION__KEY pattern
export SBIR_ETL__NEO4J__URI="neo4j+s://your-instance.databases.neo4j.io"
export SBIR_ETL__ENRICHMENT__BATCH_SIZE=200See Configuration Guide for details.
We welcome contributions! See CONTRIBUTING.md for:
- Development setup and workflow
- Code quality standards (black, ruff, mypy)
- Testing requirements (β₯80% coverage)
- Pull request process
make test # Run all tests
make test-unit # Unit tests only
make test-integration # Integration tests
make test-e2e # End-to-end testsSee Testing Guide for details.
This project is licensed under the MIT License. Copyright (c) 2025 Conrad Hollomon.
This project makes use of and is grateful for the following open-source tools and research:
- StateIO - State-level economic input-output modeling framework by USEPA
- Bayesian Mixture-of-Experts - Research on calibration and uncertainty estimation by Albus Yizhuo Li
- PaECTER - Patent similarity model by Max Planck Institute
- @SquadronConsult - Help with SAM.gov data integration
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: docs/