Blazingly fast Git repository ingestion tool that transforms any repository into LLM-ready content
Transform any Git repository into structured, AI-ready content with lightning-fast performance. Built in Rust for maximum speed and efficiency.
This project reimagines the Python gitingest with enterprise-grade performance:
- π Dramatic Speed Improvements: 18.5x faster processing with real benchmarks
- β‘ True Concurrency: Tokio async runtime vs Python's limited threading
- π§ Memory Efficiency: Streaming architecture vs loading everything in RAM
- π οΈ Production Ready: Enterprise error handling and configuration management
- Compiled Performance: Native binary vs interpreted Python execution
- Memory Safety: Rust's ownership system prevents common memory errors
- Async by Design: Built-in concurrent processing capabilities
- Zero Runtime Dependencies: Single binary deployment
Head-to-head comparison on Kubernetes repository - the largest, most complex Go project:
Repository: kubernetes/kubernetes
ββ Total Time: 26.40 seconds β‘
ββ Files Processed: 27,621 files
ββ Memory Usage: 319 MB peak
ββ Output Size: 141 MB (text)
ββ Processing Speed: 1,046 files/sec
ββ CPU Usage: 67% efficient
Repository: kubernetes/kubernetes
ββ Total Time: 487.58 seconds (8:07) π
ββ Files Processed: 10,002 files (hit limit)
ββ Memory Usage: 817 MB peak
ββ Output Size: 64 MB (text)
ββ Processing Speed: 20.5 files/sec
ββ CPU Usage: 96% maxed out
| Metric | Fast GitIngest | Python gitingest | Improvement |
|---|---|---|---|
| Processing Time | 26.40s | 487.58s | 18.5x faster |
| Files Processed | 27,621 | 10,002 | 2.8x more files |
| Memory Usage | 319 MB | 817 MB | 2.6x less memory |
| Speed per File | 1,046 files/sec | 20.5 files/sec | 51x faster |
| CPU Efficiency | 67% | 96% | More efficient |
The Rust version processed nearly 3x more files in 18.5x less time while using 2.6x less memory!
- Concurrent Processing: 1,000 parallel file operations by default
- Batch Processing: Smart chunking (500 files/batch) prevents memory overload
- Streaming Architecture: Process repositories of any size without RAM limits
- Shallow Cloning: Skip Git history, get code instantly (depth=1)
- Pattern Matching: Advanced gitignore support with include/exclude rules
- Security First: Private repository support with GitHub tokens
- Format Flexibility: JSON, Markdown, Plain Text output
- Size Controls: Configurable limits and intelligent filtering
- AI/LLM Engineers - Get repository context for code analysis
- DevOps Teams - Repository auditing and documentation
- Code Analysis - Extract and analyze codebase structure
- Documentation Generation - Auto-generate project overviews
- Security Audits - Rapid codebase scanning
git clone https://github.com/yourusername/fast-gitingest
cd fast-gitingest
cargo build --release# Analyze any repository instantly
./target/release/gitingest https://github.com/user/awesome-project
# Specify output format and file
gitingest https://github.com/user/repo --format json -o analysis.json
# Include specific patterns only
gitingest https://github.com/user/repo --include "*.rs,*.toml" --format markdown
# Process with custom limits and verbose output
gitingest https://github.com/user/repo --max-files 50000 --verboseπ Text Format (Default)
Repository: kubernetes/kubernetes
Summary:
Repository: kubernetes/kubernetes
Files processed: 27621
Total size: 277.9 MB
Host: github.com
Directory Structure:
βββ kubernetes/
βββ .github/
βββ api/
βββ build/
βββ cmd/
File Contents:
// Structured file content here...
π JSON Format
{
"id": "uuid-here",
"repo_url": "https://github.com/kubernetes/kubernetes",
"short_repo_url": "kubernetes/kubernetes",
"summary": "Repository: kubernetes/kubernetes\nFiles processed: 27621...",
"tree": "βββ kubernetes/\n βββ .github/\n...",
"content": "// File contents here...",
"status": "Completed"
}π Markdown Format
# Repository: kubernetes/kubernetes
## Summary
Repository: kubernetes/kubernetes
Files processed: 27621
Total size: 277.9 MB
## Directory Structureβββ kubernetes/ βββ .github/ βββ cmd/
## File Contents
[Structured content with syntax highlighting]
# High-performance mode for large repositories
export CONCURRENT_FILE_LIMIT=1000 # Parallel processing limit
export BATCH_SIZE=500 # Files per batch
export MAX_FILE_SIZE=10485760 # 10MB per file limit
# Memory-optimized mode for constrained environments
export CONCURRENT_FILE_LIMIT=100
export BATCH_SIZE=50
export MAX_FILES=5000# Private repository access
export GITHUB_TOKEN="ghp_your_token_here"
# Allowed Git hosting platforms
export ALLOWED_HOSTS="github.com,gitlab.com,bitbucket.org"# Repository size controls
export MAX_TOTAL_SIZE=524288000 # 500MB total limit
export MAX_DIRECTORY_DEPTH=20 # Recursion depth limit
export DEFAULT_TIMEOUT=120 # Processing timeout (seconds)Modular, high-performance design built for scale:
βββββββββββββββββββββββ ββββββββββββββββββββββββ
β gitingest-cli/ βββββΆβ gitingest/ β
β β’ CLI Interface β β β’ Core Engine β
β β’ Argument Parsingβ β β’ Business Logic β
β β’ Output Formatting β β’ Performance Opts β
βββββββββββββββββββββββ ββββββββββββββββββββββββ
β
ββββββββββββββΌβββββββββββββ
β Services β
β βββββββββββββββββββββββ β
β β β’ IngestService β β
β β β’ GitService β β
β β β’ FileService β β
β β β’ PatternService β β
β βββββββββββββββββββββββ β
βββββββββββββββββββββββββββ
- π§ GitService: Lightning-fast shallow cloning with authentication
- π FileService: Concurrent file processing with semaphore-controlled threading
- π― PatternService: Advanced pattern matching and .gitignore support
- β‘ IngestService: End-to-end pipeline orchestration with detailed metrics
# Prepare codebase for LLM analysis
gitingest https://github.com/company/backend-service --format json
# Extract specific file types for training data
gitingest https://github.com/org/ml-project --include "*.py,*.md,*.yml"# Generate project documentation
gitingest https://github.com/team/frontend --format markdown -o project-docs.md
# Security audit preparation
gitingest https://github.com/company/webapp --include "*.js,*.ts,*.json" --verbose# Repository structure analysis
gitingest https://github.com/org/microservices --format json | jq '.tree'
# Code review preparation
gitingest https://github.com/team/feature-branch --exclude "node_modules,dist,build"# Clone and build
git clone https://github.com/yourusername/fast-gitingest
cd fast-gitingest
cargo build --release
# Run tests
cargo test
# Development build with debug symbols
cargo build
# Check compilation without building
cargo check# Enable detailed performance logging
RUST_LOG=debug ./target/release/gitingest https://github.com/user/repo --verbose
# Benchmark with system metrics
gtime -v ./target/release/gitingest https://github.com/user/repo
# Compare formats
./target/release/gitingest https://github.com/user/repo --format text
./target/release/gitingest https://github.com/user/repo --format json
./target/release/gitingest https://github.com/user/repo --format markdown- Multi-format Output: XML, YAML, TOML support
- Plugin System: Custom content processors
- Cloud Integration: S3, GCS direct upload
- REST API: HTTP service mode
- Docker Images: Containerized deployment
- GitHub Actions: CI/CD integration
- Web Interface: Browser-based analysis
MIT License - see LICENSE for details.
We welcome contributions! Please see our Contributing Guidelines for:
- π Bug reports and fixes
- β¨ Feature requests and implementations
- π Documentation improvements
- π§ͺ Performance optimizations
- π§ Platform support expansion
β Star this repo if it helped you process repositories 18.5x faster!
Report Bug β’ Request Feature β’ Documentation
Made with β‘ by developers, for developers