Skip to content

Blazingly fast Git repository ingestion tool that transforms any repository into LLM-ready content

Notifications You must be signed in to change notification settings

0yik/fast-gitingest

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

20 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

⚑ Fast GitIngest

Blazingly fast Git repository ingestion tool that transforms any repository into LLM-ready content

Rust License: MIT Performance

Transform any Git repository into structured, AI-ready content with lightning-fast performance. Built in Rust for maximum speed and efficiency.


πŸ€– Inspired By

This project reimagines the Python gitingest with enterprise-grade performance:

Why We Built This

  • πŸš€ Dramatic Speed Improvements: 18.5x faster processing with real benchmarks
  • ⚑ True Concurrency: Tokio async runtime vs Python's limited threading
  • 🧠 Memory Efficiency: Streaming architecture vs loading everything in RAM
  • πŸ› οΈ Production Ready: Enterprise error handling and configuration management

Architecture Advantages

  • Compiled Performance: Native binary vs interpreted Python execution
  • Memory Safety: Rust's ownership system prevents common memory errors
  • Async by Design: Built-in concurrent processing capabilities
  • Zero Runtime Dependencies: Single binary deployment

πŸš€ Real-World Benchmark

Head-to-head comparison on Kubernetes repository - the largest, most complex Go project:

Fast GitIngest (Rust) - Text Format

Repository: kubernetes/kubernetes
β”œβ”€ Total Time:         26.40 seconds ⚑
β”œβ”€ Files Processed:    27,621 files
β”œβ”€ Memory Usage:       319 MB peak
β”œβ”€ Output Size:        141 MB (text)
β”œβ”€ Processing Speed:   1,046 files/sec
└─ CPU Usage:          67% efficient

Python coderamp-labs/gitingest - Text Format

Repository: kubernetes/kubernetes  
β”œβ”€ Total Time:         487.58 seconds (8:07) 🐌
β”œβ”€ Files Processed:    10,002 files (hit limit)
β”œβ”€ Memory Usage:       817 MB peak
β”œβ”€ Output Size:        64 MB (text)
β”œβ”€ Processing Speed:   20.5 files/sec
└─ CPU Usage:          96% maxed out

πŸ“Š Performance Improvements

Metric Fast GitIngest Python gitingest Improvement
Processing Time 26.40s 487.58s 18.5x faster
Files Processed 27,621 10,002 2.8x more files
Memory Usage 319 MB 817 MB 2.6x less memory
Speed per File 1,046 files/sec 20.5 files/sec 51x faster
CPU Efficiency 67% 96% More efficient

The Rust version processed nearly 3x more files in 18.5x less time while using 2.6x less memory!


πŸ’ͺ Why Fast GitIngest?

πŸ”₯ Performance Optimized

  • Concurrent Processing: 1,000 parallel file operations by default
  • Batch Processing: Smart chunking (500 files/batch) prevents memory overload
  • Streaming Architecture: Process repositories of any size without RAM limits
  • Shallow Cloning: Skip Git history, get code instantly (depth=1)

πŸ›‘οΈ Enterprise Ready

  • Pattern Matching: Advanced gitignore support with include/exclude rules
  • Security First: Private repository support with GitHub tokens
  • Format Flexibility: JSON, Markdown, Plain Text output
  • Size Controls: Configurable limits and intelligent filtering

🎯 Perfect For

  • AI/LLM Engineers - Get repository context for code analysis
  • DevOps Teams - Repository auditing and documentation
  • Code Analysis - Extract and analyze codebase structure
  • Documentation Generation - Auto-generate project overviews
  • Security Audits - Rapid codebase scanning

πŸš€ Quick Start

Installation

git clone https://github.com/yourusername/fast-gitingest
cd fast-gitingest
cargo build --release

Basic Usage

# Analyze any repository instantly
./target/release/gitingest https://github.com/user/awesome-project

# Specify output format and file
gitingest https://github.com/user/repo --format json -o analysis.json

# Include specific patterns only
gitingest https://github.com/user/repo --include "*.rs,*.toml" --format markdown

# Process with custom limits and verbose output
gitingest https://github.com/user/repo --max-files 50000 --verbose

🎨 Output Examples

πŸ“ Text Format (Default)

Repository: kubernetes/kubernetes
Summary:
Repository: kubernetes/kubernetes
Files processed: 27621
Total size: 277.9 MB
Host: github.com

Directory Structure:
└── kubernetes/
    β”œβ”€β”€ .github/
    β”œβ”€β”€ api/
    β”œβ”€β”€ build/
    └── cmd/

File Contents:
// Structured file content here...

πŸ“Š JSON Format

{
  "id": "uuid-here",
  "repo_url": "https://github.com/kubernetes/kubernetes",
  "short_repo_url": "kubernetes/kubernetes", 
  "summary": "Repository: kubernetes/kubernetes\nFiles processed: 27621...",
  "tree": "└── kubernetes/\n    β”œβ”€β”€ .github/\n...",
  "content": "// File contents here...",
  "status": "Completed"
}

πŸ“ Markdown Format

# Repository: kubernetes/kubernetes

## Summary
Repository: kubernetes/kubernetes
Files processed: 27621
Total size: 277.9 MB

## Directory Structure

└── kubernetes/ β”œβ”€β”€ .github/ └── cmd/


## File Contents
[Structured content with syntax highlighting]

βš™οΈ Advanced Configuration

Performance Tuning

# High-performance mode for large repositories
export CONCURRENT_FILE_LIMIT=1000    # Parallel processing limit
export BATCH_SIZE=500                # Files per batch
export MAX_FILE_SIZE=10485760        # 10MB per file limit

# Memory-optimized mode for constrained environments
export CONCURRENT_FILE_LIMIT=100
export BATCH_SIZE=50
export MAX_FILES=5000

Security & Access

# Private repository access
export GITHUB_TOKEN="ghp_your_token_here"

# Allowed Git hosting platforms  
export ALLOWED_HOSTS="github.com,gitlab.com,bitbucket.org"

Processing Limits

# Repository size controls
export MAX_TOTAL_SIZE=524288000      # 500MB total limit
export MAX_DIRECTORY_DEPTH=20        # Recursion depth limit
export DEFAULT_TIMEOUT=120           # Processing timeout (seconds)

πŸ—οΈ Architecture

Modular, high-performance design built for scale:

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”    β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚   gitingest-cli/    │───▢│     gitingest/       β”‚
β”‚   β€’ CLI Interface   β”‚    β”‚   β€’ Core Engine      β”‚
β”‚   β€’ Argument Parsingβ”‚    β”‚   β€’ Business Logic   β”‚  
β”‚   β€’ Output Formatting    β”‚   β€’ Performance Opts β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜    β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                                        β”‚
                           β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β–Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
                           β”‚       Services          β”‚
                           β”‚  β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β” β”‚
                           β”‚  β”‚ β€’ IngestService    β”‚ β”‚
                           β”‚  β”‚ β€’ GitService       β”‚ β”‚
                           β”‚  β”‚ β€’ FileService      β”‚ β”‚
                           β”‚  β”‚ β€’ PatternService   β”‚ β”‚
                           β”‚  β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜ β”‚
                           β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Core Performance Features

  • πŸ”§ GitService: Lightning-fast shallow cloning with authentication
  • πŸ“ FileService: Concurrent file processing with semaphore-controlled threading
  • 🎯 PatternService: Advanced pattern matching and .gitignore support
  • ⚑ IngestService: End-to-end pipeline orchestration with detailed metrics

πŸ’‘ Use Cases

AI & Machine Learning

# Prepare codebase for LLM analysis
gitingest https://github.com/company/backend-service --format json

# Extract specific file types for training data
gitingest https://github.com/org/ml-project --include "*.py,*.md,*.yml"

Documentation & Analysis

# Generate project documentation
gitingest https://github.com/team/frontend --format markdown -o project-docs.md

# Security audit preparation
gitingest https://github.com/company/webapp --include "*.js,*.ts,*.json" --verbose

DevOps & CI/CD

# Repository structure analysis
gitingest https://github.com/org/microservices --format json | jq '.tree'

# Code review preparation  
gitingest https://github.com/team/feature-branch --exclude "node_modules,dist,build"

πŸ”§ Development

Building from Source

# Clone and build
git clone https://github.com/yourusername/fast-gitingest
cd fast-gitingest
cargo build --release

# Run tests
cargo test

# Development build with debug symbols
cargo build

# Check compilation without building
cargo check

Performance Testing

# Enable detailed performance logging
RUST_LOG=debug ./target/release/gitingest https://github.com/user/repo --verbose

# Benchmark with system metrics
gtime -v ./target/release/gitingest https://github.com/user/repo

# Compare formats
./target/release/gitingest https://github.com/user/repo --format text
./target/release/gitingest https://github.com/user/repo --format json  
./target/release/gitingest https://github.com/user/repo --format markdown

πŸ“ˆ Roadmap

  • Multi-format Output: XML, YAML, TOML support
  • Plugin System: Custom content processors
  • Cloud Integration: S3, GCS direct upload
  • REST API: HTTP service mode
  • Docker Images: Containerized deployment
  • GitHub Actions: CI/CD integration
  • Web Interface: Browser-based analysis

πŸ“„ License

MIT License - see LICENSE for details.


🀝 Contributing

We welcome contributions! Please see our Contributing Guidelines for:

  • πŸ› Bug reports and fixes
  • ✨ Feature requests and implementations
  • πŸ“š Documentation improvements
  • πŸ§ͺ Performance optimizations
  • πŸ”§ Platform support expansion

⭐ Star this repo if it helped you process repositories 18.5x faster!

Report Bug β€’ Request Feature β€’ Documentation


Made with ⚑ by developers, for developers

About

Blazingly fast Git repository ingestion tool that transforms any repository into LLM-ready content

Topics

Resources

Stars

Watchers

Forks

Packages

No packages published

Languages