A powerful CLI tool that combines PR comments mining and automated documentation generation for GitHub repositories. Extract coding guidelines from PR reviews and generate comprehensive documentation using AWS Bedrock LLM.
- Extract and analyze PR review comments from GitHub repositories
- Generate LLM-friendly coding guidelines from expert reviews
- Classify comments by type (code standards, discussions, general)
- Support for checkpoint-based processing with resume capability
- Automatically generate comprehensive documentation for codebases
- Support for JavaScript, TypeScript, JSX, and TSX files
- Dependency graph analysis and visualization
- AWS Bedrock LLM-powered intelligent documentation
- Compressed SKF format for efficient AI parsing
- Automatic GitHub repository cloning
- Support for both local and remote repositories
- Intelligent file discovery and parsing
- Clean temporary file management
- Python 3.8+
- Node.js and npm (for JavaScript/TypeScript parsing)
- AWS CLI configured with Bedrock access
- Git
git clone <repository-url>
cd LLM-AutoDoc
pip install -e .pip install -r requirements.txt# GitHub Access
export GITHUB_TOKEN="your_github_token"
# AWS Configuration
export AWS_PROFILE="qa" # or your preferred profile
export AWS_REGION="us-east-1"
export BEDROCK_MODEL_ID="us.anthropic.claude-3-5-sonnet-20241022-v2:0"
# Optional: Custom settings
export BEDROCK_TEMPERATURE="0.1"
export BEDROCK_MAX_TOKENS="4000"- Configure AWS credentials:
aws configure - Request access to Claude models in AWS Bedrock console
- Ensure your region supports Bedrock (us-east-1 recommended)
The unified tool provides several commands for different use cases:
Set up all required dependencies and validate configuration:
# Complete environment setup
autodoc setup
# Run diagnostics to check current setup
autodoc setup --diagnosticsExtract coding guidelines from PR review comments:
# Generate guidelines from top 10 PRs
autodoc generate https://github.com/owner/repo -k 10
# Custom output file
autodoc generate https://github.com/owner/repo -k 5 --output my-guidelines.txt
# Resume interrupted processing
autodoc generate https://github.com/owner/repo -k 10 --resumeGet repositories with most review activity:
# Top 5 PRs by comment count
autodoc top https://github.com/owner/repo -k 5
# JSON output format
autodoc top https://github.com/owner/repo -k 10 --format jsonGet detailed information about a specific PR:
autodoc pr https://github.com/owner/repo/pull/123Analyze and classify PR comments:
autodoc classify https://github.com/owner/repo -k 5 --output analysis.txt# Basic documentation generation
autodoc document /path/to/local/repo
# With custom output file
autodoc document /path/to/local/repo --output my-docs.md
# Generate both full and compressed versions
autodoc document /path/to/local/repo --compress# Clone and generate documentation
autodoc document https://github.com/owner/repo
# Keep cloned repository after processing
autodoc document https://github.com/owner/repo --keep-clone
# Generate compressed documentation
autodoc document https://github.com/owner/repo --compress--token: GitHub personal access token--quiet: Reduce verbose output
Generate LLM-friendly coding guidelines from PR comments.
Arguments:
repo_url: GitHub repository URL-k: Number of top PRs to analyze (default: 5)--output: Output file name (auto-generated if not provided)--resume: Resume from checkpoint--checkpoint-dir: Checkpoint directory (default: .checkpoints)
Generate comprehensive documentation for a repository.
Arguments:
repo_path: Repository path (local directory or GitHub URL)--output: Output documentation file name (default: documentation.md)--compress: Also generate compressed SKF format--keep-clone: Keep cloned repository after processing
Fetch top PRs by comment count.
Arguments:
repo_url: GitHub repository URL-k: Number of top PRs to fetch (default: 5)--format: Output format (text/json, default: text)
Fetch detailed information about a specific PR.
Arguments:
pr_url: GitHub PR URL--format: Output format (text/json, default: text)
Classify PR comments using Bedrock.
Arguments:
repo_url: GitHub repository URL-k: Number of top PRs to analyze (default: 5)--output: Output file name (default: pr_analysis.txt)--resume: Resume from checkpoint--checkpoint-dir: Checkpoint directory
# 1. Generate coding guidelines from PR comments
autodoc generate https://github.com/facebook/react -k 15 --output react-guidelines.txt
# 2. Generate comprehensive documentation
autodoc document https://github.com/facebook/react --compress --output react-docs.md
# 3. Analyze top PRs for insights
autodoc top https://github.com/facebook/react -k 10 --format json > react-top-prs.json# Generate documentation for current project
autodoc document . --compress
# Analyze your team's PR patterns
autodoc generate https://github.com/yourorg/yourproject -k 20- Markdown (.md): Human-readable comprehensive documentation
- SKF (.skf.txt): Compressed format optimized for AI/LLM consumption
- Text (.txt): LLM-friendly coding guidelines
- JSON: Structured data for programmatic use
- JavaScript (.js)
- TypeScript (.ts)
- JSX (.jsx)
- TSX (.tsx)
- JavaScript/TypeScript: Node.js and npm
# Check AWS configuration
aws sts get-caller-identity
# Verify Bedrock access
aws bedrock list-foundation-models --region us-east-1# Use personal access token
export GITHUB_TOKEN="your_token_here"
# Check rate limit status
curl -H "Authorization: token $GITHUB_TOKEN" https://api.github.com/rate_limit# Setup Node.js dependencies
cd LLM-AutoDoc/parsers && npm install
- "AWS credentials not found": Configure AWS CLI or set environment variables
- "Cannot connect to AWS Bedrock": Check region and model access permissions
- "Node.js parser not set up": Run
npm installin parsers directory - "Repository not found": Verify repository URL and access permissions