
A GitHub CLI extension for comprehensive repository auditing and migration planning
Collect detailed statistics about repositories, and organizations with intelligent rate limiting and resume capability.
- Quick Start
- Install
- Required Permissions
- Usage
- Examples
- Features
- Feature Flags: Controlling Data Collection
- Output Format
- Performance & Rate Limits
- Troubleshooting
- Architecture
- Building from Source
- Contributing
- Developer Guide
- License
# Install the extension
gh extension install mona-actions/gh-stats
# Collect stats for an organization
gh stats --org your-org-name
# Output: gh-stats-2025-10-30.jsonThat's it! The tool collects comprehensive statistics (organization metadata, repository details, activity metrics, security configs, packages) and saves them to a date-stamped JSON file.
See Output Format for complete data details and Feature Flags to reduce API usage by 30-70%.
gh extension install mona-actions/gh-statsgh extension upgrade gh-statsTo ensure your GitHub CLI has the required scopes:
# Check current authentication status
gh auth status
# Refresh authentication with required scopes
gh auth refresh --scopes repo,admin:org,admin:org_hook,read:packages,read:project,read:user,workflow,gistIf you need to authenticate for the first time:
gh auth login --scopes repo,admin:org,admin:org_hook,read:packages,read:project,read:user,workflow,gistCollect comprehensive statistics for repositories in one or more organizations:
Usage:
gh stats [flags]
Flags:
-o, --org string GitHub organization to analyze
-i, --input string File with list of orgs to analyze
-O, --output string Output file path (default: gh-stats-YYYY-MM-DD.json)
-w, --max-workers int Maximum number of concurrent API calls (default 3)
--graphql-batch-size int Repos per GraphQL batch query (1-50, default 20, higher = fewer API calls)
-f, --fail-fast Stop processing on first error
-v, --verbose Enable verbose output
-r, --resume Resume from existing output file, skipping already processed repositories
--dry-run Show what would be collected without making API calls (preview mode)
--minimal Collect only essential data (base GraphQL + org metadata, ~3-5 calls/repo)
-h, --help Display helpgh stats --org mona-actions# Create a file with organization names (one per line)
echo "mona-actions" > orgs.txt
echo "another-org" >> orgs.txt# Run analysis
gh stats --input orgs.txtSee what data would be collected without making any API calls:
gh stats --org mona-actions --dry-runThis shows:
- List of organizations to process
- Configuration summary
- Estimated API usage per repository
- Data points that will be collected
Use this to verify your configuration and estimate rate limit impact before starting.
Use --minimal for fast audits or when rate-limited:
gh stats --org mona-actions --minimalEnables all --no-* flags (see Feature Flags). Override specific flags as needed:
gh stats --org mona-actions --minimal --no-actions=falseThe tool automatically resumes from the existing date-stamped output file:
# Run analysis - automatically resumes if gh-stats-2025-10-30.json exists
gh stats --org mona-actionsThe tool detects existing data and skips already-processed repositories. Each day creates a new file with the current date.
gh stats --org mona-actions --output custom-stats.jsongh stats \
--input orgs.txt \
--output multi-org-stats.json \
--max-workers 5 \
--verbose \
--fail-fast- π Repository Statistics: Collects metrics including PRs, issues, releases, collaborators, branches, commits, and activity data
- π’ Organization Metadata: Gathers organization settings, security configuration, members, teams, and packages
- π¦ Package Information: Retrieves package counts and version information per repository
- π Multi-Organization Support: Process multiple organizations from a file or command line
- β‘ Concurrent Processing: Configurable worker pool (default 3 workers) for parallel API requests
- π Automatic Resume: Detects existing output and skips already-processed repositories
- π‘ Real-time Rate Limit Tracking: Displays remaining REST and GraphQL API quota during execution
- π Verbose Logging: Optional detailed output for debugging and monitoring
- π JSON Output: Structured JSON file with organizations, repositories, and packages
Use --minimal to disable all optional data collection (~1 call/repo vs ~30-40), or use individual --no-* flags for fine-tuned control:
# Minimal mode
gh stats --org my-org --minimal
# Minimal + Actions only
gh stats --org my-org --minimal --no-actions=falseBenefits:
- Managing API rate limits when working with large organizations
- Speeding up collection by skipping unnecessary data
- Reducing bandwidth for slow connections
These fields are always included in the GraphQL query because they're cheap (no pagination or just counts):
Repository Metadata:
- Name, URL, SSH URL, homepage URL, description
- Visibility, timestamps (created, updated, pushed)
- Size (disk usage), fork/archive/template status
- Boolean flags (isPrivate, isLocked, isMirror, isEmpty, isDisabled, etc.)
Engagement Metrics:
- Stars, forks, watchers (count only)
Feature Flags:
- hasWikiEnabled, hasIssuesEnabled, hasProjectsEnabled
- hasDiscussionsEnabled, hasVulnerabilityAlertsEnabled, hasSponsorshipsEnabled
Merge Settings:
- Merge options (squash, rebase, merge commit allowed)
- Auto-merge, delete branch on merge, web commit signoff
- Merge commit message/title formats
Counts (no data fetched - just totals):
- Branches, tags, issues (open/closed), pull requests (open/closed/merged)
- Labels, projects, discussions, commit comments, vulnerability alerts
Default Branch Info:
- Name, latest commit OID, last commit date
Additional Metadata:
- Security policy URL
All of the above data comes from a single GraphQL query per repository (or batched for multiple repos).
Use these flags to skip expensive GraphQL sections that fetch large amounts of nested data:
| Flag | Data Skipped | Impact | Use Case |
|---|---|---|---|
--no-collaborators |
Collaborators with permissions (up to 100 users) | High | Skip for public-only repos or privacy concerns |
--no-branch-protection |
Branch protection rules (up to 50 rules) | High | Skip if not auditing security policies |
--no-rulesets |
Repository rulesets (up to 50 rulesets) | Medium | Skip if not using rulesets |
--no-milestones |
Milestones with issue counts (up to 50) | High | Skip if not tracking project management |
--no-releases |
Releases with details (up to 50) | Medium | Skip if not analyzing release cadence |
--no-deploy-keys |
Deploy keys (up to 100 keys) | Medium | Skip if not auditing deployment access |
--no-environments |
Deployment environments (up to 50) | Low | Skip if not using GitHub Environments |
--no-deployments |
Deployment history (up to 50) | High | Skip if not tracking deployments |
--no-languages |
Language breakdown with sizes (up to 100) | Medium | Skip if not analyzing language distribution |
--no-topics |
Repository topics (up to 100) | Low | Skip if not using topics |
--no-license |
License information (name, key, SPDX ID, URL) | Low | Skip if license info not needed |
--no-community-files |
Code of conduct, contributing, CODEOWNERS | Low | Skip if not analyzing community health |
These flags skip additional REST API calls made after the base GraphQL query:
| Flag | Data Skipped | API Savings | Use Case |
|---|---|---|---|
--no-packages |
Package/dependency info | ~5-10 calls/repo | Skip if not using package registries |
--no-actions |
Actions workflows & runs | ~5 calls/repo | Skip if not using GitHub Actions |
--no-security |
Security alerts & configs | ~2-4 calls/repo | Skip if not analyzing security |
--no-settings |
Advanced repo settings | ~1 call/repo | Skip if only need basic metadata |
--no-custom-props |
Custom properties | ~1 call/repo | Skip if not using custom properties |
--no-branches |
Branch details (keeps counts) | ~1 call/repo | Skip detailed branch info |
--no-webhooks |
Webhook configurations | ~1 call/repo | Skip webhook details |
--no-autolinks |
Autolink references | ~1 call/repo | Skip autolink configs |
--no-pages |
GitHub Pages config | ~1 call/repo | Skip if not using Pages |
--no-issues-data |
Issue details (keeps counts) | ~1 call/repo | Skip detailed issue info |
--no-prs-data |
PR details (keeps counts) | ~1 call/repo | Skip detailed PR info |
--no-traffic |
Traffic/views statistics | ~2 calls/repo | Skip traffic metrics |
--no-tags |
Git tag details | ~1 call/repo | Skip tag information |
--no-git-refs |
Git references | ~1 call/repo | Skip git ref details |
--no-lfs |
Git LFS information | ~1 call/repo | Skip LFS details |
--no-files |
File tree information | ~1 call/repo | Skip file listings |
--no-contributors |
Contributor statistics | ~1 call/repo | Skip contributor data |
--no-commits |
Commit details (keeps counts) | ~1 call/repo | Skip detailed commit info |
--no-issue-events |
Issue event counts | ~1 call/repo | Skip issue event counting |
| Mode | Command | API Calls/Repo | Use Case |
|---|---|---|---|
| Minimal | --minimal |
~1 (batched GraphQL only) | Quick audits, rate limited |
| Medium | --no-packages --no-traffic --no-lfs --no-files --no-contributors --no-commits |
~20-25 | Balanced (saves ~40%) |
| Full | (default) | ~30-40 | Complete dataset |
All modes: ~3-5 org-level calls + per-repo calls + 1 batched GraphQL query per 50 repos.
The input file should contain one organization name per line. Lines starting with # are treated as comments:
# Organizations to analyze
mona-actions
another-org
# third-org # commented out
The tool generates a consolidated JSON file with a date-stamped filename (e.g., gh-stats-2025-10-29.json) containing three main sections:
{
"orgs": [
{
"login": "mona-actions",
"url": "https://github.com/mona-actions",
...
}
],
"repos": [
{
"org": "mona-actions",
"repo": "example-repo",
"url": "https://github.com/mona-actions/example-repo",
...
}
],
"packages": [
{
"org": "mona-actions",
"repo": "example-repo",
"packageName": "my-package",
...
}
]
}The orgs array is your organizational overview dashboard with rich metadata including:
- Identity & Branding: Organization name, login, description, URL, blog, Twitter username
- Access & Permissions: Default repository permissions, 2FA requirements, member/admin settings
- Collaboration Stats: Public/private repository counts, member counts, team structures, runners count
- Billing & Resources: Disk usage, private repo slots
- Security Posture: Security managers, commit signoff requirements, blocked users
- Advanced Features: GitHub Apps installations, webhooks, custom repository properties, Actions secrets/variables, organization rulesets
Perfect for auditing organizational health, planning migrations, and compliance reporting! π’
The repos array contains everything you need to know about your repositories in meticulous detail:
- Core Metadata: Name, full name, URL (HTML/Git/SSH), description, homepage, primary language
- Repository Status: Fork status, archived state, disabled state, template designation, visibility (public/private/internal)
- Size & Activity: Disk size (KB), star/fork/watcher counts
- Timestamps & Freshness: Creation date, last update, last push dates
- Collaboration Metrics: Collaborator counts with detailed permissions, team access mappings
- Branch Management: Total branches, protected branches, branch protection rules, rulesets, deploy keys
- Issue & PR Analytics: Issue counts (open/closed), PR counts (open/closed/merged), PR merge time averages, issue event counts, commit comment counts
- Project Management: Milestones with progress tracking, projects count, discussions count, labels count
- Release & Versioning: Release counts with details, tag counts, latest release information
- CI/CD & Automation: GitHub Actions workflows, secrets/variables counts, runners, cache usage, deployment environments, deployment history
- Security & Compliance: Code scanning alerts, secret scanning alerts, Dependabot alerts, security advisories
- Configuration Deep-Dive: Wiki enabled, issues enabled, projects enabled, discussions enabled, merge strategies (squash/merge/rebase), auto-merge settings, delete branch on merge, web commit signoff, forking permissions
- Traffic & Engagement: Views, clones, referrers with daily breakdowns (requires push access)
- Advanced Features: Custom properties, autolink references, webhooks, GitHub Pages configuration, Git LFS enabled status, community profile health, license information, topics, language breakdown
Whether you're planning a migration, conducting an audit, or optimizing your workflowsβit's all here! πβ¨
The packages array delivers comprehensive package registry insights including:
- Package Identity: Package name, package type (npm, Maven, Docker, NuGet, RubyGems, etc.)
- Visibility & Access: Public vs private visibility, organizational ownership
- Version History: Version count (when available from API)
- Repository Linkage: Associated repository (or "unassigned" for org-level packages)
- Lifecycle Tracking: Creation timestamp, last update timestamp
Essential for dependency audits, security reviews, and understanding your artifact landscape! π¦π
Note: Some package types may not provide version count information through the GitHub API, so this field may be empty for certain packages.
GitHub enforces two separate rate limits:
- REST API: 5,000-15,000 calls/hour (depends on token type: 5k for personal access tokens, 15k for OAuth/GitHub App tokens)
- GraphQL API: 5,000 points/hour (queries cost variable points based on complexity)
The tool tracks both limits separately and displays them in real-time:
Processing 32 repositories... [REST: 64/15000 - GQL: 39/5000] [32/32] 100% | 13s
Base GraphQL: 1 batched query per 50 repositories (fetches base metadata for all 50 repos in single call)
Per-repository processing: ~30-40 REST API calls + 1 GraphQL call for expensive fields (varies by features enabled and repository size)
| Repos | Total API Calls | Time with 3 workers (default) |
|---|---|---|
| 100 | ~3,100-4,100 | 15-25 minutes |
| 1,000 | ~31,000-41,000 | 2-5 hours |
Rate limit constraints (assuming 80% REST / 20% GraphQL split):
- 5k REST / 5k GraphQL: ~125-156 repos/hour maximum
- 15k REST / 5k GraphQL: ~139-178 repos/hour (GraphQL bottleneck)
Use --minimal or individual --no-* flags (see Feature Flags) to skip unnecessary data
- Default: 3 workers (good balance)
- Conservative: 1-2 workers (safest, slower)
- Aggressive: 4-5 workers (faster, may hit rate limits)
gh stats --org my-org -w 5Note: Higher concurrency can trigger rate limiting faster. Monitor the progress bar for rate limit status.
The tool writes each repository immediately to disk. If interrupted, just re-run the same commandβit automatically detects existing data and skips already-processed repositories.
The tool is optimized to minimize API calls:
- Bulk queries: Fetch multiple data points in single GraphQL requests
- Count-only fields: Use
first: 0to get counts without transferring data - Smart batching: Combine related queries to reduce round trips (default: 20 repos per batch)
- Incremental writes: Stream output to prevent memory bloat
Adjust GraphQL batch size for your organization size and rate limit constraints:
# Smaller batches (more API calls, but smaller responses)
gh stats --org my-org --graphql-batch-size 10
# Larger batches (fewer API calls, best for large orgs with available quota)
gh stats --org my-org --graphql-batch-size 50The tool provides real-time visibility into API consumption:
During execution:
Processing 32 repositories... [REST: 64/15000 - GQL: 39/5000] [32/32] 100% | 13s
At completion:
INFO REST: 64/15000 used (14936 remaining, resets at 01:46:48)
INFO GraphQL: 40/5000 used (4960 remaining, resets at 02:03:54)
SUCCESS β Complete! Processed 32/32 repos | REST API: 14 calls | GraphQL API: 38 calls
Progress updates: Displays rate limit status every 50 API calls during large operations
If you encounter rate limit errors:
- Check reset time:
gh api rate_limit - Reduce data collection: Use
--no-*flags to skip optional data - Lower concurrency: Use
-w 1or-w 2to slow down API consumption - Wait and resume: Rate limits reset hourly; the tool auto-resumes where it left off
- Run during off-peak hours: Better API response times and availability
β Do:
- Use feature flags for large organizations (>100 repos)
- Monitor the progress bar for rate limit status
- Let batches complete before interrupting
- Use resume capability for multi-hour runs
- Start with default 3 workers, adjust based on rate limit feedback
β Don't:
- Set workers >5 without monitoring rate limits
- Collect unnecessary data (use flags strategically)
- Interrupt frequently (hurts efficiency)
Symptom: 401 or 403 errors
Solution:
- Verify GitHub CLI authentication:
gh auth status - Re-authenticate if needed:
gh auth login - Ensure token has required scopes:
repo,read:org,read:packages
Symptom: Some repositories show warnings or missing data
Solution:
- Check the output for specific error messages
- Verify you have access to all repositories in the organization
- Some fields may be unavailable if repositories are archived or have features disabled
See Performance & Rate Limits section for:
- Understanding rate limits and monitoring
- Optimization strategies with feature flags
- Timing estimates and troubleshooting guidance
Enable detailed output for debugging (use the -v or --verbose flag):
gh stats --org my-org --verboseThis shows:
- Individual API calls being made
- Rate limit status after each major operation
- Detailed error messages for failures
- Progress for each repository processed
If you encounter issues:
- Check the GitHub Issues for similar problems
- Enable verbose mode and include the output in your issue report
- Include your gh-stats version:
gh extension list - Describe your environment (OS, GitHub CLI version, organization size)
gh stats uses a layered architecture with a worker pool for concurrent data collection from GitHub's REST and GraphQL APIs:
graph TB
subgraph CLI["π₯οΈ CLI Layer"]
direction TB
MAIN["<b>Entry Point</b><br/>main.go<br/><i>Application bootstrap</i>"]
CMD["<b>Command Interface</b><br/>cmd/root.go<br/>cmd/run.go<br/><i>Flag parsing & validation</i>"]
MAIN --> CMD
end
subgraph ORCH["βοΈ Orchestration Layer"]
direction TB
PROC["<b>Main Coordinator</b><br/>stats/processor.go<br/><i>Workflow orchestration</i>"]
CONC["<b>Worker Pool</b><br/>stats/concurrency.go<br/><i>Parallel processing (default 3 workers)</i>"]
VALID["<b>Input Validator</b><br/>stats/validation.go<br/><i>Org & config validation</i>"]
end
subgraph API["π API Layer"]
direction TB
CORE["<b>Core Infrastructure</b><br/>ghapi/core_*.go<br/><i>Rate limiting, retry logic, REST/pagination helpers</i>"]
GRAPHQL["<b>GraphQL Operations</b><br/>ghapi/graphql_*.go<br/><i>Two-step batching strategy:<br/>1. GetBatchedRepoBasicDetails (50 repos/batch)<br/>2. ProcessRepoWithBasicData (per-repo enrichment)</i>"]
ORG["<b>Organization Operations</b><br/>ghapi/orgs_*.go<br/><i>Org metadata, packages, teams</i>"]
REPO["<b>Repository Operations</b><br/>ghapi/repos_*.go<br/><i>Repo metadata, settings, access control</i>"]
end
subgraph STATE["π State Management"]
direction TB
GLOBAL["<b>Global State</b><br/>state/state.go<br/><i>Shared state & synchronization</i>"]
RATE["<b>Rate Limit Tracker</b><br/><i>REST: 5k-15k/hr (token-dependent)<br/>GraphQL: 5k pts/hr</i>"]
PROGRESS["<b>Progress Display</b><br/><i>Real-time UI updates (pterm)</i>"]
end
subgraph OUTPUT["πΎ Output Layer"]
direction TB
MODELS["<b>Data Models</b><br/>output/models.go<br/><i>Org, Repo, Package structs</i>"]
JSON["<b>JSON Writer</b><br/>output/json.go<br/><i>Incremental file writes</i>"]
end
subgraph EXT["βοΈ External Dependencies"]
direction TB
GHCLI["<b>GitHub CLI</b><br/><i>gh api - handles REST & GraphQL<br/>Rate limits: REST 5k-15k/hr (token-dependent), GraphQL 5k pts/hr</i>"]
end
CMD --> PROC
PROC --> VALID
PROC --> ORG
PROC --> CONC
CONC --> GRAPHQL
ORG --> CORE
ORG --> GHCLI
GRAPHQL --> GHCLI
GRAPHQL --> REPO
REPO --> CORE
CORE --> GHCLI
PROC --> GLOBAL
CONC --> GLOBAL
ORG --> GLOBAL
GRAPHQL --> GLOBAL
REPO --> GLOBAL
GLOBAL --> RATE
GLOBAL --> PROGRESS
CONC --> JSON
JSON --> MODELS
classDef cliStyle fill:#e3f2fd,stroke:#1976d2,stroke-width:2px,color:#000
classDef orchStyle fill:#fff3e0,stroke:#f57c00,stroke-width:2px,color:#000
classDef apiStyle fill:#f3e5f5,stroke:#7b1fa2,stroke-width:2px,color:#000
classDef stateStyle fill:#e8f5e9,stroke:#388e3c,stroke-width:2px,color:#000
classDef outputStyle fill:#fff9c4,stroke:#f9a825,stroke-width:2px,color:#000
classDef extStyle fill:#ffebee,stroke:#c62828,stroke-width:2px,color:#000
class MAIN,CMD cliStyle
class PROC,CONC,VALID orchStyle
class CORE,GRAPHQL,ORG,REPO apiStyle
class GLOBAL,RATE,PROGRESS stateStyle
class MODELS,JSON outputStyle
class GHCLI extStyle
- Input Processing: CLI parses flags and validates organization names or input files
- Rate Limit Check: Verifies sufficient API quota before starting
- Organization Loop: Processes each organization sequentially
- Batch GraphQL Fetching: Fetches base repository data for batches of repositories (default 50 per batch) using single GraphQL queries
- Parallel Repository Enrichment: Worker pool (default 3 workers) processes individual repositories to fetch REST API data and expensive GraphQL fields
- Incremental Persistence: Each repository's data is written to disk immediately upon completion
- Rate Limit Monitoring: Continuously tracks both REST and GraphQL limits
- Resume Capability: Automatically skips already-processed repositories on restart
- Worker Pool: Semaphore-based concurrency control (default: 3 workers)
- Dual Rate Limiting: Separate tracking for REST and GraphQL APIs
- Incremental Output: Append-only writes prevent data loss on interruption
- Error Resilience: Non-critical errors logged as warnings, processing continues
- Context Propagation: Graceful cancellation with Ctrl-C support
| Package | Responsibility | Key Files |
|---|---|---|
cmd |
CLI interface and command routing | root.go, run.go |
stats |
Orchestration and parallel processing | processor.go, concurrency.go, validation.go |
ghapi |
GitHub API interaction layer | Core: core_*.go (5 files)GraphQL: graphql_*.go (4 files)Organizations: orgs_*.go (3 files)Repositories: repos_*.go (2 files) |
state |
Global state and rate limit tracking | state.go |
output |
Data serialization and file I/O | json.go, models.go, formatting.go |
# Clone the repository
git clone https://github.com/mona-actions/gh-stats.git
cd gh-stats
# Install dependencies
go mod download
# Build the extension
go build -o gh-stats
# Run tests
go test ./...
go test -race ./...
# Install golangci-lint v2.x (required for linting)
go install github.com/golangci/golangci-lint/cmd/golangci-lint@v2.6.0
# Run linter
golangci-lint run
# Install as a GitHub CLI extension
gh extension install .We welcome contributions! Please see CONTRIBUTING.md for detailed guidelines and our Developer Guide for technical documentation.
- Fork and clone the repository
- Create a feature branch:
git checkout -b feature/your-feature - Make your changes following our Developer Guide
- Add tests for new functionality
- Update documentation (godoc comments, README)
- Run tests:
go test ./...andgo test -race ./... - Submit a pull request with a clear description
This project is licensed under the MIT License - see the LICENSE file for details.
Built with β€οΈ by the
Expert Services Team