A comprehensive Python package for analyzing Git repositories to generate developer productivity insights without requiring external project management tools. Extract actionable metrics directly from Git history with ML-enhanced commit categorization, automated developer identity resolution, and professional reporting.
- π Zero Dependencies: Analyze productivity without requiring JIRA, Linear, or other PM tools
- π§ ML-Powered Intelligence: Advanced commit categorization with 85-95% accuracy
- π₯ Smart Identity Resolution: Automatically consolidate developer identities across email addresses
- π’ Enterprise Ready: Organization-wide repository discovery with intelligent caching
- π Professional Reports: Rich markdown narratives and CSV exports for executive dashboards
Get up and running in 5 minutes:
# 1. Install GitFlow Analytics
pip install gitflow-analytics
# 2. Install ML dependencies (optional but recommended)
python -m spacy download en_core_web_sm
# 3. Create a simple configuration
echo 'version: "1.0"
github:
token: "${GITHUB_TOKEN}"
organization: "your-org"' > config.yaml
# 4. Set your GitHub token
echo 'GITHUB_TOKEN=ghp_your_token_here' > .env
# 5. Run analysis
gitflow-analytics -c config.yaml --weeks 8What you get:
- π Weekly metrics CSV with developer productivity trends
- π₯ Developer profiles with project distribution and work styles
- π Untracked work analysis with ML-powered categorization
- π Executive summary with actionable insights
- π Rich markdown report ready for stakeholders
## Executive Summary
- **Total Commits**: 156 across 3 projects
- **Active Developers**: 5 team members
- **Ticket Coverage**: 73.2% (industry benchmark: 60-80%)
- **Top Contributor**: Sarah Chen (32 commits, FRONTEND focus)
## Key Insights
π― **High Productivity**: Team averaged 31 commits/week
π **Balanced Workload**: No single developer >40% of total work
β
**Good Process**: 73% ticket coverage shows strong tracking- π Two-Step Processing: Optimized fetch-then-classify workflow for better performance
- π° Cost Tracking: Monitor LLM API usage with detailed token and cost reporting
- β‘ Smart Caching: Intelligent caching reduces analysis time by up to 90%
- π Automatic Updates: Repositories automatically fetch latest commits before analysis
- π Weekly Trends: Track classification pattern changes over time
- π― Enhanced Categorization: All commits properly categorized with confidence scores
π Analysis & Insights
- Multi-repository analysis with intelligent project grouping
- ML-enhanced commit categorization (85-95% accuracy)
- Developer productivity metrics and work pattern analysis
- Story point extraction from commits and PRs
- Ticket tracking across JIRA, GitHub, ClickUp, and Linear
π’ Enterprise Features
- Organization-wide repository discovery from GitHub
- Automated developer identity resolution and consolidation
- Database-backed caching for sub-second report generation
- Data anonymization for secure external sharing
- Batch processing optimized for large repositories
π Professional Reporting
- Rich markdown narratives with executive summaries
- Weekly CSV exports with trend analysis
- Customizable output formats and filtering
- Performance benchmarking and team comparisons
Comprehensive guides for every use case:
| Getting Started | Advanced Usage | Integration |
|---|---|---|
| Installation | Complete Configuration | CLI Reference |
| 5-Minute Tutorial | ML Categorization | JSON Export Schema |
| First Analysis | Enterprise Setup | CI Integration |
π― Quick Links:
- π Documentation Hub - Complete guide index
- π Quick Start - Get running in 5 minutes
- βοΈ Configuration - Full reference
- π€ Contributing - Join the project
pip install gitflow-analyticspip install gitflow-analytics
python -m spacy download en_core_web_smgit clone https://github.com/bobmatnyc/gitflow-analytics.git
cd gitflow-analytics
pip install -e ".[dev]"
python -m spacy download en_core_web_sm# config.yaml
version: "1.0"
github:
token: "${GITHUB_TOKEN}"
organization: "your-org" # Auto-discovers all repositories
analysis:
ml_categorization:
enabled: true
min_confidence: 0.7# config.yaml
version: "1.0"
github:
token: "${GITHUB_TOKEN}"
repositories:
- name: "my-app"
path: "~/code/my-app"
github_repo: "myorg/my-app"
project_key: "APP"# .env (same directory as config.yaml)
GITHUB_TOKEN=ghp_your_token_here# Analyze last 8 weeks
gitflow-analytics -c config.yaml --weeks 8
# With custom output directory
gitflow-analytics -c config.yaml --weeks 8 --output ./reportsπ‘ Need more configuration options? See the Complete Configuration Guide for advanced features, integrations, and customization.
GitFlow Analytics can exclude merge commits from filtered line count calculations, following DORA metrics best practices.
Merge commits represent repository management, not original development work:
- Average merge commit: 236.6 filtered lines vs 30.8 for regular commits (7.7x higher)
- Merge commits can skew productivity metrics and velocity calculations
- DORA metrics best practice: Focus on original development work, not repository management
Add this setting to your analysis configuration:
analysis:
# Exclude merge commits from filtered line counts (DORA metrics best practice)
exclude_merge_commits: true # Default: falseReal metrics from EWTN dataset analysis:
| Metric | With Merge Commits | Without Merge Commits | Change |
|---|---|---|---|
| Total Filtered Lines | 138,730 | 54,808 | -60% |
| Merge Commits | 355 commits | 355 commits | (excluded from line counts) |
| Regular Commits | 1,426 commits | 1,426 commits | (unchanged) |
When exclude_merge_commits: true:
β
Filtered Stats: Merge commits (2+ parents) have filtered_insertions = 0 and filtered_deletions = 0
β
Raw Stats: Always preserved for all commits (accurate commit counts)
β
Reports: Line count metrics reflect only original development work
β Not affected: Commit counts, developer activity tracking, ticket references
β Enable when:
- You want DORA-compliant metrics for productivity tracking
- Your workflow uses merge commits for pull requests
- You need accurate developer velocity without repository overhead
- You're comparing metrics across teams with different merge strategies
β Disable when:
- You want to track all repository activity including management overhead
- Merge commits represent significant manual conflict resolution in your workflow
- You're analyzing repositories without merge-heavy workflows
- You need to measure total repository churn including merges
# Full configuration example
analysis:
weeks_back: 8
include_weekends: true
# DORA-compliant metrics: exclude merge commits
exclude_merge_commits: true
# Analyze ALL branches to capture feature branch work
branch_patterns:
- "*" # Include all branches (feature, develop, hotfix, etc.)π‘ Pro Tip: Combine
exclude_merge_commits: truewithbranch_patterns: ["*"]to analyze all development work without merge overhead.
GitFlow Analytics generates comprehensive reports for different audiences:
- weekly_metrics.csv - Developer productivity trends by week
- weekly_velocity.csv - Lines-per-story-point velocity analysis
- developers.csv - Complete team profiles and statistics
- summary.csv - Project-wide statistics and benchmarks
- untracked_commits.csv - ML-categorized uncommitted work analysis
- narrative_summary.md - Rich markdown report with:
- Executive summary with key metrics
- Team composition and work distribution
- Project activity breakdown
- Development patterns and recommendations
- Weekly trend analysis
## Executive Summary
- **Total Commits**: 324 commits across 4 projects
- **Active Developers**: 8 team members
- **Ticket Coverage**: 78.4% (above industry benchmark)
- **Top Areas**: Frontend (45%), API (32%), Infrastructure (23%)
## Key Insights
β
**Strong Process Adherence**: 78% ticket coverage
π― **Balanced Team**: No developer >35% of total work
π **Growth Trend**: +15% productivity vs last quarterπ₯ Team Lead Dashboard
- Track individual developer productivity and growth
- Identify workload distribution and potential burnout
- Monitor code quality trends and technical debt
π Engineering Management
- Generate executive reports on team velocity
- Analyze process adherence and ticket coverage
- Benchmark performance across projects and quarters
π Process Optimization
- Identify untracked work patterns that should be formalized
- Optimize developer focus and reduce context switching
- Improve estimation accuracy with historical data
π’ Enterprise Analytics
- Organization-wide repository analysis across dozens of projects
- Automated identity resolution for large, distributed teams
- Cost-effective analysis without expensive PM tool dependencies
# Analyze repositories (default command)
gitflow-analytics -c config.yaml --weeks 12 --output ./reports
# Explicit analyze command (backward compatibility)
gitflow-analytics analyze -c config.yaml --weeks 12 --output ./reports
# Show cache statistics
gitflow-analytics cache-stats -c config.yaml
# List known developers
gitflow-analytics list-developers -c config.yaml
# Analyze developer identities
gitflow-analytics identities -c config.yaml
# Merge developer identities
gitflow-analytics merge-identity -c config.yaml dev1_id dev2_id
# Discover story point fields in your PM platform
gitflow-analytics discover-storypoint-fields -c config.yaml--weeks, -w: Number of weeks to analyze (default: 12)--output, -o: Output directory for reports (default: ./reports)--anonymize: Anonymize developer information--no-cache: Disable caching for fresh analysis--clear-cache: Clear cache before analysis--validate-only: Validate configuration without running--skip-identity-analysis: Skip automatic identity analysis--apply-identity-suggestions: Apply identity suggestions without prompting
Here's a complete example showing .env file and corresponding YAML configuration:
# GitHub Configuration
GITHUB_TOKEN=ghp_xxxxxxxxxxxxxxxxxxxx
GITHUB_ORG=your-organization
# PM Platform Configuration
JIRA_ACCESS_USER=developer@company.com
JIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx
LINEAR_API_KEY=lin_api_xxxxxxxxxxxx
CLICKUP_API_TOKEN=pk_xxxxxxxxxxxx
# Note: GitHub Issues uses GITHUB_TOKEN automaticallyversion: "1.0"
# GitHub configuration with organization discovery
github:
token: "${GITHUB_TOKEN}"
organization: "${GITHUB_ORG}"
# Multi-platform PM integration
pm:
jira:
access_user: "${JIRA_ACCESS_USER}"
access_token: "${JIRA_ACCESS_TOKEN}"
base_url: "https://company.atlassian.net"
linear:
api_key: "${LINEAR_API_KEY}"
team_ids: ["team_123abc"] # Optional: filter by specific teams
clickup:
api_token: "${CLICKUP_API_TOKEN}"
workspace_url: "https://app.clickup.com/12345/v/"
# JIRA story point integration (optional)
jira_integration:
enabled: true
fetch_story_points: true
story_point_fields:
- "Story point estimate" # Your field name
- "customfield_10016" # Fallback field ID
# Analysis configuration
analysis:
# Track tickets from all configured platforms
ticket_platforms:
- jira
- linear
- clickup
- github # GitHub Issues (uses GITHUB_TOKEN)
# Exclude bot commits and boilerplate files
exclude:
authors:
- "dependabot[bot]"
- "renovate[bot]"
paths:
- "**/node_modules/**"
- "**/*.min.js"
- "**/package-lock.json"
# Developer identity consolidation
identity:
similarity_threshold: 0.85
manual_mappings:
- name: "John Doe"
primary_email: "john.doe@company.com"
aliases:
- "jdoe@oldcompany.com"
- "john@personal.com"
# Output configuration
output:
directory: "./reports"
formats:
- csv
- markdownThe tool generates comprehensive CSV reports and markdown summaries:
-
Weekly Metrics (
weekly_metrics_YYYYMMDD.csv)- Week-by-week developer productivity
- Story points, commits, lines changed
- Ticket coverage percentages
- Per-project breakdown
-
Weekly Velocity (
weekly_velocity_YYYYMMDD.csv)- Lines of code per story point analysis
- Efficiency trends and velocity patterns
- PR-based vs commit-based story points breakdown
- Team velocity benchmarking and week-over-week trends
-
Summary Statistics (
summary_YYYYMMDD.csv)- Overall project statistics
- Platform-specific ticket counts
- Top contributors
-
Developer Report (
developers_YYYYMMDD.csv)- Complete developer profiles
- Total contributions
- Identity aliases
-
Untracked Commits Report (
untracked_commits_YYYYMMDD.csv)- Detailed analysis of commits without ticket references
- Commit categorization (bug_fix, feature, refactor, documentation, maintenance, test, style, build)
- Enhanced metadata: commit hash, author, timestamp, project, message, file/line changes
- Configurable file change threshold for filtering significant commits
The untracked commits report provides deep insights into work that bypasses ticket tracking:
CSV Columns:
commit_hash/short_hash: Full and abbreviated commit identifiersauthor/author_email/canonical_id: Developer identification (with anonymization support)date: Commit timestampproject: Project key for multi-repository analysismessage: Commit message (truncated for readability)category: Automated categorization of work typefiles_changed/lines_added/lines_removed/lines_changed: Change metricsis_merge: Boolean flag for merge commits
Automatic Categorization:
- Feature: New functionality development (
add,new,implement,create) - Bug Fix: Error corrections (
fix,bug,error,resolve,hotfix) - Refactor: Code restructuring (
refactor,optimize,improve,cleanup) - Documentation: Documentation updates (
doc,readme,comment,guide) - Maintenance: Routine upkeep (
update,upgrade,dependency,config) - Test: Testing-related changes (
test,spec,mock,fixture) - Style: Formatting changes (
format,lint,prettier,whitespace) - Build: Build system changes (
build,compile,ci,docker)
-
Narrative Summary (
narrative_summary_YYYYMMDD.md)- Executive Summary: High-level metrics and team overview
- Team Composition: Developer profiles with project percentages and work patterns
- Project Activity: Detailed breakdown by project with contributor percentages and commit classifications
- Development Patterns: Key insights from productivity and collaboration analysis
- Pull Request Analysis: PR metrics including size, lifetime, and review activity
- Weekly Trends (v1.1.0+): Week-over-week changes in classification patterns
-
Database-Backed Qualitative Report (
database_qualitative_report_YYYYMMDD.md) (v1.1.0+)- Generated directly from SQLite storage for fast retrieval
- Includes weekly trend analysis per developer/project
- Shows classification changes over time (e.g., "Features: +15%, Bug Fixes: -5%")
- Issue Tracking: Platform usage and coverage analysis with simplified display
- Enhanced Untracked Work Analysis: Comprehensive categorization with dual percentage metrics
- PM Platform Integration: Story point tracking and correlation insights (when available)
- Recommendations: Actionable insights based on analysis patterns
The narrative report provides comprehensive insights through multiple detailed sections:
- Developer Profiles: Individual developer statistics with commit counts
- Project Distribution: Shows ALL projects each developer works on with precise percentages
- Work Style Classification: Categorizes developers as "Focused", "Multi-project", or "Highly Focused"
- Activity Patterns: Identifies time patterns like "Standard Hours" or "Extended Hours"
Example developer profile:
**John Developer**
- Commits: 15
- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)
- Work Style: Focused
- Active Pattern: Standard Hours- Activity by Project: Commits and percentage of total activity per project
- Contributor Breakdown: Shows each developer's contribution percentage within each project
- Lines Changed: Quantifies the scale of changes per project
- Platform Usage: Clean display of ticket platform distribution (JIRA, GitHub, etc.)
- Coverage Analysis: Percentage of commits that reference tickets
- Enhanced Untracked Work Analysis: Detailed categorization and recommendations
The enhanced untracked work analysis provides two key percentage metrics for better context:
- Percentage of Total Untracked Work: Shows how much each developer contributes to the overall untracked work pool
- Percentage of Developer's Individual Work: Shows what proportion of a specific developer's commits are untracked
Example interpretation:
- John Doe: 25 commits (40% of untracked, 15% of their work) - maintenance, style
This means:
- John contributed 25 untracked commits
- These represent 40% of all untracked commits in the analysis period
- Only 15% of John's total work was untracked (85% was properly tracked)
- Most untracked work was maintenance and style changes (acceptable categories)
Process Insights:
- High "% of untracked" + low "% of their work" = Developer doing most of the acceptable maintenance work
- Low "% of untracked" + high "% of their work" = Developer needs process guidance
- High percentages in feature/bug_fix categories = Process improvement opportunity
commit_hash,short_hash,author,author_email,canonical_id,date,project,message,category,files_changed,lines_added,lines_removed,lines_changed,is_merge
a1b2c3d4e5f6...,a1b2c3d,John Doe,john@company.com,ID0001,2024-01-15 14:30:22,FRONTEND,Update dependency versions for security patches,maintenance,2,45,12,57,false
f6e5d4c3b2a1...,f6e5d4c,Jane Smith,jane@company.com,ID0002,2024-01-15 09:15:10,BACKEND,Fix typo in error message,bug_fix,1,1,1,2,false
9876543210ab...,9876543,Bob Wilson,bob@company.com,ID0003,2024-01-14 16:45:33,FRONTEND,Add JSDoc comments to utility functions,documentation,3,28,0,28,false
# GitFlow Analytics Report
**Generated**: 2025-08-04 14:27:47
**Analysis Period**: Last 4 weeks
## Executive Summary
- **Total Commits**: 35
- **Active Developers**: 3
- **Lines Changed**: 910
- **Ticket Coverage**: 71.4%
- **Active Projects**: FRONTEND, SERVICE_TS, SERVICES
- **Top Contributor**: John Developer with 15 commits
## Team Composition
### Developer Profiles
**John Developer**
- Commits: 15
- Projects: FRONTEND (85.0%), SERVICE_TS (15.0%)
- Work Style: Focused
- Active Pattern: Standard Hours
**Jane Smith**
- Commits: 12
- Projects: SERVICE_TS (70.0%), FRONTEND (30.0%)
- Work Style: Multi-project
- Active Pattern: Extended Hours
## Project Activity
### Activity by Project
**FRONTEND**
- Commits: 14 (50.0% of total)
- Lines Changed: 450
- Contributors: John Developer (71.4%), Jane Smith (28.6%)
**SERVICE_TS**
- Commits: 8 (28.6% of total)
- Lines Changed: 280
- Contributors: Jane Smith (100.0%)
## Issue Tracking
### Platform Usage
- **Jira**: 15 tickets (60.0%)
- **Github**: 8 tickets (32.0%)
- **Clickup**: 2 tickets (8.0%)
### Untracked Work Analysis
**Summary**: 10 commits (28.6% of total) lack ticket references.
#### Work Categories
- **Maintenance**: 4 commits (40.0%), avg 23 lines *(acceptable untracked)*
- **Bug Fix**: 3 commits (30.0%), avg 15 lines *(should be tracked)*
- **Documentation**: 2 commits (20.0%), avg 12 lines *(acceptable untracked)*
#### Top Contributors (Untracked Work)
- **John Developer**: 1 commits (50.0% of untracked, 6.7% of their work) - *refactor*
- **Jane Smith**: 1 commits (50.0% of untracked, 8.3% of their work) - *style*
#### Recommendations for Untracked Work
π― **Excellent tracking**: Less than 20% of commits are untracked - the team shows strong process adherence.
## Recommendations
β
The team shows healthy development patterns. Continue current practices while monitoring for changes.The narrative reports automatically include all available sections based on your configuration and data availability:
Always Generated:
- Executive Summary, Team Composition, Project Activity, Development Patterns, Issue Tracking, Recommendations
Conditionally Generated:
- Pull Request Analysis: Requires GitHub integration with PR data
- PM Platform Integration: Requires JIRA or other PM platform configuration
- Qualitative Analysis: Requires ChatGPT integration setup
Customizing Report Content:
# config.yaml
output:
formats:
- csv
- markdown # Enables narrative report generation
# Optional: Enhance narrative reports with additional data
jira:
access_user: "${JIRA_ACCESS_USER}"
access_token: "${JIRA_ACCESS_TOKEN}"
base_url: "https://company.atlassian.net"
# Optional: Add qualitative insights
analysis:
chatgpt:
enabled: true
api_key: "${OPENAI_API_KEY}"Configure custom regex patterns to match your team's story point format:
story_point_patterns:
- "SP: (\\d+)" # SP: 5
- "\\[([0-9]+) pts\\]" # [3 pts]
- "estimate: (\\d+)" # estimate: 8Automatically detects and tracks tickets from multiple PM platforms:
- JIRA:
PROJ-123 - GitHub Issues:
#123,GH-123 - ClickUp:
CU-abc123 - Linear:
ENG-123
GitFlow Analytics supports multiple project management platforms simultaneously. You can configure one or more platforms based on your team's workflow:
# Configure which platforms to track
analysis:
ticket_platforms:
- jira
- linear
- clickup
- github # GitHub Issues
# Platform-specific configuration
pm:
jira:
access_user: "${JIRA_ACCESS_USER}"
access_token: "${JIRA_ACCESS_TOKEN}"
base_url: "https://your-company.atlassian.net"
linear:
api_key: "${LINEAR_API_KEY}"
team_ids: # Optional: filter by team
- "team_123abc"
clickup:
api_token: "${CLICKUP_API_TOKEN}"
workspace_url: "https://app.clickup.com/12345/v/"
# GitHub Issues uses existing GitHub token automatically
github:
token: "${GITHUB_TOKEN}"- Get API Token: Go to Atlassian API Tokens
- Required Permissions: Read access to projects and issues
- Configuration:
pm: jira: access_user: "${JIRA_ACCESS_USER}" # Your Atlassian email access_token: "${JIRA_ACCESS_TOKEN}" base_url: "https://your-company.atlassian.net"
- Get API Key: Go to Linear Settings β API
- Required Permissions: Read access to issues
- Configuration:
pm: linear: api_key: "${LINEAR_API_KEY}" team_ids: ["team_123abc"] # Optional: specify team IDs
- Get API Token: Go to ClickUp Settings β Apps
- Get Workspace URL: Copy from browser when viewing your workspace
- Configuration:
pm: clickup: api_token: "${CLICKUP_API_TOKEN}" workspace_url: "https://app.clickup.com/12345/v/"
GitHub Issues is automatically enabled when GitHub integration is configured. No additional setup required:
github:
token: "${GITHUB_TOKEN}" # Same token for repo access and issuesGitFlow Analytics can fetch story points directly from JIRA tickets:
jira_integration:
enabled: true
fetch_story_points: true
story_point_fields:
- "Story point estimate" # Your custom field name
- "customfield_10016" # Or use field IDTo discover your JIRA story point fields:
gitflow-analytics discover-storypoint-fields -c config.yamlStore credentials securely in a .env file:
# .env file (keep this secure and don't commit to git!)
GITHUB_TOKEN=ghp_your_token_here
# PM Platform Credentials
JIRA_ACCESS_USER=your.email@company.com
JIRA_ACCESS_TOKEN=ATATT3xxxxxxxxxxx
LINEAR_API_KEY=lin_api_xxxxxxxxxxxx
CLICKUP_API_TOKEN=pk_xxxxxxxxxxxxThe tool uses SQLite for intelligent caching:
- Commit analysis results
- Developer identity mappings
- Pull request data
Cache is automatically managed with configurable TTL.
GitFlow Analytics intelligently consolidates developer identities across different email addresses and name variations:
Identity analysis now runs automatically by default when no manual mappings exist. The system will:
- Analyze all developer identities in your commits
- Show suggested consolidations with a clear preview
- Prompt for approval with a simple Y/n
- Update your configuration automatically
- Continue analysis with consolidated identities
Example of the interactive prompt:
π Analyzing developer identities...
β οΈ Found 3 potential identity clusters:
π Suggested identity mappings:
john.doe@company.com
β 123456+johndoe@users.noreply.github.com
β jdoe@personal.email.com
π€ Found 2 bot accounts to exclude:
- dependabot[bot]
- renovate[bot]
ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Apply these identity mappings to your configuration? [Y/n]:
This prompt appears at most once every 7 days.
To skip automatic identity analysis:
# Simplified syntax (default)
gitflow-analytics -c config.yaml --skip-identity-analysis
# Explicit analyze command
gitflow-analytics analyze -c config.yaml --skip-identity-analysisTo manually run identity analysis:
gitflow-analytics identities -c config.yamlThe system automatically detects:
- GitHub noreply emails (e.g.,
150280367+username@users.noreply.github.com) - Name variations (e.g., "John Doe" vs "John D" vs "jdoe")
- Common email patterns across domains
- Bot accounts for automatic exclusion
You can also manually configure identity mappings in your YAML:
analysis:
identity:
manual_mappings:
- name: "John Doe" # Optional: preferred display name for reports
primary_email: john.doe@company.com
aliases:
- jdoe@personal.email.com
- 123456+johndoe@users.noreply.github.com
- name: "Sarah Smith"
primary_email: sarah.smith@company.com
aliases:
- s.smith@oldcompany.comThe optional name field in manual mappings allows you to control how developer names appear in reports. This is particularly useful for:
- Standardizing display names across different email formats
- Resolving duplicates when the same person appears with slight name variations
- Using preferred names instead of technical email formats
Example use cases:
analysis:
identity:
manual_mappings:
# Consolidate Austin Zach identities
- name: "Austin Zach"
primary_email: "john.smith@company.com"
aliases:
- "150280367+jsmith@users.noreply.github.com"
- "jsmith-company@users.noreply.github.com"
# Standardize name variations
- name: "John Doe" # Consistent display across all reports
primary_email: "john.doe@company.com"
aliases:
- "johndoe@company.com"
- "j.doe@company.com"Without the name field, the system uses the canonical email's associated name, which might not be ideal for reporting.
To disable the automatic identity prompt:
analysis:
identity:
auto_analysis: falseGitFlow Analytics includes sophisticated machine learning capabilities for categorizing commits with high accuracy and confidence scoring.
The ML categorization system uses a hybrid approach combining:
- Semantic Analysis: Uses spaCy NLP models to understand commit message meaning
- File Pattern Recognition: Analyzes changed files for additional context signals
- Rule-based Fallback: Falls back to traditional regex patterns when ML confidence is low
- Confidence Scoring: Provides confidence metrics for all categorizations
The system automatically categorizes commits into:
- Feature: New functionality development (
add,implement,create) - Bug Fix: Error corrections (
fix,resolve,correct) - Refactor: Code restructuring (
refactor,optimize,improve) - Documentation: Documentation updates (
docs,readme,comment) - Maintenance: Routine upkeep (
update,upgrade,dependency) - Test: Testing-related changes (
test,spec,coverage) - Style: Formatting changes (
format,lint,prettier) - Build: Build system changes (
build,ci,docker) - Security: Security-related fixes (
security,vulnerability) - Hotfix: Urgent production fixes (
hotfix,critical,emergency) - Config: Configuration changes (
config,settings,environment)
analysis:
ml_categorization:
# Enable/disable ML categorization (default: true)
enabled: true
# Minimum confidence for ML predictions (0.0-1.0, default: 0.6)
min_confidence: 0.6
# Semantic vs file pattern weighting (default: 0.7 vs 0.3)
semantic_weight: 0.7
file_pattern_weight: 0.3
# Confidence threshold for ML vs rule-based (default: 0.5)
hybrid_threshold: 0.5
# Caching for performance
enable_caching: true
cache_duration_days: 30
# Processing settings
batch_size: 100For ML categorization, install the spaCy English model:
python -m spacy download en_core_web_smAlternative models (if the default is unavailable):
# Medium model (more accurate, larger)
python -m spacy download en_core_web_md
# Large model (most accurate, largest)
python -m spacy download en_core_web_lg- Accuracy: 85-95% accuracy on typical commit messages
- Speed: ~50-100 commits/second with caching enabled
- Fallback: Gracefully disables qualitative analysis if spaCy model unavailable (provides helpful error messages)
- Memory: ~200MB additional memory usage for spaCy models
With ML categorization enabled, reports include:
- Confidence scores for each categorization
- Method indicators (ML, rules, or cached)
- Alternative predictions for uncertain cases
- ML performance statistics in analysis summaries
commit_hash,category,ml_confidence,ml_method,message
a1b2c3d,feature,0.89,ml,"Add user authentication system"
f6e5d4c,bug_fix,0.92,ml,"Fix memory leak in cache cleanup"
9876543,maintenance,0.74,rules,"Update dependency versions"
GitFlow Analytics provides helpful error messages when YAML configuration issues are encountered. Here are common errors and their solutions:
β YAML configuration error at line 3, column 1:
π« Tab characters are not allowed in YAML files!
Fix: Replace all tabs with spaces (use 2 or 4 spaces for indentation)
- Most editors can show whitespace characters and convert tabs to spaces
- In VS Code: View β Render Whitespace, then Edit β Convert Indentation to Spaces
β YAML configuration error at line 5, column 10:
π« Missing colon (:) after a key name!
Fix: Add a colon and space after each key name
# Correct:
repositories:
- name: my-repo
# Incorrect:
repositories
- name my-repoβ YAML configuration error at line 8, column 15:
π« Unclosed quoted string!
Fix: Ensure all quotes are properly closed
# Correct:
token: "my-token-value"
# Incorrect:
token: "my-token-valueβ YAML configuration error:
π« Indentation error or invalid structure!
Fix: Use consistent indentation (either 2 or 4 spaces)
# Correct:
analysis:
exclude:
paths:
- "vendor/**"
# Incorrect:
analysis:
exclude:
paths: # 3 spaces - inconsistent!
- "vendor/**"- Use a YAML validator: Check your configuration with online YAML validators before using
- Enable whitespace display: Make tabs and spaces visible in your editor
- Use quotes for special characters: Wrap values containing
:,#,@, etc. in quotes - Consistent indentation: Pick 2 or 4 spaces and stick to it throughout the file
- Check the sample config: Reference
config-sample.yamlfor proper structure
Beyond YAML syntax, GitFlow Analytics validates:
- Required fields (
repositoriesmust havenameandpath) - Environment variable resolution
- File path existence
- Valid configuration structure
If you encounter persistent issues, run with --debug for detailed error information:
# Simplified syntax (default)
gitflow-analytics -c config.yaml --debug
# Explicit analyze command
gitflow-analytics analyze -c config.yaml --debugContributions are welcome! Please feel free to submit a Pull Request.
# Clone the repository
git clone https://github.com/bobmatnyc/gitflow-analytics.git
cd gitflow-analytics
# Install development dependencies
make install-dev
# Run tests
make test
# Format code
make format
# Run all quality checks
make quality-gateThis project uses a Makefile-based release workflow for simplicity and transparency. See RELEASE.md for detailed documentation.
Quick Reference:
make release-patch # Bug fixes (3.13.1 β 3.13.2)
make release-minor # New features (3.13.1 β 3.14.0)
make release-major # Breaking changes (3.13.1 β 4.0.0)For more details, see:
- RELEASE.md - Comprehensive release guide
- RELEASE_QUICKREF.md - Quick reference card
make help- All available commands
This project is licensed under the MIT License - see the LICENSE file for details.