Autonomous build failure diagnosis and fixing using Claude AI
Real Evidence: This system has been tested with real GitHub Actions workflows. See TEST_RESULTS.md for proof with actual URLs.
- Overview
- Key Features
- Quick Start
- Architecture
- Configuration
- Testing
- Cost Optimization
- Documentation
Autonomous agent that detects, investigates, and fixes CI/CD build failures:
- Detects failure in GitHub Actions
- Investigates using iterative LLM conversations (up to 5 turns)
- Fixes by creating PR with complete file replacements
- Coordinates across multiple build flavors (85% cost savings)
Cost: ~$0.50 per fix attempt | Success Rate: High for common issues
- LLM requests files as needed (not all upfront)
- Multi-turn investigation (configurable, default: 5 turns)
- Smart log extraction (handles 10+ MB logs)
- Token budget: 50K tokens (~$0.50)
- No string-matching errors
- LLM provides complete fixed files
- More reliable than diffs
- No human interaction required
- LLM constrained to file requests only
- Best-effort autonomous fixes
- Fetches files from GitHub raw URLs
- Includes recent commit history
- Regression detection (was working β now broken)
- 85% cost savings for multi-platform builds
- First flavor analyzes, others wait
- See: COORDINATION.md
export ANTHROPIC_API_KEY="your-key"
export GITHUB_TOKEN="your-token"
export GITHUB_REPOSITORY="owner/repo"python agent/autonomous_agent.py \
--branch main \
--build-status failure \
--failure-log test-project/test-output.log \
--mock-mode- name: Autonomous Fix
if: failure()
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
BUILD_FLAVOR: "linux-x64"
run: |
python agent/autonomous_agent.py \
--branch "${{ github.ref_name }}" \
--build-status failure \
--failure-log test-output.log| Case | Condition | Action |
|---|---|---|
| CASE 1 | First failure on main | Create fix branch, analyze with LLM |
| CASE 2 | Failure on fix branch | Retry with incremented attempt number |
| CASE 3 | Success on fix branch | Create PR |
| CASE 4 | Success on non-fix branch | Do nothing |
| CASE 5 | Attempt β₯ 7 | Escalate to human (create issue) |
SmartLogExtractor β Extracts relevant error from massive logs
β
ContextFetcher β Fetches files/git history as requested
β
LLMClient β Iterative investigation with Claude
β
GitOperations β Applies fixes, commits, creates PRs
β
FlavorCoordinator β Coordinates multi-platform builds
All in agent/config.py:
# Model progression
SONNET_MAX_ATTEMPTS = 4 # Attempts 1-4
OPUS_MAX_ATTEMPTS = 6 # Attempts 5-6
ESCALATION_THRESHOLD = 7 # Attempt 7+
# Investigation limits
MAX_INVESTIGATION_TURNS = 5 # Max LLM back-and-forth
MAX_TOTAL_TOKENS = 50000 # ~$0.50 budget
MIN_FIX_CONFIDENCE = 0.85 # Min confidence to apply
# Multi-flavor coordination
ENABLE_FLAVOR_COORDINATION = TrueAll parameters are tunable!
Real LLM: $0.50 Γ 100 tests = $50
Mock LLM: $0.00 Γ 1000 tests = $0 β
tests/
βββ unit/ # Fast, isolated tests
βββ integration/ # Mock LLM end-to-end
βββ scenarios/ # Complex test cases
# All tests (free - uses mock LLM!)
pytest tests/ -v
# Specific test
pytest tests/unit/test_log_extractor.py -v
# With coverage
pytest --cov=agent --cov-report=htmlSee TESTING.md for complete testing guide
| Component | Tokens | Cost |
|---|---|---|
| Error excerpt | 1,000 | $0.003 |
| Git history | 500 | $0.0015 |
| File requests (2 turns) | 4,000 | $0.012 |
| Fix proposal | 1,000 | $0.003 |
| Total | ~6,500 | ~$0.02 |
Budget allows up to 5 turns: $0.50 max
7 platforms (e.g., ApraPipes):
- Without coordination: 7 Γ $0.50 = $3.50/commit
- With coordination: $0.50/commit
- Annual savings (300 commits): $10,800 π°
- README.md (this file) - Overview & quick start
- TESTING.md - Comprehensive testing guide
- COORDINATION.md - Multi-flavor coordination
- TEST_RESULTS.md - Real test evidence
agent/config.py- All parametersagent/prompts.json- LLM templatesagent/autonomous_agent.py- Main logicagent/llm_client.py- Iterative investigationagent/coordination.py- Multi-flavor coordination
β Production Ready
- Core 5-case routing: Proven
- Iterative investigation: Implemented
- Multi-flavor coordination: Implemented
- Mock testing: In Progress
See TEST_RESULTS.md for evidence
- Write tests first (mock mode)
- Run
pytest tests/ - Update docs (keep consolidated!)
- No new .md files without reason
Q: Cost per fix? A: ~$0.50 (configurable budget)
Q: What if it fails? A: After 6 attempts β escalates to human
Q: Language support? A: Language-agnostic (Python, C++, Rust, etc.)
Q: Test without costs?
A: Yes! Use --mock-mode flag
Q: Disable the agent?
A: Remove from workflow or set ENABLE_AUTO_FIX = False
- Issues: https://github.com/Apra-Labs/test-autonomous-devops/issues
- Real Evidence: TEST_RESULTS.md
- Testing Guide: TESTING.md
- Coordination: COORDINATION.md
License: MIT