This project is developed entirely by autonomous agents. No human questions or support will be provided.
A comprehensive tool suite for running reliability tests on Claude agents with flexible execution modes, template-based prompts, and detailed analysis capabilities.
Build the tools:
make buildRun a basic reliability test:
./build/agent-reliability-tests general-purpose --loops 5Analyze the results:
./build/analyze chat_*.log --verboseBinary: ./build/agent-reliability-tests
Source: cmd/reliability/main.go
The tool supports three execution modes:
- Queue Mode (Default) - Uses worker threads for controlled execution
- Parallel Mode - Executes tests in parallel batches
- Multi-Worker Queue - Scales queue mode with multiple workers
# Basic test (queue mode with 1 worker)
./build/agent-reliability-tests general-purpose --loops 10
# Multi-worker queue mode
./build/agent-reliability-tests general-purpose --loops 20 --queue 3
# Parallel batch execution
./build/agent-reliability-tests general-purpose --loops 15 --parallel --batch 5
# Using custom templates
./build/agent-reliability-tests multi-agent-coordinator --prompt example_prompt_templates/coordination_plan.tmpl --loops 5--loops, -l- Number of test iterations (default: 1)--queue, -q- Number of worker threads for queue mode (default: 1)--parallel, -p- Enable parallel batch execution--batch- Batch size for parallel mode (default: 5)--prompt- Path to Go template file for custom prompts--filename, -f- Base name for output log file (default: "chat")
The tool includes a flexible file-based Go template system for customizing agent prompts.
Note: Currently,
SubAgentNameis the only available template variable.
Located in example_prompt_templates/:
hello_world.tmpl- Basic agent interaction (default behavior)coordination_plan.tmpl- Multi-agent coordination testingcode_review.tmpl- Code review capability testingfeature_implementation.tmpl- Complex feature development testing
# Use coordination template
./build/agent-reliability-tests multi-agent-coordinator \
--prompt example_prompt_templates/coordination_plan.tmpl \
--loops 10 --queue 2
# Create custom template
echo "Ask the {{.SubAgentName}} to explain quantum computing" > my_template.tmpl
./build/agent-reliability-tests python-pro --prompt my_template.tmpl --loops 5Binary: ./build/analyze
Source: cmd/analyze/main.go
Analyzes test logs to quantify response similarity, identify patterns, and detect outliers.
- Response similarity metrics
- Clustering analysis
- Pattern identification
- Outlier detection
- Reliability assessment
# Analyze specific log file
./build/analyze chat_1234567890.log --verbose --output analysis.txt
# Debug mode (shows extracted responses)
./build/analyze chat_1234567890.log --debug
# Save detailed analysis
./build/analyze chat_1234567890.log --output detailed_analysis.txt --verbose--verbose, -v- Enable detailed output including similarity matrix--output, -o- Save results to file--debug, -d- Show extracted responses for debugging
For convenience, several Makefile targets are available for quick testing:
make build- Build binaries into./build/directorymake test- Quick test:./build/agent-reliability-tests general-purpose --loops 5make test-parallel- Quick parallel test with default settingsmake exec- Quick queue test:./build/agent-reliability-tests general-purpose --loops 30 --queue 5make analyze- Analyze the most recent log file automaticallymake clean- Remove log files and build directorymake deps- Install Go dependenciesmake help- Show available targets
Note: For production use and custom configurations, use the binaries directly as shown in the examples above.
- Queue Mode: Default mode using worker goroutines with job queue
- Parallel Mode: Batch-based parallel execution with configurable batch size
- Performance: Template caching eliminates repeated file I/O and parsing
- File-based: Go templates stored in
.tmplor.templatefiles - Validation: Input validation with clear error messages
- Caching: Templates parsed once and cached for performance
- Flexible: Support for complex template logic and formatting
├── cmd/
│ ├── reliability/ # Test runner CLI
│ └── analyze/ # Log analyzer CLI
├── pkg/reliability/ # Core reliability testing logic
├── example_prompt_templates/ # Template examples and documentation
├── build/ # Compiled binaries (created by make build)
└── Makefile # Build and test automation
Install dependencies:
make depsBuild tools:
make buildRun a quick test:
./build/agent-reliability-tests general-purpose --loops 3Or use the convenience target:
make test