Test AI tool documentation against multiple models before shipping.
For Agents: See AGENTS.md for quick start and troubleshooting.
You write skill documentation for Claude. It works great. Then other models misunderstand it. You don't find out until production.
bun run src/cli.ts test ./skill.md "Generate a 3-voice podcast" -m qwen/qwen3-coder:freeFocus Group sends your documentation to multiple AI models and asks them to complete a task. Each model reports:
- What they understood
- How they'd approach the task
- What confused them
- What would cause them to fail
- Suggested improvements
# Install
git clone https://github.com/EmZod/Agent-Focus-Group
cd Agent-Focus-Group
bun install
# Configure (get key from https://openrouter.ai/keys)
export OPENROUTER_API_KEY="sk-or-v1-..."
# Test with free model (no cost)
bun run src/cli.ts test ./skill.md "Complete this task" -m qwen/qwen3-coder:free
# Test with default models (~$0.005)
bun run src/cli.ts test ./skill.md "Complete this task"Cost note: Without
-m <model>:free, the tool uses paid models. Always use-m qwen/qwen3-coder:freewhile learning.
See examples/sample-skill.md for a sample skill file format.
Focus Group Test
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Skill: SKILL.md
Task: Generate a podcast with 3 voices
Models: 2
qwen/qwen3-coder:free [OK] 14.3s $0.00
meta-llama/llama-3.2-3b-instruct [OK] 24.5s $0.00
Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Completed: 2/2 models
Common Confusions:
• "voice path format unclear" (2/2 models)
• "no command to list available voices" (1/2 models)
Suggested Improvements:
• Add explicit full path example for --voice flag
• Include a command to list available voices
View full results: bun run src/cli.ts show 01KEHWY8AAK...
All commands use bun run src/cli.ts <command>:
| Command | Purpose |
|---|---|
test <skill> "<task>" |
Run test against models |
show <run-id> |
View detailed results (latest works) |
history |
List recent runs |
diff <run1> <run2> |
Compare two runs |
cost |
Show API cost summary |
config |
Show configuration |
# Use specific models
bun run src/cli.ts test ./skill.md "task" -m openai/gpt-5,anthropic/claude-sonnet-4.5
# Use presets
bun run src/cli.ts test ./skill.md "task" -p expensive
# Use free models (no cost)
bun run src/cli.ts test ./skill.md "task" -m qwen/qwen3-coder:free,meta-llama/llama-3.2-3b-instruct:free
# Don't save to database
bun run src/cli.ts test ./skill.md "task" --no-save
# Increase timeout for slow models (default: 60s)
bun run src/cli.ts test ./skill.md "task" --timeout 120| Preset | Models | Use When |
|---|---|---|
cheap |
gpt-5-mini, claude-haiku-4.5, gemini-2.5-flash | Quick iteration |
expensive |
gpt-5, claude-sonnet-4.5, gemini-2.5-pro | Before shipping |
frontier |
claude-opus-4.5, gpt-5.2-pro, o3-pro | Critical docs |
comprehensive |
All above + llama-4, qwen3 | Maximum coverage |
Test without cost using OpenRouter's free tier (add :free suffix):
# Single free model
bun run src/cli.ts test ./skill.md "task" -m qwen/qwen3-coder:free
# Multiple free models (comma-separated, no spaces)
bun run src/cli.ts test ./skill.md "task" -m qwen/qwen3-coder:free,meta-llama/llama-3.2-3b-instruct:freeFree models are slower (30-60s vs 10-20s for paid) but cost nothing.
Config: ~/.config/focus-group/config.toml
Database: ~/.local/share/focus-group/focus-group.db
| Variable | Required | Purpose |
|---|---|---|
OPENROUTER_API_KEY |
Yes | API authentication |
FOCUS_GROUP_CONFIG |
No | Override config path |
FOCUS_GROUP_DATA |
No | Override data directory |
v2: Skill Doc Rewriter — Generate improved skill docs from model feedback. Track progress →
| Doc | Purpose |
|---|---|
| AGENTS.md | Quick start for AI agents, troubleshooting |
| examples/sample-skill.md | Example skill file format |
| docs/DESIGN.md | Architecture and design decisions |
bun install # Install dependencies
bun test # Run tests (45 tests)
bun run typecheck # Type checkMIT