Skip to content

Test AI tool documentation against multiple models before shipping. Send skill docs to GPT, Claude, Gemini, Llama — get back confusions, failures, and improvements.

License

Notifications You must be signed in to change notification settings

EmZod/Agent-Focus-Group

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Focus Group

Test AI tool documentation against multiple models before shipping.

For Agents: See AGENTS.md for quick start and troubleshooting.

The Problem

You write skill documentation for Claude. It works great. Then other models misunderstand it. You don't find out until production.

The Solution

bun run src/cli.ts test ./skill.md "Generate a 3-voice podcast" -m qwen/qwen3-coder:free

Focus Group sends your documentation to multiple AI models and asks them to complete a task. Each model reports:

  • What they understood
  • How they'd approach the task
  • What confused them
  • What would cause them to fail
  • Suggested improvements

Quick Start

# Install
git clone https://github.com/EmZod/Agent-Focus-Group
cd Agent-Focus-Group
bun install

# Configure (get key from https://openrouter.ai/keys)
export OPENROUTER_API_KEY="sk-or-v1-..."

# Test with free model (no cost)
bun run src/cli.ts test ./skill.md "Complete this task" -m qwen/qwen3-coder:free

# Test with default models (~$0.005)
bun run src/cli.ts test ./skill.md "Complete this task"

Cost note: Without -m <model>:free, the tool uses paid models. Always use -m qwen/qwen3-coder:free while learning.

See examples/sample-skill.md for a sample skill file format.

Example Output

Focus Group Test
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Skill: SKILL.md
Task:  Generate a podcast with 3 voices
Models: 2

  qwen/qwen3-coder:free               [OK]  14.3s  $0.00
  meta-llama/llama-3.2-3b-instruct    [OK]  24.5s  $0.00

Summary
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Completed: 2/2 models

Common Confusions:
  • "voice path format unclear" (2/2 models)
  • "no command to list available voices" (1/2 models)

Suggested Improvements:
  • Add explicit full path example for --voice flag
  • Include a command to list available voices

View full results: bun run src/cli.ts show 01KEHWY8AAK...

Commands

All commands use bun run src/cli.ts <command>:

Command Purpose
test <skill> "<task>" Run test against models
show <run-id> View detailed results (latest works)
history List recent runs
diff <run1> <run2> Compare two runs
cost Show API cost summary
config Show configuration

Options

# Use specific models
bun run src/cli.ts test ./skill.md "task" -m openai/gpt-5,anthropic/claude-sonnet-4.5

# Use presets
bun run src/cli.ts test ./skill.md "task" -p expensive

# Use free models (no cost)
bun run src/cli.ts test ./skill.md "task" -m qwen/qwen3-coder:free,meta-llama/llama-3.2-3b-instruct:free

# Don't save to database
bun run src/cli.ts test ./skill.md "task" --no-save

# Increase timeout for slow models (default: 60s)
bun run src/cli.ts test ./skill.md "task" --timeout 120

Model Presets

Preset Models Use When
cheap gpt-5-mini, claude-haiku-4.5, gemini-2.5-flash Quick iteration
expensive gpt-5, claude-sonnet-4.5, gemini-2.5-pro Before shipping
frontier claude-opus-4.5, gpt-5.2-pro, o3-pro Critical docs
comprehensive All above + llama-4, qwen3 Maximum coverage

Free Models

Test without cost using OpenRouter's free tier (add :free suffix):

# Single free model
bun run src/cli.ts test ./skill.md "task" -m qwen/qwen3-coder:free

# Multiple free models (comma-separated, no spaces)
bun run src/cli.ts test ./skill.md "task" -m qwen/qwen3-coder:free,meta-llama/llama-3.2-3b-instruct:free

Free models are slower (30-60s vs 10-20s for paid) but cost nothing.

Configuration

Config: ~/.config/focus-group/config.toml
Database: ~/.local/share/focus-group/focus-group.db

Environment Variables

Variable Required Purpose
OPENROUTER_API_KEY Yes API authentication
FOCUS_GROUP_CONFIG No Override config path
FOCUS_GROUP_DATA No Override data directory

Roadmap

v2: Skill Doc Rewriter — Generate improved skill docs from model feedback. Track progress →

Documentation

Doc Purpose
AGENTS.md Quick start for AI agents, troubleshooting
examples/sample-skill.md Example skill file format
docs/DESIGN.md Architecture and design decisions

Development

bun install          # Install dependencies
bun test             # Run tests (45 tests)
bun run typecheck    # Type check

License

MIT

About

Test AI tool documentation against multiple models before shipping. Send skill docs to GPT, Claude, Gemini, Llama — get back confusions, failures, and improvements.

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •