A modular web agent designed for the AGI SDK REAL benchmark, featuring dynamic prompt routing and Chain-of-Thought planning for autonomous web navigation.
๐ System Architecture Diagram (Miro)
- ๐ง Modular Architecture โ High-level Orchestrator (project manager) + focused Agent (LLM specialist)
- ๐ Dynamic Prompt Routing โ Automatically selects task-specific prompts based on the goal
- ๐ญ Chain-of-Thought Planning โ Self-verification step reviews plans for logical flaws before execution
- ๐ Advanced Self-Correction โ Detects stuck states and changes strategy to recover
- ๐ Granular Planning โ Breaks down complex goals into single-action steps for reliability
agiinc/
โโโ README.md
โโโ requirements.txt # Root dependencies
โโโ Dockerfile # Docker support
โโโ docker-compose.yml # Easy orchestration
โ
โโโ agiwebagent/ # Web agent implementation
โ โโโ main.py # Entry point
โ โโโ requirements.txt # Agent-specific dependencies
โ โโโ agent_src/
โ โโโ agent.py # LLM communication layer
โ โโโ config.py # Configuration dataclass
โ โโโ memory.py # Action history tracking
โ โโโ orchestrator.py # Task lifecycle manager
โ โโโ prompt_selector.py # Dynamic prompt routing
โ โโโ llm_utils.py # LLM utilities with retry logic
โ โโโ vision_tools.py # Visual OCR extraction
โ โโโ utils.py # Helper functions
โ โโโ prompts/ # Task-specific prompt files
โ โโโ omnizon_prompts.py
โ โโโ dashdish_prompts.py
โ โโโ ...
โ
โโโ agisdk/ # AGI SDK (submodule/dependency)
-
Create a virtual environment
python -m venv agienv source agienv/bin/activate # On Windows: agienv\Scripts\activate
-
Install dependencies
pip install -r requirements.txt pip install -r agiwebagent/requirements.txt
-
Configure your API key
Copy the example environment file and add your API key:
cp .env.example .env # Edit .env and replace 'sk-your-api-key-here' with your actual OpenAI API key -
Run the agent
python agiwebagent/main.py --task_name webclones.omnizon-1 --headless true
-
Configure your API key
cp .env.example .env # Edit .env and add your OpenAI API key -
Build and run
docker compose build docker compose run --rm agiwebagent \ --task_name webclones.omnizon-1 \ --headless true
python agiwebagent/main.py --task_name webclones.omnizon-1 --no-cache --headless true# Run all Omnizon (e-commerce) tasks
python agiwebagent/main.py --task_type omnizon --headless true
# Run all DashDish (food delivery) tasks
python agiwebagent/main.py --task_type dashdish --headless true
# Run all NetworkIn (professional networking) tasks
python agiwebagent/main.py --task_type networkin --headless true| Argument | Description | Example |
|---|---|---|
--task_name |
Run a single task by ID | webclones.omnizon-1 |
--task_type |
Run all tasks of a type | omnizon, dashdish |
--headless |
Run browser in background | true / false |
--no-cache |
Force re-run without cache | Flag |
--model |
OpenAI model for main agent | gpt-4o, gpt-4o-mini |
--vision_model |
Model for OCR/vision | gpt-4o |
--use_ocr |
Enable visual OCR | true / false |
| Variable | Description | Required |
|---|---|---|
OPENAI_API_KEY |
Your OpenAI API key | โ Yes |
Edit agiwebagent/agent_src/config.py to customize:
@dataclass
class AgentConfig:
model_name: str = "gpt-4o" # Main execution model
plan_model_name: str = "gpt-4o" # Planning model
parser_model_name: str = "gpt-4o-mini" # Prompt routing model
vision_model_name: str = "gpt-4o" # OCR/vision model
max_steps: int = 25 # Max steps per task
max_retries: int = 3 # Max plan generation retries
use_screenshot: bool = True # Include screenshots
use_axtree: bool = True # Include accessibility tree
use_ocr: bool = False # Enable visual OCR1. Playwright browsers not installed
playwright install chromium
playwright install-deps2. Rate limit errors The agent has built-in exponential backoff retry logic. If you're hitting limits frequently, consider:
- Using a model with higher rate limits
- Reducing parallel execution
3. Docker display issues (headed mode) For headed mode in Docker on Linux:
xhost +local:docker
docker-compose run --rm agiwebagent --headless falseโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ main.py โ
โ (Entry Point) โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โผ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ TaskOrchestrator โ
โ (Manages task lifecycle) โ
โ โข Creates plans โข Handles errors โข Tracks progress โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโโโโดโโโโโโโโโโโโโ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ PromptSelector โ โ HighPerformanceAgent โ
โ (Routes to prompts) โ โ (LLM Interface) โ
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ โ
โผ โผ
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
โ prompts/*.py โ โ AgentMemory โ
โ (Task-specific) โ โ (History tracking) โ
โโโโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโ
- Integrate DSPy for better prompt optimization
- Add RL post-training (GRPO, PPO)
- Test with different LLMs for role-based cost optimization
- Build on Nova-act and browser-use frameworks
- Fine-tune multimodal LLM components
This project is licensed under the MIT License - see the LICENSE file for details.
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request