Skip to content

raj-gupta1/ComputerUse-WebAgent

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

35 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

๐Ÿค– AGI Web Agent

Python 3.11+ License: MIT Docker

A modular web agent designed for the AGI SDK REAL benchmark, featuring dynamic prompt routing and Chain-of-Thought planning for autonomous web navigation.

๐Ÿ“ System Architecture Diagram (Miro)


โœจ Key Features

  • ๐Ÿง  Modular Architecture โ€” High-level Orchestrator (project manager) + focused Agent (LLM specialist)
  • ๐Ÿ”€ Dynamic Prompt Routing โ€” Automatically selects task-specific prompts based on the goal
  • ๐Ÿ’ญ Chain-of-Thought Planning โ€” Self-verification step reviews plans for logical flaws before execution
  • ๐Ÿ”„ Advanced Self-Correction โ€” Detects stuck states and changes strategy to recover
  • ๐Ÿ“‹ Granular Planning โ€” Breaks down complex goals into single-action steps for reliability

๐Ÿ“‚ Project Structure

agiinc/
โ”œโ”€โ”€ README.md
โ”œโ”€โ”€ requirements.txt              # Root dependencies
โ”œโ”€โ”€ Dockerfile                    # Docker support
โ”œโ”€โ”€ docker-compose.yml            # Easy orchestration
โ”‚
โ”œโ”€โ”€ agiwebagent/                  # Web agent implementation
โ”‚   โ”œโ”€โ”€ main.py                   # Entry point
โ”‚   โ”œโ”€โ”€ requirements.txt          # Agent-specific dependencies
โ”‚   โ””โ”€โ”€ agent_src/
โ”‚       โ”œโ”€โ”€ agent.py              # LLM communication layer
โ”‚       โ”œโ”€โ”€ config.py             # Configuration dataclass
โ”‚       โ”œโ”€โ”€ memory.py             # Action history tracking
โ”‚       โ”œโ”€โ”€ orchestrator.py       # Task lifecycle manager
โ”‚       โ”œโ”€โ”€ prompt_selector.py    # Dynamic prompt routing
โ”‚       โ”œโ”€โ”€ llm_utils.py          # LLM utilities with retry logic
โ”‚       โ”œโ”€โ”€ vision_tools.py       # Visual OCR extraction
โ”‚       โ”œโ”€โ”€ utils.py              # Helper functions
โ”‚       โ””โ”€โ”€ prompts/              # Task-specific prompt files
โ”‚           โ”œโ”€โ”€ omnizon_prompts.py
โ”‚           โ”œโ”€โ”€ dashdish_prompts.py
โ”‚           โ””โ”€โ”€ ...
โ”‚
โ””โ”€โ”€ agisdk/                       # AGI SDK (submodule/dependency)

๐Ÿš€ Quick Start

Option 1: Local Installation

  1. Create a virtual environment

    python -m venv agienv
    source agienv/bin/activate  # On Windows: agienv\Scripts\activate
  2. Install dependencies

    pip install -r requirements.txt
    pip install -r agiwebagent/requirements.txt
  3. Configure your API key

    Copy the example environment file and add your API key:

    cp .env.example .env
    # Edit .env and replace 'sk-your-api-key-here' with your actual OpenAI API key
  4. Run the agent

    python agiwebagent/main.py --task_name webclones.omnizon-1 --headless true

Option 2: Docker (Recommended)

  1. Configure your API key

    cp .env.example .env
    # Edit .env and add your OpenAI API key
  2. Build and run

    docker compose build
    docker compose run --rm agiwebagent \
      --task_name webclones.omnizon-1 \
      --headless true

๐Ÿ“– Usage

Running a Single Task

python agiwebagent/main.py --task_name webclones.omnizon-1 --no-cache --headless true

Running a Full Task Suite

# Run all Omnizon (e-commerce) tasks
python agiwebagent/main.py --task_type omnizon --headless true

# Run all DashDish (food delivery) tasks
python agiwebagent/main.py --task_type dashdish --headless true

# Run all NetworkIn (professional networking) tasks
python agiwebagent/main.py --task_type networkin --headless true

Command-Line Arguments

Argument Description Example
--task_name Run a single task by ID webclones.omnizon-1
--task_type Run all tasks of a type omnizon, dashdish
--headless Run browser in background true / false
--no-cache Force re-run without cache Flag
--model OpenAI model for main agent gpt-4o, gpt-4o-mini
--vision_model Model for OCR/vision gpt-4o
--use_ocr Enable visual OCR true / false

โš™๏ธ Configuration

Environment Variables

Variable Description Required
OPENAI_API_KEY Your OpenAI API key โœ… Yes

Agent Configuration

Edit agiwebagent/agent_src/config.py to customize:

@dataclass
class AgentConfig:
    model_name: str = "gpt-4o"          # Main execution model
    plan_model_name: str = "gpt-4o"     # Planning model
    parser_model_name: str = "gpt-4o-mini"  # Prompt routing model
    vision_model_name: str = "gpt-4o"   # OCR/vision model
    max_steps: int = 25                 # Max steps per task
    max_retries: int = 3                # Max plan generation retries
    use_screenshot: bool = True         # Include screenshots
    use_axtree: bool = True             # Include accessibility tree
    use_ocr: bool = False               # Enable visual OCR

๐Ÿ”ง Troubleshooting

Common Issues

1. Playwright browsers not installed

playwright install chromium
playwright install-deps

2. Rate limit errors The agent has built-in exponential backoff retry logic. If you're hitting limits frequently, consider:

  • Using a model with higher rate limits
  • Reducing parallel execution

3. Docker display issues (headed mode) For headed mode in Docker on Linux:

xhost +local:docker
docker-compose run --rm agiwebagent --headless false

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                        main.py                              โ”‚
โ”‚                    (Entry Point)                            โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
                           โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    TaskOrchestrator                         โ”‚
โ”‚              (Manages task lifecycle)                       โ”‚
โ”‚  โ€ข Creates plans  โ€ข Handles errors  โ€ข Tracks progress       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                           โ”‚
              โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
              โ–ผ                         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   PromptSelector     โ”‚    โ”‚  HighPerformanceAgent โ”‚
โ”‚ (Routes to prompts)  โ”‚    โ”‚    (LLM Interface)    โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
              โ”‚                         โ”‚
              โ–ผ                         โ–ผ
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚  prompts/*.py        โ”‚    โ”‚    AgentMemory       โ”‚
โ”‚ (Task-specific)      โ”‚    โ”‚  (History tracking)  โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”ฎ Future Improvements

  • Integrate DSPy for better prompt optimization
  • Add RL post-training (GRPO, PPO)
  • Test with different LLMs for role-based cost optimization
  • Build on Nova-act and browser-use frameworks
  • Fine-tune multimodal LLM components

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Commit your changes (git commit -m 'Add amazing feature')
  4. Push to the branch (git push origin feature/amazing-feature)
  5. Open a Pull Request

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published