AirCrawl

AirCrawl is an experimental browser automation system that uses LLMs to autonomously navigate websites, extract data, and complete complex web tasks.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                        AGENT LOOP                               │
│  ┌─────────┐    ┌─────────┐    ┌─────────┐    ┌─────────┐       │
│  │ OBSERVE │───▶│  PLAN   │───▶│ EXECUTE │───▶│ VERIFY  │──┐    │
│  └─────────┘    └─────────┘    └─────────┘    └─────────┘  │    │
│       ▲                                                    │    │
│       └────────────────────────────────────────────────────┘    │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                     MCP SERVER (Tools)                          │
│  ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐            │
│  │ navigate │ │  click   │ │   type   │ │screenshot│  ...       │
│  └──────────┘ └──────────┘ └──────────┘ └──────────┘            │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                      SELENIUMBASE                               │
│              (Undetected Chrome Browser)                        │
└─────────────────────────────────────────────────────────────────┘

Features

Autonomous Navigation: LLM-powered decision making for complex web tasks
Vision-Based Analysis: Multimodal LLM analyzes screenshots to understand page state
Element Annotation: Visual labeling of interactive elements for precise targeting
Undetected Browser: SeleniumBase with UC mode bypasses bot detection
MCP Protocol: Standard Model Context Protocol for tool communication
REST API: Full-featured FastAPI server for integration
Interactive Sessions: Persistent browser sessions for step-by-step control
Quick Actions: Pre-built endpoints for common tasks (search, login, extract)
Agent System: Persistent agents with custom prompts and encrypted secrets

Installation

# Clone the repository
git clone https://github.com/yourusername/aircrawl.git
cd aircrawl

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install dependencies
pip install -r requirements.txt

# Copy environment template
cp .env.template .env
# Edit .env with your API keys

Configuration

Edit .env file:

# LLM Configuration (OpenAI-compatible)
OPENAI_API_KEY=your-api-key
OPENAI_BASE_URL=https://api.openai.com/v1  # Or other compatible endpoint
MODEL=gpt-4o  # Or any compatible model

# Browser Settings
BROWSER_UC_MODE=true
BROWSER_TIMEOUT=30

# Agent Settings
AGENT_MAX_STEPS=30

# API Server
API_HOST=0.0.0.0
API_PORT=8000
DEBUG=false

# Database (PostgreSQL)
POSTGRES_USER=aircrawl
POSTGRES_PASSWORD=your_secure_password
POSTGRES_DB=aircrawl
DATABASE_URL=postgresql+asyncpg://aircrawl:your_secure_password@localhost:5432/aircrawl

# Encryption key for secrets
ENCRYPTION_KEY=  # Generate with: python -c "from cryptography.fernet import Fernet; print(Fernet.generate_key().decode())"

# Browser Proxy (optional) - see "Using Gost Proxy" section below
BROWSER_PROXY=

Docker Compose

The easiest way to run AirCrawl is with Docker Compose:

# Copy environment template
cp .env.template .env
# Edit .env with your API keys and configuration

# Start all services (PostgreSQL, Backend, Frontend)
docker compose up -d

# View logs
docker compose logs -f

# Stop services
docker compose down

Services:

PostgreSQL: Database on port 5432
Backend API: FastAPI server on port 8000
Frontend: SvelteKit app on port 3000

Using Gost Proxy (Anti-Bot)

Gost is a SOCKS5 proxy that helps bypass anti-bot detection by routing browser traffic through a residential IP. This is useful when websites block datacenter IPs.

How it works:

Gost runs as a local SOCKS5 proxy (port 1081)
It forwards traffic to your upstream proxy (e.g., a residential proxy service)
The browser connects through the local proxy, appearing to come from a residential IP

Setup:

Set up your upstream proxy (residential IP, home server, etc.)
Configure .env:

# Local proxy that the browser will use
BROWSER_PROXY=socks5://127.0.0.1:1081

# Upstream proxy (your residential/home proxy)
GOST_UPSTREAM_PROXY=socks5://username:password@your-proxy-ip:1080

Start with the proxy profile:

docker compose --profile proxy up -d

Without proxy: If BROWSER_PROXY is not set, the browser connects directly without any proxy.

Usage

Command Line

# Run a single task
python main.py task "Go to news.ycombinator.com and get the top 5 headlines"

# Interactive demo
python main.py demo

# Start API server
python main.py server --port 8000

# Start MCP server (for external clients)
python main.py mcp-server

API Server

Start the server:

python main.py server

Create async task:

curl -X POST http://localhost:8000/api/tasks \
  -H "Content-Type: application/json" \
  -d '{"task": "Go to wikipedia.org and search for Python programming", "max_steps": 20}'

Check task status:

curl http://localhost:8000/api/tasks/{task_id}

Quick search:

curl -X POST http://localhost:8000/api/quick/search \
  -H "Content-Type: application/json" \
  -d '{"query": "latest AI news", "engine": "duckduckgo", "num_results": 5}'

Quick navigate and extract:

curl -X POST http://localhost:8000/api/quick/navigate \
  -H "Content-Type: application/json" \
  -d '{"url": "https://news.ycombinator.com", "extract": "headlines"}'

Interactive Sessions

Create persistent browser sessions for step-by-step control:

# Create session
curl -X POST http://localhost:8000/api/sessions \
  -H "Content-Type: application/json" \
  -d '{}'

# Execute action
curl -X POST http://localhost:8000/api/sessions/{session_id}/action \
  -H "Content-Type: application/json" \
  -d '{"action": "navigate", "parameters": {"url": "https://google.com"}}'

# Close session
curl -X DELETE http://localhost:8000/api/sessions/{session_id}

Python SDK

import asyncio
from agent.agent import WebAgent
from agent.mcp_client import create_direct_client

async def main():
    async with create_direct_client(headless=True) as browser:
        agent = WebAgent(mcp_client=browser)
  
        result = await agent.execute_task(
            "Go to bbc.com/news and extract the top 5 headlines"
        )
  
        print(f"Status: {result['status']}")
        print(f"Data: {result['extracted_data']}")

asyncio.run(main())

API Endpoints

Tasks

Method	Endpoint	Description
POST	`/api/tasks`	Create async task
POST	`/api/tasks/sync`	Create sync task (blocking)
GET	`/api/tasks`	List all tasks
GET	`/api/tasks/{id}`	Get task details
GET	`/api/tasks/{id}/history`	Get execution history
DELETE	`/api/tasks/{id}`	Delete task

Quick Actions

Method	Endpoint	Description
POST	`/api/quick/navigate`	Navigate and extract content
POST	`/api/quick/search`	Web search
POST	`/api/quick/login`	Login to website

Sessions

Method	Endpoint	Description
POST	`/api/sessions`	Create browser session
POST	`/api/sessions/{id}/action`	Execute action
GET	`/api/sessions`	List sessions
DELETE	`/api/sessions/{id}`	Close session

Vision & Screenshot Analysis

AirCrawl uses multimodal LLM vision capabilities to understand web pages visually, enabling more intelligent automation:

How It Works

Page State Observation: When the agent calls get_page_state, a screenshot is automatically captured
Visual Analysis: The screenshot is sent to the vision-capable LLM (e.g., GPT-4 Vision, Claude 3) along with the task context
Element Annotation: Interactive elements are visually labeled with numbered overlays
Intelligent Decision Making: The LLM analyzes the visual layout to determine the next action

Annotated Screenshots

The system can overlay visual labels on interactive elements to help the LLM identify targets:

# Annotated elements are numbered and highlighted
# Example output:
{
  "annotated_elements": [
    {"id": 1, "selector": "#search-input", "type": "input", "text": "Search..."},
    {"id": 2, "selector": ".login-btn", "type": "button", "text": "Login"},
    {"id": 3, "selector": "a.nav-link", "type": "link", "text": "Home"}
  ]
}

The LLM can then reference elements by their visual ID: "Click element #2 to login"

API Endpoints for Screenshots

# Get current session screenshot (base64)
curl http://localhost:8000/api/sessions/{session_id}/screenshot

# Get annotated screenshot with labeled elements
curl http://localhost:8000/api/sessions/{session_id}/screenshot/annotated

Benefits

Visual Context: Understands page layout, not just DOM structure
Dynamic Content: Handles JavaScript-rendered content that may not be in initial HTML
Complex UIs: Navigates modern SPAs with dynamic components
Error Detection: Visually identifies error messages, popups, and unexpected states
Accessibility: Works even when selectors are obfuscated or change frequently

Browser Tools

The agent has access to these browser automation tools:

Core Navigation

browser_start / browser_stop - Session lifecycle
navigate - Go to URL
go_back / go_forward / refresh - Browser navigation

Element Interaction

click - Click element by CSS selector
type_text - Enter text in input fields
press_key - Keyboard input (Enter, Tab, Escape, etc.)
hover / double_click / right_click - Mouse actions
select_option - Dropdown selection

Page Analysis

get_page_state - Observe page (elements, URL, title, screenshot)
annotate_elements - Add visual labels to interactive elements
remove_annotations - Remove visual labels
take_annotated_screenshot - Capture screenshot with element labels
get_text / get_attribute - Extract element data
extract_content - Get headlines, links, article, forms

Utilities

scroll - Scroll page
wait_for_element / wait_for_text - Wait for content
execute_script - Run JavaScript
switch_to_frame - Handle iframes
upload_file - File uploads
accept_alert / dismiss_alert - Handle dialogs

Project Structure

aircrawl/
├── mcp_server/
│   ├── __init__.py
│   ├── server.py          # MCP server with browser tools
│   └── browser_manager.py # SeleniumBase wrapper
├── agent/
│   ├── __init__.py
│   ├── agent.py           # Main agent loop
│   ├── mcp_client.py      # MCP client wrapper
│   └── state.py           # State management
├── api/
│   ├── __init__.py
│   └── server.py          # FastAPI REST API
├── config.py              # Configuration
├── main.py                # CLI entry point
├── requirements.txt
└── README.md

License

MIT License

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agent		agent
api		api
db		db
front		front
mcp_server		mcp_server
.env.template		.env.template
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
config.py		config.py
docker-compose.yaml		docker-compose.yaml
frontend.png		frontend.png
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

AirCrawl

Architecture

Features

Installation

Configuration

Docker Compose

Using Gost Proxy (Anti-Bot)

Usage

Command Line

API Server

Create async task:

Check task status:

Quick search:

Quick navigate and extract:

Interactive Sessions

Python SDK

API Endpoints

Tasks

Quick Actions

Sessions

Vision & Screenshot Analysis

How It Works

Annotated Screenshots

API Endpoints for Screenshots

Benefits

Browser Tools

Core Navigation

Element Interaction

Page Analysis

Utilities

Project Structure

License

About

Uh oh!

Languages

Arkel-ai/aircrawl

Folders and files

Latest commit

History

Repository files navigation

AirCrawl

Architecture

Features

Installation

Configuration

Docker Compose

Using Gost Proxy (Anti-Bot)

Usage

Command Line

API Server

Create async task:

Check task status:

Quick search:

Quick navigate and extract:

Interactive Sessions

Python SDK

API Endpoints

Tasks

Quick Actions

Sessions

Vision & Screenshot Analysis

How It Works

Annotated Screenshots

API Endpoints for Screenshots

Benefits

Browser Tools

Core Navigation

Element Interaction

Page Analysis

Utilities

Project Structure

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Languages