The 99.99% Success Rate Stealth Engine for AI Agents
The Sovereign, Self-Hosted Alternative to Firecrawl, Jina, and Tavily.
ShadowCrawl is not just a scraperβit's a Cyborg Intelligence Layer. While other APIs fail against Cloudflare, Akamai, and PerimetterX, ShadowCrawl leverages a unique Human-AI Collaboration model to achieve a near-perfect bypass rate on even the most guarded "Boss Level" sites (LinkedIn, Airbnb, Ticketmaster).
- 99.99% Bot Bypass: Featuring the "Non-Robot Search" engine. When automation hits a wall, ShadowCrawl bridges the gap with Human-In-The-Loop (HITL) interaction, allowing you to solve CAPTCHAs and login walls manually while the agent continues its work.
- Total Sovereignty: 100% Private. Self-hosted via Docker. No API keys, no monthly fees, and no third-party data tracking.
- Agent-Native (MCP): Deeply integrated with Cursor, Claude Desktop, and IDEs via the Model Context Protocol. Your AI agent now has eyes and hands in the real web.
- Universal Noise Reduction: Advanced Rust-based filtering that collapses "Skeleton Screens" and repeats, delivering clean, semantic Markdown that reduces LLM token costs.
Most scrapers try to "act" like a human and fail. ShadowCrawl uses a human when it matters.
stealth_scrape is our flagship tool for high-fidelity rendering. It launches a visible, native Brave Browser instance on your machine.
- Manual Intervention: If a site asks for a Login or a Puzzle, you solve it once; the agent scrapes the rest.
- Brave Integration: Uses your actual browser profiles (cookies/sessions) to look like a legitimate user, not a headless bot.
- Stealth Cleanup: Automatically strips automation markers (
navigator.webdriver, etc.) before extraction.
Most scraping APIs surrender when facing enterprise-grade shields. ShadowCrawl is the Hammer that breaks through. We successfully bypass and extract data from:
- Cloudflare π‘οΈ (Turnstile / Challenge Pages)
- DataDome π€ (Interstitial & Behavioral blocks)
- Akamai π° (Advanced Bot Manager)
- PerimeterX / HUMAN π€
- Kasada & Shape Security π
The Secret? The Cyborg Approach (HITL). ShadowCrawl doesn't just "imitate" a humanβit bridges your real, native Brave/Chrome session into the agent's workflow. If a human can see it, ShadowCrawl can scrape it.
We don't just claim to bypassβwe provide the receipts. All evidence below was captured using stealth_scrape (feature flag: non_robot_search) with the Safety Kill Switch enabled (2026-02-14).
| Target Site | Protection | Evidence Size | Data Extracted | Status |
|---|---|---|---|---|
| Cloudflare + Auth | 413KB | π JSON Β· π Snippet | 60+ job IDs, listings β | |
| Ticketmaster | Cloudflare Turnstile | 1.1MB | π JSON Β· π Snippet | Tour dates, venues β |
| Airbnb | DataDome | 1.8MB | π JSON Β· π Snippet | 1000+ Tokyo listings β |
| Upwork | reCAPTCHA | 300KB | π JSON Β· π Snippet | 160K+ job postings β |
| Amazon | AWS Shield | 814KB | π JSON Β· π Snippet | RTX 5070 Ti results β |
| nowsecure.nl | Cloudflare | 168KB | π JSON Β· πΈ Screenshot | Manual button tested β |
π Full Documentation: See proof/README.md for verification steps, protection analysis, and quality metrics.
| Feature | Description |
|---|---|
| Search & Discovery | Federated search via SearXNG. Finds what Google hides. |
| Deep Crawling | Recursive, bounded crawling to map entire subdomains. |
| Semantic Memory | (Optional) Qdrant integration for long-term research recall. |
| Proxy Master | Native rotation logic for HTTP/SOCKS5 pools. |
| Hydration Scraper | Specialized logic to extract "hidden" JSON data from React/Next.js sites. |
| Universal Janitor | Automatic removal of popups, cookie banners, and overlays. |
| Feature | Firecrawl / Jina | ShadowCrawl |
|---|---|---|
| Cost | Monthly Subscription | $0 (Self-hosted) |
| Privacy | They see your data | 100% Private |
| LinkedIn/Airbnb | Often Blocked | 99.99% Success (via HITL) |
| JS Rendering | Cloud-only | Native Brave / Browserless |
| Memory | None | Semantic Research History |
Docker is the fastest way to bring up the full stack (SearXNG, proxy manager, etc.).
Important: Docker mode cannot use the HITL/GUI renderer (stealth_scrape) because containers cannot reliably access your host's native Brave/Chrome window, keyboard hooks, and OS permissions.
Use the Native Rust Way below when you want boss-level bypass.
# Clone and Launch
git clone https://github.com/DevsHero/shadowcrawl.git
cd shadowcrawl
docker compose -f docker-compose-local.yml up -d --build
For the 99.99% bypass (HITL), you must run natively (tested on macOS; Windows supported via a verified install guide below).
Build the MCP stdio server with the HITL feature enabled:
cd mcp-server
cargo build --release --bin shadowcrawl-mcp --features non_robot_search
This produces the local MCP binary at:
mcp-server/target/release/shadowcrawl-mcp
Prereqs:
- Install Brave Browser (recommended) or Google Chrome
- Grant Accessibility permissions (required for the emergency ESC hold-to-abort kill switch)
Windows:
- Verified setup guide (tested):
docs/WINDOWS_SETUP.md
ShadowCrawl can run as an MCP server in 2 modes:
- Docker MCP server: great for normal scraping/search tools, but cannot do HITL/GUI (
stealth_scrape). - Local MCP server (
shadowcrawl-local): required for HITL tools (a visible Brave/Chrome window).
Add this to your MCP config to use the Dockerized server:
{
"mcpServers": {
"shadowcrawl": {
"command": "docker",
"args": [
"compose",
"-f",
"/YOUR_PATH/shadowcrawl/docker-compose-local.yml",
"exec",
"-i",
"-T",
"shadowcrawl",
"shadowcrawl-mcp"
]
}
}
}
If you want to use HITL tools like stealth_scrape, configure a local MCP server that launches the native binary.
VS Code MCP config example ("servers" format):
Notes:
- MCP tool name:
stealth_scrape(internal handler + feature flag name:non_robot_search). - For HITL, prefer Brave + a real profile dir (
SHADOWCRAWL_RENDER_PROFILE_DIR) so cookies/sessions persist. - If you're running via Docker MCP server, HITL tools will either be unavailable or fail (no host GUI).
ShadowCrawl is built with β€οΈ by a Solo Developer for the open-source community. If this tool helped you bypass a $500/mo API, consider supporting its growth!
- Found a bug? Open an Issue.
- Want a feature? Submit a request!
- Love the project? Star the repo β or buy me a coffee to fuel more updates!
License: MIT. Free for personal and commercial use.
{ "servers": { "shadowcrawl-local": { "type": "stdio", "command": "env", "args": [ "RUST_LOG=info", // Optional (only if you run the full stack locally): "SEARXNG_URL=http://localhost:8890", "BROWSERLESS_URL=http://localhost:3010", "BROWSERLESS_TOKEN=mcp_stealth_session", "QDRANT_URL=http://localhost:6344", // Network + limits: "HTTP_TIMEOUT_SECS=30", "HTTP_CONNECT_TIMEOUT_SECS=10", "OUTBOUND_LIMIT=32", "MAX_CONTENT_CHARS=10000", "MAX_LINKS=100", // Optional (proxy manager): "IP_LIST_PATH=/YOUR_PATH/shadowcrawl/ip.txt", "PROXY_SOURCE_PATH=/YOUR_PATH/shadowcrawl/proxy_source.json", // HITL / stealth_scrape quality-of-life: // "SHADOWCRAWL_NON_ROBOT_AUTO_ALLOW=1", // "SHADOWCRAWL_RENDER_PROFILE_DIR=/YOUR_PROFILE_DIR", // "CHROME_EXECUTABLE=/Applications/Brave Browser.app/Contents/MacOS/Brave Browser", "/YOUR_PATH/shadowcrawl/mcp-server/target/release/shadowcrawl-mcp" ] } } }