Skip to content

Releases: DevsHero/ShadowCrawl

Release v2.0.0-rc

14 Feb 14:12

Choose a tag to compare

🥷 ShadowCrawl v2.0.0-rc: The "Boss Slayer" Release

This release marks a major evolution from a simple scraper to a Sovereign Stealth Intelligence Engine. We’ve engineered v2.0.0-rc to shatter the "unscrapable" walls of enterprise anti-bot systems while giving full control back to the user.

💥 High-Fidelity "Cyborg" Extraction (Native Only)

The headline feature is the non_robot_search tool—our nuclear option for high-security targets.

  • 99.99% Bypass Success: Successfully tested against Cloudflare Turnstile, DataDome, Akamai, and LinkedIn gatekeepers.
  • HITL (Human-In-The-Loop): A unique hybrid approach that bridges your native Brave/Chrome session. If a human can see it, ShadowCrawl can scrape it.
  • ⚠️ Docker Limitation: non_robot_search requires a native GUI environment to launch the browser. It is NOT supported within Docker. For full HITL power, run the binary natively on macOS (primary target) or Linux.

🚀 Performance & Stability Hardening

  • Global Mutex Locking: Prevents browser profile race conditions. Tools now queue and execute sequentially to avoid "Profile Locked" errors.
  • Deep Metadata Fallback: Our engine now digs into HTML embedded IDs (like LinkedIn urn:li:jobPosting) when standard JSON-LD is missing.
  • Smart Watchdog: Implemented OS-level process management to guarantee no "Zombie" browser processes remain after a scrape.

📂 Verified Evidence (Proof of Work)

Check the new sample-results/proof/ directory for live scrape artifacts generated by this version:

  • LinkedIn: Verified bypass of job posting gatekeepers.
  • nowsecure.nl: Successful extraction through Cloudflare Turnstile.
  • cdiscount.com: Confirmed bypass of DataDome behavioral blocks.

📦 Quick Start & Deployment

1. Standard Stack (Automated Scrapes)
Best for server-side deployment and standard JS-heavy sites via Browserless.

docker compose -f docker-compose-local.yml up -d --build
2. Stealth Stack (HITL / Anti-Bot Bypass)
Required for non_robot_search. Run natively on your host machine to allow GUI interaction.
cd mcp-server
cargo run --release --features non_robot_search

🔗 Latest Changes
Full Commit Details: 370f6fe

Highlights: Implement stealth techniques for Chromiumoxide, safety kill switches, and evidence documentation.

🙏 Support the Mission
Built by a Solo Developer for the open-source community. Help us grow:

Star the repo ⭐

Become a Sponsor 💖

Full Changelog: https://github.com/DevsHero/ShadowCrawl/compare/v1.0.0...v2.0.0-rc

Release v1.1.0

13 Feb 13:52

Choose a tag to compare

Docker Image

Published to: ghcr.io/devshero/shadowcrawl:1.1.0

release: v1.1.0 quality/runtime hardening

Changes

  • centralize shared content quality policy helpers

  • add runtime quality_mode support across structured/scrape/crawl/extract paths

  • implement strict_proxy_health with non-strict diagnostic fallback

  • improve proxy connection testing with HEAD->GET fallback

  • harden scrape outputs (raw HTML omitted by default in JSON/batch)

  • enrich search output with search_id, dedupe, metadata, published_at

  • improve stdio/agent-mode resilience for research_history when memory unavailable

  • remove leaked registry tokens and local test artifacts; harden gitignore

  • Build and push Docker image

  • ShadowCrawl MCP Server v1.1.0

Pull the image:

docker pull ghcr.io/devshero/shadowcrawl:1.1.0

Full Changelog: v1.0.1...v1.1.0

Release v1.0.1

13 Feb 10:04

Choose a tag to compare

Docker Image

Published to: ghcr.io/devshero/shadowcrawl:1.0.1

Changes

  • Rebrand ShadowCrawl
  • Build and push Docker image
  • ShadowCrawl MCP Server v1.0.1

Pull the image:

docker pull ghcr.io/devshero/shadowcrawl:1.0.1

Full Changelog: v0.3.0...v1.0.1

Release v1.0.0

12 Feb 08:41

Choose a tag to compare

🚀 Release v1.0.0 (General Availability)

The search-scrape project has evolved. v1.0.0 GA delivers a robust, self-hosted alternative to premium scraping APIs, specifically engineered for AI Agent workflows and MCP-native environments.

📦 Docker Image

The official image is now available on GitHub Container Registry:
ghcr.io/DevsHero/search-scrape:1.0.0

💎 Key Features & Enhancements (Since v0.3.0)
Unified MCP Surface: 100% parity between HTTP and stdio transport layers. No more tool-drift between CLI and IDE usage.

The Power of 8: Fully validated and optimized tool catalog:

search_web & search_structured (Federated Intelligence)

scrape_url & scrape_batch (Stealth Extraction)

crawl_website (Recursive Discovery)

extract_structured (Schema-driven Heuristics)

research_history (Long-term Semantic Memory via Qdrant)

proxy_manager (Autonomous Proxy Rotation)

Production Reliability: Hardened process lifecycle and improved shutdown handling to prevent premature task cancellation in VS Code/Cursor.

Enterprise-Grade Proxy Ops: Simplified proxy management using ip.txt as a canonical source with improved normalization.

Ready-to-Use Documentation: New specialized guides for IDE Setup and SearXNG Tuning to ensure 99.9% success rates.

⚡ Quick Start
Get up and running in under 60 seconds:

git clone https://github.com/DevsHero/AnvilSynth.git && cd AnvilSynth docker compose -f docker-compose-local.yml up -d --build

Verify Service Health

curl -s http://localhost:5001/health

Discover Tool Capabilities

curl -s http://localhost:5001/mcp/tools
📝 Important Notes
Stealth Performance: While v1.0.0 includes advanced anti-bot measures, highly protected sites (e.g., LinkedIn/PerimeterX) perform best with high-quality residential proxies.

Validation: This release has passed the RELEASE_READINESS_2026-02-12 validation suite

Pull the image:

docker pull ghcr.io/devshero/search-scrape:latest

Release v0.3.0

12 Feb 09:16

Choose a tag to compare

Release v0.3.0 Pre-release
Pre-release

docker pull ghcr.io/devshero/search-scrape:0.3.0

Release v0.2.0

10 Feb 06:41

Choose a tag to compare

v0.2.0

Integrated key enhancements while maintaining original performance:

  • crawl_website: Added recursive crawling for deep content extraction.
  • scrape_batch: Added concurrent scraping for better efficiency.
  • extract_structured: Added LLM-based structured JSON output.

Special thanks to @lutfi238 for the excellent work on these features!
Ref: https://github.com/lutfi238/search-scrape

  • Update documents
  • Update cargo packages
  • Github action workflows

Docker Image

Published to: ghcr.io/DevsHero/search-scrape:0.2.0

Changes

  • Build and push Docker image
  • Search-Scrape MCP Server v0.2.0

Pull the image:

docker pull ghcr.io/DevsHero/search-scrape:0.2.0