A CLI and microservice to fetch URLs and render them as:
- HTML: Rendered page content
- Markdown: Converted from HTML
- PNG screenshot: Full page capture
# Install globally
npm install -g web-capture
# Capture a URL as HTML (output to stdout)
web-capture https://example.com
# Capture as Markdown and save to file
web-capture https://example.com --format markdown --output page.md
# Take a screenshot
web-capture https://example.com --format png --output screenshot.png
# Start as API server
web-capture --serve
# Start server on custom port
web-capture --serve --port 8080- HTML: GET /html?url=
- Markdown: GET /markdown?url=
- PNG screenshot: GET /image?url=
npm install
# or
yarn installStart the API server:
web-capture --serve [--port <port>]| Option | Short | Description | Default |
|---|---|---|---|
--serve |
-s |
Start as HTTP API server | - |
--port |
-p |
Port to listen on | 3000 (or PORT env) |
Capture a URL directly:
web-capture <url> [options]| Option | Short | Description | Default |
|---|---|---|---|
--format |
-f |
Output format: html, markdown/md, image/png |
html |
--output |
-o |
Output file path | stdout (text) or auto-generated (images) |
--engine |
-e |
Browser engine: puppeteer, playwright |
puppeteer (or BROWSER_ENGINE env) |
# Capture HTML to stdout
web-capture https://example.com
# Capture Markdown to file
web-capture https://example.com -f markdown -o page.md
# Take screenshot with Playwright engine
web-capture https://example.com -f png -e playwright -o screenshot.png
# Pipe HTML to another command
web-capture https://example.com | grep "title"yarn dev- Start the development server with hot reloading using nodemonyarn start- Start the service using Docker Compose
yarn test- Run all unit testsyarn test:watch- Run tests in watch modeyarn test:e2e- Run end-to-end testsyarn test:e2e:docker- Run end-to-end tests against Docker containeryarn test:all- Run all tests including build and e2e tests
yarn build- Build and start the Docker container
yarn examples:python- Run Python example scriptsyarn examples:javascript- Run JavaScript example scriptsyarn examples- Run all examples (requires build)
yarn dev
curl http://localhost:3000/html?url=https://example.com# Build and run using Docker Compose
yarn start
# Or manually
docker build -t web-capture .
docker run -p 3000:3000 web-captureGET /html?url=<URL>&engine=<ENGINE>Returns the raw HTML content of the specified URL.
Parameters:
url(required): The URL to fetchengine(optional): Browser engine to use (puppeteerorplaywright). Default:puppeteer
Examples:
# Using default Puppeteer engine
curl http://localhost:3000/html?url=https://example.com
# Using Playwright engine
curl http://localhost:3000/html?url=https://example.com&engine=playwrightGET /markdown?url=<URL>Converts the HTML content of the specified URL to Markdown format.
GET /image?url=<URL>&engine=<ENGINE>Returns a PNG screenshot of the specified URL.
Parameters:
url(required): The URL to captureengine(optional): Browser engine to use (puppeteerorplaywright). Default:puppeteer
Examples:
# Using default Puppeteer engine
curl http://localhost:3000/image?url=https://example.com > screenshot.png
# Using Playwright engine
curl http://localhost:3000/image?url=https://example.com&engine=playwright > screenshot.pngweb-capture uses lino-arguments for unified configuration management. Configuration values are resolved with the following priority (highest to lowest):
- CLI arguments:
--port 8080 - Environment variables:
PORT=8080 - Custom configuration file:
--configuration path/to/custom.lenv - Default .lenv file:
.lenvin the project root - Built-in defaults
Create a .lenv file in your project root using Links Notation format:
# Server configuration
PORT: 3000
# Browser engine (puppeteer or playwright)
BROWSER_ENGINE: puppeteer
Specify a custom configuration file path:
web-capture --serve --configuration /path/to/custom.lenvAll configuration options support environment variables:
# Set port via environment variable
export PORT=8080
web-capture --serve
# Set browser engine
export BROWSER_ENGINE=playwright
web-capture https://example.com --format pngThe service supports both Puppeteer and Playwright browser engines:
- Puppeteer: Default engine, mature and well-tested
- Playwright: Alternative engine with similar capabilities
You can choose the engine using:
- CLI argument:
--engine playwright - Environment variable:
BROWSER_ENGINE=playwright - Configuration file:
BROWSER_ENGINE: playwrightin.lenv
Supported engine values:
puppeteerorpptr- Use Puppeteerplaywrightorpw- Use Playwright
The service is built with:
- Express.js for the web server
- Puppeteer and Playwright for headless browser automation and screenshots
- Turndown for HTML to Markdown conversion
- Jest for testing
- capture-website - Capture website screenshots with a simple API
- pageres - Capture screenshots of websites in various resolutions
- puppeteer - Headless Chrome Node.js API for browser automation and screenshots
- playwright - Cross-browser automation library
- turndown - HTML to Markdown converter written in JavaScript
- html-to-markdown - Go library to convert HTML to Markdown with support for entire websites
- markdowner - Advanced HTML to Markdown conversion tool
- pandoc - Universal document converter supporting HTML to Markdown
- scrape-it - Node.js scraper with a clean API
- ScreenshotOne - Developer-focused screenshot API with advanced features
- ScrapFly - Screenshot API with antibot protection and rotating proxies
- ScreenshotAPI.net - High-quality screenshot API with retina support
- ApiFlash - Chrome-based screenshot API with S3 integration
- Scrapingdog - Cost-effective screenshot and scraping solution
- URLBox - Website screenshot API
- site-shot.com - Free website screenshot service
- pikwy.com - Website thumbnail and screenshot generator
- screenshotmachine.com - Website screenshot service
- screenshot.guru - Simple screenshot service
- urltomarkdown.com - Convert URLs to Markdown format
- CaptureKit - API for HTML to Markdown conversion
- MarkItDown - Microsoft's open-source tool for converting various file formats to Markdown
- html-to-markdown (Python) - Rust-powered Python library for HTML to Markdown conversion
UNLICENSED