web-capture

A CLI and microservice to fetch URLs and render them as:

HTML: Rendered page content
Markdown: Converted from HTML
PNG screenshot: Full page capture

Quick Start

CLI Usage

# Install globally
npm install -g web-capture

# Capture a URL as HTML (output to stdout)
web-capture https://example.com

# Capture as Markdown and save to file
web-capture https://example.com --format markdown --output page.md

# Take a screenshot
web-capture https://example.com --format png --output screenshot.png

# Start as API server
web-capture --serve

# Start server on custom port
web-capture --serve --port 8080

API Endpoints (Server Mode)

HTML: GET /html?url=
Markdown: GET /markdown?url=
PNG screenshot: GET /image?url=

Installation

npm install
# or
yarn install

CLI Reference

Server Mode

Start the API server:

web-capture --serve [--port <port>]

Option	Short	Description	Default
`--serve`	`-s`	Start as HTTP API server	-
`--port`	`-p`	Port to listen on	3000 (or PORT env)

Capture Mode

Capture a URL directly:

web-capture <url> [options]

Option	Short	Description	Default
`--format`	`-f`	Output format: `html`, `markdown`/`md`, `image`/`png`	`html`
`--output`	`-o`	Output file path	stdout (text) or auto-generated (images)
`--engine`	`-e`	Browser engine: `puppeteer`, `playwright`	`puppeteer` (or BROWSER_ENGINE env)

Examples

# Capture HTML to stdout
web-capture https://example.com

# Capture Markdown to file
web-capture https://example.com -f markdown -o page.md

# Take screenshot with Playwright engine
web-capture https://example.com -f png -e playwright -o screenshot.png

# Pipe HTML to another command
web-capture https://example.com | grep "title"

Available Commands

Development

yarn dev - Start the development server with hot reloading using nodemon
yarn start - Start the service using Docker Compose

Testing

yarn test - Run all unit tests
yarn test:watch - Run tests in watch mode
yarn test:e2e - Run end-to-end tests
yarn test:e2e:docker - Run end-to-end tests against Docker container
yarn test:all - Run all tests including build and e2e tests

Building

yarn build - Build and start the Docker container

Examples

yarn examples:python - Run Python example scripts
yarn examples:javascript - Run JavaScript example scripts
yarn examples - Run all examples (requires build)

Usage

Local Development

yarn dev
curl http://localhost:3000/html?url=https://example.com

Docker

# Build and run using Docker Compose
yarn start

# Or manually
docker build -t web-capture .
docker run -p 3000:3000 web-capture

API Endpoints

HTML Endpoint

GET /html?url=<URL>&engine=<ENGINE>

Returns the raw HTML content of the specified URL.

Parameters:

url (required): The URL to fetch
engine (optional): Browser engine to use (puppeteer or playwright). Default: puppeteer

Examples:

# Using default Puppeteer engine
curl http://localhost:3000/html?url=https://example.com

# Using Playwright engine
curl http://localhost:3000/html?url=https://example.com&engine=playwright

Markdown Endpoint

GET /markdown?url=<URL>

Converts the HTML content of the specified URL to Markdown format.

Image Endpoint

GET /image?url=<URL>&engine=<ENGINE>

Returns a PNG screenshot of the specified URL.

Parameters:

url (required): The URL to capture
engine (optional): Browser engine to use (puppeteer or playwright). Default: puppeteer

Examples:

# Using default Puppeteer engine
curl http://localhost:3000/image?url=https://example.com > screenshot.png

# Using Playwright engine
curl http://localhost:3000/image?url=https://example.com&engine=playwright > screenshot.png

Configuration

web-capture uses lino-arguments for unified configuration management. Configuration values are resolved with the following priority (highest to lowest):

CLI arguments: --port 8080
Environment variables: PORT=8080
Custom configuration file: --configuration path/to/custom.lenv
Default .lenv file: .lenv in the project root
Built-in defaults

Configuration File (.lenv)

Create a .lenv file in your project root using Links Notation format:

# Server configuration
PORT: 3000

# Browser engine (puppeteer or playwright)
BROWSER_ENGINE: puppeteer

Using Custom Configuration Files

Specify a custom configuration file path:

web-capture --serve --configuration /path/to/custom.lenv

Environment Variables

All configuration options support environment variables:

# Set port via environment variable
export PORT=8080
web-capture --serve

# Set browser engine
export BROWSER_ENGINE=playwright
web-capture https://example.com --format png

Browser Engine Support

The service supports both Puppeteer and Playwright browser engines:

Puppeteer: Default engine, mature and well-tested
Playwright: Alternative engine with similar capabilities

You can choose the engine using:

CLI argument: --engine playwright
Environment variable: BROWSER_ENGINE=playwright
Configuration file: BROWSER_ENGINE: playwright in .lenv

Supported engine values:

puppeteer or pptr - Use Puppeteer
playwright or pw - Use Playwright

Development

The service is built with:

Express.js for the web server
Puppeteer and Playwright for headless browser automation and screenshots
Turndown for HTML to Markdown conversion
Jest for testing

Related Resources

NPM Packages & Libraries

Web Capture & Screenshot Tools

capture-website - Capture website screenshots with a simple API
pageres - Capture screenshots of websites in various resolutions
puppeteer - Headless Chrome Node.js API for browser automation and screenshots
playwright - Cross-browser automation library

HTML to Markdown Conversion

turndown - HTML to Markdown converter written in JavaScript
html-to-markdown - Go library to convert HTML to Markdown with support for entire websites
markdowner - Advanced HTML to Markdown conversion tool
pandoc - Universal document converter supporting HTML to Markdown

Web Scraping

scrape-it - Node.js scraper with a clean API

Screenshot API Services

Commercial Services

ScreenshotOne - Developer-focused screenshot API with advanced features
ScrapFly - Screenshot API with antibot protection and rotating proxies
ScreenshotAPI.net - High-quality screenshot API with retina support
ApiFlash - Chrome-based screenshot API with S3 integration
Scrapingdog - Cost-effective screenshot and scraping solution
URLBox - Website screenshot API

Free/Open Services

site-shot.com - Free website screenshot service
pikwy.com - Website thumbnail and screenshot generator
screenshotmachine.com - Website screenshot service
screenshot.guru - Simple screenshot service

HTML to Markdown Services

urltomarkdown.com - Convert URLs to Markdown format
CaptureKit - API for HTML to Markdown conversion

Alternative Tools

MarkItDown - Microsoft's open-source tool for converting various file formats to Markdown
html-to-markdown (Python) - Rust-powered Python library for HTML to Markdown conversion

License

UNLICENSED

Name		Name	Last commit message	Last commit date
Latest commit History 75 Commits
.changeset		.changeset
.github/workflows		.github/workflows
bin		bin
docs		docs
examples		examples
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
.jscpd.json		.jscpd.json
.lenv		.lenv
.prettierignore		.prettierignore
.prettierrc		.prettierrc
ARCHITECTURE.md		ARCHITECTURE.md
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
apply.sh		apply.sh
babel.config.cjs		babel.config.cjs
docker-compose.yml		docker-compose.yml
eslint.config.js		eslint.config.js
jest.config.mjs		jest.config.mjs
package-lock.json		package-lock.json
package.json		package.json
yarn.lock		yarn.lock

License

link-assistant/web-capture

Folders and files

Latest commit

History

Repository files navigation

web-capture

Quick Start

CLI Usage

API Endpoints (Server Mode)

Installation

CLI Reference

Server Mode

Capture Mode

Examples

Available Commands

Development

Testing

Building

Examples

Usage

Local Development

Docker

API Endpoints

HTML Endpoint

Markdown Endpoint

Image Endpoint

Configuration

Configuration File (.lenv)

Using Custom Configuration Files

Environment Variables

Browser Engine Support

Development

Related Resources

NPM Packages & Libraries

Web Capture & Screenshot Tools

HTML to Markdown Conversion

Web Scraping

Screenshot API Services

Commercial Services

Free/Open Services

HTML to Markdown Services

Alternative Tools

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 4

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages