An AI-ready search and fetch tool for arXiv papers, designed for both humans and AI agents.
- Search papers by free-text query.
- Fetch paper details by arXiv ID.
- Formatted JSON output including
description_paragraphs(extracted from PDF). - Pagination support via
--limitoption. - Date filtering with
--beforeand--after. - Raw PDF download with
--rawflag. - Headless mode by default; use
--headto show the browser. - Model Context Protocol (MCP) support to integrate with AI agents.
- Robust formatting: Uses structured JSON for easy machine consumption.
Linux & macOS:
curl -fsSL https://raw.githubusercontent.com/sonesuke/arxiv-cli/main/install.sh | bashNote: On Linux, this installs to
~/.local/binwithout requiringsudo. Make sure~/.local/binis in yourPATH.
Windows (PowerShell):
irm https://raw.githubusercontent.com/sonesuke/arxiv-cli/main/install.ps1 | iexIf you have Rust installed, you can build from source:
cargo install --path .arxiv-cli supports the Model Context Protocol, allowing AI agents (like Claude Desktop) to search and fetch papers directly.
| Tool Name | Description | Parameters |
|---|---|---|
search_papers |
Search arXiv for papers matching a free-text query. | query (required), limit, before, after |
fetch_paper |
Fetch details (metadata & PDF text) of a specific paper. | paper_id (required, e.g., "2512.04518") |
To start the MCP server over stdio:
arxiv-cli mcpAdd this to your claude_desktop_config.json:
{
"mcpServers": {
"arxiv-cli": {
"command": "/path/to/arxiv-cli",
"args": ["mcp"]
}
}
}| Command | Description | Example |
|---|---|---|
search |
Search for papers matching a query. | arxiv-cli search --query "LLM" --limit 10 |
fetch |
Fetch a single paper's metadata and text. | arxiv-cli fetch 2512.04518 |
config |
Manage configuration settings. | arxiv-cli config list |
mcp |
Start the MCP server over stdio. | arxiv-cli mcp |
Search for papers matching a query.
arxiv-cli search --query "LLM" --limit 10# Papers submitted after 2024-01-01
arxiv-cli search --query "machine learning" --after "2024-01-01"
# Papers submitted between 2023-01-01 and 2023-12-31
arxiv-cli search --query "blockchain" --after "2023-01-01" --before "2023-12-31"Fetch a single paper's metadata and extracted text.
arxiv-cli fetch 2512.04518Download the PDF file directly to stdout.
arxiv-cli fetch 2512.04518 --raw > paper.pdfUseful for debugging.
arxiv-cli search --query "AI" --headThis tool relies on a compatible Chrome/Chromium installation for scraping. Config file location:
- macOS:
~/Library/Application Support/com.sonesuke.arxiv-cli/config.json - Linux:
~/.config/arxiv-cli/config.json - Windows:
C:\Users\{User}\AppData\Roaming\sonesuke\arxiv-cli\config\config.json
You can manage the configuration via CLI:
# List current configuration
arxiv-cli config list
# Set a value
arxiv-cli config set headless false
arxiv-cli config set browser_path "/usr/bin/google-chrome"
# Get a value
arxiv-cli config get headless
# Show config file path
arxiv-cli config pathFor Docker/devcontainer environments, you may need to pass additional Chrome flags:
{
"browser_path": "/usr/bin/google-chrome",
"chrome_args": [
"--no-sandbox",
"--disable-setuid-sandbox",
"--disable-gpu"
]
}Note: When the CI environment variable is set, the following flags are automatically added:
--disable-gpu--no-sandbox--disable-setuid-sandbox
- Stack: Rust, Clap, Custom CDP Client (
tokio-tungstenite), Serde, Reqwest, PDF-Extract,mcp-sdk-rs. - Search Scraping: Uses a custom Chrome DevTools Protocol (CDP) client to handle dynamic search result loaded via JS.
- PDF Extraction: Downloads the PDF and extracts text using
pdf-extract, splitting it into structured paragraphs (description_paragraphs).
MIT