arXiv CLI - AI-ready

An AI-ready search and fetch tool for arXiv papers, designed for both humans and AI agents.

Features

Search papers by free-text query.
Fetch paper details by arXiv ID.
Formatted JSON output including description_paragraphs (extracted from PDF).
Pagination support via --limit option.
Date filtering with --before and --after.
Raw PDF download with --raw flag.
Headless mode by default; use --head to show the browser.
Model Context Protocol (MCP) support to integrate with AI agents.
Robust formatting: Uses structured JSON for easy machine consumption.

Installation

Easy Install (Recommended)

Linux & macOS:

curl -fsSL https://raw.githubusercontent.com/sonesuke/arxiv-cli/main/install.sh | bash

Note: On Linux, this installs to ~/.local/bin without requiring sudo. Make sure ~/.local/bin is in your PATH.

Windows (PowerShell):

irm https://raw.githubusercontent.com/sonesuke/arxiv-cli/main/install.ps1 | iex

From Source (Cargo)

If you have Rust installed, you can build from source:

cargo install --path .

Model Context Protocol (MCP)

arxiv-cli supports the Model Context Protocol, allowing AI agents (like Claude Desktop) to search and fetch papers directly.

Available Tools

Tool Name	Description	Parameters
`search_papers`	Search arXiv for papers matching a free-text query.	`query` (required), `limit`, `before`, `after`
`fetch_paper`	Fetch details (metadata & PDF text) of a specific paper.	`paper_id` (required, e.g., "2512.04518")

Usage

To start the MCP server over stdio:

arxiv-cli mcp

Configuration for Claude Desktop

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "arxiv-cli": {
      "command": "/path/to/arxiv-cli",
      "args": ["mcp"]
    }
  }
}

CLI Usage

CLI Commands

Command	Description	Example
`search`	Search for papers matching a query.	`arxiv-cli search --query "LLM" --limit 10`
`fetch`	Fetch a single paper's metadata and text.	`arxiv-cli fetch 2512.04518`
`config`	Manage configuration settings.	`arxiv-cli config list`
`mcp`	Start the MCP server over stdio.	`arxiv-cli mcp`

Search by query

Search for papers matching a query.

arxiv-cli search --query "LLM" --limit 10

Filter by date

# Papers submitted after 2024-01-01
arxiv-cli search --query "machine learning" --after "2024-01-01"

# Papers submitted between 2023-01-01 and 2023-12-31
arxiv-cli search --query "blockchain" --after "2023-01-01" --before "2023-12-31"

Fetch paper details

Fetch a single paper's metadata and extracted text.

arxiv-cli fetch 2512.04518

Fetch raw PDF

Download the PDF file directly to stdout.

arxiv-cli fetch 2512.04518 --raw > paper.pdf

Show the browser window

Useful for debugging.

arxiv-cli search --query "AI" --head

Configuration

This tool relies on a compatible Chrome/Chromium installation for scraping. Config file location:

macOS: ~/Library/Application Support/com.sonesuke.arxiv-cli/config.json
Linux: ~/.config/arxiv-cli/config.json
Windows: C:\Users\{User}\AppData\Roaming\sonesuke\arxiv-cli\config\config.json

Manage Configuration

You can manage the configuration via CLI:

# List current configuration
arxiv-cli config list

# Set a value
arxiv-cli config set headless false
arxiv-cli config set browser_path "/usr/bin/google-chrome"

# Get a value
arxiv-cli config get headless

# Show config file path
arxiv-cli config path

Chrome Arguments

For Docker/devcontainer environments, you may need to pass additional Chrome flags:

{
  "browser_path": "/usr/bin/google-chrome",
  "chrome_args": [
    "--no-sandbox",
    "--disable-setuid-sandbox",
    "--disable-gpu"
  ]
}

Note: When the CI environment variable is set, the following flags are automatically added:

--disable-gpu
--no-sandbox
--disable-setuid-sandbox

Implementation Details

Stack: Rust, Clap, Custom CDP Client (tokio-tungstenite), Serde, Reqwest, PDF-Extract, mcp-sdk-rs.
Search Scraping: Uses a custom Chrome DevTools Protocol (CDP) client to handle dynamic search result loaded via JS.
PDF Extraction: Downloads the PDF and extracts text using pdf-extract, splitting it into structured paragraphs (description_paragraphs).

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
agents/pr-healer		agents/pr-healer
e2e		e2e
src		src
.gitignore		.gitignore
AGENTS.md		AGENTS.md
CHANGELOG.md		CHANGELOG.md
CLAUDE.md		CLAUDE.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
Cross.toml		Cross.toml
LICENSE.md		LICENSE.md
README.md		README.md
install.ps1		install.ps1
install.sh		install.sh
mise.toml		mise.toml
rustfmt.toml		rustfmt.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

arXiv CLI - AI-ready

Features

Installation

Easy Install (Recommended)

From Source (Cargo)

Model Context Protocol (MCP)

Available Tools

Usage

Configuration for Claude Desktop

CLI Usage

CLI Commands

Search by query

Filter by date

Fetch paper details

Fetch raw PDF

Show the browser window

Configuration

Manage Configuration

Chrome Arguments

Implementation Details

License

About

Uh oh!

Releases 9

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

sonesuke/arxiv-cli

Folders and files

Latest commit

History

Repository files navigation

arXiv CLI - AI-ready

Features

Installation

Easy Install (Recommended)

From Source (Cargo)

Model Context Protocol (MCP)

Available Tools

Usage

Configuration for Claude Desktop

CLI Usage

CLI Commands

Search by query

Filter by date

Fetch paper details

Fetch raw PDF

Show the browser window

Configuration

Manage Configuration

Chrome Arguments

Implementation Details

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 9

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages