Skip to content

An AI-ready search and fetch tool for arXiv papers, designed for both humans and AI agents.

License

Notifications You must be signed in to change notification settings

sonesuke/arxiv-cli

Repository files navigation

arXiv CLI - AI-ready

An AI-ready search and fetch tool for arXiv papers, designed for both humans and AI agents.

Features

  • Search papers by free-text query.
  • Fetch paper details by arXiv ID.
  • Formatted JSON output including description_paragraphs (extracted from PDF).
  • Pagination support via --limit option.
  • Date filtering with --before and --after.
  • Raw PDF download with --raw flag.
  • Headless mode by default; use --head to show the browser.
  • Model Context Protocol (MCP) support to integrate with AI agents.
  • Robust formatting: Uses structured JSON for easy machine consumption.

Installation

Easy Install (Recommended)

Linux & macOS:

curl -fsSL https://raw.githubusercontent.com/sonesuke/arxiv-cli/main/install.sh | bash

Note: On Linux, this installs to ~/.local/bin without requiring sudo. Make sure ~/.local/bin is in your PATH.

Windows (PowerShell):

irm https://raw.githubusercontent.com/sonesuke/arxiv-cli/main/install.ps1 | iex

From Source (Cargo)

If you have Rust installed, you can build from source:

cargo install --path .

Model Context Protocol (MCP)

arxiv-cli supports the Model Context Protocol, allowing AI agents (like Claude Desktop) to search and fetch papers directly.

Available Tools

Tool Name Description Parameters
search_papers Search arXiv for papers matching a free-text query. query (required), limit, before, after
fetch_paper Fetch details (metadata & PDF text) of a specific paper. paper_id (required, e.g., "2512.04518")

Usage

To start the MCP server over stdio:

arxiv-cli mcp

Configuration for Claude Desktop

Add this to your claude_desktop_config.json:

{
  "mcpServers": {
    "arxiv-cli": {
      "command": "/path/to/arxiv-cli",
      "args": ["mcp"]
    }
  }
}

CLI Usage

CLI Commands

Command Description Example
search Search for papers matching a query. arxiv-cli search --query "LLM" --limit 10
fetch Fetch a single paper's metadata and text. arxiv-cli fetch 2512.04518
config Manage configuration settings. arxiv-cli config list
mcp Start the MCP server over stdio. arxiv-cli mcp

Search by query

Search for papers matching a query.

arxiv-cli search --query "LLM" --limit 10

Filter by date

# Papers submitted after 2024-01-01
arxiv-cli search --query "machine learning" --after "2024-01-01"

# Papers submitted between 2023-01-01 and 2023-12-31
arxiv-cli search --query "blockchain" --after "2023-01-01" --before "2023-12-31"

Fetch paper details

Fetch a single paper's metadata and extracted text.

arxiv-cli fetch 2512.04518

Fetch raw PDF

Download the PDF file directly to stdout.

arxiv-cli fetch 2512.04518 --raw > paper.pdf

Show the browser window

Useful for debugging.

arxiv-cli search --query "AI" --head

Configuration

This tool relies on a compatible Chrome/Chromium installation for scraping. Config file location:

  • macOS: ~/Library/Application Support/com.sonesuke.arxiv-cli/config.json
  • Linux: ~/.config/arxiv-cli/config.json
  • Windows: C:\Users\{User}\AppData\Roaming\sonesuke\arxiv-cli\config\config.json

Manage Configuration

You can manage the configuration via CLI:

# List current configuration
arxiv-cli config list

# Set a value
arxiv-cli config set headless false
arxiv-cli config set browser_path "/usr/bin/google-chrome"

# Get a value
arxiv-cli config get headless

# Show config file path
arxiv-cli config path

Chrome Arguments

For Docker/devcontainer environments, you may need to pass additional Chrome flags:

{
  "browser_path": "/usr/bin/google-chrome",
  "chrome_args": [
    "--no-sandbox",
    "--disable-setuid-sandbox",
    "--disable-gpu"
  ]
}

Note: When the CI environment variable is set, the following flags are automatically added:

  • --disable-gpu
  • --no-sandbox
  • --disable-setuid-sandbox

Implementation Details

  • Stack: Rust, Clap, Custom CDP Client (tokio-tungstenite), Serde, Reqwest, PDF-Extract, mcp-sdk-rs.
  • Search Scraping: Uses a custom Chrome DevTools Protocol (CDP) client to handle dynamic search result loaded via JS.
  • PDF Extraction: Downloads the PDF and extracts text using pdf-extract, splitting it into structured paragraphs (description_paragraphs).

License

MIT

About

An AI-ready search and fetch tool for arXiv papers, designed for both humans and AI agents.

Topics

Resources

License

Stars

Watchers

Forks

Packages

No packages published

Contributors 3

  •  
  •  
  •