native-devtools-mcp

Give your AI agent "eyes" and "hands" for native desktop applications.

A Model Context Protocol (MCP) server that provides Computer Use capabilities: screenshots, OCR, input simulation, and window management.

Features • Installation • For AI Agents • Permissions

macOS	Windows

🔍 Search Keywords

MCP, Model Context Protocol, computer use, desktop automation, UI automation, RPA, screenshots, OCR, screen reading, mouse, keyboard, macOS, Windows, native-devtools-mcp.

🚀 Features

👀 Computer Vision: Capture screenshots of screens, windows, or specific regions. Includes built-in OCR (text recognition) to "read" the screen.
🖱️ Input Simulation: Click, drag, scroll, and type text naturally. Supports global coordinates and window-relative actions.
🪟 Window Management: List open windows, find applications, and bring them to focus.
🧩 Template Matching: Find non-text UI elements (icons, shapes) using load_image + find_image, returning precise click coordinates.
🔒 Local & Private: 100% local execution. No screenshots or data are ever sent to external servers.
🔌 Dual-Mode Interaction:
1. Visual/Native: Works with any app via screenshots & coordinates (Universal).
2. AppDebugKit: Deep integration for supported apps to inspect the UI tree (DOM-like structure).

🤖 For AI Agents (LLMs)

This MCP server is designed to be highly discoverable and usable by AI models (Claude, Gemini, GPT).

📄 Read AGENTS.md: A compact, token-optimized technical reference designed specifically for ingestion by LLMs. It contains intent definitions, schema examples, and reasoning patterns.

Core Capabilities for System Prompts:

take_screenshot: The "eyes". Returns images + layout metadata + text locations (OCR).
click / type_text: The "hands". Interacts with the system based on visual feedback.
find_text: A shortcut to find text on screen and get its coordinates immediately.
load_image / find_image: Template matching for non-text UI elements (icons, shapes), returning screen coordinates for clicking.

📦 Installation (macOS + Windows)

The install steps are identical on macOS and Windows.

Option 1: Run with `npx` (no install needed)

npx -y native-devtools-mcp

Option 2: Global install

npm install -g native-devtools-mcp

Option 3: Build from source (Rust)

Click to expand build instructions

git clone https://github.com/sh3ll3x3c/native-devtools-mcp
cd native-devtools-mcp
cargo build --release
# Binary: ./target/release/native-devtools-mcp

⚙️ Configuration

macOS Configuration

Claude Desktop config file: ~/Library/Application Support/Claude/claude_desktop_config.json

Claude Desktop requires the signed app bundle (npx/npm will not work due to Gatekeeper):

Download NativeDevtools-X.X.X.dmg from GitHub Releases
Open the DMG and drag NativeDevtools.app to /Applications
Configure Claude Desktop:

{
  "mcpServers": {
    "native-devtools": {
      "command": "/Applications/NativeDevtools.app/Contents/MacOS/native-devtools-mcp"
    }
  }
}

Restart Claude Desktop - it will prompt for Screen Recording and Accessibility permissions for NativeDevtools

Note: Claude Code (CLI) can use either the signed app or npx - both work.

Windows Configuration

Claude Desktop config file: %APPDATA%\Claude\claude_desktop_config.json

Configuration JSON (Windows and macOS CLI)

For Windows (or macOS with Claude Code CLI):

{
  "mcpServers": {
    "native-devtools": {
      "command": "npx",
      "args": ["-y", "native-devtools-mcp"]
    }
  }
}

Note: Requires Node.js 18+ installed.

For Claude Code (CLI) Users

To avoid approving every single tool call (clicks, screenshots), you can add this wildcard permission to your project's settings or global config:

File: .claude/settings.local.json (or similar)

{
  "permissions": {
    "allow": ["mcp__native-devtools__*"]
  }
}

🔍 Two Approaches to Interaction

We provide two ways for agents to interact, allowing them to choose the best tool for the job.

1. The "Visual" Approach (Universal)

Best for: 99% of apps (Electron, Qt, Games, Browsers).

How it works: The agent takes a screenshot, analyzes it visually (or uses OCR), and clicks at coordinates.
Tools: take_screenshot, find_text, click, type_text (plus load_image / find_image for icons and shapes).
Example: "Click the button that looks like a gear icon." → use find_image with a gear template.

2. The "Structural" Approach (AppDebugKit)

Best for: Apps specifically instrumented with our AppDebugKit library (mostly for developers testing their own apps).

How it works: The agent connects to a debug port and queries the UI tree (like HTML DOM).
Tools: app_connect, app_query, app_click.
Example: app_click(element_id="submit-button").

🧩 Template Matching (find_image)

Use find_image when the target is not text (icons, toggles, custom controls) and OCR or find_text cannot identify it.

Typical flow:

take_screenshot(app_name="MyApp") → screenshot_id
load_image(path="/path/to/icon.png") → template_id
find_image(screenshot_id="...", template_id="...") → matches with screen_x/screen_y
click(x=..., y=...)

Fast vs Accurate:

fast (default): uses downscaling and early-exit for speed.
accurate: uses full-resolution, wider scale search, and smaller stride for thorough matching.

Optional inputs like mask_id, search_region, scales, and rotations can improve precision and performance.

🏗️ Architecture

graph TD
    Client[Claude / LLM Client] <-->|JSON-RPC 2.0| Server[native-devtools-mcp]
    Server -->|Direct API| Sys[System APIs]
    Server -->|WebSocket| Debug[AppDebugKit]

    subgraph "Your Machine"
        Sys -->|Screen/OCR| macOS[CoreGraphics / Vision]
        Sys -->|Input| Win[Win32 / SendInput]
        Debug -.->|Inspect| App[Target App]
    end

🔧 Technical Details (Under the Hood)

OS	Feature	API Used
macOS	Screenshots	`screencapture` (CLI)
	Input	`CGEvent` (CoreGraphics)
	OCR	`VNRecognizeTextRequest` (Vision Framework)
Windows	Screenshots	`BitBlt` (GDI)
	Input	`SendInput` (Win32)
	OCR	`Windows.Media.Ocr` (WinRT)

Screenshot Coordinate Precision

Screenshots include metadata for accurate coordinate conversion:

screenshot_origin_x/y: Screen-space origin of the captured area (in points)
screenshot_scale: Display scale factor (e.g., 2.0 for Retina displays)
screenshot_pixel_width/height: Actual pixel dimensions of the image
screenshot_window_id: Window ID (for window captures)

Coordinate conversion:

screen_x = screenshot_origin_x + (pixel_x / screenshot_scale)
screen_y = screenshot_origin_y + (pixel_y / screenshot_scale)

Implementation notes:

Window captures (macOS): Uses screencapture -o which excludes window shadow. The captured image dimensions match kCGWindowBounds × scale exactly, ensuring click coordinates derived from screenshots land on intended UI elements.
Region captures: Origin coordinates are aligned to integers to match the actual captured area.

🛡️ Privacy, Safety & Best Practices

🔒 Privacy First

100% Local: All processing (screenshots, OCR, logic) happens on your device.
No Cloud: Images are never uploaded to any third-party server by this tool.
Open Source: You can inspect the code to verify exactly what it does.

⚠️ Operational Safety

Hands Off: When the agent is "driving" (clicking/typing), do not move your mouse or type.
- Why? Real hardware inputs can conflict with the simulated ones, causing clicks to land in the wrong place.
Focus Matters: Ensure the window you want the agent to use is visible. If a popup steals focus, the agent might type into the wrong window unless it checks first.

🔐 Required Permissions (macOS)

On macOS, you must grant permissions to the host application (e.g., Terminal, VS Code, Claude Desktop) to allow screen recording and input control.

Screen Recording: Required for take_screenshot.
- System Settings > Privacy & Security > Screen Recording
Accessibility: Required for click, type_text, scroll.
- System Settings > Privacy & Security > Accessibility

Restart Required: After granting permissions, you must fully quit and restart the host application.

🪟 Windows Notes

Works out of the box on Windows 10/11.

Uses standard Win32 APIs (GDI, SendInput).
OCR uses the built-in Windows Media OCR engine (offline).
Note: Cannot interact with "Run as Administrator" windows unless the MCP server itself is also running as Administrator.

Name		Name	Last commit message	Last commit date
Latest commit History 80 Commits
.github/workflows		.github/workflows
.vscode		.vscode
AppDebugKit		AppDebugKit
TestApp		TestApp
benches		benches
npm		npm
packaging/macos/NativeDevtools.app/Contents		packaging/macos/NativeDevtools.app/Contents
scripts		scripts
src		src
tests		tests
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
demo.gif		demo.gif
windows-demo-1.gif		windows-demo-1.gif

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

native-devtools-mcp

🔍 Search Keywords

🚀 Features

🤖 For AI Agents (LLMs)

📦 Installation (macOS + Windows)

Option 1: Run with `npx` (no install needed)

Option 2: Global install

Option 3: Build from source (Rust)

⚙️ Configuration

macOS Configuration

Windows Configuration

Configuration JSON (Windows and macOS CLI)

For Claude Code (CLI) Users

🔍 Two Approaches to Interaction

1. The "Visual" Approach (Universal)

2. The "Structural" Approach (AppDebugKit)

🧩 Template Matching (find_image)

🏗️ Architecture

Screenshot Coordinate Precision

🛡️ Privacy, Safety & Best Practices

🔒 Privacy First

⚠️ Operational Safety

🔐 Required Permissions (macOS)

🪟 Windows Notes

📜 License

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

sh3ll3x3c/native-devtools-mcp

Folders and files

Latest commit

History

Repository files navigation

native-devtools-mcp

🔍 Search Keywords

🚀 Features

🤖 For AI Agents (LLMs)

📦 Installation (macOS + Windows)

Option 1: Run with npx (no install needed)

Option 2: Global install

Option 3: Build from source (Rust)

⚙️ Configuration

macOS Configuration

Windows Configuration

Configuration JSON (Windows and macOS CLI)

For Claude Code (CLI) Users

🔍 Two Approaches to Interaction

1. The "Visual" Approach (Universal)

2. The "Structural" Approach (AppDebugKit)

🧩 Template Matching (find_image)

🏗️ Architecture

Screenshot Coordinate Precision

🛡️ Privacy, Safety & Best Practices

🔒 Privacy First

⚠️ Operational Safety

🔐 Required Permissions (macOS)

🪟 Windows Notes

📜 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Option 1: Run with `npx` (no install needed)

Packages