hud

The platform for evaluations and RL environments for agents.
Wrap real software as environments, run benchmarks, and train with RL – locally or at scale.

Are you an enterprise building agents?

📅 Hop on a call or email us at 📧 founders@hud.ai

⚡️ Hello there, we are HUD!

HUD is an open-source platform for evaluating and training agents:

Turn any piece of software into an RL environment.
Run reproducible benchmarks and submit to public leaderboards.
Train agents with RL (GRPO) on top of those environments with any RL provider.
Inspect every tool call, observation, and reward in real time on hud.ai.

The core Python SDK lives in hud-evals/hud-python.

We love contributions!

Check our open hud-python issues and open pull requests for more information.

🧩 Highlights

🧱 MCP environment skeleton – wrap any system as an MCP server so any agent can call it.
📡 Live telemetry – traces for every trajectory and tool call, streamed to hud.ai.
📊 Public benchmarks & leaderboards – OSWorld-Verified, SheetBench-50, and more at hud.ai/leaderboards.
🌐 Cloud browsers – integrations for browser-based environments (AnchorBrowser, Steel, BrowserBase, etc.).
🔁 Hot-reload dev loop – hud dev for iterating on environments without rebuilds.
🧪 One-click RL – hud rl to train GRPO policies on any HUD dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hud

Are you an enterprise building agents?

⚡️ Hello there, we are HUD!

🧩 Highlights

Pinned Loading

Repositories

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

People

Top languages

Uh oh!

Most used topics

Uh oh!