Skip to content
@hud-evals

hud

HUD

The platform for evaluations and RL environments for agents.
Wrap real software as environments, run benchmarks, and train with RL – locally or at scale.

PyPI version License Docs Discord X Follow Shop

Are you an enterprise building agents?

📅 Hop on a call or email us at 📧 founders@hud.ai


⚡️ Hello there, we are HUD!

HUD is an open-source platform for evaluating and training agents:

  • Turn any piece of software into an RL environment.
  • Run reproducible benchmarks and submit to public leaderboards.
  • Train agents with RL (GRPO) on top of those environments with any RL provider.
  • Inspect every tool call, observation, and reward in real time on hud.ai.

The core Python SDK lives in hud-evals/hud-python.

We love contributions!

Check our open hud-python issues and open pull requests for more information.


🧩 Highlights

  • 🧱 MCP environment skeleton – wrap any system as an MCP server so any agent can call it.
  • 📡 Live telemetry – traces for every trajectory and tool call, streamed to hud.ai.
  • 📊 Public benchmarks & leaderboards – OSWorld-Verified, SheetBench-50, and more at hud.ai/leaderboards.
  • 🌐 Cloud browsers – integrations for browser-based environments (AnchorBrowser, Steel, BrowserBase, etc.).
  • 🔁 Hot-reload dev loophud dev for iterating on environments without rebuilds.
  • 🧪 One-click RLhud rl to train GRPO policies on any HUD dataset.

Pinned Loading

  1. hud-python hud-python Public

    OSS RL environment + evals toolkit

    Python 250 61

Repositories

Showing 10 of 40 repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…