Skip to content

Tau2 like bench with STRIPS planner as verifier and world simulator

Notifications You must be signed in to change notification settings

mpdmanash/tau3_bench

Repository files navigation

Tau3 Benchmark

Commands to run to replicate results.

  • uv sync
  • run_tau3bench.sh

Command for a single run.

  • uv run inspect eval tau3bench_task.py

Command to view results.

  • uv run inspect view --log-dir <log-directory>

About

Tau2 like bench with STRIPS planner as verifier and world simulator

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published