Tau3 Benchmark Commands to run to replicate results. uv sync run_tau3bench.sh Command for a single run. uv run inspect eval tau3bench_task.py Command to view results. uv run inspect view --log-dir <log-directory>