From c9e3c59d5bd2798e58c050827287d1b903be04e2 Mon Sep 17 00:00:00 2001 From: Claude Date: Tue, 30 Dec 2025 09:08:46 -0500 Subject: [PATCH] Fix typos and errors in README - Remove stray text 'massiveaxe' from heading - Fix runbooks example to reference correct _a_ script - Fix broken Qwen model link to point to Hugging Face --- README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 03fa4d2..7b1c1b0 100644 --- a/README.md +++ b/README.md @@ -20,7 +20,7 @@ pip install -r requirements.txt python solve_agent.py [options] ``` -### Required Argumentsmassiveaxe +### Required Arguments - `problems_dir`: Directory containing `.md` problem files @@ -82,12 +82,12 @@ Each final submission is written to its own markdown file in the following forma ## Runbooks ```bash -./runbooks/run_putnam_2025_b_nomos-1.sh # Putnam 2025 A problems +./runbooks/run_putnam_2025_a_nomos-1.sh # Putnam 2025 A problems ./runbooks/run_putnam_2025_b_nomos-1.sh # Putnam 2025 B problems ``` ## Results -When run on the Putnam 2025 with the [NousResearch/Nomos-1](https://huggingface.co/NousResearch/nomos-1) model, this reasoning harness achieves a score of **87/120** as graded by a human expert. Below we show a problem-wise comparison with [Qwen3/Qwen](Qwen/Qwen3-30B-A3B-Thinking-2507), which scores 24/120 under the same conditions. +When run on the Putnam 2025 with the [NousResearch/Nomos-1](https://huggingface.co/NousResearch/nomos-1) model, this reasoning harness achieves a score of **87/120** as graded by a human expert. Below we show a problem-wise comparison with [Qwen/Qwen3-30B-A3B-Thinking-2507](https://huggingface.co/Qwen/Qwen3-30B-A3B-Thinking-2507), which scores 24/120 under the same conditions.

image