Achievement: Achieving 88.23% accuracy on GSM8K using a single 1.5B parameter model (DeepSeek-R1-Distill-Qwen-1.5B) via dynamic test-time computation strategies, without any fine-tuning.
This repository contains the official implementation and technical report for DTTC (Dynamic Test-Time Computing).
DTTC is a lightweight framework designed to enhance the reasoning capabilities of small language models (SLMs). By introducing a Dynamic Parameter Pool (DPP), Ambiguity Statement Mapping (ASM), and Time-enhanced Penalty Decoding (TPD), we significantly bridge the gap between 1.5B models and larger state-of-the-art LLMs.
π Read the Technical Report (PDF)
Experiments were conducted on a single RTX 4090D (24GB).
| Benchmark | Baseline (CoT) | DTTC (Ours) | Improvement |
|---|---|---|---|
| GSM8K | 76.04% | 88.23% | πΊ +12.19% |
| SVAMP | 85.33% | 94.67% | πΊ +9.34% |
| ASDiv | 93.92% | 98.68% | πΊ +4.76% |
| AQUA | 63.39% | 81.49% | πΊ +18.10% |
The framework consists of three synergistic components:
-
Dynamic Parameter Pool (DPP): * Instead of a fixed temperature, we utilize a pool of diverse decoding configurations (e.g., balanced
$T=0.6$ , deterministic$T=0.5$ , exploratory$T=0.8$ ).- The system dynamically switches strategies via a queue-based mechanism until a valid answer format is generated.
-
Ambiguity Statement Mapping (ASM):
- A robust regex-based module to handle linguistic ambiguities in math word problems (e.g., "3 sprints 3 times a week").
- See
inference.pyfor the extensive pattern matching rules.
-
Time-enhanced Penalty Decoding (TPD):
- A custom
LogitsProcessorthat suppresses thought-switching tokens (e.g., "Alternatively", "However") during early generation stages to enforce reasoning consistency.
- A custom