Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

Yanqi Dai^1,2, Yuxiang Ji³, Xiao Zhang⁴, Yong Wang^2†, Guanhua Chen³, Xiangxiang Chu², Zhiwu Lu¹
¹Gaoling School of Artificial Intelligence, Renmin University of China ²AMAP, Alibaba Group ³Xiamen University ⁴Dalian University of Technology
^†Project lead.

News

[Jan 31, 2026]: 🛠️ Code and augmented data are released.
[Jan 29, 2026]: 🔥 Our paper is published on arXiv and HuggingFace, and becomes #1 Paper of the day in HuggingFace Daily Papers.
[Jan 26, 2026]: 🎉 Our paper is accepted by ICLR 2026.

Introduction

We propose a two-dual MathForge framework to improve mathematical reasoning by targeting harder questions from both perspectives, which comprises a Difficulty-Aware Group Policy Optimization (DGPO) algorithm and a Multi-Aspect Question Reformulation (MQR) strategy. Overall, MathForge forms a synergistic loop: MQR expands the data frontier, and DGPO effectively learns from the augmented data.

Difficulty-Aware Group Policy Optimization (DGPO)

Algorithmically, widely used Group Relative Policy Optimization (GRPO) suffers from an implicit imbalance where the magnitude of policy updates is lower for harder questions. DGPO first rectifies the implicit imbalance in GRPO via difficulty-balanced group advantage estimation (DGAE), and further prioritizes harder questions by difficulty-aware question-level weighting (DQW).

Multi-Aspect Question Reformulation (MQR)

Data-wise, augmentation approaches primarily rephrase questions to enhance diversity without systematically increasing intrinsic difficulty. MQR reformulates questions across multiple aspects to increase difficulty while maintaining the original gold answer. The core instructions for these strategies are as follows:

Main Results

The main comparative results on the MATH dataset using Qwen2.5-Math-7B are presented in the following table, demonstrating significant effectiveness of DGPO, MQR, and the overall MathForge framework.

Methods	AIME24	AIME25	AMC23	MATH500	Minerva	Olympiad	Avg. / $Δ_\text{GRPO}$
Base Model	12.19	4.79	35.23	48.60	15.07	16.33	22.04
GRPO	20.94	8.44	58.98	72.20	27.76	37.33	37.61
Dr.GRPO	21.04	8.23	58.59	72.05	28.58	35.89	37.40 (−0.21)
GPG	21.98	9.06	59.61	72.05	27.21	37.67	37.93 (+0.32)
DAPO	21.25	8.75	58.20	72.70	29.50	37.22	37.94 (+0.33)
GSPO	19.38	8.33	60.16	73.00	28.12	37.26	37.71 (+0.10)
GRPO-AD	21.56	9.48	59.06	73.25	29.14	37.07	38.26 (+0.65)
DGPO	23.85	10.21	61.02	74.25	31.07	38.33	39.79 (+2.18)
MQR	25.00	11.77	59.38	77.85	31.43	40.81	41.04 (+3.43)
MathForge	24.58	12.60	59.84	79.95	33.36	42.67	42.17 (+4.56)

Datasets

You can find the datasets constructed by this work in the following links:

MathForge_MATH-augmented: We augmented the training questions of the MATH dataset using our proposed MQR strategy, resulting in a dataset that is 4 times as large as the original training set.
MathForge_GEOQA-R1V-revised: We revised the GEOQA_R1V_Train_8K dataset by correcting unit errors in the original gold answers, reformatting the data, and randomly splitting it into training and test sets.
YanqiDai/MathForge_NuminaMath-CoT-sample80k: We randomly sampled 80k data from the NuminaMath-CoT dataset for supervised fine-tuning of DeepSeek-Math-7B.

Installation

Create a conda environment with the required dependencies:

conda create -n mathforge python=3.10
conda activate mathforge
pip install torch==2.6.0 torchvision==0.21.0 torchaudio==2.6.0  
pip install vllm==0.8.5.post1
pip install flash-attn==2.8.2 --no-build-isolation

Clone this repository and install open-r1 and trl from our modified branches:

git clone https://github.com/AMAP-ML/MathForge.git
# install open-r1
cd MathForge
pip install -e ".[dev]"
# install trl==0.20.0
cd trl-0.20.0
pip install -e .

Training

Please refer to the scripts in the scripts_mathforge folder for training various models using GRPO, DGPO, or MathForge (DGPO + MQR). To quickly start training, you can use the following command as an example:

bash scripts_mathforge/Qwen2.5-7B_MATH/run_mathforge.sh

Evaluation

For mathematical reasoning evaluation, we recommend using the Lighteval toolkit. You can use the following command to evaluate a trained model on multiple mathematical benchmarks, including AIME24, AIME25, AMC23, MATH500, Minerva, and Olympiad:

CUDA_VISIBLE_DEVICES=0 bash eval/evaluate_math.sh <model_path>

For geographical reasoning evaluation on the GEOQA dataset, you can use the following command:

CUDA_VISIBLE_DEVICES=0 bash eval/evaluate_geoqa.sh <model_path>

Acknowledgement

This work was built upon several open-source projects, including Open-R1, TRL, R1-V, MATH, and Lighteval. We express our gratitude to these projects.

Citation

If you find MathForge useful for your research and applications, please cite using this BibTeX:

@article{dai2026harder,
    title={Harder is better: Boosting mathematical reasoning via difficulty-aware grpo and multi-aspect question reformulation}, 
    author={Dai, Yanqi and Ji, Yuxiang and Zhang, Xiao and Wang, Yong and Chu, Xiangxiang and Lu, Zhiwu},
    journal={arXiv preprint arXiv:2601.20614},
    year={2026}
}

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
assests		assests
eval		eval
recipes		recipes
scripts		scripts
scripts_mathforge		scripts_mathforge
slurm		slurm
src		src
tests		tests
trl-0.20.0		trl-0.20.0
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

News

Contents

Introduction

Difficulty-Aware Group Policy Optimization (DGPO)

Multi-Aspect Question Reformulation (MQR)

Main Results

Datasets

Installation

Training

Evaluation

Acknowledgement

Citation

About

Uh oh!

Releases

Packages

Languages

AMAP-ML/MathForge

Folders and files

Latest commit

History

Repository files navigation

Harder Is Better: Boosting Mathematical Reasoning via Difficulty-Aware GRPO and Multi-Aspect Question Reformulation

News

Contents

Introduction

Difficulty-Aware Group Policy Optimization (DGPO)

Multi-Aspect Question Reformulation (MQR)

Main Results

Datasets

Installation

Training

Evaluation

Acknowledgement

Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages