Skip to content
/ RISE Public

Code and data for "Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing" (ACL 2025)

Notifications You must be signed in to change notification settings

kaishxu/RISE

Repository files navigation

RISE

This repository contains the official implementation of Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing (ACL 2025).

🎯 Overview

RISE is a novel approach for improving reasoning capabilities in large language models through preference learning. The key innovation is the use of error-injected self-editing to create high-quality preference data that helps models learn to identify and correct subtle reasoning errors.

Models

Model Name HF Checkpoint Size License
RISE-Qwen2-7B 🤗 kaishxu/RISE-Qwen2-7B 7B Qwen2

🚀 Quick Start

Prerequisites

Basic Usage

  1. Prepare Training Data:
# Set data path
export data_path="./data/train"

# Generate self-sample prompts
python construct_self_sample_prompts.py \
    --model_name qwen2 \
    --save_prompt_path $data_path/self-sample-qwen2.jsonl

# Run sampling
bash scripts/sampling.sh

# Construct self-editing prompts
python construct_self_editing_prompts.py \
    --model_name qwen2 \
    --sample_folder_path $data_path/self-sample-qwen2-completion \
    --save_sample_path $data_path/self-sample-qwen2-dpo.json \
    --save_prompt_path $data_path/self-editing-qwen2-prompt.jsonl
  1. Generate Self-editing Completions:
python inference.py \
    --model /path/to/Qwen2-7B-Instruct \
    --data_file $data_path/self-editing-qwen2-prompt.jsonl \
    --save_path $data_path/self-editing-qwen2-completion.json \
    --tensor_parallel_size 1 \
    --batch_size 10000
  1. Create Training Data:
python construct_dpo_samples.py \
    --prompt_path $data_path/self-editing-qwen2-prompt.jsonl \
    --completion_path $data_path/self-editing-qwen2-completion.json \
    --chosen_sample_path $data_path/self-sample-qwen2-dpo-step.json \
    --full_sample_path $data_path/self-sample-qwen2-dpo.json \
    --save_sample_path $data_path/self-editing-qwen2-dpo.json
  1. Train the Model:
bash scripts/train.sh
  1. Evaluate the Model:
bash scripts/eval.sh

📊 Evaluation

The project includes comprehensive evaluation on mathematical reasoning tasks:

python eval_math.py \
    --model /path/to/trained/model \
    --data_path /path/to/test/data \
    --prompt qwen2-boxed \
    --save_path results.json

🤝 Thanks

Our training data is modified from Step-DPO. Thanks for their great work!

About

Code and data for "Subtle Errors in Reasoning: Preference Learning via Error-injected Self-editing" (ACL 2025)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published