Question on the difference of training sets for Base vs. LRM

Hi, thanks for sharing this great work!

In the paper, you mention ~35k training examples for Base models (Qwen2.5-7B/14B) and another ~35k for LRM (QwQ-32B). Could you please clarify what are the main distinctions (sources, difficulty, filtering, etc.)?

This would help us better understand whether performance differences mainly come from the training recipe or also from data distribution.

Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on the difference of training sets for Base vs. LRM #25

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Question on the difference of training sets for Base vs. LRM #25

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions