Hi, thanks for sharing this great work!
In the paper, you mention ~35k training examples for Base models (Qwen2.5-7B/14B) and another ~35k for LRM (QwQ-32B). Could you please clarify what are the main distinctions (sources, difficulty, filtering, etc.)?
This would help us better understand whether performance differences mainly come from the training recipe or also from data distribution.
Thanks!