Hello, thanks for great work. I have a question about the experimental setup and the fairness of comparisons.
In your paper, UniVG-R1 is trained on the newly constructed CoT dataset, while the baseline models (e.g., Migician, Qwen2-VL, etc.) are directly evaluated without being trained on this dataset. This may introduce a data advantage for UniVG-R1.
Just I wanna know if it is a common practice in this field?