Hi, this is really interesting work! I have a question about the impact of modifying the RoPE of the pretrained Flux-Kontext.
Since you modified the RoPE to incorporate depth-aware 3D coordinates and hierarchical resolutions, does this significantly alter the model's behavior compared to the pretrained model? I'm wondering whether this requires extensive retraining to accommodate the differences in RoPE designs. Could you share training details like GPU hours and iterations?
Thank you for your time!