Hi, thanks for sharing the Pytorch implementation! I am curious about how you select the stats for varied masking ratios. In the paper, you mentioned 'a truncated Gaussian distribution centered at 0.55, left truncated by 0.5, and right truncated by 1.' What is the motivation for using such a distribution? Why not use the cosine schedule as done in MaskGIT? Thank you!