Hi,
The idea behind autoregressive modeling (pixelcnn, pixelcnn++,...) is that pixels are generated sequentially depending on the previous pixels. it is so called causality constraint.
with up and down sampling, i think the constraint is not satisfied.
It could be because of up and downsampling layers in your network.
Please correct me if I am wrong.