Hi, I don't if wrapping the dataset creation part inside the training function is a good idea... One possible issue is that when using multiple gpus the MNIST dataset is downloaded twice... Perhaps it's better to create a Dataset object in the main function and pass it into the train function, and create the distributed sampler within? Thanks for your help