Call set_epoch on DistributedSampler

Hi,

thanks for the excellent example of using DistributedDataParallel in PyTorch; it is very easy to understand and is much better that Pytorch docs.

One important bit that is missing is making the gradient descent truly stochastic in the distributed case. From [Pytoch docs](https://pytorch.org/docs/stable/data.html#torch.utils.data.distributed.DistributedSampler), in order to achieve this, `set_epoch` must be called on the sampler. Otherwise, the data points will be sampled in the same order in every epoch, without shuffling (remember, `DataLoader` is constructed with `shuffle=False`). I have also discovered that it is very important to set the epoch to the same value in each worker, otherwise there is a chance that some data points will be visited multiple times, and others none at all.

I hope all this makes sense. I think that future readers will benefit from the addition I am proposing. Once again, thanks for the excellent doc.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call set_epoch on DistributedSampler #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Call set_epoch on DistributedSampler #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions