Differences in training effects between single-GPU and multi-GPU training

Hi, I'm trying to use sympointv1 for training, but I'm finding some difference in the training results between single and multi gpu. On 8*A100 I can effectively drop the loss, but using a single gpu and setting the batch_size to 16 doesn't have the same training effect.how should I train with a single card?