minor logging changes if training from a checkpoint #2

henry-berger · 2022-08-11T14:19:34Z

Line 53: If one were to train a model with a total of 100k epochs starting from a checkpoint of 25k epochs, the current code would display a status bar that runs from 0 to 100k. That would provide an inaccurate estimate of the remaining time, because there are actually only 75k epochs to run. This edit changes the status bar so it goes from 0 to epochs - start_epoch, which in this example would be 0 to 75k.
Line 56: When starting from a checkpoint other than 0, the checkpoint will almost always satisfy the two conditions not epoch % epochs_til_checkpoint and epoch, so the current code would save a checkpoint before actually doing any training. Among other things, this overwrites the training losses file at the the starting checkpoint, replacing it with an empty file. Replacing the condition epoch with the condition epoch != start_epoch should fix that problem.

modified status bar if training from a checkpoint

c1b48a2

Provide feedback