Add hook for reporting and recording variables #12
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Cutting-n-Pasting the three commit messages here in lieu of description
@ekuznetsov139 , this PR is essesntially to add the
_LogSessionRunHookclass from your PR #11 (albeit with some tweaks). Please review. I will also be filing another PR soon (within a day or two) which pulls in all the AMP related changes from that PR. Once both those PRs are merged, PR #11 can be updated to be XLA specific.@c0redumb @micmelesse @xdgarrido please review and merge
Add a --num_report_steps option to specify reporting frequency.
Currently the information for the following
global_step/secandexamples/secgets displayed (and recorded via the summary writer) after every step.
This
--num_report_steps=Noption allows the user to specify the frequency (i.e. every N steps) with which such information should be displayed and recordedEnable printing and recording of throughput + loss on a periodic basis
This commit adds the ability to report (i.e. display to stdout) the following information on a periodic basis
The frequency of the reporting is specified via the
--num_report_stepsoption.Currently only the
throughputandtotal_lossvalues get recorded (to the trace events file meant for tensorboard consumption).Note that
throughputis the same asexamples.secandtotal_lossis the same aslossboth of which are already reported and recorded via theTPUEstimatorimplementation.The
LogSessionRunHookclass is based on a similar class in the NVBERT implementation. It can be easily enhanced to report and record other/more variables.Disable the log messages from being printed twice.
Currently all the messages output via
tf.compat.v1.logging.infoget printed twice. For exampleSetting the
propgateflag in the loggger toFalsewill prevent this. For the above example, only one line will be printedThis makes the output log file more readable.