Skip to content

Conversation

@ekuznetsov139
Copy link

  • This PR adds support of XLA. Specify a command-line flag --use_xla=1 or --use_xla=2 to enable. With --use_xla=1, XLA will be used to fuse several specific subgraphs like AdamWeightDecayOptimizer. With --use_xla=2, TF will try to fuse the entire graph to the maximum extent possible.
  • It enables AMP via the flag --use_fp16=True (supersedes the branch https://github.com/ROCmSoftwarePlatform/bert/tree/enable_AMP).
  • Alternately, it enables fp16 via the flag --manual_fp16=True. This code was lifted straight from NVBERT and has not been tested.
  • It adds continuous logging, letting you see the loss in realtime (also lifted from NVBERT).
  • It adjusts evaluation logic.

It's been tested with horovod+xla+adam with and without fp16 and it seems to work correctly. With 8x MI50, seq 128, batch size 10, 1M steps (125K/GPU), final loss is 2.179 +/- 0.003 with fp32 and 2.202 with fp16.

It may be necessary to get a very recent build of TF for horovod & xla to work together.

Adding support of AMP (FP16)
Adding logging
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants