Skip to content

请教下tensorflow的bert指定nccl无法运行的问题 #160

@Chenjingliang1

Description

@Chenjingliang1

在下面这个链接里看到有提到tensorflow的bert不支持nccl
https://github.com/Oneflow-Inc/DLPerf/blob/master/reports/dlperf_benchmark_test_report_v1_cn.md
image

但这个链接里又给出了测评结果https://github.com/Oneflow-Inc/DLPerf/tree/master/TensorFlow/bert#%E5%A4%9A%E6%9C%BA
且在https://github.com/Oneflow-Inc/DLPerf/blob/master/TensorFlow/bert/scripts/single_node_train.sh#L64 里看到@YongtaoShi 提交的增加了nccl的配置。

请问你们后来是咋运行成功的?我现在也遇到指定nccl就不能正常运行了。

感谢~

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions