Benchmarking of Cudnn convolutions is bugged when Tensor Cores are used.

The CUDNN function `cudnnFindConvolutionForwardAlgorithmEx` (used in https://github.com/zdevito/ATen/blob/master/aten/src/ATen/native/cudnn/Conv.cpp#L481) searches an optimal algorithm satisfying the provided workspace constraint by varying the algorithm type but also by changing the math type. The current ATen code fixes the math type to `CUDNN_TENSOR_OP_MATH` if fp16 is used (see https://github.com/zdevito/ATen/blob/master/aten/src/ATen/cudnn/Descriptors.h#L199) irrespective of the math type of the optimal algorithm returned by `cudnnFindConvolutionForwardAlgorithmEx`. The worst consequence of this oversight is that the workspace that gets allocated under the wrong math type assumption violates the actual memory constraint estimated with `getMaxWorkspaceSize`, resulting in a `CUDA Out Of Memory`.

To solve this issue one should consider caching the `cudnnConvolutionFwdAlgoPerf_t` structures rather than `cudnnConvolutionFwdAlgo_t`. In this way the information about the correct `math type` is available and can be properly set before computing the actual workspace and before running the convolution.

The issue does not affect only the forward pass, but also eventually `BwdData` and `BwdFilter`.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Benchmarking of Cudnn convolutions is bugged when Tensor Cores are used. #219

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Benchmarking of Cudnn convolutions is bugged when Tensor Cores are used. #219

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions