Issue summary
I am using Caffe and ImageNet dataset training on GoogleNet(v1). When I do the single GPU (MI25) training, the training batch size I used is '128'. Then I change the training applied to Multiple MI25 training on hipCaffe, since the total GPU memory capacity has 4 times ( 16GB x4), the batch size should able to fit 512 image/batch(128 image/batch/card). From my testing result, the batch size cannot be changed, even just '192' (multiple of 64), it shows "error: 'hipErrorMemory Allocation'(1002)" .
Since the batch size only has '128', I just do a roughly math, the four cards training time will 3 ~ 3.5x longer as training time on 4xP100 system (batch_size=512).
May I ask is there some environment parameters should I set before the training which can help on enlarge the batch size on multiple GPU training?
I crossed check with one of my NVIDIA P100x4 Server, the batch size could be increased as long as I use more cards. The batch number mentioned above was based on my experience when I did on the same dataset, same network, with NVIDIA P100(16GB GDDR), and V100(16GB GDDR) Training job.
Steps to reproduce
Use the bvlc_googlenet training network under the hipCaffe installation path. The ImageNet dataset from ImageNet official website.
Your system configuration
Operating system: Ubuntu 16.04.3
Compiler: gcc version 5.4.0 20160609 (Ubuntu 5.4.0-6ubuntu1~16.04.5)
CUDA version (if applicable):
CUDNN version (if applicable):
BLAS: USE_ROCBLAS := 1
Python or MATLAB version (for pycaffe and matcaffe respectively): 2.7.12
Other:
miopen-hip 1.1.4
miopengemm 1.1.5
rocm-libs 1.6.180
Server: Inventec P47
GPU: AMD MI25 x4
CPU: AMD EPYC 7601 x2
Memory: 512GB