-
Notifications
You must be signed in to change notification settings - Fork 11
Description
Hi west team,
I been having a problem on running gpu version of west wstat.x on NERSC Perlmutter . I have no problem when running a small system, but I can’t run through a single calculation for big systems. No matter what parallel setting I tried, it crash at certain point with segmentation fault.
I attached my test file for a ZnO 192 atoms supercell with different parallel using npdep=2496.
"Compile_west_gpu_v1.sh “ and "Compile_west_gpu_v2.sh “ are two compilation script I tried.
“ZnO_wstat_2496/N_ni_” are the parallel tests with N=number of gpus, ni = -ni parallel setting for wstat.x
“ZnO_wstat_2496/slurm.out.reports” is the report of slurm error message. When there’s no memory issue, the segmentation fault problem always appear.
“ZnO_wstat_2496/wstat.out.reports” is the report of where the wstat.out end at. Some end at starting , some end at 70%.
Beside the ZnO 4x4x3 supercell, I also tested other systems like 161 atoms VB- in hBN. The similar issue appears.
Do you have any idea on solving this problem?