Skip to content

Segmentation fault using gpu version for large system #7

@ShiminZhang21

Description

@ShiminZhang21

Hi west team,

I been having a problem on running gpu version of west wstat.x on NERSC Perlmutter . I have no problem when running a small system, but I can’t run through a single calculation for big systems. No matter what parallel setting I tried, it crash at certain point with segmentation fault.

I attached my test file for a ZnO 192 atoms supercell with different parallel using npdep=2496.
"Compile_west_gpu_v1.sh “ and "Compile_west_gpu_v2.sh “ are two compilation script I tried.
“ZnO_wstat_2496/N_ni_” are the parallel tests with N=number of gpus, ni = -ni parallel setting for wstat.x
“ZnO_wstat_2496/slurm.out.reports” is the report of slurm error message. When there’s no memory issue, the segmentation fault problem always appear.
“ZnO_wstat_2496/wstat.out.reports” is the report of where the wstat.out end at. Some end at starting , some end at 70%.

Beside the ZnO 4x4x3 supercell, I also tested other systems like 161 atoms VB- in hBN. The similar issue appears.

Do you have any idea on solving this problem?

Seg_fault.zip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions