-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Summary
When using kokkos with GPU acceleration and comm device i.e. running the dynamics on the GPU, LAMMPS produces unstable MD with incorrect energies and explosive kinetic energies. The same simulation runs correctly with comm host (running dynamics on CPU) or without using kokkos package.
This might be an issue with original kokkos implementation, but I didn't see any such issues (closed or still open) on their repo. Plus I did not attempt to reproduce this with vanilla lammps using LJ potential.
Environment
- lammps version: installed via conda, consult the
metatensor.installfor a reproduction of this environment. - cuda version: 12.6
- OS: Ubuntu 20.04
Problem details
Incorrect dynamics with comm device:
lmp -k on g 1 -sf kk -pk kokkos newton on neigh half comm device -in in.LGPS_kokkos- Kinetic energy explodes (300K → 1500K+ in 1 ps)
- Potential energy drift (~200 eV difference)
Correct dynamics with comm host:
lmp -k on g 1 -sf kk -pk kokkos newton on neigh half comm host -in in.LGPS_kokkos- Stable temperature evolution
- Correct potential energies (identical to CPU-only run)
- Almost numerically identical to non-Kokkos simulation
Correct dynamics with CPU:
lmp -in in.LGPS_CPU- This is the control simulation, and gives same result as above.
To reproduce
I have attached the installation instructions to recreate the environment and input/output files to run these 3 types of simulations: non-kokkos CPU version, kokkos with dynamics on CPU, kokkos with dynamics on GPU.
Final note
I personally don't need to run dynamics on GPU (with comm device) or even use kokkos at all. But I think it is important to make note of this bug/behaviour.
I also observed the same behaviour on ALPS where I compiled lammps myself following the instructions on the metatomic docs.