Skip to content

Conversation

@laraPPr
Copy link
Collaborator

@laraPPr laraPPr commented Dec 12, 2025

No description provided.

@laraPPr laraPPr added the 2025.06-software.eessi.io 2025.06 version of software.eessi.io label Dec 12, 2025
@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 12, 2025

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=aarch64/neoverse_n1

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Dec 12, 2025

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: neoverse_n1
Building for: aarch64/neoverse_n1
Job dir: /project/def-users/SHARED/jobs/2025.12/pr_1335/112381

date job status comment
Dec 12 12:55:18 UTC 2025 submitted job id 112381 awaits release by job manager
Dec 12 12:56:11 UTC 2025 released job awaits launch by Slurm scheduler
Dec 12 13:06:14 UTC 2025 running job 112381 is running
Dec 12 14:02:23 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-112381.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-aarch64-neoverse_n1-17655480150.tar.zstsize: 44 MiB (46437129 bytes)
entries: 1832
modules under 2025.06/software/linux/aarch64/neoverse_n1/modules/all
archspec/0.2.5-GCCcore-13.3.0.lua
kim-api/2.4.1-GCC-13.3.0.lua
MDI/1.4.26-gompi-2024a.lua
mpi4py/4.0.1-gompi-2024a.lua
PLUMED/2.9.3-foss-2024a.lua
ScaFaCoS/1.0.4-foss-2024a.lua
tbb/2021.13.0-GCCcore-13.3.0.lua
Voro++/0.4.6-GCCcore-13.3.0.lua
xxd/9.1.1275-GCCcore-13.3.0.lua
software under 2025.06/software/linux/aarch64/neoverse_n1/software
archspec/0.2.5-GCCcore-13.3.0
kim-api/2.4.1-GCC-13.3.0
MDI/1.4.26-gompi-2024a
mpi4py/4.0.1-gompi-2024a
PLUMED/2.9.3-foss-2024a
ScaFaCoS/1.0.4-foss-2024a
tbb/2021.13.0-GCCcore-13.3.0
Voro++/0.4.6-GCCcore-13.3.0
xxd/9.1.1275-GCCcore-13.3.0
reprod directories under 2025.06/software/linux/aarch64/neoverse_n1/reprod
archspec/0.2.5-GCCcore-13.3.0/20251212_130702UTC
kim-api/2.4.1-GCC-13.3.0/20251212_130934UTC
MDI/1.4.26-gompi-2024a/20251212_131023UTC
mpi4py/4.0.1-gompi-2024a/20251212_132422UTC
PLUMED/2.9.3-foss-2024a/20251212_131747UTC
ScaFaCoS/1.0.4-foss-2024a/20251212_133954UTC
tbb/2021.13.0-GCCcore-13.3.0/20251212_132813UTC
Voro++/0.4.6-GCCcore-13.3.0/20251212_130719UTC
xxd/9.1.1275-GCCcore-13.3.0/20251212_131032UTC
other under 2025.06/software/linux/aarch64/neoverse_n1
no other files in tarball
Dec 12 14:02:23 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:aarch64_neoverse_n1+default
P: latency: 1.93 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:aarch64_neoverse_n1+default
P: latency: 5.64 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:aarch64_neoverse_n1+default
P: latency: 0.29 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:aarch64_neoverse_n1+default
P: bandwidth: 16029.27 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-112381.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 12, 2025

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen4

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Dec 12, 2025

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4
Job dir: /project/def-users/SHARED/jobs/2025.12/pr_1335/112382

date job status comment
Dec 12 14:03:38 UTC 2025 submitted job id 112382 awaits release by job manager
Dec 12 14:04:26 UTC 2025 released job awaits launch by Slurm scheduler
Dec 12 14:10:29 UTC 2025 running job 112382 is running
Dec 12 15:00:34 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-112382.out
✅ no message matching FATAL:
❌ found message matching ERROR:
❌ found message matching FAILED:
❌ found message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-17655515380.tar.zstsize: 45 MiB (48205632 bytes)
entries: 1832
modules under 2025.06/software/linux/x86_64/amd/zen4/modules/all
archspec/0.2.5-GCCcore-13.3.0.lua
kim-api/2.4.1-GCC-13.3.0.lua
MDI/1.4.26-gompi-2024a.lua
mpi4py/4.0.1-gompi-2024a.lua
PLUMED/2.9.3-foss-2024a.lua
ScaFaCoS/1.0.4-foss-2024a.lua
tbb/2021.13.0-GCCcore-13.3.0.lua
Voro++/0.4.6-GCCcore-13.3.0.lua
xxd/9.1.1275-GCCcore-13.3.0.lua
software under 2025.06/software/linux/x86_64/amd/zen4/software
archspec/0.2.5-GCCcore-13.3.0
kim-api/2.4.1-GCC-13.3.0
MDI/1.4.26-gompi-2024a
mpi4py/4.0.1-gompi-2024a
PLUMED/2.9.3-foss-2024a
ScaFaCoS/1.0.4-foss-2024a
tbb/2021.13.0-GCCcore-13.3.0
Voro++/0.4.6-GCCcore-13.3.0
xxd/9.1.1275-GCCcore-13.3.0
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/reprod
archspec/0.2.5-GCCcore-13.3.0/20251212_141126UTC
kim-api/2.4.1-GCC-13.3.0/20251212_141343UTC
MDI/1.4.26-gompi-2024a/20251212_141434UTC
mpi4py/4.0.1-gompi-2024a/20251212_142525UTC
PLUMED/2.9.3-foss-2024a/20251212_142308UTC
ScaFaCoS/1.0.4-foss-2024a/20251212_143815UTC
tbb/2021.13.0-GCCcore-13.3.0/20251212_142831UTC
Voro++/0.4.6-GCCcore-13.3.0/20251212_141139UTC
xxd/9.1.1275-GCCcore-13.3.0/20251212_141439UTC
other under 2025.06/software/linux/x86_64/amd/zen4
no other files in tarball
Dec 12 15:00:34 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86_64_amd_zen4+default
P: latency: 1.41 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86_64_amd_zen4+default
P: latency: 3.29 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86_64_amd_zen4+default
P: latency: 0.17 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86_64_amd_zen4+default
P: bandwidth: 14176.12 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-112382.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 12, 2025

@lorisercole We have tested this up and down but clearly not enough. It failed during the make step with

/cvmfs/software.eessi.io/versions/2025.06/compat/linux/aarch64/usr/bin/ld: /home/larappr/eessi/versions/2025.06/software/linux/aarch64/neoverse_n1/software/ScaFaCoS/1.0.4-foss-2024a/lib64/libfcs_fmm.so.0: undefined reference to `armci_die'

/cvmfs/software.eessi.io/versions/2025.06/compat/linux/aarch64/usr/bin/ld: /home/larappr/eessi/versions/2025.06/software/linux/aarch64/neoverse_n1/software/ScaFaCoS/1.0.4-foss-2024a/lib64/libfcs_fmm.so.0: undefined reference to `atomic_fetch_and_add'

collect2: error: ld returned 1 exit status

@lorisercole
Copy link

@laraPPr is this only a problem of scafacos on aarch64?

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 15, 2025

The error above is on aarch64. And it looks ARM specific. I'll have to check what happened with the zen4 build.

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 15, 2025

With zen4 the sanity checks failed

 l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/atm/in.atm"); l.finalize()'' ...

  >> result for command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/atm/in.atm"); l.finalize()'': FAILED

  >> running command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/balance/in.balance"); l.finalize()'' ...

  >> result for command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/balance/in.balance"); l.finalize()'': FAILED

  >> running command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/colloid/in.colloid"); l.finalize()'' ...

  >> result for command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/colloid/in.colloid"); l.finalize()'': FAILED

  >> running command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/crack/in.crack"); l.finalize()'' ...

  >> result for command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/crack/in.crack"); l.finalize()'': FAILED

  >> running command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/dipole/in.dipole"); l.finalize()'' ...

  >> result for command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/dipole/in.dipole"); l.finalize()'': FAILED

  >> running command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/friction/in.friction"); l.finalize()'' ...

  >> result for command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/friction/in.friction"); l.finalize()'': FAILED

  >> running command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/hugoniostat/in.hugoniostat"); l.finalize()'' ...

  >> result for command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/hugoniostat/in.hugoniostat"); l.finalize()'': FAILED

  >> running command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/indent/in.indent"); l.finalize()'' ...

  >> result for command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/indent/in.indent"); l.finalize()'': FAILED

  >> running command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/melt/in.melt"); l.finalize()'' ...

  >> result for command 'cd /tmp/eb-_zihpb_x/eb-bxtvzvor/tmpgxc01u41 && mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("/cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/LAMMPS/22Jul2025-foss-2024a-kokkos/examples/melt/in.melt"); l.finalize()'': FAILED

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 15, 2025

[x86-64-amd-zen4-node1:192127:0:192127] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x10)

==== backtrace (tid: 192127) ====

 0  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/UCX/1.16.0-GCCcore-13.3.0/lib64/libucs.so.0(ucs_handle_error+0x2b4) [0x153af4a23ba4]

 1  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/UCX/1.16.0-GCCcore-13.3.0/lib64/libucs.so.0(+0x2bd79) [0x153af4a23d79]

 2  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/UCX/1.16.0-GCCcore-13.3.0/lib64/libucs.so.0(+0x2bf2a) [0x153af4a23f2a]

 3  /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/../lib64/libc.so.6(+0x3be10) [0x153b03ad2e10]

 4  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(Py_FinalizeEx+0x41) [0x153b03fe5551]

 5  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/libffi/3.4.5-GCCcore-13.3.0/lib64/libffi.so.8(+0x709a) [0x153b0341509a]

 6  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/libffi/3.4.5-GCCcore-13.3.0/lib64/libffi.so.8(+0x65f5) [0x153b034145f5]

 7  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/libffi/3.4.5-GCCcore-13.3.0/lib64/libffi.so.8(ffi_call+0xbd) [0x153b03414c7d]

 8  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/python3.12/lib-dynload/_ctypes.cpython-312-x86_64-linux-gnu.so(+0x987d) [0x153b0342387d]

 9  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/python3.12/lib-dynload/_ctypes.cpython-312-x86_64-linux-gnu.so(+0x9188) [0x153b03423188]

10  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(_PyObject_MakeTpCall+0x6b) [0x153b03f4593b]

11  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(+0x10e9f2) [0x153b03e6c9f2]

12  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(PyEval_EvalCode+0xa9) [0x153b03fc47a9]

13  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(+0x28a777) [0x153b03fe8777]

14  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(+0x2854ab) [0x153b03fe34ab]

15  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(PyRun_StringFlags+0x69) [0x153b03fd6479]

16  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(PyRun_SimpleStringFlags+0x3c) [0x153b03fd62dc]

17  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(Py_RunMain+0x444) [0x153b03ff3d74]

18  /cvmfs/software.eessi.io/versions/2025.06/software/linux/x86_64/amd/zen4/software/Python/3.12.3-GCCcore-13.3.0/lib/libpython3.12.so.1.0(Py_BytesMain+0x27) [0x153b03fadae7]

19  /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/../lib64/libc.so.6(+0x26c3a) [0x153b03abdc3a]

20  /cvmfs/software.eessi.io/versions/2025.06/compat/linux/x86_64/lib/../lib64/libc.so.6(__libc_start_main+0x85) [0x153b03abdcf5]

21  python(_start+0x21) [0x401061]

=================================

--------------------------------------------------------------------------

prterun noticed that process rank 0 with PID 192127 on node x86-64-amd-zen4-node1 exited on

signal 11 (Segmentation fault).

--------------------------------------------------------------------------

)

@ocaisa
Copy link
Member

ocaisa commented Dec 15, 2025

IIRC ScaFaCos fails to build on Arm, I thought at some point we made it a x86-only dependency. I tracked down #384 but can't see anything in the hooks any more.

@lorisercole
Copy link

IIRC ScaFaCos fails to build on Arm, I thought at some point we made it a x86-only dependency. I tracked down #384 but can't see anything in the hooks any more.

We've removed the x86-only condition in the latest PR, see this comment:
easybuilders/easybuild-easyconfigs#23719 (comment)
I could not test it myself though

@lorisercole
Copy link

@laraPPr for the zen4 builds, is that everything we can see from the log files? It's not clear to me if MPI managed to start LAMMPS and it fails at the end of the run, or if it crashes immediately (in which case it seems to be more an MPI-related issue)

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 15, 2025

For zen4 we can't see more but I'm building it interactively to check what goes wrong

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 15, 2025

It seems to run because it does not crash immedially when I run the sanity check so I think it is running.

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 15, 2025

@lorisercole I just build it by ignorig the sanity check and tested it. It seem to be a problem with the python binding. because running without python the tests run without any problems. when I ran lmp -in in.lj and mpirun -n lmp -in in.lj

When running mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("in.lj"); l.finalize()'. It finished the run and the resluts look ok but it still finishes with the segmentation fault.

@lorisercole
Copy link

When running mpirun -n 1 python -c 'from lammps import lammps; l=lammps(); l.file("in.lj"); l.finalize()'. It finished the run and the resluts look ok but it still finishes with the segmentation fault.

The segfault is due to the l.finalize() function call.
It is something that I fixed in the latest LAMMPS easyblock PR:
https://github.com/easybuilders/easybuild-easyblocks/blob/bef765e964e70c6bdcad32f53b5ec24b4ebb28ee/easybuild/easyblocks/l/lammps.py#L769-L771

…24a.yml

Co-authored-by: Loris Ercole <30901257+lorisercole@users.noreply.github.com>
@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 16, 2025

@lorisercole Can you check if the last change is still in the easyblock or if the merge sync removed it. Because I took the last commit of the pr.

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 16, 2025

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen4

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Dec 16, 2025

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4
Job dir: /project/def-users/SHARED/jobs/2025.12/pr_1335/113483

date job status comment
Dec 16 11:22:44 UTC 2025 submitted job id 113483 awaits release by job manager
Dec 16 11:23:30 UTC 2025 released job awaits launch by Slurm scheduler
Dec 16 11:29:41 UTC 2025 running job 113483 is running
Dec 16 11:32:49 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-113483.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-17658846710.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen4/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen4/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen4
no other files in tarball
Dec 16 11:32:49 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86_64_amd_zen4+default
P: latency: 3.84 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86_64_amd_zen4+default
P: latency: 3.22 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86_64_amd_zen4+default
P: latency: 0.16 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86_64_amd_zen4+default
P: bandwidth: 14520.71 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-113483.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

# See https://github.com/easybuilders/easybuild-easyconfigs/pull/23719
from-commit: 379c0f01a29109a3c3db1e1807838f50074af143
# See https://github.com/easybuilders/easybuild-easyblocks/pull/3894
include-easyblocks-from-commit: 433d723ceb5492ab22bf45566e40296d01db87c2
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was to quick in applying this change. I also cannot find this commit in the pr https://github.com/easybuilders/easybuild-easyblocks/pull/3894/commits

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the final merge commit
easybuilders/easybuild-easyblocks@433d723

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 16, 2025

bot: build repo:eessi.io-2025.06-software instance:eessi-bot-mc-aws for:arch=x86_64/amd/zen4

@eessi-bot-aws
Copy link

eessi-bot-aws bot commented Dec 16, 2025

New job on instance eessi-bot-mc-aws for repository eessi.io-2025.06-software
Building on: amd-zen4
Building for: x86_64/amd/zen4
Job dir: /project/def-users/SHARED/jobs/2025.12/pr_1335/113484

date job status comment
Dec 16 12:12:09 UTC 2025 submitted job id 113484 awaits release by job manager
Dec 16 12:12:44 UTC 2025 released job awaits launch by Slurm scheduler
Dec 16 12:13:50 UTC 2025 running job 113484 is running
Dec 16 12:16:58 UTC 2025 finished
😢 FAILURE (click triangle for details)
Details
✅ job output file slurm-113484.out
✅ no message matching FATAL:
❌ found message matching ERROR:
✅ no message matching FAILED:
✅ no message matching required modules missing:
❌ no message matching No missing installations
✅ found message matching .tar.* created!
Artefacts
eessi-2025.06-software-linux-x86_64-amd-zen4-17658873190.tar.zstsize: 0 MiB (22 bytes)
entries: 0
modules under 2025.06/software/linux/x86_64/amd/zen4/modules/all
no module files in tarball
software under 2025.06/software/linux/x86_64/amd/zen4/software
no software packages in tarball
reprod directories under 2025.06/software/linux/x86_64/amd/zen4/reprod
no reprod directories in tarball
other under 2025.06/software/linux/x86_64/amd/zen4
no other files in tarball
Dec 16 12:16:58 UTC 2025 test result
😁 SUCCESS (click triangle for details)
ReFrame Summary
[ OK ] (1/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_allreduce %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /e4bf9965 @BotBuildTests:x86_64_amd_zen4+default
P: latency: 1.42 us (r:0, l:None, u:None)
[ OK ] (2/4) EESSI_OSU_coll %benchmark_info=mpi.collective.osu_alltoall %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node %device_type=cpu /3da4890b @BotBuildTests:x86_64_amd_zen4+default
P: latency: 3.19 us (r:0, l:None, u:None)
[ OK ] (3/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_latency %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /3255009a @BotBuildTests:x86_64_amd_zen4+default
P: latency: 0.15 us (r:0, l:None, u:None)
[ OK ] (4/4) EESSI_OSU_pt2pt_CPU %benchmark_info=mpi.pt2pt.osu_bw %module_name=OSU-Micro-Benchmarks/7.5-gompi-2025a %scale=1_node /59f4b331 @BotBuildTests:x86_64_amd_zen4+default
P: bandwidth: 14511.6 MB/s (r:0, l:None, u:None)
[ PASSED ] Ran 4/4 test case(s) from 4 check(s) (0 failure(s), 0 skipped, 0 aborted)
Details
✅ job output file slurm-113484.out
❌ found message matching ERROR:
✅ no message matching [\s*FAILED\s*].*Ran .* test case

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 16, 2025

We'll have to wait for the next EasyBuild release

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 16, 2025

@laraPPr is this only a problem of scafacos on aarch64?

Had a little convo with an AI friend and he suggest trying to build with -DLAMMPS_EXTRA_LIBS="-latomic -larmci". I don't think it is a ScaFaCos problem since it build and passed the sanity check. I think I'll test this in dev.eessi.io.

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 17, 2025

Ah there is an open issue on this scafacos/scafacos#42

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 17, 2025

So two options:

  1. We resolve this problem with ScaFaCos
  2. Go back to removing ScaFaCos as a dependency (Than LAMMPS builds its own ScaFaCos)

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 17, 2025

Ah no. the internal ScaFaCos is not build on ARM
ARM:

{EESSI 2023.06} [larappr@aarch64-neoverse-n1-node1 2Aug2023_update2-foss-2023a-kokkos]$ mpirun -n 1 lmp -h | grep SCAFACOS

cascaselake:

{EESSI 2023.06} [vsc00000@nod4004 ]$ mpirun -n 1 lmp -h | grep SCAFACOS
RIGID SCAFACOS SHOCK SMTBQ SPH SPIN SRD TALLY UEF VORONOI VTK YAFF

@laraPPr
Copy link
Collaborator Author

laraPPr commented Dec 17, 2025

Maybe we should add a sanity check that parses general_packages and check lmp -h if all of them are included?

@lorisercole
Copy link

lorisercole commented Dec 17, 2025

Ah no. the internal ScaFaCos is not build on ARM ARM:

{EESSI 2023.06} [larappr@aarch64-neoverse-n1-node1 2Aug2023_update2-foss-2023a-kokkos]$ mpirun -n 1 lmp -h | grep SCAFACOS

cascaselake:

{EESSI 2023.06} [vsc00000@nod4004 ]$ mpirun -n 1 lmp -h | grep SCAFACOS
RIGID SCAFACOS SHOCK SMTBQ SPH SPIN SRD TALLY UEF VORONOI VTK YAFF

Of course it's not build on ARM, that old easyconfig (2Aug2023_update2-foss-2023a-kokkos) still had the if ARCH == 'x86_64' conditional:
https://github.com/easybuilders/easybuild-easyconfigs/blob/9d5add8ea92501ba9562cd5b298ed82ea8a1cec7/easybuild/easyconfigs/l/LAMMPS/LAMMPS-2Aug2023_update2-foss-2023a-kokkos-CUDA-12.1.1.eb#L67-L72

You should check the easyconfigs that were modified by the #23719 PR:
https://github.com/easybuilders/easybuild-easyconfigs/pull/23719/changes

  • LAMMPS-29Aug2024-foss-2023b-kokkos.eb
  • LAMMPS-29Aug2024_update2-foss-2023a-kokkos.eb
  • LAMMPS-29Aug2024_update2-foss-2023b-kokkos-CUDA-12.4.0.eb
  • LAMMPS-29Aug2024_update2-foss-2024a-kokkos.eb
  • LAMMPS-29Aug2024_update2-foss-2024a-kokkos-CUDA-12.6.0.eb
  • LAMMPS-22Jul2025-foss-2024a-kokkos.eb
  • LAMMPS-22Jul2025-foss-2024a-kokkos-CUDA-12.6.0.eb

Maybe we should add a sanity check that parses general_packages and check lmp -h if all of them are included?

I think that LAMMPS' own tests already check for all the desired packages to be there.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

2025.06-software.eessi.io 2025.06 version of software.eessi.io

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants