-
Notifications
You must be signed in to change notification settings - Fork 5
Description
This is the serious error in FLAME-HPC/xparser#14, also an issue of libmboard (and not xparser).
Using the attached minimal example (it is really minimal: a Sender agent sends a message to a Receiver agent), build and run like this:
xparser -p -f bl_model.xml ## with or without the -f flag
make
mpirun -np 2 ./main 2000 its/0.xml -r
hangs/stalls/deadlocks at seemingly random iterations with full CPU usage! Sometimes the simulation finishes, sometimes not. If I use at least four processes mpirun -np 4 --oversubscribe ./main 2000 its/0.xml -r, then the stalls occur in every simulation run (again at random iterations).
I tracked the error to the MB_SetAccessMode function and in there to (I think) the MPI_Allgather call here.
I can replicate this behaviour with both OpenMPI 3.0.1 and MPICH 3.2.1 (both using the MPI 3.0 Standard).
I haven't found a workaround for this problem, so I am unable to run any simulations at the moment. So this is a quite serious error (at least for me). Any help on this is greatly appreciated!