Skip to content

libmboard hangs in random iterations #7

@zauster

Description

@zauster

This is the serious error in FLAME-HPC/xparser#14, also an issue of libmboard (and not xparser).

Using the attached minimal example (it is really minimal: a Sender agent sends a message to a Receiver agent), build and run like this:

xparser -p -f bl_model.xml     ## with or without the -f flag
make
mpirun -np 2 ./main 2000 its/0.xml -r

hangs/stalls/deadlocks at seemingly random iterations with full CPU usage! Sometimes the simulation finishes, sometimes not. If I use at least four processes mpirun -np 4 --oversubscribe ./main 2000 its/0.xml -r, then the stalls occur in every simulation run (again at random iterations).

I tracked the error to the MB_SetAccessMode function and in there to (I think) the MPI_Allgather call here.

I can replicate this behaviour with both OpenMPI 3.0.1 and MPICH 3.2.1 (both using the MPI 3.0 Standard).

I haven't found a workaround for this problem, so I am unable to run any simulations at the moment. So this is a quite serious error (at least for me). Any help on this is greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions