[19.1] Problems with DMP

  • Operating System
    Ubuntu 18.04, with stock OpenMPI 2.1.1

  • MFiX version
    19.1

  • GUI or text (source code)
    GUI, straight from the anaconda repository

  • Detailed description of the issue.
    When attempting to run any of the tutorial cases using a dmp-configured solver, an error message is given

    [pk:32466] mca_base_component_repository_open: unable to open mca_patcher_overwrite: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_patcher_overwrite.so: undefined symbol: mca_patcher_base_patch_t_class (ignored)
    [pk:32466] mca_base_component_repository_open: unable to open mca_shmem_mmap: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_mmap.so: undefined symbol: opal_show_help (ignored)
    [pk:32466] mca_base_component_repository_open: unable to open mca_shmem_sysv: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_sysv.so: undefined symbol: opal_show_help (ignored)
    [pk:32466] mca_base_component_repository_open: unable to open mca_shmem_posix: /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_shmem_posix.so: undefined symbol: opal_shmem_base_framework (ignored)
    --------------------------------------------------------------------------
    It looks like opal_init failed for some reason; your parallel process is
    likely to abort.  There are many reasons that a parallel process can
    fail during opal_init; some of which are due to configuration or
    environment problems.  This failure appears to be an internal failure;
    here's some additional information (which may only be relevant to an
    Open MPI developer):
      opal_shmem_base_select failed
      --> Returned value -1 instead of OPAL_SUCCESS
    --------------------------------------------------------------------------
    --------------------------------------------------------------------------
    It looks like orte_init failed for some reason; your parallel process is
    likely to abort.  There are many reasons that a parallel process can
    fail during orte_init; some of which are due to configuration or
    environment problems.  This failure appears to be an internal failure;
    here's some additional information (which may only be relevant to an
    Open MPI developer):
      opal_init failed
      --> Returned value Error (-1) instead of ORTE_SUCCESS
    --------------------------------------------------------------------------
    --------------------------------------------------------------------------
    It looks like MPI_INIT failed for some reason; your parallel process is
    likely to abort.  There are many reasons that a parallel process can
    fail during MPI_INIT; some of which are due to configuration or environment
    problems.  This failure appears to be an internal failure; here's some
    additional information (which may only be relevant to an Open MPI
    developer):
      ompi_mpi_init: ompi_rte_init failed
      --> Returned "Error" (-1) instead of "Success" (0)
    --------------------------------------------------------------------------
    *** An error occurred in MPI_Init
    *** on a NULL communicator
    *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
    ***    and potentially your MPI job)
    

This happens for single phase, TFM, DPM and PIC cases, I tested a few of the tutorial cases.

The mfixsolver.sh looks suspicious:

#!/bin/sh

env LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}"/home/paul/miniconda3/envs/mfix-19.1/lib/libpython3.7m.so" PYTHONPATH="/home/paul/vortex_shedding_fld_2d":"":${PYTHONPATH:+:$PYTHONPATH} /home/paul/miniconda3/envs/mfix-19.1/bin/python3 -m mfixgui.pymfix "$@"

Although fixing this does not resolve the issue. Any ideas?

Dear MFix-Team,
I have the same problem as paulkieckhefen under Windows 10.
Kind regards
Timo

@paulkieckhefen looks like a compile issue. Can you compile a serial version? Did this combination work for you with 18.1.5?

@timo_du_mala, we don’t support DMP (parallel) runs on windows. I suspect you have a different issue?

I built openmpi/2.1.1 and gcc/7.3.0 on my CentOs 7 machine and used them to compile MFiX 19.1. The solver works fine for me in parallel. There must be something different about the Ubuntu package.

FYI, building your own compilers is easy with spack.

@onlyjus, yes I just tried it and works fine - for both DMP in 18.1.5 and serial in 19.1.

Had a look at the mfixsolver script generated by 18.1.5 - it contained

#!/bin/sh
env LD_PRELOAD=libmpi.so LD_LIBRARY_PATH=${LD_LIBRARY_PATH:+$LD_LIBRARY_PATH:}...etc

Adding the LD_PRELOAD to that generated by 19.1 fixed the issue for me. Thank you for your suggestion!