There are screenshots of the error in the zipped folder of the error. The simulations are being run from the command line on HPC resources. The error occurs with both TFM and DEM. The error occurs for tutorials and the provided .mfx file.
Attempts to fix the issue
The first thing and most important thing to note is this simulation has been run on one HPC cluster, but does not run on a different (the current) HPC cluster. This makes me believe it has to do with the compiling process of the modules used. However, there are no errors in the compiling process. I have tried both versions 22.2.1 and 21.4 of MFiX (same error occurs for both). For modules, I load in openmpi, gcc, conda, and cmake. I have also tried to run 2 different tutorial simulations (DEM, 3D hopper and drum), both of which send the same error. Therefore I have concluded it is not the .mfx file which is the issue. Also, this .mfx file calls for UDFs and a modified solids conductivity, which requires source code modifications. I have deleted both the UDFs and modified source code files to make sure that is not the problem, still no luck. Like I already said, I think it has to do with the modules/compiling, but I do not know what to try next. Thank you in advance for your time.
Attach project files Silica_Error.zip (50.3 KB)
I hope I have attached the correct files. When running through the command line, I do not know how to create a bug report, so I think I attached everything. I would be happy to provide more if necessary.
Have you tried to run it in serial on the cluster that does not work?
Looks like you are using a spack built gcc-8.4: /nopt/nrel/apps/base/2020-05-12/spack/opt/spack/linux-centos7-x86_64/gcc-4.8.5/gcc-8.4.0
and openmpi 4.1.0: /nopt/nrel/apps/openmpi/4.1.0-gcc-8.4.0-j15/bin/mpif90
When I use spack, the openmpi build is also built with spack. Paths just look a little goofy to me.
Sorry I do not know much about the modules (or modules in general). I just know which ones I normally have to load by using: module load openmpi, etc. Maybe it was the order I loaded them in? Or maybe I just need to load openmpi and gcc gets lumped in?
From the screenshot, it looks like we’re never getting past mpi_init - this is failing at startup, before any real MFiX-specific code is executing - so it looks like potentially a problem with your mpi setup.
I would try to build an run a trivial MPI “Hello world” program - for example
Here is my compile command line I am using. Do you see a problem with this maybe? cmake ~/MFiX_Source/mfix-22.2.1 -DCMAKE_BUILD_TYPE=Release -DENABLE_MPI=1 -DCMAKE_Fortran_COMPILER=mpif90 -DCMAKE_Fortran_FLAGS=“-O2”
Update: MFiX is now working in parallel if I use Intel modules. Specifically, I loaded in comp-intel/2020.1.217, intel-mpi/2020.1.217, and cmake/3.18.2. With these modules, I had to modify the compile line to: cmake ~/MFiX_Source/mfix-22.2.1 -DCMAKE_BUILD_TYPE=Release -DENABLE_MPI=1
Comment: I believe it is an issue with the version of gcc or openmpi. On the previous cluster I was using gcc/6.3.0 and openmpi/2.1.6, which ran the simulations fine. On the current cluster, the versions available are gcc/8.4.0 and openmpi/4.1.0, which do not seem to work. A colleague of mine tested gcc/7.5 and said it works, so maybe versions higher than 7.5 have issue?
Thank you for everyone’s help. I found a work around, but maybe this will spark a conservation and someone will found another solution.