CFD-MFiX MPI_Wait error after restart_1

When attempting to restart the simulation using ‘restart_1’, I get the following error:

Initial DES Particle array size: 6146
Message 1010: Read in data from .RES file for TIME =   5.1000
Time step number (NSTEP) =  52403
   Compressible: IJK_P_g remaining undefined.
Resizing DES MPI buffers:     1.5 MB  (+202.9%)
[ins006:4083590] *** An error occurred in MPI_Wait
[ins006:4083590] *** reported by process [4088528897,25]
[ins006:4083590] *** on communicator MPI_COMM_WORLD
[ins006:4083590] *** MPI_ERR_TRUNCATE: message truncated
[ins006:4083590] *** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
[ins006:4083590] ***    and potentially your MPI job)

Attempting to rerun multiple times sometimes may give a different error:

A request was made to bind that would require binding
processes to more cpus than are available in your allocation:

   Application:     ./mfixsolver
   #processes:      80
   Mapping policy:  BYCORE
   Binding policy:  CORE

You can override this protection by adding the "overload-allowed"
option to your binding directive.

The simulation uses DEM with air as the fluid. UDFs usr0_des.f and usr3_des.f are used.

Other info:
This simulation runs until time-out, but there are issues in restarting. I run other identical simulations with the small difference of using variable particle densities. Interestingly, about half of these other simulations restart with no issue, but the other half returns either of the two errors above. Files used are attached (.zip).

I tried:

  • Rebuilding the solver - did not help
  • Excluding partially occupied nodes - did not help
  • Using a different MPI module - did not help
  • A similar post suggested the issue is with DLB_DT, but that DNE in my mfix file
  • Used ‘restart_2’ - did not help

Edit: updated mfixdmp.slurm file in .zip

MFiX_forum 2.zip (45.5 MB)

  1. does your run restart_1 if you get rid of the dynamic load balance line in your mfx?
  2. did you try deleting the the load partitioning file that was created by mfix and then try the restart?
  3. do you really need DLB anyhow? are your particles moving all from one side of the domain to another?
  1. I believe I do not have a DLB line in my mfx, though it may be that the simulation uses it by default?
  2. Not sure if I have such file. Please see the screenshot showing the files in the simulation directory
  3. No, I do not need DLB but could not find a place to disable or enable it

Note: I tried manually including the following lines in the mfx file:

dmp_particles_balancing = .false.
des_mpi_buffer_factor = 4.0

However,dmp_particles_balancing is not a recognized variable. I use version 23.2.

It does not look like dynamic load balancing is enabled in this file.

Manually setting des_buff_resize_factor = 4.0 has solved the issue

1 Like