Restart (Resume previous MFIX run) Issue with DMP Solver on Ubuntu 22.04.5 LTS

Description:

I am experiencing an issue with the DMP solver in MFIX on a virtual machine running Ubuntu 22.04.5 LTS, based on Windows 7. Initially, I can compile DMP solver and run the GSP tutorial case without any problems. However, if I interrupt the simulation and attempt to restart the case, I encounter errors that seem related to parallel processing.

Details:

  • System: Ubuntu 22.04.5 LTS (Virtual Machine on Windows 7)
  • MFIX Version: 24.3.1
  • Issue: Restarting a case after interruption causes errors in parallel runs.
  • Observations:
    • The problem occurs only with parallel runs.
    • Single-core computations do not have restart issues.
    • DEM & SQP parallel computations and restarts function correctly.
  • Attachment: Included is a GSP case that fails to restart.
    fb_gsp1_2024-11-27T151009.321571.zip (47.4 MB)

Any guidance or suggestions would be greatly appreciated.

Thank you!

Thanks for the bug report, I will look into this. Can you tell me a little more about how you are running this? Are you resuming after the run reaches tstop, are you pressing “stop” or otherwise killing the process, or are you using the “pause” button in the MFiX GUI? And what parameters, if any, are you changing before resuming?

Thanks.

Thank you for looking into this issue.

I am running the case using the MFIX GUI. I first use the “pause” button, then “stop,” and finally attempt to “resume” the case. I have not changed any simulation parameters, including the number of cores used for the DMP run.

I initially discovered this issue while running my own GSP case and then confirmed it by testing the tutorial GSP case, where the same problem occurred.

Please let me know if you need any more information.

Thank you!

I don’t have a solution yet but here are a few comments:

  1. It has nothing to do with the virtual machine. I get the same result running on Linux

  2. If I pause the simulation via the pause button, then resume by pressing the run button, without stopping it, the job resumes without problems.

  3. If I stop the job then try running with “Resume previous MFiX run” selected, I also get a crash. So pausing is not the root problem.

  4. I get the same crash if I stop the simulation by writing an MFIX.STOP file in the run directory, so the problem is not the way the GUI stops the job (using a different mechanism, via the HTTP interface).

I will let you know as we make progress on this issue. Thank you very much for the bug report.

– Charles

Adding -mca mpi_abort_print_stack 1 to the mpirun flags:

[x280:17514] [6] func:/tmp/fb_gsp1_2024-11-27T151009.321571/mfixsolver_dmp.so(__mpi_utility_MOD_bcast_1i+0x195) [0x7f16f69e8b9e]

[x280:17514] [7] func:/tmp/fb_gsp1_2024-11-27T151009.321571/mfixsolver_dmp.so(__read_res1_des_MOD_read_res_parray_gsp_1i+0x16e) [0x7f16f6941367]

[x280:17514] [8] func:/tmp/fb_gsp1_2024-11-27T151009.321571/mfixsolver_dmp.so(__read_res0_des_mod_MOD_read_res0_des+0x10a1) [0x7f16f693ae06]

[x280:17514] [9] func:/tmp/fb_gsp1_2024-11-27T151009.321571/mfixsolver_dmp.so(__make_arrays_des_mod_MOD_make_arrays_des+0x11dc) [0x7f16f68b5c4d]

[x280:17514] [10] func:/tmp/fb_gsp1_2024-11-27T151009.321571/mfixsolver_dmp.so(__main_MOD_initialize+0xc8f) [0x7f16f65b07a1]

[x280:17514] [11] func:/tmp/fb_gsp1_2024-11-27T151009.321571/mfixsolver_dmp.so(run_mfix_+0x2e2) [0x7f16f65b149c]

[x280:17514] [12] func:/tmp/fb_gsp1_2024-11-27T151009.321571/mfixsolver_dmp.so(__main_MOD_run_mfix0+0x4f) [0x7f16f65b0d8a]

Thank you for the update.:grinning:

Hi,
so the restart problem is that only main rank knows the exact number of gsp particles. we will include this bug fix in the next release.

There is a simple way to fix this if you are familiar with udf. Let me know if you want to try, I can show you how to do that.

Hi,

Thank you for your response. I would appreciate it if you could guide me on how to fix the bug. Should I add an additional udf file in the project directory? Please let me know the steps involved.

Thank you!

you should be able to find a editor tab from the left bottom corner, click editor
then from the left side you can find a MFIX source with a search bar underneath.
type read_res0_des.f and you should see a file in the dir tree, click that file and copy to project directory for editing.

then in line 25: put use mpi_utility, only: global_all_max
then in line 122: add CALL GLOBAL_ALL_MAX(NGluedParticles)

save and build the solver, this should fix the restart issue. Please let me know if the issue persists.

2 Likes

Alternately, wait for a few weeks until MFiX 24.4 is released and this bugfix will be included.

Thanks for the instructions. I made the changes and the issue is resolved. Appreciate your help!

Thanks. I’ll update to MFiX 24.4 when it’s released.