The DEM-based fluidized bed is behaving abnormally in the SMP

Ethan_MFIX · April 10, 2025, 12:33pm

Hello everyone. I have run a pseudo-2D fluidized bed in MFIX-24.4.1 version using SMP and serial respectively. However, I found that when running with SMP, the particles would fail to fluidize and appear with different anomalies than in serial. The first video is the SMP case and the second is the serial case.And I’ve attached two case files.
case.zip (654.2 KB)

Ethan_MFIX · April 11, 2025, 1:47pm

I’m very confused.I did the calculations again using SMP in Windows and found the results to be consistent with serial. When used in linux the above problem occurs and the higher the number of SMP threads the more the particle movement is hindered.

cgw · April 15, 2025, 2:32pm

There’s something very odd in the files you posted, I’m not sure if this explains the problem at all, but in the serial/mfix.dat there is no Solids section and a lot of the keys are at the bottom of the file in the Undocumented keys section. Do you have any idea why this might be the case? I’ve never seen this before. All of the keys except OMP_NUM_THREADS and tstop have the same values, but something went wrong when the serial/mfix.dat file was saved. I see you are running on a VM, maybe this has something to do with it? Can you tell us more about your setup (what is the OS of the host and of the VM?)

Also please explain the anomalies you are referring to, I’m sorry if I’m missing something, but the two videos you posted above seem quite similar (although not identical). We don’t expect identical results in SMP due to order-of-operations differences.

Ethan_MFIX · April 15, 2025, 2:47pm

Sorry Charles, I sent a new copy of the file I was testing at the time.I am using vmware software and my system is ubuntu 22.04.To show the difference between serial and SMP parallel, I’ve attached another simulation with 14 threads.

case2.zip (683.0 KB)

cgw · April 15, 2025, 2:49pm

OK, that definitely looks odd. You are running an Ubuntu instance on a Windows machine, is that correct? And do you have any idea why all those keys were “Undocumented”? It’s not the main problem we’re chasing, but I’d like to know why that happened.

Ethan_MFIX · April 15, 2025, 2:52pm

Yes Charles, I am using ubuntu system in windows with vmware software.For those “Undocumented” keys, I think the reason may be that I keep switching versions.

Ethan_MFIX · April 15, 2025, 2:58pm

However, I often encounter .mfx files with “Undocumented” keys when using MFIX, and they disappear when I close and reopen the file and save it.

cgw · April 15, 2025, 3:10pm

Ok, thanks. That’s very strange and I’ve never seen it before. It might have to do with your VM setup (?)

Ethan_MFIX · April 15, 2025, 3:21pm

Thanks Charles, can I ask if you tested it on a linux system. If you tested it OK, I think I can test it in a VM on someone else’s computer.

cgw · April 15, 2025, 3:53pm

I have no idea what’s causing the “Undocumented” keys but let’s put that aside for now.
I ran your case on a Linux system with MFiX 25.1, using the serial solver and the SMP solver with 16 threads. I can reproduce your results:
Serial:

SMP

I also note that despite using 16 threads there is only a very slight speedup:
Serial:

SMP:

In serial mode, we reach t=10s in about 600 sec, in SMP we reach t=10 in 550 seconds. Not much of a speedup.

perf top shows that we are spending most of our time waiting, in the SMP case:

We need to look into the reasons for the different behavior, but in the meanwhile I’d advise that you simply not use SMP, as there seems to be little advantage in doing so.

I ran the case with a DMP configuration of 4x1x1 and got a slightly better speedup (and no anomalies):

This is about a 2X speedup over the serial case. If you have access to a Linux host you could try running with DMP. I’m not sure how well it will work on a Linux VM on a Windows host, but you could try it.

Ethan_MFIX · April 15, 2025, 4:23pm

Thanks Charles, I ran the simulation using DMP and it got the same results as serial. So I was wondering what causes the SMP and serial results to have such a big difference? I think the results of SMP in linux systems are anomalous. I am getting the same results as serial using SMP in windows (not in VM).

cgw · April 15, 2025, 5:32pm

I agree, there’s an anomaly with SMP on Linux. It might take us a while to resolve this. Thanks for the report.

Ethan_MFIX · April 16, 2025, 1:16pm

Thanks Charles, I’m very excited about the results of your study.

cgw · April 17, 2025, 12:35am

I found the problem and have a fix. There was a missing omp atomic directive one of the MFiX source files.

Replace

$CONDA_PREFIX/share/mfix/src/model/des/comp_mean_fields1.f

with the attached file.

Then rebuild the default SMP solver:

$ cd $CONDA_PREFIX/bin
$ build_mfixsolver --smp -j

You will also need to rebuild any custom SMP solvers in project directories.

On Windows, use %CONDA_PREFIX% instead of $CONDA_PREFIX and replace slashes with backslashes.

I suspect your results on Windows were not 100% correct, but you may not have noticed. Windows uses GFortran 5 while Linux uses GFortran 13, which optimizes more agressively. With the older compiler, the problem is still present, but less obvious.

comp_mean_fields1.f (3.5 KB)

Thanks again for the bug report.

Ethan_MFIX · April 17, 2025, 12:54am

Thanks very much Charles, I will have a try!