Can SMP be used to accelerate simulation speed in MFIX-DEM?

GL1 · September 26, 2023, 2:50pm

Hello. Researchers. May I ask if SMP can be used for parallel computation in DEM numerical simulations with chemical reactions? I saw some posts saying yes, but the user manual clearly states no. I am a novice in parallelism and am eager to learn. I look forward to your detailed explanations and guidance.
Best wishes!

cgw · September 26, 2023, 9:14pm

SMP can always be used but the benefit may not be as large as you hope for - e.g. 4 threads will not necessarily give 4X speedup - some parts of the code benefit more from SMP than others. Even if the chemical reactions do not take advantage of SMP, the fluid solver will. So there will be speedups in some parts of the code but not others. This is highly empirical, I encourage you to try it and see what results you get. Please post back here and share what you learn, thanks!

– Charles

GL1 · September 27, 2023, 10:35am

Thank you very much for your reply. I conducted a test today and found a very interesting phenomenon. For the DEM example (including chemical reactions), I used SMP for parallel computing on Windows 10 and found that the calculation rate decreased with the increase in the number of cores. What is the reason for this?
When a chemical reaction exists:
10 cores：

SMP is turned off：

When a chemical reaction is do not exists:
10 cores:

SMP is turned off：

It can be seen that regardless of whether there is a chemical reaction or not, the calculation rate decreases as the number of core increases. Why?
2dd_2023-09-27T183505.625318.zip (60.1 MB)

cgw · September 27, 2023, 12:27pm

Looks like SMP-related overheads are dominating any gains from parallelization.

cgw · September 27, 2023, 12:48pm

How many cores does your system have?

GL1 · September 27, 2023, 1:32pm

Thanks for your reply. My system has a total of 12 cores.

cgw · September 27, 2023, 2:02pm

I ran the case on my system using from 1 to 8 SMP threads. (I have 8 cores available).
I set the stop time to 0.5 seconds and repeated each run 3 times.

Here’s the raw data:
note.txt (3.1 KB)

Here’s a summary - the numbers shown are the averages for the 3 runs:

Threads  Realtime  CPU      Speedup
-------  --------  ---      -------
1        33.85     34.15    1
2        23.40     46.97    1.44
3        19.09     57.38    1.77
4        19.11     76.50    1.77
5        21.17     81.58    1.59
6        23.79     106.21   1.44
7        22.03     110.20   1.54
8        23.66     127.87   1.43

plot

As you can see, throwing more threads at the problem does not give better reulst - there seems to be an optimum around 3 or 4 threads. The total CPU time goes up, but that’s because its the sum of the times for each thread. The problem is that the threads are spending a lot of time doing “busy work”, that is, syncronizing operations with other threads. One quickly reaches a point of diminishing returns.

It is known that SMP works better on Linux than Windows. Linux will also give you the opportunity to use DMP, which generally gives better results than SMP. Again this is highly empirical and your results may vary.

GL1 · September 27, 2023, 2:34pm

Thank you for your detailed answer. I have another question about the above picture. What specific code can I type to obtain the content of the following figure?
Best wishes!

cgw · September 27, 2023, 2:49pm

I made that table by repeatedly running the simulation with different SMP settings and collecting the results. (I ran each setting 3 times and averaged the results)

The “Speedup” is just the ratio of the run time for N threads compared to the time for 1 thread.

GL1 · September 27, 2023, 3:01pm

Thanks for your answer. Iknow.
Best wishes!