A few comments:
- Reducing the particle diameter by a factor of 5 increases the number of particles by a factor of 125. MFiX 25.3 (released today) shows the particle count in the dashboard view:
Diameter 0.005:
Diameter 0.001:
The number of particle/particle collisions to check scales (roughly) with the square of the number of particles. So an increase of 4 minutes to 1 day is not unreasonable (factor of approx 325).
- Throwing more CPUs at the problem doesn’t always make it go faster.
I don’t have quite the system you do - I have a 32-core AMD Ryzen.
Running with the larger particle size, in serial mode I get:
Simulation start time = 0.0000s
Simulation time reached = 1.002s
Elapsed real time = 4.314m
Time spent in I/O = 0.4849s
and with 16-core DMP:
Simulation start time = 0.0000s
Simulation time reached = 1.000s
Elapsed real time = 2.754m
Time spent in I/O = 1.105s
So 16-core DMP only got me a speedup of 1.56x.
(For comparison, 16-core SMP resulted in a speedup of 1.92)
Looking at “perf top” during the run:
Serial:
8.14% mfixsolver.so [.] __calc_force_dem_mod_MOD_calc_force_dem
7.02% mfixsolver.so [.] __calc_collision_wall_MOD_calc_dem_force_with_wall_stl
5.98% mfixsolver.so [.] __leqsol_MOD_leq_matvec
4.79% mfixsolver.so [.] __des_drag_gp_mod_MOD_des_drag_gp
4.76% mfixsolver.so [.] __leqsol_MOD_dot_product_par
2.81% mfixsolver.so [.] __wrap_pow
2.80% mfixsolver.so [.] __particles_in_cell_mod_MOD_particles_in_cell
2.68% mfixsolver.so [.] __drag_gs_mod_MOD_drag_syam_obrien
2.67% mfixsolver.so [.] remove_collision.0.constprop.0.isra.0
2.56% mfixsolver.so [.] __drag_gs_des1_mod_MOD_drag_gs_des1
2.47% libc.so.6 [.] memset
2.45% libm.so.6 [.] pow
2.01% mfixsolver.so [.] __cfnewvalues_mod_MOD_cfnewvalues
1.96% mfixsolver.so [.] __leq_bicgs_mod_MOD_leq_bicgs0
1.83% mfixsolver.so [.] __discretelement_MOD_cross
1.58% mfixsolver.so [.] __get_stl_data_mod_MOD_move_is_stl
1.45% mfixsolver.so [.] __desgrid_MOD_desgrid_neigh_build
16-core DMP:
27.62% libopen-pal.so.80.0.5 [.] mca_btl_sm_component_progress
6.51% mfixsolver_dmp.so [.] __leqsol_MOD_leq_matvec
3.66% libopen-pal.so.80.0.5 [.] opal_progress
3.57% libmpi.so.40.40.7 [.] mca_part_persist_progress
2.90% mfixsolver_dmp.so [.] __get_stl_data_mod_MOD_move_is_stl
2.43% libc.so.6 [.] memset
2.29% mfixsolver_dmp.so [.] __particles_in_cell_mod_MOD_particles_in_cell
1.96% mfixsolver_dmp.so [.] __calc_collision_wall_MOD_calc_dem_force_with_wall_stl
1.94% libopen-pal.so.80.0.5 [.] opal_timer_linux_get_cycles_sys_timer
1.66% mfixsolver_dmp.so [.] __leq_bicgs_mod_MOD_leq_bicgs0
1.52% libc.so.6 [.] memmove
1.45% mfixsolver_dmp.so [.] __desgrid_MOD_desgrid_pic
1.34% mfixsolver_dmp.so [.] __calc_force_dem_mod_MOD_calc_force_dem
1.25% [kernel] [k] _copy_to_iter
1.16% mfixsolver_dmp.so [.] __leqsol_MOD_dot_product_par
All of the mca
and opal
functions are DMP overhead, which is taking over 30% of total CPU time.
Now with the smaller particle size, there are more particles crossing between DMP nodes and even more bookkeeping overhead:
56.18% libopen-pal.so.80.0.5 [.] mca_btl_sm_component_progress
7.58% libopen-pal.so.80.0.5 [.] opal_progress
7.42% libmpi.so.40.40.7 [.] mca_part_persist_progress
3.92% libopen-pal.so.80.0.5 [.] opal_timer_linux_get_cycles_sys_timer
2.70% mfixsolver_dmp.so [.] __calc_force_dem_mod_MOD_calc_force_dem
2.25% libmpi.so.40.40.7 [.] ompi_request_default_wait
1.82% mfixsolver_dmp.so [.] __desgrid_MOD_desgrid_neigh_build
1.63% mfixsolver_dmp.so [.] __desgrid_MOD_desgrid_pic
1.59% mca_btl_smcuda.so [.] mca_btl_smcuda_component_progress
1.58% mfixsolver_dmp.so [.] __drag_gs_des1_mod_MOD_drag_gs_des1
1.29% libc.so.6 [.] memset
1.02% libm.so.6 [.] pow
As you can see, we’re spending more time in DMP overhead than anything else.
- I notice that you are using 120 cores on a 64-physical core system via hyperthreading. This is not always the most efficient since the hyperthreads are sharing CPU cache, memory bandwidth, etc. You may want to compare results with and without hyperthreading.
If you are interested in performance analysis, I suggest you familairize yourself with the perf
tool.
– Charles