SQP performance

Ethan_MFIX · September 23, 2025, 5:25pm

Hello, everyone. I’m running a simple SQP simulation of particles settling from the top. However, I’ve noticed that as time progresses, the process sometimes gets stuck at a particular time step for a long time, causing the particle settling process to become extremely slow. Is there any way to resolve this?
2.mfx (10.5 KB)

cgw · September 23, 2025, 7:12pm

SQP is known to be slow, especially for particles like yours with sharp corners. More CPUs might help - I got a modest, but not impressive, speedup with 8 cores in SMP mode

Serial:

SMP:

In each case, however, there are noticeable horizontal stretches on the simtime-vs-realtime plots which show the process stalling for long stretches of time - although it seems to recover.

I used perf record/perf report to try to get an idea of what’s going on during these periods.

Here’s a report from a period where the simulation is proceeding normally:


# Overhead  Command          Shared Object     Symbol                                                             
# ........  ...............  ................  ...................................................................
#
    20.47%  mfixsolver: 2.m  mfixsolver.so    [.] __wrap_pow
     8.12%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_contact_newton_dpmethod_mod_MOD_func_dp_a
     8.00%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_equivalent_radius_mod_MOD_sq_gradient
     6.71%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_properties_mod_MOD_sq_inout_f
     4.32%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_rotation_mod_MOD_qrotate
     2.69%  mfixsolver: 2.m  mfixsolver.so    [.] __calc_collision_wall_MOD_calc_dem_force_with_wall_stl
     2.26%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_rotation_mod_MOD_qrotationmatrix
     1.67%  mfixsolver: 2.m  mfixsolver.so    [.] __wrap_pow@plt
     1.66%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_contact_newton_dpmethod_mod_MOD_sq_contact_newton_dp_a
     1.59%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_rotation_mod_MOD_qmatrixrotatevector
     1.54%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_contact_newton_dpmethod_mod_MOD_func_dp_b
     1.04%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_calc_force_mod_MOD_calc_force_superdem
     0.89%  mfixsolver: 2.m  mfixsolver.so    [.] remove_collision.0.constprop.0.isra.0
     0.84%  mfixsolver: 2.m  mfixsolver.so    [.] __cfnewvalues_mod_MOD_superdem_cfnewvalues
     0.83%  mfixsolver: 2.m  libmvec.so.1     [.] 0x000000000000c569
     0.76%  mfixsolver: 2.m  mfixsolver.so    [.] __sq_equivalent_radius_mod_MOD_sq_hessian

this is what I’m used to seeing for SQP cases - computing the exponentials in the shape function (wrap_pow) is the hottest function and is using about 20% of total CPU time.

Now here’s what we get during one of the stall periods:

# ........  ...............  ...............  ...............................................................
#
    92.36%  mfixsolver: 2.m  mfixsolver.so    [.] __gmres_general_MOD_inverse_uptri_matrix
     0.96%  mfixsolver: 2.m  mfixsolver.so    [.] __gmres_general_MOD_gmres
     0.88%  mfixsolver: 2.m  mfixsolver.so    [.] __gmres_general_MOD_matvecn
     0.57%  mfixsolver: 2.m  mfixsolver.so    [.] __gmres_general_MOD_apply_givens_rotation

We’re spending a whopping 92% of time in one function, inverse_uptri_matrix.
It’s possible that optimizing this function (like we optimized the pow function) might improve performance. But it would be good to know why we are spending so much time in gmres in general. It’s possible that some convergence criteria might be too tight.

The original author of this code, Xi Gao @gaoxi is no longer at NETL but he may have a comment on this. I’ll try to take a closer look at this when I get some time, but I can’t promise anything, I’m quite busy with other things at this time. But this is interesting…

Thanks for the report, and good luck with your work.

– Charles

gaoxi · September 24, 2025, 3:49am

I agree with Charles! First, avoid to use roundness parameters are cause sharp corners. Second, some convergence criteria might be too tight. Since for particle of different size and shape, the convergence criteria might be different. So try to adjust these parameters in the source code and find how them will affect the cost of your case.

Ethan_MFIX · September 24, 2025, 7:52am

Thank you very much for your answers, Charles and Gao! I’m wondering what convergence criteria can be adjusted for DEM without fluids. I’ve only found two places where adjustments can be made, but I’m unsure which ones are applicable to DEM. Or do you have any suggestions regarding the adjustment of convergence criteria?

cgw · September 24, 2025, 3:19pm

Unfortunately the covergence criteria in the Numerics pane are for the fluid and DEM solvers. The convergence criteria for the SQP contact detection algorithm are not adjustible as of now. (I’ll see about adding this to a future release). The SQP code uses its own GMRES implementation in the file sq_gmres.f and the convergence criteria are all hard-coded there.

I did a little experiment and was able to get your case to run in half the time:

There are still some flat spots in the “modified” run but they are shorter, as can be seen more clearly in this closeup of the plot:

Here are the changes I made.

Compile sq_gmres with higher level of compiler optimization. In order to do this you will have to edit the file CMakeLists.txt in the mfix source directory. This file cannot be copied to the work directory like Fortran files, you have to edit the master copy. It is in $CONDA_PREFIX/share/mfix/src
Add these lines:

set_source_files_properties(${MFIX_SRC}/des/sq_gmres.f PROPERTIES
  COMPILE_OPTIONS "${CFLAGS};-O3;-ggdb;${MARCH}")

You can add these right below the similar setting for xpow.c (around line 295)

Make the following changes in model/des/sq_contact_detection_newton.f . You can either edit the master copy or copy this file to your project directory. If you copy to project directory you will still have the original code for comparison.

diff --git a/model/des/sq_contact_detection_newton.f b/model/des/sq_contact_detection_newton.f
index 1f09d3f5d..1addf15d3 100644
--- a/model/des/sq_contact_detection_newton.f
+++ b/model/des/sq_contact_detection_newton.f
@@ -84,7 +84,7 @@ CONTAINS
 
           INTEGER,PARAMETER :: MAX_ITER = 100
           DOUBLE PRECISION,PARAMETER :: g = (sqrt(5.0) - 1.0) / 2.0 ! golden ratio
-          DOUBLE PRECISION,PARAMETER :: tol2 = 1e-7**2  !squared tolerance for golden mean
+          DOUBLE PRECISION,PARAMETER :: tol2 = 1e-5**2  !squared tolerance for golden mean
           DOUBLE PRECISION :: AAA(4), BBB(4), X1(4), X2(4), F1, F2, DELX(4)
           DOUBLE PRECISION :: funca1(4), funca2(4), funca01(4), funca02(4), FUNCA03(4)
           DOUBLE PRECISION :: X_A01(4), X_A02(4), X_A03(4)
@@ -249,7 +249,7 @@ CONTAINS
               else
                   ! solve linear equation using gmres_general
                   m = 1000
-                  threshold = 1E-10
+                  threshold = 1E-8
                   X_A_DELTA(:) = 0.0
                   call gmres(FJACOBIA, B, X_A_DELTA, 4,m, threshold, X_AA,c)
               endif
@@ -424,7 +424,7 @@ CONTAINS
 
           INTEGER,PARAMETER :: MAX_ITER=100
           DOUBLE PRECISION,PARAMETER :: G = (sqrt(5.0) - 1.0) / 2.0 ! golden ratio
-          DOUBLE PRECISION,PARAMETER :: TOL2 = 1E-7 ** 2 !squared tolerance for golden mean
+          DOUBLE PRECISION,PARAMETER :: TOL2 = 1E-5 ** 2 !squared tolerance for golden mean
           DOUBLE PRECISION :: AAA(4), BBB(4), X1(4), X2(4), F1, F2, DELX(4)
           DOUBLE PRECISION :: funcb1(4), funcb2(4), FUNCB01(4), FUNCB02(4), funcb03(4)
           DOUBLE PRECISION :: X_B01(4), X_B02(4), X_B03(4)

Then build a custom solver for your project.

We have an MFiX release coming out soon (25.3) but I do not feel comfortable adding these changes to the release without more testing. You can help us test this I suspect that there are further optimizations possible, we may not need to compute the inverse matrix at all.

And this condition on the determinant > 1 has never been clear to me: (@gaoxi , any comment)?

              if (dabs(detA) > 1.0d0)  then
                  call matinv4(FJACOBIA, FJACOBIA_inv)
                  call matvec4(FJACOBIA_inv, B, X_AA)
              else
                  ! solve linear equation using gmres_general
                  m = 1000
                  threshold = 1E-8
                  X_A_DELTA(:) = 0.0
                  call gmres(FJACOBIA, B, X_A_DELTA, 4,m, threshold, X_AA,c)
              endif

why are we using the gmres method when |det(A)| <= 1 ? Why is the cutoff equal to 1?

It’s also possible there’s a better way to solve the linear equation (BiCGSTAB, etc). We could try allowing different linear solvers here like we do for the fluid solver, etc.

So there’s probably quite a bit we could to to speed this code up. But we have to be careful to not introduce excessive errors.

With the modified code, I notice a tendency for the particles to continue to move after they have settled - this seems a bit non-physical (kinetic energy seemingly coming from nowhere) - you can see this at around the 6s mark in this movie clip:

However I see very similar things happening with the unmodified code, so I’m not convinced that this is due to my changes:

Thanks for your interest in MFiX and for helping us test this code. Please let us know if you find anything interesting. We may be able to find some improvements for the next release.

– Charles

Ethan_MFIX · September 24, 2025, 3:54pm

Thank you very much for your reply, Charles. Regarding this section of code, should it be added this way?
set_source_files_properties(${MFIX_SRC}/des/sq_gmres.f PROPERTIES
COMPILE_OPTIONS “${CFLAGS};-O3;-ggdb;${MARCH}”)
I’m sorry, I didn’t quite understand the first change you mentioned.

gaoxi · September 24, 2025, 4:01pm

The cutoff at |det(A)| = 1 is not a mathematically rigorous threshold but rather a practical heuristic to avoid numerical instability. The code is making the assumption that:

|det(A)| > 1: The matrix is “well-behaved enough” for direct inversion
|det(A)| ≤ 1: The matrix might be near-singular or ill-conditioned, so use the more robust iterative solver

cgw · September 25, 2025, 4:12pm

That looks correct. Does the solver compile after these changes?

cgw · September 25, 2025, 4:13pm

A near-singular matrix would have a determinant close to 0. I think perhaps we could change the cutoff from 1 to a lower value. I suppose this is an experimental question.