Solver stops without any error

majidkhalil · May 23, 2023, 6:45am

Dear Expert!
I am running DEM-SQP simulation. My simulation stops at 0.01 sec without generating any error.

The bug report is attached for insight.
sqp54soft_2023-05-23T114139.989965.zip (75.7 KB)

cgw · May 23, 2023, 12:43pm

Backtrace for this error:
#0 cfnewvalues_mod_MOD_superdem_cfnewvalues
at des/cfnewvalues.f:355

This looks like the same issue as:

majidkhalil · May 23, 2023, 1:52pm

I turned off all float overflow checks, and rebuild the solver as mentioned in Float overflow at des/cfnewvalues.f:539 running SQP simulation - #3 by cgw
still facing the same problem.
Kindly guide me more precisely to overcome this problem, as my thesis stuck!

cgw · May 23, 2023, 3:57pm

Ok. I can’t guarantee anything but we will try to get to the bottom of this!

I turned off overflow checking and ran the case, I haven’t gotten to the crash yet but I noticed something interesting:

There is a sudden change in the slope of the simulation time curve, there is some sort of phase transistion at about t=0.0085 (real time 410 s) where the case starts running MUCH more slowly, and the estimated time to completion just keeps increasing.

Running perf top while in this state:

we are spending an exceptional amount of time in the inverse_uptri_matrix routine, this is not normal and this is why the simulation is going so slowly. So we have to figure out what changed at t=0.0085.

cgw · May 23, 2023, 3:58pm

Velocity exceeds limit:   200.00
in cell: I =   36   J =    3   K =    2
 Epg =  0.97496     Ug =   0.0000     Vg =   233.82     Wg =  -260.01
To change the limit, adjust the scale factor MAX_INLET_VEL_FAC.

cgw · May 23, 2023, 4:19pm

I just got a solver crash, with overflow checks turned off:

#0 gmres_general_MOD_inverse_uptri_matrix
        at des/sq_gmres.f:207
#1 gmres_general_MOD_gmres
        at des/sq_gmres.f:117
#2 sq_contact_newton_dpmethod_mod_MOD_sq_contact_newton_dp_a
        at des/sq_contact_detection_newton.f:254
#3 sq_calc_force_mod_MOD_calc_force_superdem
        at des/sq_calc_force_superdem.f:341
#4 des_time_march_MOD_des_time_step
        at des/des_time_march.f:192
#5 run_dem
        at mfix.f:211
#6 run_mfix
        at mfix.f:146
#7 main_MOD_run_mfix0
        at main.f:79

It’s failing in the matrix inversion routine which has been gobbling up so much time. Note that this is a segmentation fault - accessing memory out of bounds - so this is not controlled by the FPE trapping code. There’s no recoving from this type of error, it’s always fatal.

Using gdb to examine the core:

mfix:) pwd
/tmp/sqp54soft_2023-05-23T114139.989965

mfix:) file core
core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/home/cgw/mambaforge/envs/mfix-git/bin/python -m mfix_solver.pymfix -d /tmp/sqp', real uid: 103, effective uid: 103, real gid: 1000, effective gid: 1000, execfn: '/home/cgw/mambaforge/envs/mfix-git/bin/python', platform: 'x86_64'

mfix:) gdb python core
GNU gdb (GDB) 11.2
...
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f3e0a388e54 in gmres_general::inverse_uptri_matrix 
...
(gdb) list 207
202	     do i=1,n
203	        do j=1,n
204	           if(i<j) then
205	           summ=0.0d0
206	           do k=i,j-1
207	           summ=summ+inverse_A(i,k)*A(k,j)
208	           enddo
209	           if (A(j,j)<=0.0d0 .and. dabs(A(j,j))<1e-30 ) then
210	
211	           inverse_A(j,j)=-summ/(-1e-30)
(gdb) whatis A
type = real(kind=8) (1001,1001)
(gdb) whatis inverse_A
type = real(kind=8) (1001,1001)
(gdb) p i
$6 = 1
(gdb) p j
$7 = 1001
(gdb) p k
$8 = <optimized out>

It’s a little hard to see what’s out of range, since A and inverse_A are both 1001x1001 and i and j are within bounds. We can’t see the value of k but the loop runs from i(1) to j-1 (1000) so that should be in bounds too.

Next step is to build a “debug” version of the solver.

gaoxi · May 25, 2023, 2:43am

There are some issues with your settings.

majidkhalil · May 25, 2023, 3:39am

Hello @cgw have you tried the debug version of solver?

cgw · May 25, 2023, 11:01am

@majidkhalil , we have two support threads going with significant overlap. Let’s close this thread and continue the discussion at Overlap between two superquadric particles is too large!

Thanks.

cgw · May 25, 2023, 11:01am