Dear Expert!
I am running DEM-SQP simulation. My simulation stops at 0.01 sec without generating any error.
The bug report is attached for insight.
sqp54soft_2023-05-23T114139.989965.zip (75.7 KB)
Dear Expert!
I am running DEM-SQP simulation. My simulation stops at 0.01 sec without generating any error.
The bug report is attached for insight.
sqp54soft_2023-05-23T114139.989965.zip (75.7 KB)
Backtrace for this error:
#0 cfnewvalues_mod_MOD_superdem_cfnewvalues
at des/cfnewvalues.f:355
This looks like the same issue as:
I turned off all float overflow checks, and rebuild the solver as mentioned in Float overflow at des/cfnewvalues.f:539 running SQP simulation - #3 by cgw
still facing the same problem.
Kindly guide me more precisely to overcome this problem, as my thesis stuck!
Ok. I can’t guarantee anything but we will try to get to the bottom of this!
I turned off overflow checking and ran the case, I haven’t gotten to the crash yet but I noticed something interesting:
There is a sudden change in the slope of the simulation time curve, there is some sort of phase transistion at about t=0.0085 (real time 410 s) where the case starts running MUCH more slowly, and the estimated time to completion just keeps increasing.
Running perf top
while in this state:
we are spending an exceptional amount of time in the inverse_uptri_matrix
routine, this is not normal and this is why the simulation is going so slowly. So we have to figure out what changed at t=0.0085.
Velocity exceeds limit: 200.00
in cell: I = 36 J = 3 K = 2
Epg = 0.97496 Ug = 0.0000 Vg = 233.82 Wg = -260.01
To change the limit, adjust the scale factor MAX_INLET_VEL_FAC.
I just got a solver crash, with overflow checks turned off:
#0 gmres_general_MOD_inverse_uptri_matrix
at des/sq_gmres.f:207
#1 gmres_general_MOD_gmres
at des/sq_gmres.f:117
#2 sq_contact_newton_dpmethod_mod_MOD_sq_contact_newton_dp_a
at des/sq_contact_detection_newton.f:254
#3 sq_calc_force_mod_MOD_calc_force_superdem
at des/sq_calc_force_superdem.f:341
#4 des_time_march_MOD_des_time_step
at des/des_time_march.f:192
#5 run_dem
at mfix.f:211
#6 run_mfix
at mfix.f:146
#7 main_MOD_run_mfix0
at main.f:79
It’s failing in the matrix inversion routine which has been gobbling up so much time. Note that this is a segmentation fault - accessing memory out of bounds - so this is not controlled by the FPE trapping code. There’s no recoving from this type of error, it’s always fatal.
Using gdb
to examine the core:
mfix:) pwd
/tmp/sqp54soft_2023-05-23T114139.989965
mfix:) file core
core: ELF 64-bit LSB core file, x86-64, version 1 (SYSV), SVR4-style, from '/home/cgw/mambaforge/envs/mfix-git/bin/python -m mfix_solver.pymfix -d /tmp/sqp', real uid: 103, effective uid: 103, real gid: 1000, effective gid: 1000, execfn: '/home/cgw/mambaforge/envs/mfix-git/bin/python', platform: 'x86_64'
mfix:) gdb python core
GNU gdb (GDB) 11.2
...
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00007f3e0a388e54 in gmres_general::inverse_uptri_matrix
...
(gdb) list 207
202 do i=1,n
203 do j=1,n
204 if(i<j) then
205 summ=0.0d0
206 do k=i,j-1
207 summ=summ+inverse_A(i,k)*A(k,j)
208 enddo
209 if (A(j,j)<=0.0d0 .and. dabs(A(j,j))<1e-30 ) then
210
211 inverse_A(j,j)=-summ/(-1e-30)
(gdb) whatis A
type = real(kind=8) (1001,1001)
(gdb) whatis inverse_A
type = real(kind=8) (1001,1001)
(gdb) p i
$6 = 1
(gdb) p j
$7 = 1001
(gdb) p k
$8 = <optimized out>
It’s a little hard to see what’s out of range, since A
and inverse_A
are both 1001x1001 and i
and j
are within bounds. We can’t see the value of k
but the loop runs from i
(1) to j-1
(1000) so that should be in bounds too.
Next step is to build a “debug” version of the solver.
@majidkhalil , we have two support threads going with significant overlap. Let’s close this thread and continue the discussion at Overlap between two superquadric particles is too large!
Thanks.