BUG REPORT
Type of issue
solver crash and solver non-convergence, etc.
Description
Hello MFIX experts! Recently, I tried to simulate a cylindrical fluidized bed. I submitted the job to our cluster. However, I keep get the floating-point error at DGTSV.f. I am not sure if it is due to numerical settings or pallalization (I used DMP) or a bug. The error is:
Time = 2.20003 Dt = 0.10000E-03
Nit P0 U0 V0 P1 U1 V1 Max res
1 7.7E-04 4.9E-05 2.4E-04 3.9E-08 3.6E-04 3.8E-03 V1
[h14r2n04:7587 :0:7587] Caught signal 8 (Floating point exception: floating-point divide by zero)
/work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/DGTSV.f: [ __dgtsv_mod_MOD_dgtsv() ]
…
187 TMP = DU(1:n-1)*DL(1:n-1)
188 do 1 I = 2, N
189 ! GG(i) = 1.0 / (D(i) - DL(i-1)*GG(i-1)*DU(i-1))
==> 190 GG(i) = 1.0 / (D(i) - GG(i-1)*TMP(i-1))
191 1 continue
192
193 YY(1) = GG(1) * B(1,1)
==== backtrace (tid: 7587) ====
0 0x00000000004eb3fb __dgtsv_mod_MOD_dgtsv() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/DGTSV.f:190
1 0x000000000044638a __leqsol_MOD_leq_jksweep() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/leqsol_mod.f:896
2 0x000000000044782f __leqsol_MOD_leq_msolve() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/leqsol_mod.f:324
3 0x00000000006d5dba __leq_bicgs_mod_MOD_leq_bicgs0() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/leq_bicgs.f:392
4 0x00000000006d699c __leq_bicgs_mod_MOD_leq_bicgs() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/leq_bicgs.f:91
5 0x00000000004a2567 __solve_lin_eq_mod_MOD_solve_lin_eq() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/solve_lin_eq.f:165
6 0x00000000004a6ff0 w_m_star() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/solve_vel_star.f:687
7 0x00000000004a6ff0 _solve_vel_star_mod_MOD_solve_vel_star() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/solve_vel_star.f:138
8 0x0000000000443624 iterate_MOD_do_iteration() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/iterate.f:255
9 0x000000000040314d run_fluid() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/mfix.f:188
10 0x000000000040314d run_mfix() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/mfix.f:142
11 0x00000000004038dd MAIN() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/mfix.f:298
12 0x0000000000402aa1 main() /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/mfix.f:269
13 0x00000000000223d5 __libc_start_main() ???:0
14 0x0000000000402ad1 _start() ???:0
Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.
Backtrace for this error:
#0 0x2ba43ca47b9a in ???
#1 0x2ba43ca46dc3 in ???
#2 0x2ba43d56c5cf in ???
#3 0x4eb3fb in __dgtsv_mod_MOD_dgtsv
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/DGTSV.f:190
#4 0x446389 in __leqsol_MOD_leq_jksweep
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/leqsol_mod.f:896
#5 0x44782e in __leqsol_MOD_leq_msolve
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/leqsol_mod.f:324
#6 0x6d5db9 in __leq_bicgs_mod_MOD_leq_bicgs0
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/leq_bicgs.f:392
#7 0x6d699b in __leq_bicgs_mod_MOD_leq_bicgs
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/leq_bicgs.f:91
#8 0x4a2566 in __solve_lin_eq_mod_MOD_solve_lin_eq
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/solve_lin_eq.f:165
#9 0x4a6fef in w_m_star
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/solve_vel_star.f:687
#10 0x4a6fef in __solve_vel_star_mod_MOD_solve_vel_star
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/solve_vel_star.f:138
#11 0x443623 in _iterate_MOD_do_iteration
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/iterate.f:255
#12 0x40314c in run_fluid
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/mfix.f:188
#13 0x40314c in run_mfix
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/mfix.f:142
#14 0x4038dc in mfix
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/mfix.f:298
#15 0x402aa0 in main
at /work/home/acisnr5rv0/software/mfix/mfix-22.1.1/model/mfix.f:269
Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
mpirun noticed that process rank 27 with PID 0 on node h14r2n04 exited on signal 8 (Floating point exception).
Could someboday help with solving this error? Many thanks! The .mfx file can be found in the following.
case1_tfm.mfx (45.9 KB)