FPE at des_thermo_cond_mod.f:295, incomplete stack trace

GL1 · August 31, 2023, 7:41am

Please select the most relevant MFiX category: | Installation | How to | Bug report | Share | for this topic.
Dear developers, hello. At present, I am simulating biomass pyrolysis (simply setting a working condition and adding a chemical reaction), but I have encountered some problems (in the Solids interface, turning on the Thermal Cond. function and running it for 2 steps will result in an error message: Solver crash! The MFix solvent has terminated unexpectedly. Error information: float in cal operation in (null) at (null): 0; If the thermal conduction function is not turned on, the calculation example can run normally. I cannot solve this problem.
I will appreciate it if this problem can be solved.
delete_2023-08-31T153552.188859.zip (68.4 MB)

Uploading: 2aca8ec8385a2fe01317d650e7d5b4f.png…

cgw · August 31, 2023, 4:10pm

Hi. There are a few things going on here that I don’t completely understand.
First off, the stack trace says (null):0 where there should be a filename - when I run this case I get the following backtrace:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0 radius
  at des/des_thermo_cond_mod.f:295
#1 des_thermo_cond_MOD_des_conduction
  at des/des_thermo_cond_mod.f:158
#2 calc_force_dem_mod_MOD_calc_force_dem
  at des/calc_force_dem.f:206
#3 des_time_march_MOD_des_time_step
  at des/des_time_march.f:194
#4 run_dem
  at mfix.f:211
#5 run_mfix
  at mfix.f:146
#6 main_MOD_run_mfix0
  at main.f:84

I’m not sure why you are not getting the full error report. You can try building the solver in Debug mode.

Here’s the offending code.

284	! Calculate
285	      VALUE = (R1**2 - R2**2 + CENTER_DIST_CORR**2)/(2.0d0 * R1 * CENTER_DIST_CORR)
286	! Check to ensure that VALUE is less than or equal to one. If VALUE is
287	! greater than one, the triangle inequality has been violated. Therefore
288	! there is no intersection between the fluid lens surrounding the larger
289	! particle and the surface of the smaller particle.
290	! Thus, there is no particle-fluid-particle heat transfer.
291	      IF( VALUE .GT. 1.0d0) THEN
292	         RADIUS = -1.0d0
293	      ELSE
294	! Calculate beta (Law of cosines)
295	         BETA = ACOS( VALUE )

The error is at line 295. The variable value is equal to -2.908 so it’s impossible to take the arccos of that.

Now, how did value get negative?

(gdb) print center_dist_corr 
 -0.00012723204740919605

This causes the denominator of the fraction at line 285 to be negative.

125	         IF(CENTER_DIST < (MAX_RAD + MIN_RAD)) THEN
126	! Correct particle overlap to account for artificial softening
127	            OLAP_Sim = MAX_RAD + MIN_RAD - CENTER_DIST
128	            IF(DO_AREA_CORRECTION)THEN
129	               IF(Im.ge.Jm)THEN
130	                  OLAP_Actual = CORRECT_OLAP(OLAP_Sim,Jm,Im)
131	               ELSE
132	                  OLAP_Actual = CORRECT_OLAP(OLAP_Sim,Im,Jm)
133	               ENDIF
134	               CENTER_DIST_CORR = MAX_RAD + MIN_RAD - OLAP_Actual

(gdb) p max_rad
 0.0020000000000000009
(gdb) p min_rad
 0.0020000000000000009
(gdb) p center_dist
 0.0039993857351290022

So center_dist is (slightly) less than max_rad + min_rad and we are executing the “Correct particle overlap” code, which is doing something wrong …

(gdb) p olap_sim
 6.1426487099960903e-07
(gdb) p olap_actual
 0.0041272320474091979
(gdb) p max_rad + min_rad - olap_actual
 -0.00012723204740919605

There should probably be some checks in the code to make sure this distance never gets negative.

jeff.dietiker · August 31, 2023, 4:52pm

First, make sure you use the grid based neighbor search, this will be faster than N-square, but this is not related to your issue.

The Young’s moduli you are using for the conduction are too small. You should use values that are representative of the actual material, probably in the order of 10 GPa to 100 GPa. This is different from the Youngs moduli you may use with the Hertzain collision model which are typically smaller that the actual material property (otherwise the DEM time step will likely be prohibitively small).

GL1 · September 1, 2023, 7:07am

Thank you very much for your answer and feedback. This problem has been resolved.
Best wishes.

GL1 · September 1, 2023, 7:09am

Thank you very much for your answer and feedback. This problem has been resolved.
Best wishes.

GL1 · September 6, 2023, 11:55am

Dear cgw, excuse me, how did you obtain the error code? Could you please tell me the steps to obtain it? Recently, I have been trying to find a way to obtain error codes, but no matter how I operate, I cannot do it. If you have the time to provide detailed steps, I would really appreciate it.
Best wishes!

cgw · September 6, 2023, 2:52pm

I’m running on Linux and using gdb to examine core dumps.

GL1 · September 7, 2023, 12:53am

[quote=“GL1, post:10, topic:4829, full:true”]
Dear cgw, excuse me. Could you please provide specific debugging screenshots or processes? I really don’t understand this operation, and no one around me can operate this process. I really don’t have a solution now. If this problem can be solved, I would greatly appreciate it.
Best wishes!

GL1 · September 8, 2023, 10:45am

Dear CGW, excuse me. According to the debugging results you provided, I tried on the Linux system, but it cannot be fully replicated. Due to my limited abilities, I really can’t come up with any solutions. The main question is why I can’t obtain the values shown in the figure below.

My own operation process is shown in the following figure:First, run Mfixsolver, and then type (gdb) p max_ Rad(), but there are no results or values. May I ask why this is? Or rather, how did you operate it? I am truly powerless and eager to learn this knowledge and content. Thank you again. If this issue can be resolved, I would greatly appreciate it.

Also, I have a question. Is the error code shown in the following figure obtained in the Mfix software or in the terminal interface?

Finally, thanks for your help again.
Best wishes.

cgw · September 8, 2023, 2:42pm

Line 291 should be changed to

       IF(VALUE .LT. -1.0d0 .or. VALUE .GT. 1.0d0) THEN

I’ll follow up with some notes about debugging with GDB

GL1 · September 10, 2023, 3:00pm

Thank you very much for your reply. I am very eager to learn about gdb debugging, and I kindly request that you follow up and update as soon as possible. Thank you again.
Best wishes!

cgw · September 11, 2023, 5:12pm

Hi. First off, It’s still not clear to me why the GUI is showing null:0 instead of the correct line number. I’d like to get to the bottom of that.

As promised, a few notes on GDB.

I suggest you try to find a GDB tutorial, I can offer some tips but I can’t teach you everything about GDB. The commands you need most are “where”, “up”, “down”, “list”, “print” and “whatis”. You can type “help” in GDB for the built-in help system which is not bad but presumes a bit of background knowledge.

Also note that debugging generally works better on Linux. On Linux you can do post-mortem debugging via “core” files, which don’t exist on Windows. On Windows you have to start the mfixsolver inside gdb, which it looks like you have figured out already.

The reason p min_rad is not working in your gdb session example is that you are on the wrong stack frame. If you type where at the (gdb) prompt it will show you the stack. Variables are only accessible in the stack frame where they live. In this example, the stack looks something like this:

(gdb) where
#0  0x00007f8a6861f3ae in feraiseexcept () from /lib64/libm.so.6
#1  0x00007f8a68621750 in acosf64 () from /lib64/libm.so.6
#2  0x00007f8a185d71cd in des_conduction::radius (r2=0.0020000000000000009, 
    r1=<optimized out>)
    at /home/cgw/mambaforge/envs/mfix-23.2/share/mfix/src/model/des/des_thermo_cond_mod.f:295
#3  0x00007f8a185d7a9f in des_thermo_cond::des_conduction (i=4, j=473, 
    center_dist=0.0039993857351258745, im=<optimized out>, 
    iijk=<optimized out>)
    at /home/cgw/mambaforge/envs/mfix-23.2/share/mfix/src/model/des/des_thermo_cond_mod.f:158

so the bottom of the stack is the library function feraiseexcept which is responsible for floating point exceptions. That function does not know about min_rad, etc. To see the variables which are part of des_thermo_conduction.f we need to go up 2 levels:

(gdb) up
#1  0x00007f8a68621750 in acosf64 () from /lib64/libm.so.6
(gdb) up
#2  0x00007f8a185d71cd in des_conduction::radius (r2=0.0020000000000000009, r1=<optimized out>) at /home/cgw/mambaforge/envs/mfix-23.2/share/mfix/src/model/des/des_thermo_cond_mod.f:295
295	         BETA = ACOS( VALUE )

Now we have a new problem because the term value seems to have a special meaning which blocks us accessing the variable:

(gdb) p value
Attempt to use a type name as an expression
(gdb) whatis value
type = Type, C_Union :: value   # ???

So now we look at more code to see how value is computed:

(gdb) list 270
265	!......................................................................!
266	! Subroutine Procedure: FUNCTION RADIUS                                !
267	!                                                                      !
268	! Propose: Calculate the center line radius between two particles.     !
269	! Depending on what is given as input, this function calculates:       !
270	!   1) radius of contact area between two particles                    !
271	!   2) radius delineating particle-fluid-particle region.              !
272	!......................................................................!
273	      DOUBLE PRECISION FUNCTION RADIUS(R1, R2)
274	
(gdb)   # hit return for more
275	      IMPLICIT NONE
276	
277	! Radius values
278	      DOUBLE PRECISION, INTENT(IN) :: R1, R2
279	! Distance between particle centers
280	      DOUBLE PRECISION BETA
281	! Generic value
282	      DOUBLE PRECISION VALUE
283	
284	! Calculate
(gdb)  # hit return for more
285	      VALUE = (R1**2 - R2**2 + CENTER_DIST_CORR**2)/(2.0d0 * R1 * CENTER_DIST_CORR)

Ok, here’s the computation of value. GDB can evaluate expressions, so let’s try to reproduce the computation:

(gdb) p (R1**2 - R2**2 + CENTER_DIST_CORR**2)/(2.0d0 * R1 * CENTER_DIST_CORR)
No symbol "CENTER_DIST_CORR" in current context.

Now we’re seeing a new issue, which is the fact that the compiler optimizer pass will eliminate some intermediate variables thereby making them inaccessible. So we rebuild the solver in Debug mode which turns the optimizer off:

Now, we’re in business!

(gdb) p (R1**2 - R2**2 + CENTER_DIST_CORR**2)/(2.0d0 * R1 * CENTER_DIST_CORR)
$2 = -2.9083823068238357

Hope this helps,

– Charles

GL1 · September 15, 2023, 2:48pm

Thank you for your reply. Your reply has been very helpful to me.
Best wishes!