Unphysical field variables detected on one or more processes

Hello all,

I am attempting to set up a CFD-DEM (CGP) to simulate biomass pyrolysis in a fluidized bed reactor with a relatively large cross-sectional area. Consequently, I am inputting heat via a constant flux across the reactor wall. The problem however is that the temperature gradient value (ratio of experimental heat flux to the product of wall area and gas thermal conductivity) causes the simulation to fail. See the error message below. Values lower than 1e5 K/m typically runs well but heat input is not enough to maintain the desired reactor temperature.

I was wondering if anyone has dealt with a similar setup and has some tips to getting the heat flux to the desired level without crashing the simulation.

#################################################################
ERROR check_data_30.f:299
Unphysical field variables detected on one or more processes.
<<<<<#################################################################

Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
ERROR STOP 1
Note: The following floating-point exceptions are signalling: IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
ERROR STOP 1
Note: The following floating-point exceptions are signalling: IEEE_OVERFLOW_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
ERROR STOP 1
Note: The following floating-point exceptions are signalling: IEEE_OVERFLOW_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
ERROR STOP 1
Note: The following floating-point exceptions are signalling: IEEE_OVERFLOW_FLAG IEEE_UNDERFLOW_FLAG IEEE_DENORMAL
ERROR STOP 1

Hi Oluwafemi -

Please attach your model (and any other required input files) if you can. Use the “Submit bug report” menu item on the main MFiX menu to create a zipfile containing everything.

I’m interested to know what platform you are running on and how you are building the solver. The IEEE_OVERFLOW exceptions should be triggering a stack trace (which makes it easier to see where this is happening).

– Charles

Hi Oluwafemi, thanks for the files. This is very helpful and interesting for a number of reasons. This model has a large number (44) of chemical reactions, more than we’re used to seeing. It’s a good test case.

I notice that you did not use the MFiX GUI to create this case. I loaded it into the GUI and found a small error that caused the monitor definitions to be clobbered. I fixed that, it will be in the upcoming 21.3 release, then you should be able to try this case in the GUI if you like.

More importantly, I am able to reproduce this error, but it takes quite a long time - almost 5 hours on my 8-core i7-8650U CPU @ 1.90GHz.

Building and running with the flags

-ffpe-trap=invalid,zero,overflow,underflow,denormal -g -O2 -funsafe-math-optimizations

I was able to get a stack trace as follows:

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0  0x7f4d4848e6b0 in ???
#1  0x7f4d4848d865 in ???
#2  0x7f4d4815078f in ???
#3  0x55e03811fdd7 in __calc_collision_wall_MOD_calc_dem_force_with_wall_stl
	at /home/cgw/Work/NETL/mfix/model/des/calc_collision_wall_mod.f:354
#4  0x55e03812260e in __calc_force_dem_mod_MOD_calc_force_dem
	at /home/cgw/Work/NETL/mfix/model/des/calc_force_dem.f:94
#5  0x55e037e7293c in __des_time_march_MOD_des_time_step
	at /home/cgw/Work/NETL/mfix/model/des/des_time_march.f:182
#6  0x55e037e3b3ec in run_dem
	at /home/cgw/Work/NETL/mfix/model/mfix.f:211
#7  0x55e037e3b3ec in run_mfix_
	at /home/cgw/Work/NETL/mfix/model/mfix.f:146
#8  0x55e037e3b897 in mfix
	at /home/cgw/Work/NETL/mfix/model/mfix.f:298
#9  0x55e037e306d2 in main
	at /home/cgw/Work/NETL/mfix/model/mfix.f:269

The offending line is:

354	            MAG_OVERLAP_T = sqrt(DOT_PRODUCT(OVERLAP_T, OVERLAP_T))

where OVERLAP_T is

(gdb) p OVERLAP_T
$1 = (-1.2327053750412488e-08, 1.7556845290419608e-156, 1.2969799118356008e-10)

I don’t quite see how yet, but somehow we seem to be taking the square root of a negative quantity. Of course the dot product of any vector with itself should be non-negative but we might be running into an issue with numeric precision here.

I’m continuing to debug this, but wanted to let you know the current status. Since it takes a long time to reach the error, it might be a while until we get to the bottom of this.

– Charles

1 Like

Maybe this is just an underflow coming from squaring the y-component of the overlap (1.76E-156).

1 Like

Good point Jeff, I may be on the wrong track here. Continuing to investigate.

– Charles

1 Like

Thank you, Charles and Jeff, for your help on this.

Charles, do you need me to do anything in the meantime?

Current hypothesis is that the IEEE_UNDERFLOW_FLAG and IEEE_DENORMAL exceptions are not the root cause of the problem… these are being reported because mfixsolver calls ERROR STOP and any floating-point exceptions that occurred get reported in that case, even though they were non-fatal. This can be controlled with the -ffpe-summary compiler flag, we should probably use this to prevent these spurious messages.

Looking at check_data_30.f, where the “Unphysical field variables” message is printed

   SUBROUTINE CHECK_PHYSICAL_BOUNDS(REPORT_SPECIES)
      
! Gas viscosity must be non-negative.
            CALL CHECK_NOTLESSTHASSTHAN(MU_G(IJK), ZERO, 'MU_G')
! Mixture molecular weight must be positive.
            CALL CHECK_GREATERTHAN(MW_MIX_G(IJK), ZERO, 'MW_MIX_G')
! Verify thermodynamic properties when solving energy equations:
            IF (ENERGY_EQ) THEN
! Gas conductivity must be non-negative.
               CALL CHECK_NOTLESSTHAN(K_G(IJK), ZERO, 'K_G')
! Gas phase specific heat must be positive.
               CALL CHECK_GREATERTHAN(C_PG(IJK), ZERO, 'C_PG')
! Verify that the gas phase temperature is within the bounds.
               CALL CHECK_GREATERTHAN(T_G(IJK), TMIN, 'T_G')
               CALL CHECK_LESSTHAN(T_G(IJK), TMAX, 'T_G')

it continues like this with dozens of checks, and if any have failed, we print the message you reported:

         write (err_msg,"('Unphysical field variables detected on one or more processes.')")
         call log_error()

It would of course be extremely helpful to know which of these many checks was the failing one. Looking at the definitions for CHECK_LESSTHAN, etc, they contain:

         IF (field_value < limit_value) CALL REPORT_ERROR &
            (IER, error_message, I, J, K, field_value, '<', limit_value, label)

there should be a message like:

Fatal Error: Unphysical field variables detected.

with the name and IJK index and value of the offending variable. But in your mfixsolver output, I see no such message. So I’m taking a closer look at the code which reports these out-of-bounds errors.

  1. All of the messages about IEE_UNDERFLOW, IEEE_DENORMAL, etc are irrelevant… it’s not an FPE problem. I will work to remove these spurious messages in the next release, as they are confusing and caused me to waste some time looking in the wrong direction.

  2. The problem is that the reactor is getting too hot. I think the output could make this more obvious. The overall log AUTOTHERMAL.LOG just reports

    Unphysical field variables detected on one or more processes.

without further details, but there are also files

AUTOTHERMAL_2.LOG AUTOTHERMAL_3.LOG AUTOTHERMAL_6.LOG AUTOTHERMAL_7.LOG

written by each compute node, and they contain more info:

x280:( grep -i unphy AUTOTHERMAL*LOG
AUTOTHERMAL_2.LOG:Fatal Error: Unphysical field variables detected.
AUTOTHERMAL_3.LOG:Fatal Error: Unphysical field variables detected.
AUTOTHERMAL_6.LOG:Fatal Error: Unphysical field variables detected.
AUTOTHERMAL_7.LOG:Fatal Error: Unphysical field variables detected.

and then looking in the numbered logs, we see

Fatal Error: Unphysical field variables detected.
     I      J      K
     2    167      2      T_G =    4000.     >=   4000.
     3    167      2      T_G =    4000.     >=   4000.
     4    167      2      T_G =    4000.     >=   4000.
     5    167      2      T_G =    4000.     >=   4000.
     6    167      2      T_G =    4000.     >=   4000.
     7    167      2      T_G =    4000.     >=   4000.
     8    167      2      T_G =    4000.     >=   4000.
     9    167      2      T_G =    4000.     >=   4000.
     ....

– Charles

Thank you, Charles.

This makes sense to me. However, my primary issue is unresolved. Is there a medium for quick 1:1 meetings to resolve issues?

Oluwafemi - please see my latest message, posted earlier this morning. I believe that your primary issue is that the gas temperature T_G is exceeding a hard-coded limit:

mfix/model/tolerance_mod.f

! Upper bound for temperatures
      DOUBLE PRECISION, PARAMETER :: TMAX = 4000.D0

You can try raising this, but I think something is running away in your model if the temperature is getting this high.

– Charles

Have you tried to run without your modified des_reaction_model.f to see if this could be the source of your issue?

Just to confirm, I created a monitor for max t_g over the entire domain. You can see how quickly the temperature is rising (over 20K/sec). I did not run it for the full 4 hours, but it’s not hard to see where this is going.

– Charles

I think the answer is actually already included in the original question. You are imposing a 1 Million degree per meter temperature gradient along the top wall (bc_c_t_g(4) = 1000000.0) which is causing excessive local heating of the gas phase.

You may want to consider using a constant reactor wall temperature (BC#5) to maintain the desired temperature. It is currently set as adiabatic wall.

1 Like

Thank you, Jeff. I will try setting up to test the case without a modified des_reaction_model.f and keep you posted.

That makes sense. I think the issue may be with my boundary condition. I will try it with fixed temperature like Jeff has suggested below.

Yes, that is where the problem is coming from because I have been able to run the case successful under adiabatic conditions. I am setting up now to try the case with constant wall temperature.

Just to confirm, the constant flux is (heat flux Ă· Wall area Ă· fluid thermal conductivity)?

Also, are you considering adding capability to just specify the heat flux directly in J/s since the thermal conductivity of fluid around the wall is not constant?

The input BC_C_T_G in the GUI under “constant flux” is the coefficient C in
d(T_g)/dn + Hw (T_g - Tw_g) = C . When Hw is zero, C is just the temperature gradient (K/m).

If your flux F is in Watt, then C = F / (Area x conductivity)
If your flux q is in W/m^2 then C = q / conductivity.

We have not considered inputting the flux in Watts, but we can if there is a high demand.

1 Like

Hi Jeff,

following with the topic, I’ve been digging in the forum to check tips to implement a constant heat flux boundary condition. I think it can be really useful to do it either in W or W/m2, since conductvity will vary through the simulation. Besides, it will facilitate its input.

Thanks!

1 Like

Please create a new topic for further followup on this. The discussion has strayed far from “Unphysical field variables”.