Simulation terminated almost immediately

jack7z · October 16, 2023, 8:03pm

Hello all,

I tried to run a simulation of adsorption and desorption at the same time. However, the simulation kept terminating right after I started them with this error. Is it complaining that some reaction is too fast or some number is too small? How do I interpret it so that I can figure out which part of the simulation is causing the problem? Thanks!

td_jelly_roll (7).mfx (21.7 KB)
usr_rates (12).f (3.0 KB)

Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.

Backtrace for this error:
#0 0x14c78fd61b4f in ???
#1 0x402e6f in usr_rates_
at /projectnb/ryanlab/jackz/test/AdDe50C/0.08249/usr_rates.f:88
#2 0x53748e in rrates0
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/rrates0.f:181
#3 0x414896 in __calc_coeff_mod_MOD_calc_rrate
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/calc_coeff.f:394
#4 0x414c27 in __calc_coeff_mod_MOD_calc_rrate
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/calc_coeff.f:385
#5 0x414c27 in __calc_coeff_mod_MOD_calc_coeff_all
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/calc_coeff.f:282
#6 0x40ce5f in _step_MOD_time_step_init
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/time_step.f:89
#7 0x403586 in run_fluid
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/mfix.f:186
#8 0x403586 in run_mfix
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/mfix.f:145
#9 0x403c89 in mfix
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/mfix.f:323
#10 0x402cb0 in main
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/mfix.f:293

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.

mpirun noticed that process rank 16 with PID 0 on node scc-yj3 exited on signal 8 (Floating point exception).

jack7z · October 23, 2023, 3:03pm

Bumping up my questions above. Also I tried setting the second reaction rate to 0, so it’s equivalent to having only 1 reaction. With only reaction 1 the simulation ran just fine, but adding a second reaction with 0 reaction rate produced the “Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation” error again. Does anyone know what might be the cause? Thanks! I attached my code below.
td_jelly_roll (7).mfx (21.7 KB)
usr_rates (13).f (3.0 KB)

jeff.dietiker · October 23, 2023, 7:36pm

I am not able to reproduce with the files you attached. However, on line 88 of usr_rates.f , you are using c_CO2, which is not defined if x_CO2 is zero (or less than c_Limiter). You may most of the time get this value set to zero but you can also get some garbage value (You can add a print statement above line 88 to check). You either need to set c_CO2=0.0 when X_s(IJK, 1,ML) <= c_Limiter or always compute c_CO2.

jack7z · October 31, 2023, 6:14pm

Thanks Jeff! I revised line 88 and added an else statement that sets the c_CO2 concentration to 0 if x_CO2 is smaller than the limit.

Now I got the code to run with 0 reaction rate for reaction 2. But when I had non-zero reaction rate, it would still terminate after simulating several seconds of the reactions and give me the error “Program received signal SIGFPE: Floating-point exception - erroneous arithmetic operation.”

Is there a way to see what’s causing the erroneous arithmetic operation? Thanks!

td_jelly_roll (7).mfx (21.7 KB)
usr_rates (13).f (3.0 KB)

jeff.dietiker · October 31, 2023, 7:51pm

You should get a backtrace of where the error is occurring similar to what you posted at the top. You can also build the solver with debug flags (select “Debug” in the build type).

jack7z · November 1, 2023, 3:19pm

Thank you Jeff! I found the line that causing problem was
#1 0x402e6f in usr_rates_
at /projectnb/ryanlab/jackz/test/AdDe50C/_0.10029/usr_rates.f:91

That line said
IF(c_ML > c_limiter) then
RATES(Reaction_1) =kc_CO2c_ML**2

I guess it ran into problem because c_CO2 is 0? But I thought if a reaction rate is 0 the reaction would just not happen. Why is there an erroneous arithmetic operation?

I pasted the full error below as well. Also how to run the debug mode in terminal? What command line would I use?

Thank you so much for your help!

Backtrace for this error:
#0 0x152602468b4f in ???
#1 0x402e6f in usr_rates_
at /projectnb/ryanlab/jackz/test/AdDe50C/0.10029/usr_rates.f:91
#2 0x53749e in rrates0
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/rrates0.f:181
#3 0x4148a6 in __calc_coeff_mod_MOD_calc_rrate
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/calc_coeff.f:394
#4 0x414c37 in __calc_coeff_mod_MOD_calc_rrate
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/calc_coeff.f:385
#5 0x414c37 in __calc_coeff_mod_MOD_calc_coeff_all
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/calc_coeff.f:282
#6 0x40ce6f in _step_MOD_time_step_init
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/time_step.f:89
#7 0x403596 in run_fluid
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/mfix.f:186
#8 0x403596 in run_mfix
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/mfix.f:145
#9 0x403c99 in mfix
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/mfix.f:323
#10 0x402cb0 in main
at /projectnb/ryanlab/jack7z/.conda/envs/mfix-23.3/share/mfix/src/model/mfix.f:293

jeff.dietiker · November 1, 2023, 5:55pm

I think you may have the same problem with c_ML which is not defined if the molar concentration of microlith is below the c_limiter threshold. It is best to make sure all variables are initialized to avoid unexpected results. You are also using c_RM but it is not defined.

Once you build with debug flags, you can use the custom solver and it will generally provide more information when it fails. This is not a debugging tool per se, (you can’t pause, nor inspect variables).

Simulation terminated almost immediately

Primary job terminated normally, but 1 process returned a non-zero exit code. Per user-direction, the job has been aborted.

mpirun noticed that process rank 16 with PID 0 on node scc-yj3 exited on signal 8 (Floating point exception).

Primary job terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.