-
I do not believe that the incorrect walltime and remaining time estimates will affect your simulation results in any way, these numbers are just reported to you to monitor the progress of the simulation. We will get this fixed but for now it’s safe to ignore this.
-
The
Unphysical field variables
message is a real problem. MFiX has terminated because some quantity has gone out of the legal range - e.g. the temperature going outside the rangetmin:tmax
(50:4000K by default), or some pressure or density going negative, etc. You attached themfx
file but not the job logs etc. UseSubmit bug report
from the main menu to create a full bug report including job output. Better yet, examine the.LOG
files yourself and see if you can figure out which variable it is. Also look at the “10 messages” reported in the popup, one of them might specify the field variable in question. -
The
Solver crash!
message is inaccurate - the solver aborted deliberately when thecheck_data
function found unphysical values. But it looks like the process does not exit cleanly when this happens, triggering a spate ofMPI_FINALIZE
warnings. These are ugly but also can be safely ignored. (I will try to clean up this exit path if I can). The only real issue here is #2 above, theUnphysical field variables
… this is what is causing your simulation to terminate.
I started reviewing the time-keeping code, I think it’s fair to say that some of this code is starting to show its age:
double precision recursive function wall_time()
implicit none
INTEGER(KIND=8), SAVE :: COUNT_OLD=0, WRAP=0, COUNT_START=0
LOGICAL, SAVE :: FIRST_CALL=.true.
INTEGER(KIND=8) CLOCK_CYCLE_COUNT, CLOCK_CYCLES_PER_SECOND
! max number of cycles; after which count is reset to 0
INTEGER(KIND=8) CLOCK_CYCLE_COUNT_MAX
CALL SYSTEM_CLOCK(CLOCK_CYCLE_COUNT, CLOCK_CYCLES_PER_SECOND, CLOCK_CYCLE_COUNT_MAX)
IF (FIRST_CALL) THEN
FIRST_CALL = .false.
COUNT_START = CLOCK_CYCLE_COUNT
END IF
IF(COUNT_OLD .GT. CLOCK_CYCLE_COUNT) THEN
! This is unlikely. 64-bit INTEGER and 100 MHz CLOCK_CYCLES_PER_SECOND would mean 300 years until WRAP is incremented.
WRAP = WRAP + 1
ENDIF
COUNT_OLD = CLOCK_CYCLE_COUNT
WALL_TIME = DBLE(CLOCK_CYCLE_COUNT - COUNT_START)/DBLE(CLOCK_CYCLES_PER_SECOND) &
+ DBLE(WRAP) * DBLE(CLOCK_CYCLE_COUNT_MAX)/DBLE(CLOCK_CYCLES_PER_SECOND)
end function wall_time
SYSTEM_CLOCK
is an slighly outmoded way of getting the time, here is a reference SYSTEM_CLOCK (The GNU Fortran Compiler)
Determines the COUNT of a processor clock since an unspecified time in the past modulo COUNT_MAX, COUNT_RATE determines the number of clock ticks per second. If the platform supports a monotonic clock, that clock is used and can, depending on the platform clock implementation, provide up to nanosecond resolution. If a monotonic clock is not available, the implementation falls back to a realtime clock.
CALL SYSTEM_CLOCK([COUNT, COUNT_RATE, COUNT_MAX])
Arguments:
COUNT (Optional) shall be a scalar of type INTEGER with INTENT(OUT).
COUNT_RATE (Optional) shall be a scalar of type INTEGER or REAL, with INTENT(OUT).
COUNT_MAX (Optional) shall be a scalar of type INTEGER with INTENT(OUT).
A small test program (using 64-bit INTEGERs) reports:
Count 16555615857496
Max 9223372036854775807
Hz 1000000000
Now that Max number looks suspiciously close to the walltime
number in your screenshot, once we divide by Hz…
your reported walltime is 9223373246.
Using the units
program for convenience:
bash$ units
You have: (9223372036854775807/1000000000) sec
You want: year
* 292.27727
The wraparound time is on the order of 300 years, as the comment in the code indicates. So, the overflow check at line 73 is purely theoretical - it should really never trigger at all, and one could argue that it shouldn’t even be there, since there’s no way to test it (and now it’s misbehaving!)
73 IF(COUNT_OLD .GT. CLOCK_CYCLE_COUNT) THEN
74 ! This is unlikely. 64-bit INTEGER and 100 MHz CLOCK_CYCLES_PER_SECOND would mean 300 years until WRAP is incremented.
75 WRAP = WRAP + 1
76 ENDIF
77 COUNT_OLD = CLOCK_CYCLE_COUNT
Note that if there is any non-monotonicity to the system clock, even by the tiniest amount, and the CLOCK_CYCLE_COUNT
goes down, then WRAP
will increment, and the reported time will leap forward by 292 years, exactly as we are seeing in your case! It’s also possible that there’s some sort of race condition with the COUNT_OLD
code. But it’s clear that this flawed wraparound check is at the root of the problem.
Since the SYSTEM_CLOCK
subroutine has various issues, I plan on replacing this code with something a little more modern, where you don’t have to worry about the system clock rate, overflows, unspecified starting time, etc.
In the meanwhile, if you like- open the file time_cpu_mod.f
, comment out line 75 with a !
and rebuild the solver. This should fix your jumpy-clock problem.
OK. I will try as you say. Thank you very much for your answer!