Time issue during running the simulation (REQUESTED CPU TIME LIMIT REACHED)

Hello @jeff.dietiker @cgw ,

I have a query. First of all, thanks a lot for your suggestion . I have installed LINUX and now i can use the cores available on my computer to run the simulation.
Anyways, my query is

  1. I gave a runtime of 5 sec but the simulation stopped at 1.455 which is early. Also, the simulation didn’t even run for 2 days which is the requested CPU time. I don’t see any convergence error. What could be a reason for it? and how can i solve it?
  2. How do i check the run time of the simulation when its about to start/

Can i turn off the ENABLE MAX WALL TIME and would that resolve the error?

I will be grateful if you could reply.

Looks like the solver thinks it ran for 2 days

image

If it truly did not run for two days, there might be a bug in that code. You can simply turn off ENABLE MAX WALL TIME. The reason for that option is if you are running in a queue that has a time limit and you want to make sure the solver saves a clean restart file before the queue kills your job.

A) Congratulations on getting Linux installed and running.

B) The ‘Dashboard’ pane shows remaining/estimated time while a job is running.

C) In general, if you have crashes and want help here, please attach all .LOG files (use the Submit bug report menu item to create a ZIP file with all required inputs). There’s not a lot we can do from a screenshot.

D) However, I think I see what’s going on:
The final message in the MFiX console says “REQUESTED CPU TIME LIMIT REACHED”. And I can see you have wall time limit set to 48H. The final console message says

REQUESTED CPU TIME LIMIT REACHED
Total CPU time used = 1.999 days
Total wall time used = 3.497 h

There’s a mistake and wall time vs CPU time are getting confused. “Wall time” refers to real time, as you would see from a clock on the wall (this is an old fashioned term, we should just call it ‘Real time’). CPU time refers to how much processor work is getting done. On a time sharing system when you only get a slice of the CPU, it can go slower than real time. And when using multiple cores, CPU time can progress faster than real time, as in your case! You used 2 days of CPU time in 3.5 hours. There is clearly a bug in the MFiX time-keeping code where it confuses CPU time vs real time. I will fix this for the 22.4 release.

In the meanwhile, as Justin suggests, simply turn off Enable max wall time. And in the future, feel free to experiment with settings like this. If in doubt, try it out.

– Charles

1 Like

Investigating a little more:

I ran the hopper_dem_3d tutorial with a 60 second time limit

The serial (single process) solver behaves as expected - the job terminates after a minute, and at the end of the run we see:

=============== REQUESTED CPU TIME LIMIT REACHED =
Batch Wall Time:       60.00 s
Elapsed Wall Time:     61.30 s
Term Buffer:            0.00 s
=============== REQUESTED CPU TIME LIMIT REACHED =
    NITs/SEC = 1796.
 Total CPU used = 58.95 s
 Total CPU IO used = 1.080 s
 Total wall time used = 59.84 s

For a DMP run, we get (2x2x2=8 processes)

=============== REQUESTED CPU TIME LIMIT REACHED =
Batch Wall Time:       60.00 s
Elapsed Wall Time:     62.03 s
Term Buffer:            0.00 s
=============== REQUESTED CPU TIME LIMIT REACHED =
    NITs/SEC = 1732.
 Total CPU used = 55.64 s
 Total CPU IO used = 4.403 s
 Total wall time used = 62.04 s

I verified via top that 8 processes were running locally, so the CPU time should really be higher than the reported 55 sec. Nonetheless the time limit worked correctly.

For an SMP run (8 threads), the job ended after about 15 seconds, reporting


=============== REQUESTED CPU TIME LIMIT REACHED =
Batch Wall Time:       60.00 s
Elapsed Wall Time:     61.65 s
Term Buffer:            0.00 s
=============== REQUESTED CPU TIME LIMIT REACHED =
    NITs/SEC = 6553.
 Total CPU used = 59.15 s
 Total CPU IO used = 0.9256 s
 Total wall time used = 16.44 s

In this case, we see that there’s a confusion somewhere between CPU time and wall (real) time. The first Elapsed Wall Time message incorrectly states 61.65, but th second Total wall time used message is correct.

This will be corrected for the 22.4 release. In the meantime, probably best not to use Enable max wall time for SMP jobs.

– Charles

1 Like

Thanks for the reply @cgw. I will follow the instructions as suggested by you.