HPC task with scheduled 20 hours "completed" without error in about 4 hours

Dear All,

I am using HPC cluster to run a TFM task, I assigned 20 hours for the task, and it “completed” without error in about 4 hours with wall time around 2 days. What is the reason for the problem? How can I fix that? Thank you in advance.


Hi Ju

I assume you are running in DMP mode? If the wall time reaches 2 days in 4 hours of actual time, the timekeeping code may be confusing CPU time for wall time. I’ll look into it.

You should be able to work around this by setting the keyword BATCH_WALLCLOCK (labeled as “Wall time limit” in the GUI) - if you do not set this it has a default value of 172800s, which is 2 days - so click the ‘Enable max wall time’ checkbox and set the wall time limit to a higher value.

– Charles

Hi Ju -

I assume you are running in DMP mode. If the wall time reaches 2 days in 4 hours of actual time, it looks like the timekeeping code may be confusing CPU time for wall time. I’ll look into it.

You should be able to work around this by setting the keyword BATCH_WALLCLOCK (labeled as “Wall time limit” in the GUI) to a higher value. This keyword has a default value of 172800s which is 2 days, which is the limit you ran into. Check the ‘Enable max wall time’ checkbox and you will be able to set this to a much higher value.

– Charles

shot-2021-11-26