I am running a MPI job on a cluster using slurm that has wall-time limit of 24h. I need to get the mfix run to restart_1, thus to resubmit itself every time it reaches the wall-time limit; does anyone have a suggestion on how to achieve that?
is the solution; note this will initiate the next runs even if the simulation crashed and you need to make sure restart_1 is used in the *.mfx already in the *.sh script, otherwise one needs an extra step of having new changed to restart_1
If the queue time limit is 24 hours, please make sure you are using the max wall time feature in the run pane as shown below or set the following keywords in the .mfx file:
This will make sure the run terminates cleanly 10 minutes before the 24 hour time limit is reached. A restart file will be written and MFiX will exit. This is done to prevent 1) corrupting the restart file if MFiX happens to be terminated by the queue system exactly while the restart file is written and 2) to have a restart file with the latest results independently of the restart file frequency.