How to use SBATCH/SLURM while there is "add-queue template" in GUI solver?

Dear All,

I have already complied smp and dmp solver successfully. I have the following questions.

  1. all tasks (or jobs) need to be given to a batch system called SLURM, and the DMP tasks need to be run as follows.

  1. how to combine SLURM with the mpirun -np 4 ./mfixsolver -f DES_FB1.mfx NODESI=2 NODESJ=2 and OMP_NUM_THREADS=4 ./mfixsolver -f DES_FB1.mfx and “add-queue-template” in the GUI?

  1. how to submit DMP, SMP task through GUI?

  2. What would happen if I run with default solver while assigning multiple cores and nodes for the job? Whether the default solver will only use single node, core and thread?

Thank you,

Best,

We actually use SLURM on our machine (Joule) as well. There is a feature in the run dialog that you can use to write and submit jobs to your queueing system (The GUI needs to be running on that same system, i.e. you can’t submit jobs from your local laptop to your HPC using the GUI). There is a section in the documentation that describes this queue template, which allows you to customize the widgets, here: 8.1. GUI Reference — MFiX 21.3.2 documentation

This will write a .qsubmit_script that is used to actually submit the job to the queue:

sbatch .qsubmit_script

The Joule template that is included looks like this (you’ll have to modify the queue list, and modules for your system):

#!/bin/bash -l
## CONFIG
# Special values
# SCRIPT - the path of this script, after replacement in the run directory
# PROJECT_NAME - name of the opened project
# JOB_ID - the job id extracted using job_id_regex
# COMMAND - the command to run mfix
# MFIX_HOME - the path to the mfix directory

[options]
name: Joule
job_id_regex: (\d+)
status_regex: ([rqw])
submit: sbatch ${SCRIPT}
delete: scancel ${JOB_ID}
status: squeue -j ${JOB_ID}

[JOB_NAME]
widget: lineedit
label: Job Name
value: ${PROJECT_NAME}
help: The name of the job.

[CORES]
widget: spinbox
label: Number of Cores
min_value: 1
max_value: 9999
value: 40
help: The number of cores to request.

[QUEUE]
widget: combobox
label: Queue
value: general
items: general|bigmem|shared|gpu
help: The Queue to submit to.

[LONG]
widget: checkbox
label: Long job
value: false
true:  #SBATCH --qos=long
help: Specify the job as long.


[MODULES]
widget: listwidget
label: Modules
items: gnu/6.5.0 openmpi/3.1.3_gnu6.5 |
       gnu/8.2.0 openmpi/4.0.1_gnu8.2 |
       gnu/8.4.0 openmpi/4.0.3_gnu8.4 |
       gnu/9.3.0 openmpi/4.0.4_gnu9.3
help: Select the modules that need to be loaded.

## END CONFIG
## The name for the job.
#SBATCH --job-name=${JOB_NAME}
##
## Number of cores to request
#SBATCH --tasks=${CORES}
##
## Queue Name
#SBATCH --partition=${QUEUE}
${LONG}

##Load Modules
module load ${MODULES}

##Run the job
${COMMAND}
  1. With SLURM, anything that doesn’t have a #SBATCH in front is just executed in the terminal. So, just call MFIX the normal way, but with a srun in front.
srun mpirun -np 4 ./mfixsolver -f DES_FB1.mfx NODESI=2 NODESJ=2
  1. The default sover is serial, so using more than one core will do you no good.

Thank you Justin, I want to know further that since I can only “visualize” and interact with GUI under the Sinteract mode, in which I am assigned a “single” specific core and notified that I can not run -DMP(multiple cores) in this mode. And I can not visualize and interact with the GUI in the main cluster. I want to know:

  1. whether I can submit the task to the queue with command line?

  2. what is the difference between the .x and .run fils shown below?

I was notified that if I directly “srun mpirun” in the main cluster without configuring the SBATCH file, it may clog the computing to some extent and incurs kills of simulation jobs.

As pic1 below, for MPI, I have to save the script as .run file in which there is .x file, what is the meaning of .x?


As pic2 for typical sbatch, why in this case there is no line containing .x file in the .run file?

Thank you in advance.

  1. Yes. That is what sbatch myrunscript is.
  2. that .x file is just an example executable (mycode.x)

I don’t know anything about your cluster, so I encourage you to work with the cluster admins.

Thank you Justin, when I load the necessary modules but when I run the above command direcly, the following error happened.


Why the NODESI=2 NODESJ=2 can not be recognized in mfixsolver? Thank you.

The NODESI, NODESJ and NODESK keys are set in the mfix file, not on the command line (in a previous version these were passed on the command line but this is no longer supported)

Thank you Charles, currently I am set up the parameters and configurations on local ubuntu and then transfer the project to the computing cluster and run the .run or directly srun .x like

‘srun mpirun -np 4 ./mfixsolver -f DES_FB1.mfx NODESI=2 NODESJ=2’

So I want to know how can I set up the NODES keys in the GUI so that these parameters can be embedded in the .mfx so that I can directly run “srun mpirun” in the main cluster. And in this case, whether the command would become “srun mpirun -np 4 ./mfixsolver -f DES_FB1.mfx” without NODES keys?

Thank you in advance.

Hi Ju -

the NODES keys are written into the MFIX file when you click the “Run” button and the run popup appears. But since you are not doing the run locally, this is not happening. And you can only set these keys if the local solver has DMP enabled. You could build a local DMP solver, but this is probably overkill - the simplest thing to do is just to edit the file outside of MFiX (any text editor) and add the nodesi = lines to the section with the keywords (before any THERMO_DATA or #!MFIX_GUI lines).

They may already be present in the file with value set to 1, in that case just change 1 to the values you want.

And yes, you are correct about the format of the srun command.

– Charles

Thank you Charles. This time the task can be run in eith “srun” directly or SBATCH mode. It has run for 10 mins, but I can not still see the generated intermediate files like .msh, .vtk, etc. I wonder whether the job would not be killed(with running status) even it encountered some failure and stopped unexpectedly under the SLURM settings. Thank you in advance.

If in doubt, look at the log file (CHAMBER_DELICATE_STRUCTURE1.LOG)

Thank you Charles, the .LOG is as follows.


It seems that the mesh has been generated and saved, but I can not find it in the project directory with “ls -h”.