how to combine SLURM with the mpirun -np 4 ./mfixsolver -f DES_FB1.mfx NODESI=2 NODESJ=2 and OMP_NUM_THREADS=4 ./mfixsolver -f DES_FB1.mfx and “add-queue-template” in the GUI?
What would happen if I run with default solver while assigning multiple cores and nodes for the job? Whether the default solver will only use single node, core and thread?
We actually use SLURM on our machine (Joule) as well. There is a feature in the run dialog that you can use to write and submit jobs to your queueing system (The GUI needs to be running on that same system, i.e. you can’t submit jobs from your local laptop to your HPC using the GUI). There is a section in the documentation that describes this queue template, which allows you to customize the widgets, here: 8.1. GUI Reference — MFiX 21.3.2 documentation
This will write a .qsubmit_script that is used to actually submit the job to the queue:
sbatch .qsubmit_script
The Joule template that is included looks like this (you’ll have to modify the queue list, and modules for your system):
#!/bin/bash -l
## CONFIG
# Special values
# SCRIPT - the path of this script, after replacement in the run directory
# PROJECT_NAME - name of the opened project
# JOB_ID - the job id extracted using job_id_regex
# COMMAND - the command to run mfix
# MFIX_HOME - the path to the mfix directory
[options]
name: Joule
job_id_regex: (\d+)
status_regex: ([rqw])
submit: sbatch ${SCRIPT}
delete: scancel ${JOB_ID}
status: squeue -j ${JOB_ID}
[JOB_NAME]
widget: lineedit
label: Job Name
value: ${PROJECT_NAME}
help: The name of the job.
[CORES]
widget: spinbox
label: Number of Cores
min_value: 1
max_value: 9999
value: 40
help: The number of cores to request.
[QUEUE]
widget: combobox
label: Queue
value: general
items: general|bigmem|shared|gpu
help: The Queue to submit to.
[LONG]
widget: checkbox
label: Long job
value: false
true: #SBATCH --qos=long
help: Specify the job as long.
[MODULES]
widget: listwidget
label: Modules
items: gnu/6.5.0 openmpi/3.1.3_gnu6.5 |
gnu/8.2.0 openmpi/4.0.1_gnu8.2 |
gnu/8.4.0 openmpi/4.0.3_gnu8.4 |
gnu/9.3.0 openmpi/4.0.4_gnu9.3
help: Select the modules that need to be loaded.
## END CONFIG
## The name for the job.
#SBATCH --job-name=${JOB_NAME}
##
## Number of cores to request
#SBATCH --tasks=${CORES}
##
## Queue Name
#SBATCH --partition=${QUEUE}
${LONG}
##Load Modules
module load ${MODULES}
##Run the job
${COMMAND}
With SLURM, anything that doesn’t have a #SBATCH in front is just executed in the terminal. So, just call MFIX the normal way, but with a srun in front.
Thank you Justin, I want to know further that since I can only “visualize” and interact with GUI under the Sinteract mode, in which I am assigned a “single” specific core and notified that I can not run -DMP(multiple cores) in this mode. And I can not visualize and interact with the GUI in the main cluster. I want to know:
whether I can submit the task to the queue with command line?
what is the difference between the .x and .run fils shown below?
I was notified that if I directly “srun mpirun” in the main cluster without configuring the SBATCH file, it may clog the computing to some extent and incurs kills of simulation jobs.
As pic1 below, for MPI, I have to save the script as .run file in which there is .x file, what is the meaning of .x?
The NODESI, NODESJ and NODESK keys are set in the mfix file, not on the command line (in a previous version these were passed on the command line but this is no longer supported)
Thank you Charles, currently I am set up the parameters and configurations on local ubuntu and then transfer the project to the computing cluster and run the .run or directly srun .x like
So I want to know how can I set up the NODES keys in the GUI so that these parameters can be embedded in the .mfx so that I can directly run “srun mpirun” in the main cluster. And in this case, whether the command would become “srun mpirun -np 4 ./mfixsolver -f DES_FB1.mfx” without NODES keys?
the NODES keys are written into the MFIX file when you click the “Run” button and the run popup appears. But since you are not doing the run locally, this is not happening. And you can only set these keys if the local solver has DMP enabled. You could build a local DMP solver, but this is probably overkill - the simplest thing to do is just to edit the file outside of MFiX (any text editor) and add the nodesi = lines to the section with the keywords (before any THERMO_DATA or #!MFIX_GUI lines).
They may already be present in the file with value set to 1, in that case just change 1 to the values you want.
And yes, you are correct about the format of the srun command.
Thank you Charles. This time the task can be run in eith “srun” directly or SBATCH mode. It has run for 10 mins, but I can not still see the generated intermediate files like .msh, .vtk, etc. I wonder whether the job would not be killed(with running status) even it encountered some failure and stopped unexpectedly under the SLURM settings. Thank you in advance.