Queue Submission¶
The Queue
node provides a way to automatically construct submission scripts,
submit jobs to a queueing system such as Slurm, and check the status of the
job. The node has been constructed to be flexible, allowing for complete
customization. This node is demonstrated in the Ex. 6: Generic model submission example.
To quickly get started, there are templates that can be loaded by selecting a
template from the Load template
drop down list. The options can be edited
on the Options
, Commands
, and Script
tabs.
Each directory provided by the directories
terminal is treated as a single
job. The Finished directories
terminal is a list of directories that the job
has exited the queue. The Finished mask
provides a boolean list which is the
same size as the directories
input where the entry is True
if the job
has exited the queue and False
if the job is still running or is still
queued.
The jobs can be manually submitted by pressing the Submit
button or
automatically submitted by running the sheet.
Note
Just because a job is listed as Done
in the job table and is listed in
the Finished directories
terminal does not mean that job completed
successfully, it simply means that the job is no longer in the queue.
Jobs¶
The Jobs
tab lists the Job ID
from the queue system, which Queue
the
job has been submitted too, the Status
, and the path to the job location.
The button will check the status of the jobs when pressed, and automatically check the status of the jobs every second while the button is checked. The job status will stop being checked if the button is not checked or when all the jobs are finished.
Options¶
Submission options are controlled on the Options
tab. Queues to submit to
can be selected and added. If multiple queues are selected, the jobs will
be evenly distributed. A job name can be provided in the Job name
field.
This node allows for packaging multiple runs into a single job submission which
can either be run sequentially (serial) or concurrently (parallel). If multiple
runs will be used, then the Run CMD
will be written multiple times in the
submission script (replacing the ${cmd}
variable). This Run CMD
needs
to support multiple runs. For Slurm jobs, this should include using srun
and
specifying the run directory with --chdir
.
Runs per job
specifies the number of runs in a single job. If the jobs are
to be run concurrently, select the concurrent
checkbox. If the
concurrent
checkbox is selected, an &
will be appended to each
Run CMD
and a wait
will be placed after all the run commands.
For example, using a Run CMD
of srun --chdir=${cwd} python run.py
, 4
Runs per job
, and the concurrent
checkbox selected will replace the
${cmd}
variable in the job script with:
srun --chdir=/path/to/run_001 python run.py &
srun --chdir=/path/to/run_002 python run.py &
srun --chdir=/path/to/run_003 python run.py &
srun --chdir=/path/to/run_004 python run.py &
wait
Since some queues have user limits on the number of jobs that can be submitted,
this limit can be entered in the Maximum jobs
field. This will prevent
the Queue
node from submitting too many jobs. If there are more jobs than
allowed, the Queue
node will wait until jobs have finished before submitting
additional jobs. This check and submission of jobs occurs when the
button is pressed.
Note
The variables used in the Script
are in displayed in the []
.
Commands¶
The Commands
tab is where the queue manager specific submission and status
commands are provided along with regular expressions that are used to extract
information from the stdout
of the commands.
Slurm example¶
For Slurm the submission command is sbatch <submission_script>
, so the
Submission command
would be sbatch
. The node will add the correct path
to the submission script automatically. The sbatch
command returns the job
id via stdout
, which looks like:
Submitted batch job 123456
The job id can be extracted with a simple regular expression that looks for an
integer, which is entered in the Job ID regex
field:
(\d+)
Similarly, a job status can be checked by calling squeue -j <job_id>
, so the
Status command
field would be squeue -j ${job_id}
. The squeue
command returns the status, along with other information, to stdout
:
> squeue -j 123456
JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON)
123456 general my_job myname R 2:23:43 10 n[0945-0954]
> squeue -j 12
slurm_load_jobs error: Invalid job id specified
The job status can either be:
R
- Job is running on compute nodesPD
- Job is waiting on compute nodesCG
- Job is completing
The job status can be extracted from the stdout with the following regular expression:
\s(R|PD|CG)\s
Script¶
The last tab provides a text editor where the submission script is edited. This
script can be completely customized and will be used as the submission script
after replacing the variable tags, ${variable}
. The scripts are saved in the
current working directory, named queue_submit.script######
where the #
symbols are replaced with a 6 character hash.
The following variables can be used throughout the script:
${job_name}
- Job name as specified on theOptions
tab${queue}
- queue or partition name as specified on theOptions
tab${cwd}
- current Working directory of the job (or parent directory of the job if using more than on job in a script)${cmd}
- The actual run command as specified on theOptions
tab
Example Slurm submission script
#!/bin/bash -l
## The name for the job.
#SBATCH --job-name=${job_name}
##
## Number of cores to request (each node has 40 cores)
#SBATCH --tasks=40
##
## Queue Name (general, bigmem, gpu)
#SBATCH --partition=${queue}
##
## Working directory
#SBATCH --chdir=${cwd}
## Load Modules (run "module avail" for list)
module load anaconda
## Run the job
${cmd}