.. _hpc-queue:


Queue Submission
================

The ``Queue`` node provides a way to automatically construct submission scripts,
submit jobs to a queueing system such as `Slurm`_, and check the status of the
job. The node has been constructed to be flexible, allowing for complete
customization. This node is demonstrated in the :ref:`sma-ex6` example.

To quickly get started, there are `templates` that can be loaded by selecting a
template from the ``Load template`` drop down list. The options can be edited
on the ``Options``, ``Commands``, and ``Script`` tabs.

Each directory provided by the ``directories`` terminal is treated as a single
job. The ``Finished directories`` terminal is a list of directories that the job
has exited the queue. The ``Finished mask`` provides a boolean list which is the
same size as the ``directories`` input where the entry is ``True`` if the job
has exited the queue and ``False`` if the job is still running or is still
queued.

The jobs can be manually submitted by pressing the ``Submit`` button or
automatically submitted by running the sheet.

.. Note::

   Just because a job is listed as ``Done`` in the job table and is listed in
   the ``Finished directories`` terminal does not mean that job completed
   successfully, it simply means that the job is no longer in the queue.

Jobs
----

The ``Jobs`` tab lists the ``Job ID`` from the queue system, which ``Queue`` the
job has been submitted too, the ``Status``, and the path to the job location.

The |refresh| button will
check the status of the jobs when pressed, and automatically check the status of
the jobs every second while the |refresh| button is checked. The job status
will stop being checked if the |refresh| button is not checked or when all the
jobs are finished.

.. |refresh| image:: ../../../nodeworks/images/refresh.svg


.. figure:: ./images/queue_jobs.png
   :align: center
   :scale: 75 %

Options
-------

Submission options are controlled on the ``Options`` tab. Queues to submit to
can be selected and added. If multiple queues are selected, the jobs will
be evenly distributed. A job name can be provided in the ``Job name`` field.

This node allows for packaging multiple runs into a single job submission which
can either be run sequentially (serial) or concurrently (parallel). If multiple
runs will be used, then the ``Run CMD`` will be written multiple times in the
submission script (replacing the ``${cmd}`` variable). This ``Run CMD`` needs
to support multiple runs. For Slurm jobs, this should include using ``srun`` and
specifying the run directory with ``--chdir``.

``Runs per job`` specifies the number of runs in a single job. If the jobs are
to be run concurrently, select the ``concurrent`` checkbox. If the
``concurrent`` checkbox is selected, an ``&`` will be appended to each
``Run CMD`` and a ``wait`` will be placed after all the run commands.

For example, using a ``Run CMD`` of ``srun --chdir=${cwd} python run.py``, 4
``Runs per job``, and the ``concurrent`` checkbox selected will replace the
``${cmd}`` variable in the job script with:

.. code-block::

  srun --chdir=/path/to/run_001 python run.py &
  srun --chdir=/path/to/run_002 python run.py &
  srun --chdir=/path/to/run_003 python run.py &
  srun --chdir=/path/to/run_004 python run.py &
  wait

Since some queues have user limits on the number of jobs that can be submitted,
this limit can be entered in the ``Maximum jobs`` field. This will prevent
the ``Queue`` node from submitting too many jobs. If there are more jobs than
allowed, the ``Queue`` node will wait until jobs have finished before submitting
additional jobs. This check and submission of jobs occurs when the |refresh|
button is pressed.

.. note::

   The variables used in the ``Script`` are in displayed in the ``[]``.

.. figure:: ./images/queue_options.png
   :align: center
   :scale: 75 %


Commands
--------

The ``Commands`` tab is where the queue manager specific submission and status
commands are provided along with regular expressions that are used to extract
information from the ``stdout`` of the commands.

Slurm example
+++++++++++++

For `Slurm`_ the submission command is ``sbatch <submission_script>``, so the
``Submission command`` would be ``sbatch``. The node will add the correct path
to the submission script automatically. The ``sbatch`` command returns the job
id via ``stdout``, which looks like:

.. code-block::

   Submitted batch job 123456

The job id can be extracted with a simple regular expression that looks for an
integer, which is entered in the ``Job ID regex`` field:

.. code-block::

   (\d+)

Similarly, a job status can be checked by calling ``squeue -j <job_id>``, so the
``Status command`` field would be ``squeue -j ${job_id}``. The ``squeue``
command returns the status, along with other information, to ``stdout``:

.. code-block::

   > squeue -j 123456
   JOBID   PARTITION       NAME      USER ST       TIME     NODES NODELIST(REASON)
   123456    general     my_job    myname  R    2:23:43        10 n[0945-0954]

   > squeue -j 12
   slurm_load_jobs error: Invalid job id specified

The job status can either be:

* ``R`` - Job is running on compute nodes
* ``PD`` - Job is waiting on compute nodes
* ``CG`` - Job is completing

The job status can be extracted from the stdout with the following regular
expression:

.. code-block::

   \s(R|PD|CG)\s

.. figure:: ./images/queue_cmds.png
   :align: center
   :scale: 75 %

Script
------

The last tab provides a text editor where the submission script is edited. This
script can be completely customized and will be used as the submission script
after replacing the variable tags, ``${variable}``. The scripts are saved in the
current working directory, named ``queue_submit.script######`` where the ``#``
symbols are replaced with a 6 character hash.

The following variables can be used throughout the script:

* ``${job_name}`` - Job name as specified on the ``Options`` tab
* ``${queue}`` - queue or partition name as specified on the ``Options`` tab
* ``${cwd}`` - current Working directory of the job (or parent directory of the
  job if using more than on job in a script)
* ``${cmd}`` - The actual run command as specified on the ``Options`` tab


Example Slurm submission script

.. code-block::

   #!/bin/bash -l

   ## The name for the job.
   #SBATCH --job-name=${job_name}
   ##
   ## Number of cores to request (each node has 40 cores)
   #SBATCH --tasks=40
   ##
   ## Queue Name (general, bigmem, gpu)
   #SBATCH --partition=${queue}
   ##
   ## Working directory
   #SBATCH --chdir=${cwd}

   ## Load Modules (run "module avail" for list)
   module load anaconda

   ## Run the job
   ${cmd}

.. figure:: ./images/queue_script.png
   :align: center
   :scale: 75

.. _Slurm: https://slurm.schedmd.com/overview.html