Ex. 6: Generic model submission¶
This example demonstrates the use of the Generic Model Creator and Queue Submission nodes to generate and submit model runs of any text based input model on a HPC.
Note
This example was run on NETL’s Joule HPC, which uses a Slurm based queueing system. However, this example should work on other systems but the queue commands may need to be changed to work with the queue manager being used.
Step 1: Setup base directory¶
For this example, we will create a simple python script that reads two input values (\(x\) and \(y\)) from a text file, evaluates the quadratic bowl function (\(z = x^2 + y^2\)), writes the resulting value (\(z\)) to another text file, and finally waits a random amount of time.
First, create a new directory, such as nodeworks_ex6
, to contain the files.
Next, create and open a file called run.py
, which will be our “model”. Copy
and paste the following python code:
import numpy as np
import time
import random
# read inputs
matrix = np.atleast_2d(np.loadtxt('./sample.txt'))
# evaluate function
rsp = np.sum(matrix**2, axis=1)
# wait random amount of time
time.sleep(random.randint(10, 30)) # seconds
# write result
np.savetxt('response.txt', rsp)
Next, create a file named sample.txt
in the same directory. This file will
be our texted based model input file that the Generic Model Creator node will replace the
variables ${x}
and ${y}
with the sample values generated by the
Design of Experiments node. Copy and paste the following text:
${x} ${y}
Make sure you save the files and have a directory structure that looks like this:
nodeworks_ex6
├── run.py
└── sample.txt
Step 2: Setup the nodes¶
Open Nodeworks
, create a new sheet, and add a Design of Experiments node. On the
variables tab, create a new variable by pressing the button. Change the
variable name from x1
to just x
, matching the first variable in the
sample.txt
file. Next, change the range of the variable, replacing the
0
in the from
field with -1
. Follow the same process to add another
variable by pressing the button, Change the variable name from x2
to
y
, and replace the 0
in the from
field with -1
.
Next, make the samples by going to the Design
tab, selecting
latin hypercude
as the Method
and changing the number of Samples
from the default 10
to 20
. Finally, generate the samples by pressing
the Build
button.
Now, add a Generic Model Creator node to the sheet and connect the DOE Matrix
terminal from the Design of Experiments node to the DOE Matrix
terminal on the
Generic Model Creator node. Select the Source directory
by clicking the
button and browsing to the directory created in Step 1 (nodeworks_ex6
). The
File extensions to copy
and File extensions to replace
lists will be
populated with all the file extensions in the Source directory
. In this
example, you should only see .py
and .txt
. In the
File extensions to copy
list, check the .py
extension. In the
File extensions to replace
list, check the .txt
extension. The
Export directory
will automatically be set to the Source directory
and
we will leave the default Directory prefix
as sim_
.
The Generic Model Creator
node is now set up to create a new directory for each sample in the DOE matrix,
copying the run.py
file into the new directories as well as copy the
sample.txt
file into the new directories while replacing the ${x}
and
${y}
variables with the correct sample values. Press the
Create directories
button to actually create the directories. The project
directory should now look like:
nodeworks_ex6
├── run.py
├── sample.txt
├── sim_000000
| ├── run.py
| ├── sample_dict.json
| └── sample.txt
├── sim_000001
| ├── run.py
| ├── sample_dict.json
| └── sample.txt
etc.
Note
The sample_dict.json
file contains the variables and values used to
replace the variable names in the selected file extensions. It is a json
file that will look something like:
{"x": -0.4622, "y": 0.9629}
Now add a Queue Submission node to the sheet and connect the directories
terminal of the Generic Model Creator node to the directories
terminal of the
Queue Submission node. To auto populate the fields with Joule specific
commands, regular expressions, and queue script, select the Load template
drop down and the Joule (slurm)
template. Select the general
partition
from the Queue
list. In the Run CMD
field is where we enter the actual
command to run out model, run.py
. For this specific case, we will use the
srun
command and set the working directory so we can run multiple
simulations on the same node. Enter the following Run CMD
field:
srun --chdir=${cwd} python run.py
Next, change the Runs per job
from the default of 1
to 5
and select
the concurrent
check box. This will copy the Run CMD
5 times in the
same queue submission script. The concurrent
check box tells the node to
append each run command with an &
and append a wait
after the run
commands, allowing the simulations to be run concurrently on the same node:
To actually submit the job scripts to the queue, go to the Jobs
tab and
press the Submit
button. The Jobs
table will be populated with
information about the job for each individual simulation (run directory) even
though some of the simulations were submitted together in the same queue script.
Toggle the button to enable the queue node to check the status of the
jobs.
As jobs are finished, they will be added to the finished directories
terminal of the Queue Submission node. To read the resulting response.txt
written by the simulation, add a Code
node. Enter dirs
in the
arguments
field and hit Enter
on the keyboard to generate a new
terminal. Connect the finished directories
terminal on the Queue Submission
node to the dirs
terminal on the Code
node. Finally, copy and paste the
following Python code into the Code
node:
import numpy as np
import os
rsp = []
for d in dirs:
rsp_f = os.path.join(d, 'response.txt')
rsp.append(np.loadtxt(rsp_f))
returnOut = rsp
We now have everything in place to connect the samples generated by the
Design of Experiments node and the response from the Code
node into the
Response Surface node. However, since all the simulations may not have run yet,
we need to filter out the samples that do not have a response. The
Queue Submission node has a finished mask
that provides a list of booleans
that can be used by the Sample Filter
node.
Add a Sample Filter
node to the sheet and connect the Design of Experiments node’s
DOE Matrix
terminal to the Sample Filter
node’s samples
terminal.
Next, connect the Queue Submission node’s finished mask
terminal to the
Sample Filter
node’s mask
terminal.
Finally, add a Response Surface node to the sheet. Connect the returnOut
terminal from the Code
node to the Response Surface node’s matrix/response
terminal and connect the filtered samples
terminal of the Sample Filter
node to the matrix/response
terminal of the Response Surface node. Run the
sheet to populate the Response Surface node with the samples and response.