Ex. 6: Generic model submission¶

This example demonstrates the use of the Generic Model Creator and Queue Submission nodes to generate and submit model runs of any text based input model on a HPC.

Note

This example was run on NETL’s Joule HPC, which uses a Slurm based queueing system. However, this example should work on other systems but the queue commands may need to be changed to work with the queue manager being used.

Step 1: Setup base directory¶

For this example, we will create a simple python script that reads two input values ($x$ and $y$) from a text file, evaluates the quadratic bowl function ($z = x^2 + y^2$), writes the resulting value ($z$) to another text file, and finally waits a random amount of time.

First, create a new directory, such as nodeworks_ex6, to contain the files. Next, create and open a file called run.py, which will be our “model”. Copy and paste the following python code:

import numpy as np
import time
import random

# read inputs
matrix = np.atleast_2d(np.loadtxt('./sample.txt'))

# evaluate function
rsp = np.sum(matrix**2, axis=1)

# wait random amount of time
time.sleep(random.randint(10, 30))  # seconds

# write result
np.savetxt('response.txt', rsp)

Next, create a file named sample.txt in the same directory. This file will be our texted based model input file that the Generic Model Creator node will replace the variables ${x} and ${y} with the sample values generated by the Design of Experiments node. Copy and paste the following text:

${x} ${y}

Make sure you save the files and have a directory structure that looks like this:

nodeworks_ex6
├── run.py
└── sample.txt

Step 2: Setup the nodes¶

Open Nodeworks, create a new sheet, and add a Design of Experiments node. On the variables tab, create a new variable by pressing the add button. Change the variable name from x1 to just x, matching the first variable in the sample.txt file. Next, change the range of the variable, replacing the 0 in the from field with -1. Follow the same process to add another variable by pressing the add button, Change the variable name from x2 to y, and replace the 0 in the from field with -1.

Next, make the samples by going to the Design tab, selecting latin hypercude as the Method and changing the number of Samples from the default 10 to 20. Finally, generate the samples by pressing the Build button.

Now, add a Generic Model Creator node to the sheet and connect the DOE Matrix terminal from the Design of Experiments node to the DOE Matrix terminal on the Generic Model Creator node. Select the Source directory by clicking the open button and browsing to the directory created in Step 1 (nodeworks_ex6). The File extensions to copy and File extensions to replace lists will be populated with all the file extensions in the Source directory. In this example, you should only see .py and .txt. In the File extensions to copy list, check the .py extension. In the File extensions to replace list, check the .txt extension. The Export directory will automatically be set to the Source directory and we will leave the default Directory prefix as sim_.

The Generic Model Creator node is now set up to create a new directory for each sample in the DOE matrix, copying the run.py file into the new directories as well as copy the sample.txt file into the new directories while replacing the ${x} and ${y} variables with the correct sample values. Press the Create directories button to actually create the directories. The project directory should now look like:

nodeworks_ex6
├── run.py
├── sample.txt
├── sim_000000
|   ├── run.py
|   ├── sample_dict.json
|   └── sample.txt
├── sim_000001
|   ├── run.py
|   ├── sample_dict.json
|   └── sample.txt
etc.

Note

The sample_dict.json file contains the variables and values used to replace the variable names in the selected file extensions. It is a json file that will look something like:

{"x": -0.4622, "y": 0.9629}

Now add a Queue Submission node to the sheet and connect the directories terminal of the Generic Model Creator node to the directories terminal of the Queue Submission node. To auto populate the fields with Joule specific commands, regular expressions, and queue script, select the Load template drop down and the Joule (slurm) template. Select the general partition from the Queue list. In the Run CMD field is where we enter the actual command to run out model, run.py. For this specific case, we will use the srun command and set the working directory so we can run multiple simulations on the same node. Enter the following Run CMD field:

srun --chdir=${cwd} python run.py

Next, change the Runs per job from the default of 1 to 5 and select the concurrent check box. This will copy the Run CMD 5 times in the same queue submission script. The concurrent check box tells the node to append each run command with an & and append a wait after the run commands, allowing the simulations to be run concurrently on the same node:

To actually submit the job scripts to the queue, go to the Jobs tab and press the Submit button. The Jobs table will be populated with information about the job for each individual simulation (run directory) even though some of the simulations were submitted together in the same queue script. Toggle the refresh button to enable the queue node to check the status of the jobs.

As jobs are finished, they will be added to the finished directories terminal of the Queue Submission node. To read the resulting response.txt written by the simulation, add a Code node. Enter dirs in the arguments field and hit Enter on the keyboard to generate a new terminal. Connect the finished directories terminal on the Queue Submission node to the dirs terminal on the Code node. Finally, copy and paste the following Python code into the Code node:

import numpy as np
import os

rsp = []
for d in dirs:
    rsp_f = os.path.join(d, 'response.txt')
    rsp.append(np.loadtxt(rsp_f))

returnOut = rsp

We now have everything in place to connect the samples generated by the Design of Experiments node and the response from the Code node into the Response Surface node. However, since all the simulations may not have run yet, we need to filter out the samples that do not have a response. The Queue Submission node has a finished mask that provides a list of booleans that can be used by the Sample Filter node.

Add a Sample Filter node to the sheet and connect the Design of Experiments node’s DOE Matrix terminal to the Sample Filter node’s samples terminal. Next, connect the Queue Submission node’s finished mask terminal to the Sample Filter node’s mask terminal.

Finally, add a Response Surface node to the sheet. Connect the returnOut terminal from the Code node to the Response Surface node’s matrix/response terminal and connect the filtered samples terminal of the Sample Filter node to the matrix/response terminal of the Response Surface node. Run the sheet to populate the Response Surface node with the samples and response.