How to estimate the total core hours the task would require when using computing cluster?

Dear All,

How to estimate the total core hours a task would require when configure and use the computing clusters? And how to determine the following parameters?

#SBATCH --nodes
#SBATCH --ntasks
#SBATCH --cpus-per-task
#SBATCH --mem

Thank you.

Ju,

I would read your university’s documentation on how to use Slurm directives. How many nodes and tasks you need will depend entirely on your cluster’s settings and the computational overhead for your simulation. For example, if your jobs requires NODESI * NODESJ * NODESK = 48 cores and your university cluster has 16 cores/node, you would need to set --ntasks=48 and --nodes=3 or more. Some clusters have a testing partition (i.e. --partition=shas-testing) that allows you to run a job for a short period and estimate total time required from that.

I have never set --cpus-per-task or --mem before, I would check and make sure you really need to change these from their defaults.

1 Like

Thank you Julia, I hereby attach the page for how to use SBATCH for SCITAS.

https://scitas-data.epfl.ch/confluence/display/DOC/Using+the+clusters

And I have built DMP and SMP solver for that with command build_mfixsolver [–dmp]/[–smp]. I have tested SMP under Sinteract mode with command OMP_NUM_THREADS=4 ./mfixsolver -f DES_FB1.mfx. I want to know how to use SBATCH to configure DMP/SMP tasks considering the SMP and DMP solvers have been built. Can I directly run commands such as “mpirun -np 4 ./mfixsolver -f DES_FB1.mfx NODESI=2 NODESJ=2” under /home/username for DMP task? How can I combine srun .x with the above tasks? Thank you in advance for your answers. Thank you in advance for your answers.

Ju,

If your cluster documentation does not indicate how many cores/node are on the cluster partition you have access to, you will need to reach out to the research computing staff/cluster admins at your university.

It sounds like you may be confusing a few concepts so I will try to clarify below. When you compile the solver, you create an mfixsolver executable that is now “waiting” to be run on whatever computing resources you allocate to it. srun and mpirun are both Slurm “wrapper” commands for running MPI applications, so I don’t know what you mean by “how to combine srun.x with the above tasks” because mpirun is a separate command from srun. You either use srun or mpirun but not both (although generally speaking, according to my cluster’s documentation mpirun seems to be preferred over srun if you have the option).

The general workflow is to build your solver → create a Linux bash script to submit the job to your cluster with #SBATCH directives → submit the Linux bash script.

As an example, let’s say you’re sitting in /home/username/ and you’ve successfully compiled your MFIX script for DMP with 4 processors. Now you have an mfixsolver executable waiting to be run. Assuming you have logged into your command terminal on a node capable of submitting job scripts, you need to:

  1. Create a Linux bash script (let’s call this “sbatchrun.sh” but you can name it whatever you want) with the #SBATCH directives for allocating computing resources and a line for running the solver (mpirun -np 4 ./mfixsolver -f DES_FB1.mfx NODESI=2 NODESJ=2). If you need 4 processors for this job, one of those #SBATCH lines should say something like “#SBATCH --ntasks=4”.
  2. Submit the Linux bash script from the command line by typing the command “sbatch sbatchrun.sh”. This will submit your shell script to the Slurm Workload Manager, which will queue and then run the job.

If you don’t submit a shell script with #SBATCH directives and instead just type “mpirun -np 4 ./mfixsolver -f DES_FB1.mfx NODESI=2 NODESJ=2” directly in the command line, I think it will try to run the job on the node your terminal is connected to. Even if this command ran successfully for you, it is not good practice because you are bypassing the Slurm scheduler and probably running directly on login nodes or computing resources not specifically allocated for running jobs.

1 Like

Thank you Julia, I want to know that if if set the time in the “.run” file for the SBATCH, and when the time out, the simulation project is still not finished, whether I can resume the simulation later with the same command/solver and whether I can run the .mfx file with another command/solver without deleting the intermediate files? And whether the .RES file for resuming a project is unique for each kind of solver? Thank you in advance.

Ju,

Assuming your simulation has run for long enough that it has produced at least one restart file (res_dt < stop time), you can restart a simulation by changing the run_type to ‘RESTART_1’ as explained in the MFIX documentatin (8.3.1. Run Control — MFiX 21.3.2 documentation). So, if your simulation stops and you want to restart it, edit run_type in the .mfx file and then resubmit the SBATCH file (“sbatch sbatchrun.sh”).

As far as I know, .RES files contain simulation data so yes, they will be unique to your compilation settings and simulation run. I would not copy and paste these in another directory to restart an unrelated simulation.

Since .mfx files are just input files (text files) you can move them between directories and reuse them for additional simulations. For example, if you wanted to build two solvers with different settings, you can copy and paste the .mfx file from one directory to the other. But anything that contains binary data or data specific to your simulation (i.e. mfixsolver executable, VTK/SP* outputs, .RES files, etc.) I would advise not putting in an unrelated simulation directory. Even if it runs, you could break some links or get odd results if you try to restart a simulation with another simulation’s data.

2 Likes

Thank you Julia, I did restart work as follows.

  1. I copied the project .mfx file and change control_type to “restart_1” for the new .mfx file and put it the same folder with the original .mfx file.
  2. I created a new .sh sbatch script and keep the target folder as the original one while change the executable .mfx file as the new one.
  3. I checked the two slurm .OUT files, I found out that they all began at t=0.0000s and stopped respectively at similar end times as follows. That means with the same core-hours, they accomplished similar workloads.
    PIC1: new start

    PIC1: continued one

    But at the same time the .vtu output files were generated consecutively in terms of sequence numbers as follows. (the new one begain around 16.00pm and the restart one began around 22.00pm)

I want to know whether it is normal that the simulation timestamp in the restart slurm .out file began at 0.0000s, which means that it represents relative progression only, and in reality it actually continued the last step of the previous “run”

Thank you in advance.