.. _Chap:InputsMultigrid:
Multigrid Inputs
================
The following inputs can be set directly in the AMReX solver classes but we
set them via the MFiX-Exa routines because we may want different inputs for the
different solvers called by MFiX-Exa.
These control the nodal projection and must be preceded by "mfix":
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| | Description | Type | Default |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_verbose | Verbosity of multigrid solver in nodal projection | Int | 0 |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_cg_verbose | Verbosity of BiCGStab solver in nodal projection | Int | 0 |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_rtol | Relative tolerance in nodal projection | Real | 1.e-11 |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_atol | Absolute tolerance in nodal projection | Real | 1.e-14 |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_maxiter | Maximum number of iterations in the nodal projection | Int | |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_cg_maxiter | Maximum number of iterations in the nodal projection | Int | |
| | bottom solver if using bicg, cg, bicgcg or cgbicg | | |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_max_coarsening_level | Maximum number of coarser levels to allowin the nodal projection | Int | |
| | If set to 0, the bottom solver will be called at the current level | | |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| bottom_solver_type | Which bottom solver to use in the nodal projection | String | bicgcg |
| | Options are bicgstab, cg, cgbicg, smoother or hypre | | |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
These control the MAC projection and must be preceded by "mac":
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| | Description | Type | Default |
+=========================+=======================================================================+=============+==============+
| mg_verbose | Verbosity of multigrid solver in MAC projection | Int | 0 |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_cg_verbose | Verbosity of BiCGStab solver in MAC projection | Int | 0 |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_rtol | Relative tolerance in MAC projection | Real | 1.e-11 |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_atol | Absolute tolerance in MAC projection | Real | 1.e-14 |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_maxiter | Maximum number of iterations in the MAC projection | Int | |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_cg_maxiter | Maximum number of iterations in the MAC projection | Int | |
| | bottom solver if using bicg, cg, bicgcg or cgbicg | | |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_max_coarsening_level | Maximum number of coarser levels to allow in the nodal projection | Int | |
| | If set to 0, the bottom solver will be called at the current level | | |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
| bottom_solver_type | Which bottom solver to use in the MAC projection | String | bicgcg |
| | Options are bicgstab, cg, cgbicg, smoother or hypre | | |
+-------------------------+-----------------------------------------------------------------------+-------------+--------------+
These control the diffusion solver and must be preceded by "diff":
The following inputs must be preceded by "diff"
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| | Description | Type | Default |
+======================+=======================================================================+=============+==============+
| mg_verbose | Verbosity of linear solver for diffusion solve | Int | 0 |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_cg_verbose | Verbosity of BiCGStab solver in diffusion solve | Int | 0 |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_rtol | Relative tolerance in diffusion solve | Real | 1.e-11 |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| mg_atol | Absolute tolerance in diffusion solve | Real | 1.e-14 |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| bottom_solver_type | Which bottom solver to use in the diffusion solve | String | bicgcg |
| | Options are bicgstab, cg, cgbicg, smoother or hypre | | |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
.. _Chap:InputsPlotfiles:
Plotfiles and Other Output
==========================
The following inputs must be preceded by "amr" and control frequency and naming of plotfile generation as well
as whether the EB geometry or level set should be written out, and if the particles should be written out in Ascii
format (for debugging).
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| | Description | Type | Default |
+=====================+=======================================================================+=============+===========+
| plot_int | Frequency of plotfile output; | Int | -1 |
| | if -1 then no plotfiles will be written | | |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plotfile_on_restart | Should we write a plotfile when we restart (only used if plot_int>0) | Bool | False |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plot_file | Prefix to use for plotfile output | String | plt |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| write_ls | Should we write a plotfile holding the level set and volfrac? | Bool | False |
| | If true, it will only be written once,after initialization or restart | | |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| write_eb_surface | Should we write out the EB geometry in vtp format | Bool | False |
| | If true, it will only be written once,after initialization or restart | | |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| par_ascii_file | Prefix to use for ascii particle output | String | par |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| par_ascii_int | Frequency of ascii particle output; | Int | -1 |
| | if -1 then no plotfiles will be written | | |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
The following inputs must be preceded by "amr" and control what variables will be written in plotfiles.
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| | Description | Type | Default |
+=====================+=======================================================================+=============+===========+
| plt_regtest | Save all variables to plot file (overrides all other IO flags) | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_vel_g | Save fluid velocity data to plot file | Int | 1 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_ep_g | Save fluid volume fraction to plot file | Int | 1 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_p_g | Save fluid pressure to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_ro_g | Save fluid density to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_mu_g | Save fluid viscosity to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_diveu | Save div(ep_g . u) to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_volfrac | Save Eulerian grid volume fraction (from cut cells) to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_gradp_g | Save gradient of pressure filed to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_vort | Save vorticity to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_vel_p | Save particle velocity to plot file | Int | 1 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_radius | Save particle radius to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_volume | Save particle volume to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_volume | Save particle volume to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_mass | Save particle mass to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_ro_p | Save particle density to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_omoi | Save (one divided by the) particle momentum of inertia to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_mass | Save particle mass to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_omega_p | Save particle angular velocity to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_drag_p | Save particle drag force to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
| plt_phase | Save particle type to plot file | Int | 0 |
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
+---------------------+-----------------------------------------------------------------------+-------------+-----------+
Problem Definition
==================
The following inputs must be preceded by "amr."
+-------------------+-----------------------------------------------------------------------+-------------+-----------+
| | Description | Type | Default |
+===================+=======================================================================+=============+===========+
| n_cell | Number of cells at level 0 in each coordinate direction | Int Int Int | None |
+-------------------+-----------------------------------------------------------------------+-------------+-----------+
| max_level | Maximum level of refinement allowed (0 when single-level) | Int | None |
+-------------------+-----------------------------------------------------------------------+-------------+-----------+
The following inputs must be preceded by "geometry."
+-----------------+-----------------------------------------------------------------------+-------------+-----------+
| | Description | Type | Default |
+=================+=======================================================================+=============+===========+
| coord_sys | 0 for Cartesian | Int | 0 |
+-----------------+-----------------------------------------------------------------------+-------------+-----------+
| is_periodic | 1 for true, 0 for false (one value for each coordinate direction) | Ints | 0 0 0 |
+-----------------+-----------------------------------------------------------------------+-------------+-----------+
| prob_lo | Low corner of physical domain (physical not index space) | Reals | None |
+-----------------+-----------------------------------------------------------------------+-------------+-----------+
| prob_hi | High corner of physical domain (physical not index space) | Reals | None |
+-----------------+-----------------------------------------------------------------------+-------------+-----------+
The following inputs must be preceded by "mfix."
+----------------------+-------------------------------------------------------------------------+-------------+--------+
| | Description | Type | Default |
+======================+=========================================================================+==========+===========+
| geometry | Which type of EB geometry are we using? | String | |
+----------------------+-------------------------------------------------------------------------+----------+-----------+
| levelset__refinement | Refinement factor of levelset resolution relative to level 0 resolution | Int | 1 !
+----------------------+-------------------------------------------------------------------------+----------+-----------+
Setting basic EB walls can be specified by inputs preceded by "xlo", "xhi", "ylo", "yhi", "zlo", and "zhi"
+--------------------+---------------------------------------------------------------------------+-------------+-----------+
| | Description | Type | Default |
+====================+===========================================================================+=============+===========+
| type | Used to define boundary type. Available options include: | String | None |
| | | | |
| | * 'pi' or 'pressure_inflow' | | |
| | * 'po' or 'pressure_outflow' | | |
| | * 'mi' or 'mass_inflow' | | |
| | * 'nsw' or 'no_slip_wall' | | |
+--------------------+---------------------------------------------------------------------------+-------------+-----------+
| pressure | Sets boundary pressure for pressure inflows, outflows and mass inflows | Real | None |
+--------------------+---------------------------------------------------------------------------+-------------+-----------+
| velocity | Sets boundary velocity for mass inflows | Real | None |
+--------------------+---------------------------------------------------------------------------+-------------+-----------+
| location | Specifies an offset from the domain boundary for no-slip walls | Real | None |
+--------------------+---------------------------------------------------------------------------+-------------+-----------+
To specify multiple mass inflows (e.g., define a jet and uniform background flow), provide multiple velocities for the region and define the physical extents of the sub-region. The first velocity is applied to the entire flow plane. Subsequent velocities are successively applied to the specified sub-regions. If multiple sub-regions overlap, the velocity of last specified region is used. An example of a uniform mass inflow with a square-jet centered at (0.5x0.5) is given below.
.. code-block:: none
xlo.type = "mi"
xlo.velocity = 0.01 0.10
xlo.ylo = 0.25
xlo.yhi = 0.75
xlo.zlo = 0.25
xlo.zhi = 0.75
.. sec:InputsTimeStepping:
Time Stepping
=============
The following inputs must be preceded by "amr." Note that if both are specified, both criteria
are used and the simulation still stop when the first criterion is hit. In the case of unsteady flow,
the simulation will stop when either the number of steps reaches max_step or time reaches stop_time.
In the case of unsteady flow, the simulation will stop when either the tolerance (difference between
subsequent steps) is reached or the number of iterations reaches the maximum number specified.
+------------------+-----------------------------------------------------------------------+-------------+-----------+
| | Description | Type | Default |
+==================+=======================================================================+=============+===========+
| max_step | Maximum number of time steps to take | Int | -1 |
+------------------+-----------------------------------------------------------------------+-------------+-----------+
| stop_time | Maximum time to reach | Real | -1.0 |
+------------------+-----------------------------------------------------------------------+-------------+-----------+
The following inputs must be preceded by "mfix."
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| | Description | Type | Default |
+======================+=======================================================================+=============+==============+
| fixed_dt | Should we use a fixed timestep? | Int | 0 |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| dt_min | Abort if dt gets smaller than this value | Real | 1.e-6 |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| dt_max | Maximum value of dt if calculating with cfl | Real | 1.e14 |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| cfl | CFL constraint (dt < cfl * dx / u) if fixed_dt not 1 | Real | 0.5 |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
The following inputs must be preceded by "mfix" and are only relevant if running a problem to steady state.
Currently, the criterion for setting "steady_state" to true is if "dt" is undefined in mfix.dat
+-----------------------+-----------------------------------------------------------------------+-------------+------------+
| | Description | Type | Default |
+=======================+=======================================================================+=============+============+
| steady_state | Are we running a steady-state calculation? | Int | 0 |
+-----------------------+-----------------------------------------------------------------------+-------------+------------+
| steady_state_tol | Tolerance for checking if we have reached steady state | Real | None |
| | | | |
| | (Must be set if steady_state_tol = 1) | | |
+-----------------------+-----------------------------------------------------------------------+-------------+------------+
| steady_state_max_iter | Maximum number of allowed iterations to converge to steady state | Int | 100000000 |
+-----------------------+-----------------------------------------------------------------------+-------------+------------+
Setting the Time Step
---------------------
There are several ways that the inputs are used to determine what time step
is used in the evolution of the fluid-particle system in MFiX-Exa.
1) In a pure particle case, the :cpp:`mfix.fixed_dt`, if specified, is only used to determine the frequency
of outputs, it has no effect on the "dtsolid" used in the particle evaluation. If you do not specify a positive
value of :cpp:`mfix.fixed_dt` then the code will abort.
.. highlight:: c++
::
amrex::Abort::0::If running particle-only must specify fixed_dt in the inputs file !!!
The particle time step "dtsolid" is determined by computing the collision time "tcoll" from particle properties,
then setting "dtsolid" to be "tcoll / 50".
2) In a pure fluid case, there are two options:
* If you want to fix the dt, simply set :cpp:`mfix.fixed_dt = XXX` and the fluid time
step will always be that number.
* If you want to let the code determine the appropriate time step using the advective CFL
condition, then set :cpp:`mfix.cfl = 0.7` for example, and the fluid time step will
be computed to be dt = 0.5 * dx / max(vel).
* If dt as computed in the compute_dt routine is smaller than the user-specified
```mfix.dt_min``` then the code will abort:
.. highlight:: c++
::
amrex::Abort::0::"Current dt is smaller than dt_min !!!
* If dt as computed in the compute_dt routine is larger than the user-specified
```mfix.dt_max``` then dt will be set to the minimum of its computed value and dt_max
* Note that the cfl defaults to 0.5 so it does not have to be set in the inputs file. If neither
:cpp:`mfix.cfl` nor :cpp:`fixed_dt` is set, then default value of cfl will be used.
If :cpp:`mfix.fixed_dt` is set, then it will override the cfl option whether
:cpp:`mfix.cfl` is set or not.
These options apply to steady state calculations as well as unsteady runs.
3) In a coupled particle-fluid case, dt is determined as in the pure-fluid case. In this case
the particle time step "subdt" is first computed as in the particle-only case ("dtsolid"),
then is adjusted so that an integral number of particle steps fit into a single fluid time step.
.. _Chap:InputsVerbosity:
Verbosity
=========
The following inputs must be preceded by "mfix."
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| | Description | Type | Default |
+======================+=======================================================================+=============+==============+
| verbose | Verbosity in MFiX-Exa routines | Int | 0 |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
| ooo_debug | If true then print the name of the routine we are in | Bool | False |
+----------------------+-----------------------------------------------------------------------+-------------+--------------+
.. _Chap:Inputs:
Run-time Inputs
===============
.. toctree::
:maxdepth: 1
InputsProblemDefinition
InputsDrag
InputsTimeStepping
InputsInitialization
InputsLoadBalancing
InputsMultigrid
InputsPlotFiles
InputsCheckpoint
InputsMonitors
InputsVerbosity
......@@ -30,5 +30,9 @@ time discretizations differ. Specifically,
- Plotfile format supported by AmrVis, VisIt, ParaView, and yt.
MFiX-Exa is being developed at NETL, LBNL, and CU as part of DOE's Exascale Computing Project.
MFiX-Exa is being developed at NETL and LBNL as part of the U.S. Department of Energy's
Exascale Computing Project (ECP).
MFiX-Exa heavily leverages AMReX (see https://amrex-codes.github.io/) which is also supported by
ECP as part of the AMReX Co-Design Center.
.. role:: cpp(code)
:language: c++
.. role:: fortran(code)
:language: fortran
.. _sec:load_balancing:
Load Balancing
--------------
The process of load balancing is typically independent of the process of grid creation;
the inputs to load balancing are a given set of grids with a set of weights
assigned to each grid. (The exception to this is the KD-tree approach in which the
grid creation process is governed by trying to balance the work in each grid.)
Single-level load balancing algorithms are sequentially applied to each AMR level independently,
and the resulting distributions are mapped onto the ranks taking into account the weights
already assigned to them (assign heaviest set of grids to the least loaded rank)
Options supported by AMReX include:
- Knapsack: the default weight of a grid in the knapsack algorithm is the number of grid cells,
but AMReX supports the option to pass an array of weights – one per grid – or alternatively
to pass in a MultiFab of weights per cell which is used to compute the weight per grid
- SFC: enumerate grids with a space-filling Z-morton curve, then partition the
resulting ordering across ranks in a way that balances the load
- Round-robin: sort grids and assign them to ranks in round-robin fashion -- specifically
FAB *i* is owned by CPU *i*%N where N is the total number of MPI ranks.
.. role:: cpp(code)
:language: c++
Gridding and Load Balancing
===========================
MFiX-Exa has a great deal of flexibility when it comes to how to decompose the
computational domain into individual rectangular grids, and how to distribute
those grids to MPI ranks. There can be grids of different sizes,
more than one grid per MPI rank, and different strategies for distributing the grids to MPI ranks.
We use the phrase "load balancing" here to refer to the combined process
of grid creation (and re-creation when regridding) and distribution of grids to MPI ranks.
See :ref:`sec:grid_creation` for grids are created, i.e. how the :cpp:`BoxArray` on which
:cpp:`MultiFabs` will be built is defined at each level.
See :ref:`sec:load_balancing` for the strategies AMReX supports for distributing
grids to MPI ranks, i.e. defining the :cpp:`DistributionMapping` with which
:cpp:`MultiFabs` at that level will be built.
MFiX-Exa also allows for the "dual grid approach", in which mesh and particle data are allocated
on different box layouts with different mappings to MPI ranks. This option is enabled
by setting :cpp:`amr.dual_grid = 1` in the inputs file.
See :ref:`sec:dual_grid` for more about this approach.
When running on multicore machines with OpenMP, we can also control the distribution of
work by setting the size of grid tiles (by defining :cpp:`fabarray_mfiter.tile_size`), and if relevant, of
particle tiles (by defining :cpp:`particle.tile_size`). We can also specify the strategy for assigning
tiles to OpenMP threads. See :ref:`sec:basics:mfiter:tiling:` for more about tiling.
.. toctree::
:maxdepth: 1
GridCreation
DualGrid
LoadBalancing
.. _Chap:NightlyTesting :
Nightly Tests
=============
The following regression tests are run nightly with MFiX-Exa. The plotfiles generated in each night's test
are compared with the benchmark plotfiles using the AMReX :cpp:`fcompare` utility to compare the mesh data
and :cpp:`particle_compare` to compare the particle data.
The results of these tests can be found at https://ccse.lbl.gov/pub/RegressionTesting/MFIX-Exa/
Below Ng = number of grids, Npa = number of particles, Np = number of MPI ranks.
"Auto" means the particles were generated automatically with the random number
generator; if "Auto" is not specified the particle data were read in from "particle_input.dat"
These first tests have both fluid and particles and are run in rectangular geometries;
all tests except DEM06 use drag type "BVK2".
"NSW" means "No Slip Wall" and "Per" is "periodic."
"MI/PO" refers to Mass Inflow at the low end of the domain and Pressure Outflow at the high end.
+-------------------+----+--------+------+-------+----+----+----------------------+
| Test | nx | bc_x | EB | Npa | Ng | Np | What does this test? |
| | ny | bc_y | | | | | |
| | nz | bc_z | | | | | |
+===================+====+========+======+=======+====+====+======================+
| BENCH01 | 32 | Per | None | 5005 | 1 | 1 | Triply periodic |
| Size0001 | 32 | Per | | | | | |
| | 32 | Per | | | | | |
+-------------------+----+--------+------+-------+----+----+----------------------+
| BENCH01 | 64 | Per | None | 40040 | 8 | 4 | Replicate |
| Size0001 | 64 | Per | | | | | |
| replicate | 64 | Per | | | | | |
+-------------------+----+--------+------+-------+----+----+----------------------+
| BENCH01 | 32 | Per | None | 5005 | 8 | 4 | Restart |
| Size0001 | 32 | Per | | | | | |
| restart | 32 | Per | | | | | |
+-------------------+----+--------+------+-------+----+----+----------------------+
| BENCH02 | 10 | Per | None | 1611 | 1 | 1 | Mixed NSW / Per |
| Size0001 | 10 | NSW | | | | | |
| | 10 | Per | | | | | |
+-------------------+----+--------+------+-------+----+----+----------------------+
| BENCH02 | 10 | NSW | None | 1611 | 1 | 1 | NSW on all faces |
| Size0001 | 10 | NSW | | | | | |
| walls | 10 | NSW | | | | | |
+-------------------+----+--------+------+-------+----+----+----------------------+
| BENCH03 | 4 | Per | None | 2500 | 1 | 1 | Mixed MI/PO + Per |
| Size0001 | 50 | MI/PO | | | | | |
| | 4 | Per | | | | | |
+-------------------+----+--------+------+-------+----+----+----------------------+
| BENCH04 | 4 | Per | None | 224 | 1 | 1 | Triply periodic |
| Size0001 | 50 | Per | | | | | |
| | 4 | Per | | | | | |
+-------------------+----+--------+------+-------+----+----+----------------------+
| DEM06 | 5 | Per | None | 1 | 10 | 4 | Single particle |
| z multiple | 5 | Per | | | | | falling in fluid |
| | 50 | MI/PO | | | | | (user_drag) |
+-------------------+----+--------+------+-------+----+----+----------------------+
This second set of tests have both fluid and particles and are run in cylindrial geometries
interior to the domain boundaries; they also use drag type "BVK2". Here "IGN" means
those domain boundaries should be ignored because they are outside the EB boundary.
+-------------------+----+-------+------+--------+----+----+----------------------+
| Test | nx | bc_x | EB | Npa | Ng | Np | What does this test? |
| | ny | bc_y | | | | | |
| | nz | bc_z | | | | | |
+===================+====+=======+======+========+====+====+======================+
| BENCH05 | 40 | MI/PO | Cyl | 7949 | 4 | 4 | EB in parallel |
| Size0008 | 10 | IGN | | Auto | | | |
| | 10 | IGN | | | | | |
+-------------------+----+-------+------+--------+----+----+----------------------+
| BENCH05 | 40 | MI/PO | Cyl | 7968 | 4 | 1 | EB in serial |
| Size0008 | 10 | IGN | | Auto | | | |
| serial | 10 | IGN | | | | | |
+-------------------+----+-------+------+--------+----+----+----------------------+
| BENCH05 | 40 | MI/PO | Cyl | 36672 | 16 | 4 | Regrid & dual grid |
| Size0008 | 20 | IGN | | Auto | | | |
| medium | 20 | IGN | | | | | |
+-------------------+----+-------+------+--------+----+----+----------------------+
| BENCH05 | 40 | MI/PO | Cyl | 157106 | 16 | 4 | Regrid & dual grid |
| Size0008 | 40 | IGN | | Auto | | | |
| wide | 40 | IGN | | | | | |
+-------------------+----+-------+------+--------+----+----+----------------------+
| BENCH06 | 40 | Per | Cyl | 627 | 4 | 1 | EB |
| Size0008 | 10 | IGN | | Auto | | | with periodic |
| serial | 10 | IGN | | | | | serial |
+-------------------+----+-------+------+--------+----+----+----------------------+
| BENCH06 | 40 | Per | Cyl | 624 | 4 | 4 | EB |
| Size0008 | 10 | IGN | | Auto | | | with periodic |
| | 10 | IGN | | | | | parallel |
+-------------------+----+-------+------+--------+----+----+----------------------+
This third set of tests is particles-only in rectangular geometries.
+-------------------+----+-------+------+--------+----+----+----------------------+
| Test | nx | bc_x | EB | Npa | Ng | Np | What does this test? |
| | ny | bc_y | | | | | |
| | nz | bc_z | | | | | |
+===================+====+=======+======+========+====+====+======================+
| DEM01 | 4 | NSW | None | 1 | 1 | 1 | Particle only |
| x single | 4 | NSW | | | | | |
| | 4 | NSW | | | | | |
+-------------------+----+-------+------+--------+----+----+----------------------+
| DEM03 | 5 | Per | None | 2 | 1 | 1 | Particles only |
| z single | 5 | Per | | | | | |
| | 2 | NSW | | | | | |
+-------------------+----+-------+------+--------+----+----+----------------------+
| DEM04 | 4 | NSW | None | 1 | 1 | 1 | Particles only |
| z single | 4 | Per | | | | | |
| | 4 | Per | | | | | |
+-------------------+----+-------+------+--------+----+----+----------------------+
Particles on GPUs
==========================
The particle components of MFIX-Exa are a natural candidate for offloading to the GPU.
The particle kernels are compute-intensive and can in principle be processed asynchronously with parts of the fluid advance.
The core components of the particle method in MFIX-Exa are:
- Neighbor List Construction
- Particle-Particle Collisions
- Particle-Wall Collisions
Of these operations, the neighbor list construction requires the most care.
A neighbor list is a pre-computed list of all the neighbors a given particle can interact with over the next *n* timesteps.
Neighbor lists are usually constructed by binning the particles by an interaction distance,
and then performing the N\ :sup:`2` distance check only on the particles in neighboring bins. In detail, the CPU version of the neighbor list algorithm is as follows:
- For each tile on each level, loop over the particles, identifying the bin it belongs to.
- Add the particle to a linked-list for the cell that `owns` it.
- For each cell, loop over all the particles, and then loop over all potential collisions partners in the neighboring cells.
- If a collision partner is close enough, add it to that particle's neighbor list.
To port this algorithm to the GPU, we use the parallel algorithms library Thrust, distributed as part of the CUDA Toolkit. Thrust provides parallel sorting, searching, and prefix summing algorithms that are particularly useful in porting particle algorithms. To construct the neighbor list on the GPU, we follow the basic approach used by Canaba, a product of the Particle Co-Design Center:
- Sort the particles on each grid by bin, using a parallel counting sort. We use Thrust's `exclusive\_scan` function to implement the prefix sum phase of the sort, and hand-coded kernels for the rest. This step does not actually involving rearranging the particle data - rather, we compute a permutation that would put the particles in order without actually reordering them.
- Once the particles are sorted by bin, we can loop over the particles in neighboring bins. We make two passes over the particles. First, we launch a kernel to count the number of collision partners for each particle.
- Then, we sum these numbers and allocate space for our neighbor list.
- Finally, we make a another pass over the particles, putting them into to list at the appropriate place.
Note that we build a \emph{full} neighbor list, meaning that if particle $i$ appears in particle $j$'s list, then particle $j$ also appears in particle $i$'s list. This simplifies the force-computation step when using these lists, since the forces and torques for a given particle can be updated without atomics.
The final on-grid neighbor list data structure consists of two arrays. First, we have the neighbor list itself, stored as a big, 1D array of particle indices. Then, we have an `offsets` array that stores, for each particle, where in the neighbor list array to look. The details of this data structure have been hidden inside an iterator, so that user code can look like:
.. code-block:: c
// now we loop over the neighbor list and compute the forces
AMREX_FOR_1D ( np, i,
{
ParticleType& p1 = pstruct[i];
p1.rdata(PIdx::ax) = 0.0;
p1.rdata(PIdx::ay) = 0.0;
p1.rdata(PIdx::az) = 0.0;
for (const auto& p2 : nbor_data.getNeighbors(i))
{
Real dx = p1.pos(0) - p2.pos(0);
Real dy = p1.pos(1) - p2.pos(1);
Real dz = p1.pos(2) - p2.pos(2);
...
}
Note that, because of our use of managed memory to store the particle data and the neighbor list, the above code will work when compiled for either CPU or GPU.
The above algorithm deals with constructing a neighbor list for the particles on a single grid. When domain decomposition is used, one must also make copies of particles on adjacent grids, potentially performing the necessary MPI communication for grids associated with other processes. The routines `fillNeighbors`, which computes which particles needed to be ghosted to which grid, and `updateNeighbors`, which copies up-to-date data for particles that have already been ghosted, have also been offloaded to the GPU, using techniques similar to AMReX's `Redistribute` routine. The important thing for users is that calling these functions does not trigger copying data off the GPU.
Once the neighbor list has been constructed, collisions with both particles and walls can easily be processed.
We have created a GPU branch of MFIX that is capable of running with GPU support. As of this writing, the following operations in MFIX have been offloaded:
- Neighbor particles / neighbor list construction
- Particle-particle collisions
- Particle-wall collisions
- PIC Deposition (used in putting the drag force and solids volume fraction on the grid)
......@@ -26,6 +26,4 @@ particle-particle, particle-fluid, and particle-wall interactions.
ParticleBasics
ParticleFluid
ParticleWalls
ParticlesOnGpus
Computing slopes
~~~~~~~~~~~~~~~~~~~~~~~~~~~
Slopes are computed one direction at a time for each scalar and each
velocity component.
We use the second order Monotonized Central (MC)
limiter (van Leer, 1977). The scheme is described below for the u-velocity.
The limiter computes the slope at cell "i" by combining the left, central
and right u-variation "du":
.. code:: shell
du_l = u(i) - u(i-1) = left variation
du_c = 0.5 * ( u(i+1) - u(i-1) ) = central (umlimited) variation
du_r = u(i+1) - u(i) = right variation
Finally, the u-variation at cell "i" is given by :
.. code:: shell
du(i) = sign(du_c) * min(2|du_l|, |du_c|, 2|du_r|)) if du_l*du_r > 0
du(i) = 0 otherwise
The above procedure is applied direction by direction.
BOUNDARY CONDITIONS
When periodic or Neumann's BCs are imposed, the scheme can be applied
without any change since the ghost cells at the boundary are filled
by either periodicity or by extrapolation.
For Dirichlet's BCs in the transversal direction, the scheme can again
be applied as is since the velocity is known at the first ghost cell
out of the domain.
However, for Dirichlet's BCs in the longitudinal direction, the velocity
is not known outside the domain since the BC is applied directly at the first
valid node which lies on the boundary itself. Therefore, the scheme must be
arranged as follows to use ONLY values from inside the domain.
For a left boundary (i=0), the u-variations are:
.. code:: shell
du_l = 0 Dont use values on the left
du_c = -1.5*u(0) + 2*u(1) -0.5*u(2) 2nd order right-biased
du_r = u(1) - u(0) Right variation
......@@ -12,17 +12,7 @@ Directory overview
+---------------+--------------------------------------------------+
| exec | Directory for building with gmake (optional) |
+---------------+--------------------------------------------------+
| exec_cc | Directory for building with gmake (optional) |
+---------------+--------------------------------------------------+
| src_des | Source files for particle-only operations |
+---------------+--------------------------------------------------+
| src_ebs | Source files for EB-only operations |
+---------------+--------------------------------------------------+
| src_staggered | Source files for SIMPLE and projection algorithm |
| | with face-centered velocity components |
+---------------+--------------------------------------------------+
| src_cc | Source files for projection algorithms with |
| | cell-centered velocity components |
| src | Source files |
+---------------+--------------------------------------------------+
| tests | Regression tests (see tests/README.md) |
+---------------+--------------------------------------------------+
......
......
......@@ -8,10 +8,3 @@
max-width: 100%;
overflow: visible;
}
/* rtd_theme currently colours code-blocks in the pygments style (ie. green).
* This overrides this design choice with white. Note: future versions of the
* sphinx_rtd_theme might not need this HACK. */
.highlight {
background: #ffffff;
}
......@@ -22,9 +22,14 @@ the master branch at the beginning of each month.
Introduction
GettingStarted
Inputs
Fluids
Particles
Inputs_Chapter
ManagingGridHierarchy_Chapter
Fluids_Chapter
Particles_Chapter
EB
CITests
NightlyTests
Debugging
Notice
------
......
......