.. _sma-doe:


Design of Experiments
=====================

The ``Design of Experiments`` is used to choose the sampling points to be used as
input parameters for full model evaluation. It is important in this stage to
consider the potential range of interest and ensure that the sampling space
completely covers this range so that the resulting surrogate model is analyzed
within its range of support.


.. _sma-doe-variables:

Variables
---------

.. figure:: ./images/doe_variables.png
   :align: center


Variables are added in the first ``Variables`` tab by clicking on the |add| symbol
above the table. Variables can be removed from the table by highlighting them in the
table and clicking on the |rm| symbol. Information about the variable properties are entered
and edited in the entry spaces below the table. Some of the properties are ``type`` specific,
discussed below. All entered variables are listed and summarized in the table.

Variable properties:

  *  ``variable``: used to name and identify the variables.
  *  ``arg(s)``: defines the variable argument, i.e., the ineger index if the variable
     is an array or vector.
  *  ``units``: specify variable units for plotting.
  *  ``type``:  ``Double Precision`` real variables.

    -  ``link`` used to specify the variable as a function of another variable. **NOTE**
       that the variable is no longer an independent variable, which can be important
       when sampling randomly. However, it can also be useful when dependent variables need
       to be set consistently with one independent random variable.  The plot
       below shows a two variable array which must sum to one like the volume fraction of a
       two-phase mixture, for example.
    -  ``from``: lower bound of parameter space of this variable dimension.
    -  ``to``: upper bound of parameter space of this variable dimension.
    -  ``levels``: defines the number of evenly spaced sampling points between ``from``
       and ``to`` inclusively. **NOTE** that this is only used in the specific instance when
       the factorial design is specified on the ``Design`` tab *and* the
       ``Use variable specific levels`` option is checked.

  *  ``type``: ``Integer`` integer valued variables have the same properties as
     ``Double Precision`` variables. **NOTE** that integers are also treated the same as
     reals during the calculation of the sampling points and then rounded. Care should be
     taken to avoid unwanted repeated samples.
  *  ``type``: ``String`` text based variables can be added and removed with the ``+``
     and ``-`` symbols, activated by checking the box and named by double clicking on
     the entry space to the right of the checkbox. **NOTE** that the functionality of ``String``
     variables is not currently fully implemented. Specifically, downstream nodes can not
     handle string variables. Currently, it is recommended that users convert strings into
     integer or real variables.
  *  ``type``: ``Logical`` variables only take integer values of 1 or 0 for True or False.


.. figure:: ./images/doe_variables_linked.png
   :align: center

   A design where two variables are linked.


.. _sma-doe-design:

Design
------

The ``Design`` tab is where the samples are actually constructed. Generally speaking, there
are four ways to construct a sampling scheme: ordered, sequentially (sub-random),
pseudo-randomly and simply importing a design constructed previously or from a different code.
Available ``Method``'s include

Ordered designs:

  *  :ref:`appx-smathry-doe-fact`
  *  :ref:`appx-smathry-doe-cov`
  *  :ref:`appx-smathry-doe-ccd`

Sequential designs

  *  :ref:`appx-smathry-doe-sobol`
  *  :ref:`appx-smathry-doe-hammersly`
  *  :ref:`appx-smathry-doe-halton`

Random designs

  *  :ref:`appx-smathry-doe-mcs`
  *  :ref:`appx-smathry-doe-lhs`


Previously generated designs

  *   The ``Import`` button is used to load a design saved locally in
      ``csv``, comma separated variable format


Additional sampling properties:

  *  ``Samples`` specifies the total number of samples drawn
  *  ``Repeat`` specifies how many samples are repeated a specified number of times--this
     can be useful when generating designs for *actual* experiments and for
     non-deterministic simulations
  *  ``Randomize sample order``--this applies to the table as well as the order in which
     the samples are transferred to downstream nodes
  *  ``Build`` generates the samples, also used to re-generate the design if properties
     are adjusted
  *  ``Export`` exports the samples to a comma separated variable file


Some other method-specific properties:

  *  ``Randomize`` toggles between setting the seed of the pseudorandom number generator
     from the given ``seed`` or from the clock **NOTE** that randomize should be used with
     caution and the accepted design should be saved with ``Export`` as the design will
     be re-generated and changed when the sheet is run
  *  ``Levels`` a constant number number of sampling intervals to be used with all variables
     with the ``Factorial`` design
  *  Alternatively, users may check ``Use variable specific levels`` and the intervals for
     each variable are taken from the ``levels`` value entered previously in the
     ``Variables`` tab
  *  ``Face`` in the ``central composite`` design specifies ``circumscribed``,
     ``inscribed`` or ``faced``
  *  ``Alpha`` in the ``central composite`` design specifies ``orthogonal``
     or ``rotatable``
  *  ``Optimize`` in the ``latin hypercube`` design allows selection of an optimization
     technique to improve the space filling of the design. See :ref:`appx-smathry-doe-lhs`
     for more.
  * ``Iterations`` in the ``latin hypercube`` design allows for the
    specification of iterations used by the ``Optimize`` technique.

.. figure:: ./images/doe_methods.png
   :align: center

   Example of designs with 30 samples.


.. _sma-doe-plot:

Plot
----

The samples are plotted in a 2D scatter plot. The variables can be set by selecting the
``Y Axis`` and ``X Axis`` variables from the dropdown list. The plot can be customized
and saved using the buttons below the plot. By default, all variables are shown with included
points in blue and excluded points in red. Excluded and/or included points can be
alternatively turned off by right clicking on the plot and unchecking.


.. _sma-doe-quality:

Quality
-------

The ``Quality`` tab analyzes the spatial quality of the design. The ``Minimum Distance``,
``Maximum Distance`` and their ``Ratio (Max/Min)`` are reported and calculated using
`scipy.spatial.distance.pdist <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html>`_.
Several ``Distance Metric`` options are available from the **scipy** library:


  *  `Bray-Curtis <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.braycurtis.html>`_
  *  `Canberra <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.canberra.html>`_
  *  `Chebyshev <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.chebyshev.html>`_
  *  `City Block (Manhattan) <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cityblock.html>`_
  *  `Correlation <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.correlation.html>`_
  *  `Cosine <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cosine.html>`_
  *  `Dice <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.dice.html>`_
  *  `Euclidean <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.euclidean.html>`_
  *  `Hamming <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.hamming.html>`_
  *  `Jaccard <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.jaccard.html>`_
  *  `Kulsinski <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.kulsinski.html>`_
  *  `Mahalanobis <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.mahalanobis.html>`_
  *  ``matching`` - Synonym for ``Hamming``
  *  `Minkowski <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.minkowski.html>`_
  *  `Rogers-Tanimoto <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.rogerstanimoto.html>`_
  *  `Russell-Rao <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.russellrao.html>`_
  *  `Standardized Euclidean <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.seuclidean.html>`_
  *  `Sokal-Michener <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.sokalmichener.html>`_
  *  `Sokal-Sneath <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.sokalsneath.html>`_
  *  `Squared Euclidean <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.sqeuclidean.html>`_
  *  `Yule <https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.yule.html>`_

The ``L2-Discrepancy`` from Eq.5 of [Fang2001b]_ is also calculated for the
design.

.. math::

    WD^2(D) = -(4/3)^K + 1/N^2 \Sigma_{i,j=1}^{N} Pi_{k=1}^K [3/2 - |x_k^1 - x_k^2| * (1 - |x_k^1 - x_k^2|)]

References
++++++++++

.. [Fang2001b] K.T. Fang and C.X. Ma, "Wrap-Around L2-Discrepancy of Random
   Sampling, Latin Hypercube, and Uniform Designs," Journal of Complexity,
   vol. 17, pp. 608-624, 2001.

.. icon images
.. |add| image:: ../../../nodeworks/images/add.svg
.. |rm| image:: ../../../nodeworks/images/remove.svg