.. _sma-rsm:

Response Surface
================

After the full model has been run and the responses collected at the sampled points, the
responses can be interpolated into a ``Response Surface``. The response surface maps
continuouos input parameters into a continuous (or at least piece-wise continuous)
output (response), thus acting as a surrogate model for the full model evaluation.
The basic functionality of the ``Response Surface`` node is outlined below and used
extensively in the :ref:`sma-ex`.


.. _sma-rsm-data:

Data
----

.. figure:: ./images/rsm_data.png
   :align: center


In the ``Data`` tab the required parameter input and corresponding full model response values
are entered into the node for the construction of the response surface. There are two ways in
which the required data can be input. In a typical workflow, the input matrix (typically from
a ``Design of Experiments`` node) and the corresponding model response (typically from
a ``code`` node) are connected to the ``matrix/response`` terminal as shown in the figure above.
Alternatively, if the data has been generated externally or exported from Nodeworks nodes and
agglomerated into a single table, the ``Import`` button at the top of the table can be used to
read in a ``csv`` file.


Once the input and response data has been loaded into the ``Data`` tab,
it can be sorted by individual input variables or the (full model) Response. The table can
be returned to the original index valued order by right clicking and selecting ``clear sort``.
The right click menu also offers the capability to ``Exclude`` or ``Include`` specific
entries.


.. _sma-rsm-model:

Model
-----

There are several response surface modeling choices available in the ``Model`` tab drawing from
the scikits-learn and scipy libraries. Current modeling options include:

*  `gaussian process <http://scikit-learn.org/stable/modules/generated/sklearn.gaussian_process.GaussianProcessRegressor.html>`_
*  `polynomial <http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.PolynomialFeatures.html>`_
*  `multilayer perceptron <http://scikit-learn.org/stable/modules/generated/sklearn.neural_network.MLPRegressor.html>`_
*  `support vector machine <http://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html>`_
*  `decision tree <http://scikit-learn.org/stable/modules/generated/sklearn.tree.DecisionTreeRegressor.html>`_
*  `random forest <http://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html>`_
*  `nearest <https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.NearestNDInterpolator.html>`_
*  `linear <https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.LinearNDInterpolator.html>`_
*  `cubic <https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.CloughTocher2DInterpolator.html>`_
*  `radial basis function <https://docs.scipy.org/doc/scipy/reference/generated/scipy.interpolate.Rbf.html>`_
*  `MARS <http://contrib.scikit-learn.org/py-earth/>`_

With so many different modeling options, determining which particular model is best, or even
which sub-options of a given model are best, can be a challenging task. Several features within
the ``Response Surface`` node have been included to help with model selection.
The following section covers the :ref:`sma-rsm-error` tab which provides graphical representation
of the surrogate model error, i.e., the difference between actual (full) model response
and the response surface at the corresponding input values. The table of the ``Model``
shows several quantitative :ref:`appx-smathry-rsmerr`:

  *  ``MSE``   - :ref:`appx-smathry-rsmerr-mse`
  *  ``R^2``   - :ref:`appx-smathry-rsmerr-rsq`
  *  ``L_inf`` - :ref:`appx-smathry-rsmerr-linf`
  *  ``L_1``   - :ref:`appx-smathry-rsmerr-l2`
  *  ``L_2``   - :ref:`appx-smathry-rsmerr-l2`

The error metrics opperate on either the full dataset or the out-of-sample subset for
`cross validation <https://en.wikipedia.org/wiki/Cross-validation_(statistics)>`_.
In cross validation, a subset of the dataset is  witheld from the fitting of the
response surface. By default, 10 percent of the dataset is witheld as
``Cross validation points``. When cross validation is active, the :ref:`appx-smathry-rsmerr`:
only apply to the witheld points. Without cross validation, the :ref:`appx-smathry-rsmerr`:
apply to the whole dataset.


.. figure:: ./images/rsm_model.png
   :align: center


The figure above shows several ``Model`` surrogate candidates
without cross validation on the left and with 25% cross validation on the right. Without cross
validation four models show zero error (to double precision). However, these interpolators are
designed agree with the model at the sampled points. But how do these surrogates perform over
the entire design space, i.e., in between the sampled points? One way to test this is with
cross validation. When the models are fit to only 75%, we see that the radial basis function 
(RBF) and gaussian process (GP) models have
better predictive accuracy of the 25% hold out data. The cross validation points can also be
used to fine tune model parameters. After activating new models or changing model options, the
error metrocs can be reassessed by hitting the ``Refit Model(s)`` button. The ``Refit Model(s)``
button can also be used without changing any modeling options to simply re-draw the
``Cross validation points``, which are drawn at random from the full data set.


.. _sma-rsm-error:

Error
-----

.. figure:: ./images/rsm_error.png
   :align: center


The difference between the surrogate model(s) and the data is visualized in the
``Error`` tab. If more than one surrogate model has been fit, they can be selected
from the ``Model`` dropdown menu. The discrepancy for a given model can be viewed
in three different forms selectable from the ``Plot`` dropdown menu:

  * ``parity`` plot
  * ``error`` plot
  * error ``histogram``

As in the ``Plot`` tab of this and the :ref:`sma-doe` nodes, points can be highlighed
by holding select and dragging the cursor. Highlighted points can be excluded
(and included if prevously excluded) from the fit of the response surface models.

.. wdfTODO actually discuss the select highlight feature in the DOE plot node


.. warning::
    After assessing the performance of different surrogate models with the 
    :ref:`appx-smathry-rsmerr`: and the ``Error`` plots, make sure to 
    return to the ``Model`` tab, set the ``Cross validation points`` to 0 
    and hit the ``Refit Model(s)`` button to re-fit the surrogate to the 
    full dataset before use in downstream analysis nodes.


.. _sma-rsm-plot:

Plot
----

The different models can be visualized in either ``3D`` or as a ``contour`` plot,
as shown below, in the ``Plot`` tab. If more than two input variables are used, the
variables used for the ``X Axis`` and ``Y Axis`` become selectable from dropdown
menus below the plot. Response data points are selectable and can be excluded from
the fit or shown in the ``Data`` table from the right click menu.


.. figure:: ./images/rsm_plot.png
   :align: center


.. _sma-rsm-output:

Output
------

Located at the bottom of the node below all of the tabs is the ``Output model`` selection.
If the models become computationally intensive, it is recommended that all but the desired
model be deselected from the ``Model`` tab. However, if it is desired to use more than one
surrogate model for analysis in downstream nodes, the different models can be toggled with
the ``Output model`` dropdown.