Multivariate Adaptive Regression Spline
Multivariate Adaptive Regression Spline
In statistics, multivariate adaptive regression splines (MARS) is a form of regression analysis introduced
by Jerome H. Friedman in 1991.[1] It is a non-parametric regression technique and can be seen as an
extension of linear models that automatically models nonlinearities and interactions between variables.
The term "MARS" is trademarked and licensed to Salford Systems. In order to avoid trademark
infringements, many open-source implementations of MARS are called "Earth".[2][3]
The basics
This section introduces MARS using a few examples. We start with a set of data: a matrix of input variables
x, and a vector of the observed responses y, with a response for each row in x. For example, the data could
be:
x y
10.5 16.4
10.7 18.8
10.8 19.7
... ...
20.6 77.0
Here there is only one independent variable, so the x matrix is just a single column. Given these
measurements, we would like to build a model which predicts the expected y for a given x.
The hat on the indicates that is estimated from the data. The
figure on the right shows a plot of this function: a line giving the
predicted versus x, with the original values of y shown as red
dots.
The figure on the right shows a plot of this function: the predicted versus x, with the original values of y
once again shown as red dots. The predicted response is now a better fit to the original y values.
MARS has automatically produced a kink in the predicted y to
take into account non-linearity. The kink is produced by hinge
functions. The hinge functions are the expressions starting with
(where is if , else ). Hinge functions
are described in more detail below.
In this simple example, we can easily see from the plot that y has
a non-linear relationship with x (and might perhaps guess that y
varies with the square of x). However, in general there will be
multiple independent variables, and the relationship between y
and these variables will be unclear and not easily visible by
plotting. We can use MARS to discover that non-linear
relationship.
A simple MARS model of the same data
An example MARS expression with multiple variables is
The model is a weighted sum of basis functions . Each is a constant coefficient. For example,
each line in the formula for ozone above is one basis function multiplied by its coefficient.
3) a product of two or more hinge functions. These basis functions can model interaction between two or
more variables. An example is the last line of the ozone formula.
Hinge functions
A key part of MARS models are hinge functions taking
the form
or
creates the piecewise linear graph shown for the simple MARS model in the previous section.
One might assume that only piecewise linear functions can be formed from hinge functions, but hinge
functions can be multiplied together to form non-linear functions.
Hinge functions are also called ramp, hockey stick, or rectifier functions. Instead of the notation used
in this article, hinge functions are often represented by where means take the positive
part.
MARS starts with a model which consists of just the intercept term (which is the mean of the response
values).
MARS then repeatedly adds basis function in pairs to the model. At each step it finds the pair of basis
functions that gives the maximum reduction in sum-of-squares residual error (it is a greedy algorithm). The
two basis functions in the pair are identical except that a different side of a mirrored hinge function is used
for each function. Each new basis function consists of a term already in the model (which could perhaps be
the intercept term) multiplied by a new hinge function. A hinge function is defined by a variable and a knot,
so to add a new basis function, MARS must search over all combinations of the following:
2) all variables (to select one for the new basis function)
3) all values of each variable (for the knot of the new hinge function).
To calculate the coefficient of each term MARS applies a linear regression over the terms.
This process of adding terms continues until the change in residual error is too small to continue or until the
maximum number of terms is reached. The maximum number of terms is specified by the user before
model building starts.
The search at each step is done in a brute-force fashion, but a key aspect of MARS is that because of the
nature of hinge functions the search can be done relatively quickly using a fast least-squares update
technique. Actually, the search is not quite brute force. The search can be sped up with a heuristic that
reduces the number of parent terms to consider at each step ("Fast MARS"[4]).
The forward pass usually builds an overfit model. (An overfit model has a good fit to the data used to build
the model but will not generalize well to new data.) To build a model with better generalization ability, the
backward pass prunes the model. It removes terms one by one, deleting the least effective term at each step
until it finds the best submodel. Model subsets are compared using the Generalized cross validation (GCV)
criterion described below.
The backward pass has an advantage over the forward pass: at any step it can choose any term to delete,
whereas the forward pass at each step can only see the next pair of terms.
The forward pass adds terms in pairs, but the backward pass typically discards one side of the pair and so
terms are often not seen in pairs in the final model. A paired hinge can be seen in the equation for in the
first MARS example above; there are no complete pairs retained in the ozone example.
The backward pass uses generalized cross validation (GCV) to compare the performance of model subsets
in order to choose the best subset: lower values of GCV are better. The GCV is a form of regularization: it
trades off goodness-of-fit against model complexity.
(We want to estimate how well a model performs on new data, not on the training data. Such new data is
usually not available at the time of model building, so instead we use GCV to estimate what performance
would be on new data. The raw residual sum-of-squares (RSS) on the training data is inadequate for
comparing models, because the RSS always increases as MARS terms are dropped. In other words, if the
RSS were used to compare models, the backward pass would always choose the largest model—but the
largest model typically does not have the best generalization performance.)
The formula for the GCV is
where RSS is the residual sum-of-squares measured on the training data and N is the number of
observations (the number of rows in the x matrix).
where penalty is about 2 or 3 (the MARS software allows the user to preset penalty).
Note that
is the number of hinge-function knots, so the formula penalizes the addition of knots. Thus the GCV
formula adjusts (i.e. increases) the training RSS to take into account the flexibility of the model. We
penalize flexibility because models that are too flexible will model the specific realization of noise in the
data instead of just the systematic structure of the data.
Generalized cross-validation is so named because it uses a formula to approximate the error that would be
determined by leave-one-out validation. It is just an approximation but works well in practice. GCVs were
introduced by Craven and Wahba and extended by Friedman for MARS.
Constraints
One constraint has already been mentioned: the user can specify the maximum number of terms in the
forward pass.
A further constraint can be placed on the forward pass by specifying a maximum allowable degree of
interaction. Typically only one or two degrees of interaction are allowed, but higher degrees can be used
when the data warrants it. The maximum degree of interaction in the first MARS example above is one (i.e.
no interactions or an additive model); in the ozone example it is two.
Other constraints on the forward pass are possible. For example, the user can specify that interactions are
allowed only for certain input variables. Such constraints could make sense because of knowledge of the
process that generated the data.
See also
Linear regression
Local regression
Rational function modeling
Segmented regression
Spline interpolation
Spline regression
References
1. Friedman, J. H. (1991). "Multivariate Adaptive Regression Splines". The Annals of Statistics.
19 (1): 1–67. CiteSeerX 10.1.1.382.970 (https://citeseerx.ist.psu.edu/viewdoc/summary?doi=
10.1.1.382.970). doi:10.1214/aos/1176347963 (https://doi.org/10.1214%2Faos%2F1176347
963). JSTOR 2241837 (https://www.jstor.org/stable/2241837). MR 1091842 (https://mathscin
et.ams.org/mathscinet-getitem?mr=1091842). Zbl 0765.62064 (https://zbmath.org/?format=c
omplete&q=an:0765.62064).
2. CRAN Package earth (https://cran.r-project.org/web/packages/earth/index.html)
3. Earth – Multivariate adaptive regression splines in Orange (Python machine learning library)
(http://orange.biolab.si/blog/2011/12/20/earth-multivariate-adaptive-regression-splines/)
4. Friedman, J. H. (1993) Fast MARS, Stanford University Department of Statistics, Technical
Report 110
5. Kuhn, Max; Johnson, Kjell (2013). Applied Predictive Modeling. New York, NY: Springer
New York. doi:10.1007/978-1-4614-6849-3 (https://doi.org/10.1007%2F978-1-4614-6849-3).
ISBN 9781461468486.
6. Friedman, Jerome H. (1993). "Estimating Functions of Mixed Ordinal and Categorical
Variables Using Adaptive Splines". In Stephan Morgenthaler; Elvezio Ronchetti; Werner
Stahel (eds.). New Directions in Statistical Data Analysis and Robustness. Birkhauser.
7. Friedman, Jerome H. (1991-06-01). "Estimating Functions of Mixed Ordinal and Categorical
Variables Using Adaptive Splines" (https://apps.dtic.mil/sti/citations/ADA590939). DTIC.
Archived (https://web.archive.org/web/20220411085148/https://apps.dtic.mil/sti/citations/AD
A590939) from the original on April 11, 2022. Retrieved 2022-04-11.
8. Denison, D. G. T.; Mallick, B. K.; Smith, A. F. M. (1 December 1998). "Bayesian MARS" (http
s://link.springer.com/content/pdf/10.1023/A:1008824606259.pdf) (PDF). Statistics and
Computing. 8 (4): 337–346. doi:10.1023/A:1008824606259 (https://doi.org/10.1023%2FA%3
A1008824606259). ISSN 1573-1375 (https://www.worldcat.org/issn/1573-1375).
S2CID 12570055 (https://api.semanticscholar.org/CorpusID:12570055).
Further reading
Hastie T., Tibshirani R., and Friedman J.H. (2009) The Elements of Statistical Learning (htt
p://www-stat.stanford.edu/~tibs/ElemStatLearn), 2nd edition. Springer, ISBN 978-0-387-
84857-0 (has a section on MARS)
Faraway J. (2005) Extending the Linear Model with R (http://www.maths.bath.ac.uk/~jjf23),
CRC, ISBN 978-1-58488-424-8 (has an example using MARS with R)
Heping Zhang and Burton H. Singer (2010) Recursive Partitioning and Applications (https://
www.amazon.com/Recursive-Partitioning-Applications-Springer-Statistics/dp/1441968237),
2nd edition. Springer, ISBN 978-1-4419-6823-4 (has a chapter on MARS and discusses
some tweaks to the algorithm)
Denison D.G.T., Holmes C.C., Mallick B.K., and Smith A.F.M. (2004) Bayesian Methods for
Nonlinear Classification and Regression (http://www.stat.tamu.edu/~bmallick/wileybook/boo
k_code.html), Wiley, ISBN 978-0-471-49036-4
Berk R.A. (2008) Statistical learning from a regression perspective, Springer, ISBN 978-0-
387-77500-5
External links
Several free and commercial software packages are available for fitting MARS-type models.
Free software
R packages:
earth function in the earth (https://cran.r-project.org/web/package
s/earth/index.html) package
mars function in the mda (https://cran.r-project.org/web/packages/md
a/index.html) package
polymars function in the polspline (https://cran.r-project.org/web/p
ackages/polspline/index.html) package. Not Friedman's MARS.
bass function in the BASS (https://cran.r-project.org/web/packages/B
ASS/index.html) package for Bayesian MARS.
Matlab code:
ARESLab: Adaptive Regression Splines toolbox for Matlab (http://www.cs.rtu.lv/jekabson
s/regression.html)
Code (https://web.stat.tamu.edu/~bmallick/wileybook/book_code.html) from the book
Bayesian Methods for Nonlinear Classification and Regression[1] for Bayesian MARS.
Python
Earth – Multivariate adaptive regression splines (http://orange.biolab.si/blog/2011/12/20/
earth-multivariate-adaptive-regression-splines/)
py-earth (https://github.com/jcrudy/py-earth/)
pyBASS (https://github.com/lanl/pyBASS) for Bayesian MARS.
Commercial software