Methods For Scalar-On-Function-Regression
Methods For Scalar-On-Function-Regression
Author manuscript
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Author Manuscript
4Research School of Finance, Actuarial Studies and Statistics, Australian National University
5New York State Psychiatric Institute
Abstract
Recent years have seen an explosion of activity in the field of functional data analysis (FDA), in
which curves, spectra, images, etc. are considered as basic functional data units. A central problem
in FDA is how to fit regression models with scalar responses and functional data points as
predictors. We review some of the main approaches to this problem, categorizing the basic model
types as linear, nonlinear and nonparametric. We discuss publicly available software packages, and
illustrate some of the procedures by application to a functional magnetic resonance imaging
dataset.
Author Manuscript
Keywords
functional additive model; functional generalized linear model; functional linear model; functional
polynomial regression; functional single-index model; nonparametric functional regression
1 Introduction
Regression with functional data is perhaps the most thoroughly researched topic within the
broader literature on functional data analysis (FDA). It is common (e.g., Ramsay and
Silverman, 2005; Reiss et al., 2010) to classify functional regression models into three
categories according to the role played by the functional data in each model: scalar
Author Manuscript
*
phil.reiss@nyumc.org.
Reiss et al. Page 2
2011; Huang et al., 2013), climate science (Ferraty et al., 2005; Baíllo and Grané, 2009), and
Author Manuscript
many others. We refer the reader to Morris (2015) for a recent review of functional
regression in general, and to Wang et al. (2015) for a broad overview of FDA.
In cataloguing the many variants of SoFR, we have attempted to cast a wide net. A major
contribution of this review is our attempt not merely to describe many approaches in what
has become a vast literature, but to distill a coherent organization of these methods. To keep
the scope somewhat manageable, we do not attempt to survey the functional classification
literature. We acknowledge, however, that classification and regression are quite closely
related—especially insofar as functional logistic regression, a special case of the functional
generalized linear models considered below in Section 5.3, can be viewed as a classification
method. Our emphasis is more methodological than theoretical, but for brevity we omit a
number of important methodological issues such as confidence bands, goodness-of-fit
diagnostics, outlier detection and robustness.
Author Manuscript
to apply registration or feature alignment, or if the grid points differ across observations, to
interpolate to a dense common grid. Measurement error is expected to be low in some (e.g.,
chemometric) applications, but when it is not it can have important effects on the regression
relation. Some methods (e.g., James, 2002) account explicitly for such error. Here, in order
to keep the focus on the various regression models, we shall mostly assume functional data
observed on a common dense grid with negligible error.
The functional linear model (FLM) is a natural extension of multiple linear regression to
allow for functional predictors. Many techniques have been developed to fit this model, and
we review these in Section 2. Nonlinear extensions of this basic approach are presented in
Section 3. In Section 4 we discuss nonparametric approaches to SoFR, which are based on
distances among the predictor functions. For simplicity of exposition, these three sections
Author Manuscript
consider only the most basic and most common scenario: a single functional predictor and
one real-valued scalar response. Generalizations and extensions, including the inclusion of
scalar covariates, multiple functional predictors, generalized response values, and repeated
observations, are reviewed in Section 5. Section 6 presents some ideas on how to choose
among the many methods. Available software for SoFR is described in Section 7, and an
application to brain imaging data appears in Section 8. Some concluding discussion is
provided in Section 9.
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 3
(1)
where β(·) is the coefficient function and errors εi are iid with mean zero and constant
variance σ2.
The coeffcient function β(·) has a natural interpretation: locations t with largest |β(t)| are
most influential to the response. In order to enforce some regularity in the estimate, a
common general approach to fitting model (1) is to expand β(·) (and possibly the functional
predictors as well) in terms of a set of basis functions. Basis functions can be categorized as
Author Manuscript
either (i) a priori fixed bases, most often splines or wavelets; or (ii) data-driven bases, most
often derived by functional principal component analysis or functional partial least squares.
The next two subsections discuss these two broad alternatives.
(2)
Author Manuscript
for some λ > 0 and some penalty function P (·). The estimate of the coefficient function is
thus .
In spline approaches (e.g., Hastie and Mallows, 1993; Marx and Eilers, 1999; Cardot et al.,
2003), the penalty is generally a quadratic form γT Lγ which measures the roughness of β(t)
= b(t)T γ, and hence (2) is a generalized ridge regression problem. When
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 4
While splines and roughness penalties are a natural choice when the coefficient function is
expected to be smooth, in some applications β(·) may be irregular, with features such as
Author Manuscript
spikes or discontinuities. Wavelet bases (e.g., Ogden, 1997), which provide sparse
representations for irregular functions, have received some attention in recent years. In the
framework of (2), Zhao et al. (2012) propose the wavelet-domain lasso, which combines
wavelet basis functions b(·) with the ℓ1 penalty in (2). Other sparsity
penalties for wavelet-domain SoFR are considered by Zhao et al. (2015) and Reiss et al.
(2015). Not all sparse approaches rely on wavelet bases; see, for example, James et al.
(2009) and Lee and Park (2011).
More flexible, albeit potentially more complex, models can be built by replacing the penalty
with an explicit prior structure in a fully Bayesian framework. Spline approaches of this type
are developed by Crainiceanu and Goldsmith (2010) and Goldsmith et al. (2011); wavelet
Author Manuscript
approaches based on Bayesian variable selection include those of Brown et al. (2001) and
Malloy et al. (2010).
(3)
and the coefficient function using the same basis, the integral in (1)
becomes
(4)
(Cardot et al., 1999). In practice, the eigenfunctions are estimated by functional principal
component analysis (FPCA; Rice and Silverman, 1991; Silverman, 1996; Yao et al., 2005)
and treated as fixed in subsequent analysis.
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 5
The number A of retained components acts as a tuning parameter that controls the shape and
smoothness of β(·). Ways to choose A as a part of the regression analysis include explained
Author Manuscript
variability, bootstrapping (Hall and Vial, 2006), information criteria (Yao et al., 2005; Li et
al., 2013), and cross-validation (Hosseini-Nasab, 2013).
Data-driven bases other than functional principal components can also be utilized, such as
functional partial least squares (FPLS; Preda and Saporta, 2005; Escabias et al., 2007; Reiss
and Ogden, 2007; Aguilera et al., 2010; Delaigle and Hall, 2012) and functional sliced
inverse regression (Ferré and Yao, 2003, 2005).
The functional quadratic regression model of Yao and Müller (2010) can be expressed as
(5)
Here we have both linear and quadratic coefficient functions, β(t) and γ(s, t); when the latter
is zero, (5) reduces to the FLM (1). By expressing elements of (5) in terms of functional
principal components, responses can be regressed on the principal component scores.
Adding higher-order interaction terms results in more general functional polynomial
regression models.
Author Manuscript
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 6
We begin by presenting the functional version of the single-index model (Stoker, 1986):
(6)
which extends the FLM by allowing the function h(·) to be any smooth function defined on
the real line. Fitting this model requires estimation of both the coefficient function β(·) and
the unspecified function h(·) and is typically accomplished in an iterative way. For given
β(·), h(·) can be estimated using splines, kernels or any technique for estimating a smooth
function; for given h(·), β(·) can be estimated in a similar fashion; and the process is iterated
until convergence. Several of the methods described in Section 2 for estimating an FLM
have been combined with a spline method for estimating h(·) in (6) (e.g., Eilers et al., 2009).
Author Manuscript
Alternatively, a kernel estimator can be used for h(·) (e.g., Ait-Säidi et al., 2008; Ferraty et
al., 2011).
(7)
Models of this kind, which extend projection pursuit regression to the functional predictor
Author Manuscript
case, are developed by James and Silverman (2005), Chen et al. (2011), and Ferraty et al.
(2013).
Setting βj(t) = ϕj(t), the jth FPC basis function, reduces (7) to , where
we use xij to denote FPC scores as in (3). Müller and Yao (2008) refer to this as a
“functional additive model,” generalizing the FLM of Section 2.2 which reduced to the
multiple regression model (4) with respect to the FPC scores. An extension of this method,
incorporating a sparsity-inducing penalty on the additive components, is proposed by Zhu et
al. (2014).
Müller et al. (2013) and McLean et al. (2014) propose the model
(8)
where f(·, ·) is a smooth bivariate function that can be estimated by penalized tensor product
B-splines. As an aid to interpretation, note that if sℓ = s0 + ℓΔs, ℓ = 1, … , L, and ,
then for large L, (8) implies
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 7
Author Manuscript
where gℓ(x) = f(x, sℓ)Δs. The expression at right shows that (8) is the limit (as L → ∞) of an
additive model—or in the generalized linear extension considered by McLean et al. (2014),
of a generalized additive model (Hastie and Tibshirani, 1990; Wood, 2006). Hence McLean
et al. (2014) employ the term “functional generalized additive model.”
unspecified, i.e.,
(9)
for some operator . Note that mathematically the FLM can also be formulated as
an operator, but as one that is linear—whereas Ferraty and Vieu (2006) focus primarily on
nonlinear operators m. (For further discussion of the terms nonlinear and nonparametric
SoFR, see Section 9.)
(10)
where K(·) is a kernel function, which we define as a function supported and decreasing on
Author Manuscript
[0, ∞); h > 0 is a bandwidth; and d(·, ·) is a semi-metric. Here we define a semi-metric on
as a function that is symmetric and satisfies the triangle inequality,
but d(f1, f2) = 0 does not imply f1 = f2. (Such a function is often called a “pseudo-metric”;
our terminology has the advantage of implying that a semi-norm ∥·∥ on induces a semi-
metric d(f1, f2) = ∥f1 − f2∥.) Smaller values of imply larger and
thus larger weight assigned to yi.
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 8
Ideally, the bandwidth h should strike a good balance between the squared bias of
Author Manuscript
(which increases with h) and its variance (which decreases as h increases) (Ferraty et al.,
2007). Rachdi and Vieu (2007) consider a functional cross-validation method for bandwidth
selection, and prove its asymptotic optimality. Shang (2013, 2014a,b, 2015) and Zhang et al.
(2014) propose a Bayesian method for simultaneously selecting the bandwidth and the
unknown error density, and show that it attains greater estimation accuracy than functional
cross-validation.
Observe that if the fixed bandwidth h in (10) is replaced by hk(X), the kth-smallest of the
distances , then we instead have a functional version of (weighted) k-
nearest neighbors regression (Burba et al., 2009).
The performance of the functional NW estimator can depend crucially on the chosen
semimetric (Geenens, 2011). Optimal selection of the semi-metric is discussed by Ferraty
and Vieu (2006, Chapters 3 and 13) and is addressed using marginal likelihood by Shang
(2015).
For smooth functional data, it may be appropriate to use the derivative-based semi-metric
For non-smooth functional data, it may be preferable to adopt a semi-metric based on FPCA
truncated at A components,
where the last expression uses truncated expansions defined as in (3). A semi-metric based
on functional partial least squares components (Preda and Saporta, 2005; Reiss and Ogden,
2007) can be defined analogously. Chung et al. (2014) introduced a semi-metric based on
thresholded wavelet coefficients of the functional data objects.
Author Manuscript
Given a set of possible semi-metrics with no a priori preference for any particular option,
one can select the one that minimizes a prediction error criterion such as a cross-validation
score; more generally one can adopt an ensemble predictor (see Section 6.2.2 and Fuchs et
al., 2015).
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 9
Author Manuscript
As an alternative to this “local constant” estimator, Baíllo and Grané (2009) consider a
functional analogue of local polynomial smoothing (Fan and Gijbels, 1996), specifically a
local (functional) linear approximation
whose solution yields the functional local linear estimate . Barrientos-Marin et al.
(2010) propose a compromise between the NW (local constant) and local linear estimators,
while Boj et al. (2010) offer a formulation based on more general distances.
kernel
such k(·, ·) (not to be confused with the univariate kernel function K(·) of (10)) defines a
reproducing kernel Hilbert space (RKHS) of maps , equipped with an inner
product . Preda (2007) considers more general loss functions, but for squared error
loss his proposed estimate of m in (9) is
(11)
where λ is a non-negative regularization parameter, as in criterion (2) for the FLM. A key
RKHS result, the representer theorem (Kimeldorf and Wahba, 1971; Schölkopf et al., 2001),
Author Manuscript
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 10
Note that this nonparametric formulation is distinct from the RKHS approach of Cai and
Yuan (2012) to the functional linear model (1)—in which the coefficient function β(·), rather
than the (generally nonlinear) map , is viewed as an element of an RKHS—as
well as from the RKHS method of Zhu et al. (2014).
As a further link between the (reproducing) kernel approach of Preda (2007) and other
nonparametric approaches (such as the NW estimator) that are based on (semi-metric)
distances among functions, we note that there is a well-known duality between kernels and
distances (e.g., Faraway, 2012, p. 410). Since both kernels and distances can be defined for
more general data types than functional data, the nonparametric FDA paradigm is readily
extensible to “object-oriented” data analysis (Marron and Alonso, 2014).
predictors, and models appropriate for responses that arise from a general exponential family
distribution. In this section we describe some of these generalizations and extensions.
(12)
which include linear effects of scalar covariates zi, estimated using weighted least squares,
and effects of functional predictors , estimated nonparametrically via NW weights.
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 11
A number of recent papers have considered the situation in which the ith observation
includes multiple functional predictors , possibly with different domains
. The FLM (1) extends naturally to the multiple functional regression model
One can also consider two types of functional interaction terms. An interaction between a
scalar and a functional predictor (e.g., McKeague and Qian, 2014) is formally similar to
Author Manuscript
another functional predictor, whereas an interaction between two functional predictors (e.g.,
Yang et al., 2013) resembles a functional quadratic term as in (5).
The “functional additive regression” model of Fan et al. (2015) extends the functional
single-index model (6) to the case of multiple predictors.
(13)
Author Manuscript
where η can be a known function, or estimated nonparametrically; the mr(·)’s are nonlinear
partial functions of . Model (13), like model (8), is referred to by the authors as a
“functional generalized additive model.” Lian (2011) studied the “functional partial linear
regression” model
(14)
which combines linear and nonparametric functional terms. Note that both predictors in (14)
are functional, whereas the linear terms in the “semi-functional” model (12) are scalars.
In all models considered to this point, the response variable has been a continuous, real-
valued scalar. In many practical applications the response is discrete, such as a binary
outcome indicating the presence or absence of a disease. Many of the above methods have
been generalized to allow responses with exponential-family distributions, including both
linear (Marx and Eilers, 1999; James, 2002; Müller and Stadtmüller, 2005; Reiss and Ogden,
2010; Goldsmith et al., 2011; Aguilera-Morillo et al., 2013) and nonlinear (James and
Silverman, 2005; McLean et al., 2014) models. For a single functional predictor and no
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 12
Within the linear SoFR model framework, Marx and Eilers (2005) extend their penalized
spline regression to handle higher dimensional signals by expressing the signals in terms of a
tensor B-spline basis and applying both a “row” and a “column” penalty. Reiss and Ogden
(2010) consider two-dimensional brain images as predictors, expressing them in terms of
Author Manuscript
their eigenimages and enforcing smoothness via radially symmetric penalization. Holan et
al. (2010) and Holan et al. (2012) reduce the dimensionality of the problem by projecting the
images on their (2D) principal components. Guillas and Lai (2010) apply penalized bivariate
spline methods and consider two-dimensional functions on irregular regions. Zhou et al.
(2013) and Zhou and Li (2014) propose methods that exploit the matrix or tensor structure
of image predictors; Huang et al. (2013) and Goldsmith et al. (2014) develop Bayesian
regression approaches for three-dimensional images; and Wang et al. (2014) and Reiss et al.
(2015) describe wavelet-based methods.
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 13
Our stated goal in this paper has been to describe and classify major areas of research in
SoFR, and we acknowledge that any attempt to list all possible variants of SoFR would be
futile. In this subsection we very briefly mention a few models that do not fit neatly into the
major paradigms discussed in Sections 2 through 4 or in their direct extensions in Sections
5.1 through 5.5, knowing this list is incomplete.
5.6.1 Other non-iid settings—This review has focused on iid data pairs ( ), with the
exception of Section 5.4. Other departures from the iid assumption have received some
attention in the SoFR literature. For example, Delaigle et al. (2009) considered
heteroscedastic error variance, while Ferraty et al. (2005) studied α-mixing data pairs.
5.6.2 Mixture regression—A data set may be divided into latent classes, such that each
Author Manuscript
class has a different regression relationship of the form (1). This is the model considered by
Yao et al. (2011), who represent the predictors in terms of their functional principal
components and apply a multivariate mixture regression model fitting technique. Ciarleglio
and Ogden (2016) consider sparse mixture regression in the wavelet domain.
5.6.3 Point impact models—In some situations it may be expected that only one point,
or several points, along the function will be relevant to predicting the outcome. The model
(1) could be adapted to reflect this by replacing the coefficient function β by a Dirac delta
function at some point θ. This is the “point impact” model considered by Lindquist and
McKeague (2009) and McKeague and Sen (2010), who consider various methods for
selecting one or more of these points. Ferraty et al. (2010) consider the same situation but
within a nonparametric setting.
Author Manuscript
A second type of derivative, studied by Hall et al. (2009), is the functional (Gâteaux)
derivative of the operator m in the nonparametric model (9). Roughly speaking, for a given
, the functional derivative is a linear operator mX such that for a small increment
we have m(X + ΔX) ≈ m(X) + mX(ΔX). In the linear case, the functional
derivative is given by the coefficient or slope function, i.e., for all
X and all g. In the nonparametric case, functional derivatives allow one to estimate
functional gradients, which are in effect locally varying slopes. Müller and Yao (2010)
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 14
simplify the study of functional derivatives and gradients by imposing the additive model
Author Manuscript
5.6.5 Conditional quantiles and mode—Up to now we have been concerned with
modeling the mean of y (or a transformation thereof, in the GLM case), conditional on
functional predictors. Cardot et al. (2005) propose instead to estimate a given quantile of y,
conditional on , by minimizing a penalized criterion that is similar to (2), but with the
squared error loss in the first term replaced by the “check function” used in ordinary quantile
regression (Koenker and Bassett, 1978). Chen and Müller (2012), on the other hand,
estimate the entire conditional distribution of y by fitting functional binary GLMs with I(y ≤
y0) (where I(·) denotes an indicator) as response, for a range of values of y0. Quantiles can
then be inferred by inverting the conditional distribution function. Ferraty et al. (2005) also
estimate the entire conditional distribution, but adopt a nonparametric estimator of a
weighted-average form reminiscent of (10). That paper and a number of subsequent ones
Author Manuscript
have studied applications in the geosciences, such as an extreme value analysis of ozone
concentration (Quintela-del-Río and Francisco-Fernández, 2011). The mode of the
nonparametrically estimated conditional distribution serves as an estimate of the conditional
mode of y, whose convergence rate is derived by Ferraty et al. (2005).
Aside from asymptotic properties, choice of a method may be guided by the kind of
interpretation desired, which may in turn depend on the application. As noted in Section 2,
Author Manuscript
the FLM offers a coefficient function, which has an intuitive interpretation. Nonlinear and
especially nonparametric model results are less interpretable in this sense. But for
applications in which one is interested only in accurate prediction, this advantage of the
FLM is immaterial.
Regarding the FLM, we noted in Section 2.1 that smoothness of the data is an important
factor when choosing among a priori basis types, such as splines vs. wavelets. Among data-
driven basis approaches to the FLM (Section 2.2), those that rely on a more parsimonious set
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 15
regard, FPLS has an advantage over FPCR, as emphasized by Delaigle and Hall (2012). On
the other hand, if one is interested only in the coefficient function, not in contributions of the
different components, then the relative simplicity of FPCR is an advantage over FPLS. A
key advantage of FPCR over spline or wavelet methods is that it is more readily applied
when the functional predictors are sampled not densely but sparsely and/or irregularly
(longitudinal data).
We tend to view a wide variety of methods as effective in at least some settings, but one
method for which we have limited enthusiasm is selecting a “best subset” among a large set
of FPCs, as opposed to regressing on the leading FPCs. Whereas leading FPCs offer an
optimal approximation in the sense of Eckart and Young (1936), if one is not selecting the
leading FPCs then it is not clear why the FPC basis is the right one to use at all. Another
basis that is by construction relevant for explaining y, such as FPLS components, may be
Author Manuscript
more appropriate. This view finds support in the empirical results of Febrero-Bande et al.
(2015).
Relying on one’s a priori preference is unlikely to be the best strategy for choosing how to
perform SoFR. Next we discuss two ways to let the data help determine the best approach.
rather than resorting to functional regression. McLean et al. (2015) treat the linear model (1)
as the null, to be tested versus the additive model (8), while García-Portugués et al. (2014)
consider testing against a more general alternative. Horváth and Kokoszka (2012) and Zhang
et al. (2014) investigate hypothesis testing procedures to choose the polynomial order in
functional polynomial models such as the quadratic model (5). Delsol et al. (2011) propose a
test for the null hypothesis that m in the nonparametric model (9) belongs to a given family
of operators.
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 16
6.2.2 Ensemble predictors—In most practical cases, there are not just two plausible
Author Manuscript
models—a null and an alternative—but many reasonable options for performing SoFR, and
it is impossible to know in advance which model and estimation strategy will work best for a
given data set. Because of this, Goldsmith and Scheipl (2014) extended the idea of model
stacking (Wolpert, 1992) (or superlearning, van der Laan et al., 2007) to SoFR. A large
collection of estimators is applied to the data set of interest and are evaluated for prediction
accuracy using cross-validation. These estimators are combined into an “ensemble”
predictor based on their individual performance. These authors found that multiple
approaches yielded dramatically different relative performance across several example data
sets—underlining the value of trying a variety of approaches to SoFR when possible.
the fda package (Ramsay et al., 2009) fits linear models in which either the response and/or
the predictor is functional. The fda.usc package (Febrero-Bande and Oviedo de la Fuente,
2012) implements an extensive range of parametric and nonparametric functional regression
methods, including those studied by Ramsay and Silverman (2005), Ferraty and Vieu (2006)
and Febrero-Bande and González-Manteiga (2013). The refund package (Huang et al.,
2015) implements penalized functional regression, including several variants of the FLM
and the additive model (8), and allows for multiple functional predictors and scalar
covariates, as well as generalized linear models. Optimal smoothness selection relies on the
mgcv package of Wood (2006). The mgcv package itself is one of the most flexible and user-
friendly packages for SoFR, allowing for the whole GLASS structure of Eilers and Marx
(2002) plus random effects; see section 5.2 of Wood (2011), which incidentally was the
second paper to use the term “scalar-on-function regression”. In mgcv, functional predictor
Author Manuscript
terms are treated as just one instance of “linear functional” terms (cf. the “general spline
problem” of Wahba, 1990). The refund.wave package (Huo et al., 2014), a spinoff of
refund, implements scalar-on-function and scalar-on-image regression in the wavelet
domain.
In MATLAB (MathWorks, Natick, MA), the PACE package implements a wide variety of
methods by Müller, Wang, Yao, and co-authors, including a versatile collection of functional
principal component-based regression models for dense and sparsely sampled functional
data. From a Bayesian viewpoint, Crainiceanu and Goldsmith (2010) have developed tools
for functional generalized linear models using WinBUGS (Lunn et al., 2000).
Our work has made us aware of the expanding opportunities to apply SoFR in cuttingedge
biomedical research, a trend that we expect will accelerate in the coming years. For such
applications to be feasible, one must typically move beyond the basic model (1) and
incorporate scalar covariates and/or multiple functional predictors, allow for non-iid data,
and consider hypothesis tests for the coefficient function(s). The following example
illustrates how FLMs can incorporate some of these features (thanks to flexible software
implementation, as advocated in Section 6.1), and can shed light on an interesting scientific
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 17
applied to several data sets, we refer the reader to Goldsmith and Scheipl (2014).
(15)
in which yij is the log pain score for the ith participant’s jth trial, is an indicator for a hot
stimulus, the αi’s are iid normally distributed random intercepts, is the time interval of
the trial, and the εij’s are iid normally distributed errors with mean zero. Unsurprisingly, the
expected difference in log pain score between hot and warm trials is found to be hugely
significant ( , p < 2·10−16). The coefficient function is also very significantly
nonzero, based on the modified Wald test of Wood (2013) or the likelihood ratio test of
Author Manuscript
In particular, Figure 1(b) shows that is clearly positive for the same time points that
evince a warm/hot discrepancy in Figure 1(a). One could venture a “functional collinearity”
explanation for this, in view of the high correlation (0.28) between the indicator variable
and the functional predictor term . In other words, one might suspect that
the highly positive values of around the 20- to 28-second range are just an artifact of the
higher BOLD signal in that range for hot versus warm trials. But that seems incorrect, since
looks very similar even when we fit separate models with only the warm or only the hot
trials (Figure 1(c)). A more cogent explanation is that brain activity detected by the BOLD
signal partially mediates the painful effect of the hot stimulus, a causal interpretation
Author Manuscript
The SoFR implementations of Wood (2011) in the R package mgcv and of Goldsmith et al.
(2011) in the refund package—which were used to create Figure 1(b) and (c), respectively—
make it routine to include scalar covariates and random effects, and to test the significance
of the coefficient function, as we have done here. It is also straightforward to test for a
“scalar-by-function” interaction between stimulus type and BOLD signal (found to be non-
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 18
significant); or to include multiple BOLD signal predictors. Besides the lateral cerebellum,
Author Manuscript
the data set includes such functional predictors for 20 other pain-relevant regions. Several of
these have significant effects on pain, even adjusting for the lateral cerebellum signal, but the
associated increments in explained variance are quite small.
9 Discussion
This paper has emphasized methodological and practical aspects of some widely used SoFR
models. Just a few brief words will have to suffice regarding asymptotic issues. For
functional (generalized) linear models (see Cardot and Sarda, 2011, for a survey),
convergence rates have been studied both for estimation of the coefficient function (e.g.,
Müller and Stadtmüller, 2005; Hall and Horowitz, 2007; Dou et al., 2012) and for prediction
of the response (e.g., Cai and Hall, 2006; Crambes et al., 2009); see Hsing and Eubank
(2015) for a succinct treatment, and Reimherr (2015) for a recent contribution on the key
Author Manuscript
role played by eigenvalues of the covariance operator. A few recent papers (e.g., Wang et al.,
2014; Brunel and Roche, 2015) have derived non-asymptotic error bounds for FLMs. In
nonparametric FDA, only prediction error is considered, with small-ball probability playing
a central role. Ferraty and Vieu (2011) review both pointwise and uniform convergence
results. A detailed analysis of the functional NW estimator’s asymptotic properties was
recently provided by Geenens (2015).
Regarding the term nonparametric in the FDA context, Ferraty et al. (2005) explain: “We use
the terminology functional nonparametric, where the word functional refers to the infinite
dimensionality of the data and where the word nonparametric refers to the infinite
dimensionality of the model.” But some would argue that, since the coefficient function in
(1) lies in an infinite-dimensional space, this nomenclature makes even the functional linear
Author Manuscript
model “nonparametric”. While a fully satisfying definition may be elusive, we find it most
helpful to think of nonparametric SoFR as an analogue of ordinary nonparametric
regression, i.e., as extending the nonparametric model (9) to the case of functional
predictors.
Our use of the term nonlinear in Section 3 can likewise be questioned, since that term
applies equally well to nonparametric SoFR. But we could find no better term to encompass
models that are not linear but that impose more structure than the general model (9). Note
that in non-functional statistics as well, nonlinear usually refers to models that have some
parametric structure, as opposed to leaving the mean completely unspecified.
Other problems of nomenclature have arisen as methods for SoFR have proliferated. We
have seen that “functional (generalized) additive models” may be additive with respect to
Author Manuscript
FPC scores (Müller and Yao, 2008; Zhu et al., 2014); points along a functional predictor
domain (Müller et al., 2013; McLean et al., 2014); or multiple functional predictors with
either nonlinear (Fan et al., 2015) or nonparametric (Febrero-Bande and González-Manteiga,
2013) effects. Likewise, authors have considered every combination of scalar linear (SL),
functional linear (FL) and functional nonparametric (FNP) terms, so that “partial linear”
models may refer to any two of these three: SL+FL (Shin, 2009), SL+FNP (Aneiros-Pérez
and Vieu, 2008), or FL+FNP (Lian, 2011). Finally, some of the nonlinear approaches of
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 19
Section 3 are sometimes termed “nonparametric” since they incorporate general smooth link
Author Manuscript
functions. One of our aims here has been to reduce terminological confusion.
We hope that, by distilling some key ideas and approaches from what is now a sprawling
literature, we have provided readers with useful guidance for implementing scalar-on-
function regression in the growing number of domains in which it can be applied.
Acknowledgments
We thank the Co-Editor-in-Chief, Prof. Marc Hallin, and the Associate Editor and referees, whose feedback enabled
us to improve the manuscript significantly. We also thank Prof. Martin Lindquist for providing the pain data, whose
collection was supported by the U.S. National Institutes of Health through grants R01MH076136-06 and
R01DA035484-01, and Pei-Shien Wu for assistance with the bibliography. Philip Reiss’ work was supported in part
by grant 1R01MH095836-01A1 from the National Institute of Mental Health. Je Goldsmith’s work was supported
in part by grants R01HL123407 from the National Heart, Lung, and Blood Institute and R21EB018917 from the
National Institute of Biomedical Imaging and Bioengineering. Han Lin Shang’s work was supported in part by a
Author Manuscript
Research School Grant from the ANU College of Business and Economics. Todd Ogden’s work was supprted in
part by grant 5R01MH099003 from the National Institute of Mental Health.
References
Aguilera AM, Escabias M, Preda C, Saporta G. Using basis expansions for estimating functional PLS
regression: applications with chemometric data. Chemometrics and Intelligent Laboratory Systems.
2010; 104(2):289–305.
Aguilera-Morillo MC, Aguilera AM, Escabias M, Valderrama MJ. Penalized spline approaches for
functional logit regression. Test. 2013; 22(2):251–277.
Ait-Säidi A, Ferraty F, Kassa R, Vieu P. Cross-validated estimations in the single-functional index
model. Statistics. 2008; 42(6):475–494.
Amato U, Antoniadis A, De Feis I. Dimension reduction in functional regression with applications.
Computational Statistics & Data Analysis. 2006; 50(9):2422–2446.
Aneiros-Pérez, G., Cao, R., Vilar-Fernández, JM., Muñoz-San-Roque, A. Recent Advances in
Author Manuscript
Functional Data Analysis and Related Topics. Springer-Verlag; Berlin: 2011. Functional prediction
for the residual demand in electricity spot markets.
Aneiros-Pérez G, Vieu P. Semi-functional partial linear regression. Statistics and Probability Letters.
2006; 76(11):1102–1110.
Aneiros-Pérez G, Vieu P. Nonparametric time series prediction: A semi-functional partial linear
modeling. Journal of Multivariate Analysis. 2008; 99(5):834–857.
Araki Y, Kawaguchi A, Yamashita F. Regularized logistic discrimination with basis expansions for the
early detection of Alzheimer’s disease based on three-dimensional MRI data. Advances in Data
Analysis and Classification. 2013; 7(1):109–119.
Baíllo A, Grané A. Local linear regression for functional predictor and scalar response. Journal of
Multivariate Analysis. 2009; 100(1):102–111.
Barrientos-Marin J, Ferraty F, Vieu P. Locally modelled regression and functional data. Journal of
Nonparametric Statistics. 2010; 22(5):617–632.
Boj E, Delicado P, Fortiana J. Distance-based local linear regression for functional predictors.
Author Manuscript
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 20
Cai TT, Yuan M. Minimax and adaptive prediction for functional linear regression. Journal of the
American Statistical Association. 2012; 107:1201–1216.
Author Manuscript
Cardot H, Crambes C, Sarda P. Quantile regression when the covariates are functions. Journal of
Nonparametric Statistics. 2005; 17(7):841–856.
Cardot H, Ferraty F, Mas A, Sarda P. Testing hypotheses in the functional linear model. Scandinavian
Journal of Statistics. 2003; 30(1):241–255.
Cardot H, Ferraty F, Sarda P. Functional linear model. Statistics and Probability Letters. 1999; 45(1):
11–22.
Cardot H, Ferraty F, Sarda P. Spline estimators for the functional linear model. Statistica Sinica. 2003;
13:571–591.
Cardot, H., Sarda, P. Functional linear regression. In: Ferraty, F., Romain, Y., editors. The Oxford
Handbook of Functional Data Analysis. Oxford University Press; New York: 2011. p. 21-46.
Chen D, Hall P, Müller H-G. Single and multiple index functional regression models with
nonparametric link. Annals of Statistics. 2011; 39(3):1720–1747.
Chen K, Müller H-G. Conditional quantile analysis when covariates are functions, with application to
growth data. Journal of the Royal Statistical Society, Series B. 2012; 74(1):67–89.
Author Manuscript
Chung C, Chen Y, Ogden RT. Functional data classification: A wavelet approach. Computational
Statistics. 2014; 14:1497–1513.
Ciarleglio A, Ogden RT. Wavelet-based scalar-on-function finite mixture regression models.
Computational Statistics and Data Analysis. 2016; 93:86–96. [PubMed: 26512156]
Crainiceanu CM, Goldsmith J. Bayesian functional data analysis using WinBUGS. Journal of
Statistical Software. 2010; 32:1–33.
Crainiceanu CM, Staicu AM, Di C-Z. Generalized multilevel functional regression. Journal of the
American Statistical Association. 2009; 104:1550–1561. [PubMed: 20625442]
Crambes C, Kneip A, Sarda P. Smoothing splines estimators for functional linear regression. Annals of
Statistics. 2009; 37(1):35–72.
Craven P, Wahba G. Smoothing noisy data with spline functions: estimating the correct degree of
smoothing by the method of generalized cross-validation. Numerische Mathematik. 1979; 31(4):
317–403.
Cuevas A. A partial overview of the theory of statistics with functional data. Journal of Statistical
Author Manuscript
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 21
Faraway JJ. Backscoring in principal coordinates analysis. Journal of Computational and Graphical
Statistics. 2012; 21(2):394–412.
Author Manuscript
Ferré L, Yao AF. Functional sliced inverse regression analysis. Statistics. 2003; 37(6):475–488.
Ferré L, Yao A-F. Smoothed functional inverse regression. Statistica Sinica. 2005; 15:665–683.
Fuchs K, Gertheiss J, Tutz G. Nearest neighbor ensembles for functional data with interpretable feature
selection. Chemometrics and Intelligent Laboratory Systems. 2015; 146:186–197.
García-Portugués E, González-Manteiga W, Febrero-Bande M. A goodness-of-fit test for the
functional linear model with scalar responses. Journal of Computational and Graphical Statistics.
2014; 23(3):761–778.
Geenens G. Curse of dimensionality and related issues in nonparametric functional regression.
Statistics Surveys. 2011; 5:30–43.
Geenens G. Moments, errors, asymptotic normality and large deviation principle in nonparametric
functional regression. Statistics & Probability Letters. 2015; 107:369–377.
Gertheiss J, Goldsmith J, Crainiceanu C, Greven S. Longitudinal scalar-on-functions regression with
application to tractography data. Biostatistics. 2013; 14(3):447–461. [PubMed: 23292804]
Gertheiss J, Maity A, Staicu A-M. Variable selection in generalized functional linear models. Stat.
Author Manuscript
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 22
Goldsmith J, Wand MP, Crainiceanu CM. Functional regression via variational Bayes. Electronic
Journal of Statistics. 2011; 5:572–602. [PubMed: 22163061]
Goutis C. Second-derivative functional regression with applications to near infra-red spectroscopy.
Journal of the Royal Statistical Society, Series B. 1998; 60(1):103–114.
Greven S, Crainiceanu CM, Ca o B, Reich D. Longitudinal functional principal component analysis.
Electronic Journal of Statistics. 2010; 4:1022–1054. [PubMed: 21743825]
Gu, C. Smoothing Spline ANOVA Models. 2nd ed.. Springer; New York: 2013.
Guillas S, Lai M-J. Bivariate splines for spatial functional regression models. Journal of
Nonparametric Statistics. 2010; 22(4):477–497.
Hall P, Horowitz JL. Methodology and convergence rates for functional linear regression. Annals of
Statistics. 2007; 35(1):70–91.
Hall P, Müller H-G, Yao F. Estimation of functional derivatives. Annals of Statistics. 2009; 37(6A):
3307–3329.
Hall P, Vial C. Assessing the finite dimensionality of functional data. Journal of the Royal Statistical
Author Manuscript
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 23
Lei J. Adaptive global testing for functional linear models. Journal of the American Statistical
Association. 2014; 109:624–634.
Author Manuscript
Li Y, Wang N, Carroll RJ. Selecting the number of principal components in functional data. Journal of
the American Statistical Association. 2013; 108:1284–1294.
Lian H. Functional partial linear model. Journal of Nonparametric Statistics. 2011; 23(1):115–128.
Lian H. Shrinkage estimation and selection for multiple functional regression. Statistica Sinica. 2013;
23:51–74.
Lindquist M, McKeague I. Logistic regression with Brownian-like predictors. Journal of the American
Statistical Association. 2009; 104:1575–1585.
Lindquist MA. Functional causal mediation analysis with an application to brain connectivity. Journal
of the American Statistical Association. 2012; 107:1297–1309. [PubMed: 25076802]
Lunn DJ, Thomas A, Best N, Spiegelhalter D. WinBUGS—A Bayesian modelling framework:
concepts, structure, and extensibility. Statistics and Computing. 2000; 10(4):325–337.
Malloy E, Morris J, Adar S, Suh H, Gold D, Coull B. Wavelet-based functional linear mixed models:
an application to measurement error-corrected distributed lag models. Biostatistics. 2010; 11(3):
432–452. [PubMed: 20156988]
Author Manuscript
Marron JS, Alonso AM. Overview of object oriented data analysis. Biometrical Journal. 2014; 56(5):
732–753. [PubMed: 24421177]
Marx BD, Eilers PHC. Generalized linear regression on sampled signals and curves: a P-spline
approach. Technometrics. 1999; 41(1):1–13.
Marx BD, Eilers PHC. Multidimensional penalized signal regression. Technometrics. 2005; 47(1):12–
22.
McKeague IW, Qian M. Estimation of treatment policies based on functional predictors. Statistica
Sinica. 2014; 24(3):1461. [PubMed: 25165416]
McKeague IW, Sen B. Fractals with point impact in functional linear regression. Annals of Statistics.
2010; 38:2559–2586. [PubMed: 23785219]
McLean MW, Hooker G, Ruppert D. Restricted likelihood ratio tests for linearity in scalar-on-function
regression. Statistics and Computing. 2015; 25(5):997–1008.
McLean MW, Hooker G, Staicu A-M, Scheipl F, Ruppert D. Functional generalized additive models.
Journal of Computational and Graphical Statistics. 2014; 23(1):249–269. [PubMed: 24729671]
Author Manuscript
Morris JS. Functional regression. Annual Review of Statistics and Its Application. 2015; 2:321–359.
Müller H-G, Stadtmüller U. Generalized functional linear models. Annals of Statistics. 2005; 33(2):
774–805.
Müller H-G, Wu Y, Yao F. Continuously additive models for nonlinear functional regression.
Biometrika. 2013; 100(3):607–622.
Müller H-G, Yao F. Functional additive models. Journal of the American Statistical Association. 2008;
103:1534–1544.
Müller H-G, Yao F. Additive modelling of functional gradients. Biometrika. 2010; 97(4):791–805.
Nadaraya EA. On estimating regression. Theory of Probability & Its Applications. 1964; 9(1):141–
142.
Ogden, RT. Essential Wavelets for Statistical Applications and Data Analysis. Birkhauser; Boston:
1997.
Preda C. Regression models for functional data by reproducing kernel Hilbert spaces methods. Journal
of Statistical Planning and Inference. 2007; 137(3):829–840.
Author Manuscript
Preda C, Saporta G. PLS regression on a stochastic process. Computational Statistics & Data Analysis.
2005; 48(1):149–158.
Quintela-del-Río A, Francisco-Fernández M. Nonparametric functional data estimation applied to
ozone data: prediction and extreme value analysis. Chemosphere. 2011; 82(6):800–808.
[PubMed: 21144549]
R Core Team. R: a language and environment for statistical computing. R Foundation for Statistical
Computing; Vienna, Austria: 2015.
Rachdi M, Vieu P. Nonparametric regression for functional data: automatic smoothing parameter
selection. Journal of Statistical Planning and Inference. 2007; 137(9):2784–2801.
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 24
Ramsay, JO., Hooker, G., Graves, S. Functional Data Analysis with R and MATLAB. Springer; New
York: 2009.
Author Manuscript
Ramsay, JO., Silverman, BW. Functional Data Analysis. 2nd ed.. Springer; New York: 2005.
Randolph TW, Harezlak J, Feng Z. Structured penalties for functional linear models—partially
empirical eigenvectors for regression. Electronic Journal of Statistics. 2012; 6:323–353.
[PubMed: 22639702]
Ratcliffe SJ, Leader LR, Heller GZ. Functional data analysis with application to periodically
stimulated fetal heart rate data: I. Functional regression. Statistics in Medicine. 2002; 21(8):
1103–1114. [PubMed: 11933036]
Reimherr M. Functional regression with repeated eigenvalues. Statistics & Probability Letters. 2015;
107:62–70.
Reiss PT, Huang L, Mennes M. Fast function-on-scalar regression with penalized basis expansions.
International Journal of Biostatistics. 2010; 6 Article 28.
Reiss PT, Huo L, Zhao Y, Kelly C, Ogden RT. Wavelet-domain regression and predictive inference in
psychiatric neuroimaging. Annals of Applied Statistics. 2015; 9(2):1076–1101. [PubMed:
27330652]
Author Manuscript
Reiss PT, Ogden RT. Functional principal component regression and functional partial least squares.
Journal of the American Statistical Association. 2007; 102:984–996.
Reiss PT, Ogden RT. Smoothing parameter selection for a class of semiparametric linear models.
Journal of the Royal Statistical Society: Series B. 2009; 71(2):505–523.
Reiss PT, Ogden RT. Functional generalized linear models with images as predictors. Biometrics.
2010; 66(1):61–69. [PubMed: 19432766]
Rice J, Silverman B. Estimating the mean and covariance structure nonparametrically when the data
are curves. Journal of the Royal Statistical Society, Series B. 1991; 53:233–243.
Ruppert, D., Wand, MP., Carroll, RJ. Semiparametric Regression. Cambridge University Press;
Cambridge: 2003.
Schölkopf, B., Herbrich, R., Smola, AJ. A generalized representer theorem. In: Helmbold, D.,
Williamson, B., editors. Volume 2111 of Lecture Notes in Artificial Intelligence; Computational
Learning Theory: 14th Annual Conference on Computational Learning Theory, COLT 2001 and
5th European Conference on Computational Learning Theory, EuroCOLT 2001; Berlin and
Author Manuscript
Silverman BW. Smoothed functional principal components analysis by choice of norm. Annals of
Statistics. 1996; 24(1):1–24.
Stoker TM. Consistent estimation of scaled coefficients. Econometrica. 1986; 54(6):1461–1481.
Swihart BJ, Goldsmith J, Crainiceanu CM. Restricted likelihood ratio tests for functional eects in the
functional linear model. Technometrics. 2014; 56:483–493.
van der Laan MJ, Polley EC, Hubbard AE. Super learner. Statistical Applications in Genetics and
Molecular Biology. 2007; 6(25)
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 25
Wahba, G. Spline Models for Observational Data. Society for Industrial and Applied Mathematics;
Philadelphia: 1990.
Author Manuscript
Wang, J-L., Chiou, J-M., Müller, H-G. Review of functional data analysis. 2015. arXiv preprint arXiv:
1507.05135
Wang X, Nan B, Zhu J, Koeppe R, Alzheimer’s Disease Neuroimaging Initiative. Regularized 3D
functional regression for brain image data via Haar wavelets. Annals of Applied Statistics. 2014;
8(2):1045–1064. [PubMed: 26082826]
Watson GS. Smooth regression analysis. Sankhya A. 1964; 26:359–372.
Wolpert DH. Stacked generalization. Neural Networks. 1992; 5:241–259.
Wood, SN. Generalized Additive Models: An Introduction with R. Chapman & Hall; London: 2006.
Wood SN. Fast stable restricted maximum likelihood and marginal likelihood estimation of
semiparametric generalized linear models. Journal of the Royal Statistical Society, Series B.
2011; 73(1):3–36.
Wood SN. On p-values for smooth components of an extended generalized additive model.
Biometrika. 2013; 100(1):221–228.
Yang W-H, Wikle CK, Holan SH, Wildhaber ML. Ecological prediction with nonlinear multivariate
Author Manuscript
Zhao Y, Ogden RT, Reiss PT. Wavelet-based LASSO in functional linear regression. Journal of
Computational and Graphical Statistics. 2012; 21(3):600–617. [PubMed: 23794794]
Zhou H, Li L. Regularized matrix regression. Journal of the Royal Statistical Society, Series B. 2014;
76(2):463–483.
Zhou H, Li L, Zhu H. Tensor regression with applications in neuroimaging data analysis. Journal of the
American Statistical Association. 2013; 108:540–552. [PubMed: 24791032]
Zhu H, Vannucci M, Cox DD. A Bayesian hierarchical model for classification with selection of
functional predictors. Biometrics. 2010; 66(2):463–473. [PubMed: 19508236]
Zhu H, Yao F, Zhang HH. Structured functional additive regression in reproducing kernel Hilbert
spaces. Journal of the Royal Statistical Society, Series B. 2014; 76(3):581–603.
Author Manuscript
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.
Reiss et al. Page 26
Author Manuscript
Author Manuscript
Figure 1.
(a) Overall mean lateral cerebellum BOLD signal at each of the 23 time points, with
approximate 95% confidence intervals, for warm- and hot-stimulus trials. (b) Estimated
coefficient function, with approximate pointwise 95% confidence intervals, for model (15).
(c) Coeffcient function estimate for the full data set (as in (b)) compared with those obtained
with only the hot or only the warm trials.
Author Manuscript
Author Manuscript
Int Stat Rev. Author manuscript; available in PMC 2018 August 01.