Nonparametric Kernel Regression With Multiple Predictors and Multiple Shape Constraints

Statistica Sinica 23 (2013), 1347-1371
doi:http://dx.doi.org/10.5705/ss.2012.024
NONPARAMETRIC KERNEL REGRESSION WITH

MULTIPLE PREDICTORS AND MULTIPLE
SHAPE CONSTRAINTS
Pang Du, Christopher F. Parmeter and Jeffrey S. Racine
Virginia Tech, University of Miami and McMaster University
Abstract: Nonparametric smoothing under shape constraints has recently received

much well-deserved attention. Powerful methods have been proposed for imposing
a single shape constraint such as monotonicity and concavity on univariate func-
tions. In this paper, we extend the monotone kernel regression method in Hall and
Huang (2001) to the multivariate and multi-constraint setting. We impose equality
and/or inequality constraints on a nonparametric kernel regression model and its
derivatives. A bootstrap procedure is also proposed for testing the validity of the
constraints. Consistency of our constrained kernel estimator is provided through
an asymptotic analysis of its relationship with the unconstrained estimator. The-
oretical underpinnings for the bootstrap procedure are also provided. Illustrative
Monte Carlo results are presented and an application is considered.
Key words and phrases: Hypothesis testing, multivariate kernel estimation, non-
parametric regression, shape restrictions.
1. Introduction
Imposing shape constraints on a regression model is a necessary component
of sound applied data analysis. For example, imposing monotonicity and con-
cavity constraints is often required in a range of application domains. Early
developments in the absence of smoothness restrictions can be found in Robert-
son, Wright, and Dykstra (1988) and the references therein. When data are
believed to have nonlinear structure in addition to obeying shape constraints,
nonparametric smoothing methods such as kernel smoothing and splines are of-
ten used. For example, restricted kernel smoothing has been considered by Muk-
erjee (1988), Mammen (1991), Hall and Huang (2001), Braun and Hall (2001),
Hall and Kang (2005), Birke and Dette (2007), and Carroll, Delaigle, and Hall
(2011), among others; restricted spline smoothing has been studied by Wright
and Wegman (1980), Ramsay (1988), Mammen and Thomas-Agnam (1999), Pal
and Woodroofe (2007), and Wang and Shen (2010), to name but a few. An ex-
ample of restricted estimation by other smoothing methods is Xu and Phillips
1348 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
(2012) where positivity is imposed on the estimation of a conditional covariance

function by empirical likelihood.
As pointed out in the comprehensive review by Mammen et al. (2001), a com-
mon limitation of these approaches is that they only consider shape constraints
on univariate functions and often only one constraint at a time is entertained.
Extensions to either multivariate functions or multiple constraints has received
far less attention. For example, Dette and Scheder (2006) consider the problem
of estimating a multivariate regression function that is strictly monotone in all
directions by successively applying one-dimensional isotonization procedures to
an initial unconstrained kernel regression estimator, while Birke and Pilz (2009)
impose monotonicity and convexity in the kernel estimation of a single dimension
call price function.
In practice, however, many applications call for imposing multiple shape
constraints on multivariate functions. Examples include dose-response studies
where multiple drug combinations are applied and the response function is often
assumed to be monotone in the amount of each drug, or economic studies where
the response function must satisfy coordinate-wise monotonicity, coordinate-wise
concavity, and constant returns to scale that essentially require first-order partial
derivatives from a double-log model to sum to one. For such problems, Gallant
(1982) and Gallant and Golub (1984) proposed a series-based estimator called
the Flexible Fourier Form (Gallant (1981)) whose coefficients can be restricted
to impose the relevant shape constraints. Although the method can handle mul-
tiple constraints, it is hard to incorporate certain common constraints such as
monotonicity. Villalobos and Wahba (1987) proposed a constrained thin-plate
spline estimator and applied it to the posterior probability estimation in classi-
fication problems where the probability function must lie between 0 and 1. But
the asymptotic properties of their estimator are not well understood. Matzkin
(1991, 1992) studied constrained maximum likelihood estimators using interpo-
lation, but these estimators are not smooth and are hard to generalize beyond
coordinate-wise monotonicity and concavity. Mammen et al. (2001) discussed
extension of their projection framework for constrained smoothing to nonpara-
metric additive models, which can be considered as a special case of the general
constraints that we consider below. Also, none of the aforementioned approaches
for multiple constraints propose hypothesis testing procedures for the constraints
themselves.
In this paper, we propose a kernel smoothing method that can handle mul-
tiple general shape constraints for multivariate functions. Our method can be
considered as a generalization of Hall and Huang (2001) where monotone regres-
sion was considered for a general class of kernel smoothers, as well as a partial
KERNEL REGRESSION WITH SHAPE CONSTRAINTS 1349
generalization of Carroll, Delaigle, and Hall (2011) where the monotonicity con-
straint is replaced by a general shape constraint with the complication of mea-
surement errors. Similar to Hall and Huang (2001), our estimator is constructed
by introducing weights for each response data point that can dampen or magnify
the impact of any observation. In order to deliver an estimate satisfying the
shape constraints, the weights are selected to minimize their distance from the
uniform weights of the unconstrained estimator while obeying the constraints. In
Hall and Huang (2001), this distance was measured by a power divergence met-
ric introduced in Cressie and Read (1984); this has a rather complicated form
and is hard to generalize. Instead, we resort to the well-known l2 -metric that is
much simpler and has all the desired properties of the power divergence metric
(Theorem 1). Under certain conditions, we generalize the consistency results of
Hall and Huang (2001, Thm. 4.3) to our multivariate and multiple constraint
setting. In essence, when the shape constraints are non-binding (i.e., strictly
satisfied) on the domain, the restricted estimator is asymptotically and numer-
ically equivalent to the unrestricted estimator. When the shape constraints are
non-binding everywhere except for a certain area of measure 0, the restricted and
unrestricted estimators are still very close to each other except in a neighborhood
of the binding area.
Besides the multivariate and multi-constraint extension, we also propose a
bootstrap procedure to test the validity of the shape constraints. Inference for
shape restrictions in nonparametric regression settings has attracted much at-
tention in the past decade. For example, Hall and Heckman (2000) developed
a bootstrap test for monotonicity whose test statistic is formulated around the
slope estimates of linear regression models fitted over small intervals. Ghosal,
Sen, and van der Vaart (2000) introduced a statistic based on a U-process measur-
ing the discordance between predictor and response. Both of these approaches
are specifically designed for monotonicity constraints and it is difficult to ex-
tend them to our multivariate and multi-constraint setting. Yatchew and Härdle
(2006) employed a residual-based test to check for monotonicity and convexity
simultaneously in a regression setting, but their result is for univariate functions
and requires that the constraints be strictly satisfied. Our bootstrap procedure
originates from Hall et al. (2001), which tested the monotonicity of hazard func-
tions but provided no theoretical justification for the procedure. Although the
test statistic is difficult to analyze asymptotically, we are able to provide asymp-
totic results for its implementation when shape constraints are satisfied on a
sufficiently dense grid of points. The derivation of our result takes advantage of
the simple form of the l2 metric that may not be easily available for the power
divergence metric, a promising artifact of the l2 metric we adopt. Another ap-
pealing aspect of our method is that it only involves quadratic programming and
can be readily implemented using standard sequential quadratic programming

software.
The rest of this paper proceeds as follows. Section 2 proposes a general non-
parametric regression estimator in the presence of linear shape constraints and
presents a simple test of the validity of the constraints. Section 3 investigates
the theoretical properties of the proposed method, including the existence and
consistency of the constrained estimator as well as the asymptotic behavior of the
test statistic. Section 4 considers examples treating monotonicity/concavity and
global concavity. Section 5 considers a number of simulated applications, exam-
ines the finite-sample performance of the proposed test, and presents an empir-
ical application involving technical efficiency on Indonesian rice farms. Section
6 presents some discussion and concluding remarks. The proofs of the theorems
are presented in the online Supplementary Material.
2. A Constrained Kernel Regression Estimator

In the following two subsections we outline the mechanics of our estimation
and inference procedures before launching into theoretical properties and proofs.
Theoretical properties are then outlined in Section 3 while the detailed proofs
are relegated to the online Supplementary Material.
2.1. The estimator

In what follows we let {Yi , Xi }ni=1 denote sample pairs of response and
explanatory variables, where Yi is a scalar, Xi is of dimension r, and n de-
notes the sample size. The goal is to estimate the unknown mean response
g(x) ≡ E(Y |X = x) from the regression model Yi = g(Xi ) + εi , subject to con-
straints on g (s) (x), where s is an r-vector corresponding to the dimension of x
and the εi ’s are independent and identically distributed errors with zero mean
and variance σ 2 . In what follows, the elements of s represent the order of the
partial derivative corresponding to each element of x, so for s = (s1 , s2 , . . . , sr ),
g (s) (x) = [∂ s1 g(x) · · · ∂ sr g(x)]/[∂xs11 · · · ∂xsrr ].
Given an estimate ĝ(x), suppose one wishes constraints on ĝ(x) of the form
l(x) ≤ ĝ (s) (x) ≤ u(x) (2.1)
for arbitrary l(·), u(·), and s. For some applications, s = (0, . . . , 0, 1, 0, . . . , 0)

would be of particular interest, say for example when the partial derivative rep-
resents a budget share and therefore must lie in [0, 1]; s = (0, 0, . . . , 0) might be
of interest when an outcome must be bounded. Additional constraints that could
be imposed in this framework are (log-) supermodularity, (log-) convexity, and
quasiconvexity, all of which focus on second order or cross-partial derivatives.
Variously, equality rather than inequality constraints might be required.
The two-sided constraint (2.1) can be considered as a special case of multiple

simultaneous one-sided constraints. Hence for general purposes, we consider
restrictions of the form
∑
αs,k ĝ (s) (x) − ck (x) ≥ 0, k = 1, . . . , T, (2.2)
s∈Sk
where T is the number of restrictions and, in each restriction, the sum is taken
over all vectors in Sk that correspond to the constraints, with αs,k a set of
constants used to generate them. Note that (2.2) could be further generalized
to contain more sophisticated constraints such as global concavity/convexity or
homogeneity of degree R (Euler’s theorem) by allowing the αs,k to be functions
of the covariates. See Section 4.2 for one such generalization. In what follows we
presume, without loss of generality, that for all s, αs,k ≥ 0 and ck (x) ≡ 0, since
the ck (x)’s are known functions. The approach we describe is quite general; it
admits arbitrary combinations of constraints subject to the obvious caveat that
the constraints must be internally consistent.
Standard kernel regression smoothers can be written as linear combinations
of the response Yi ,
∑n
ĝ(x) = Ai (x)Yi , (2.3)
i=1
where Ai (x) is a local weighting matrix. This includes the Nadaraya-Watson esti-
mator (Nadaraya (1965),Watson (1964)), the Priestley-Chao estimator (Priestley
and Chao (1972)), the Gasser-Müller estimator (Gasser and Müller (1979)), and
the local polynomial estimator (Fan (1992)), among others. Following Hall and
Huang (2001), we consider a generalization of (2.3) to
∑
n
ĝ(x|p) = pi Ai (x)Yi , (2.4)
i=1
∑ (s)
and where ĝ (s) (x|p) = ni=1 pi Ai (x)Yi .
As an example, we use (2.4) to generate an unrestricted Nadaraya-Watson es-
∑
timator. Here we take pi = 1/n, i = 1, . . . , n, and set Ai (x) = nKh (Xi , x)/ nj=1
Kh (Xj , x), where Kh (·) is a product kernel and h is a vector of bandwidths; see
Racine and Li (2004) for details. When pi ̸= 1/n for some i, we have a restricted
Nadaraya-Watson estimator; the selection of p satisfying particular restrictions
is discussed below.
Let pu be the n-vector of uniform weights and let p be the vector of
weights to be selected. To impose our constraints, we choose p to minimize
some distance measure from p to pu as proposed by Hall and Huang (2001).
Whereas Hall and Huang (2001) consider probability weights and distance mea-
sures suitable for probability weights (e.g.,
∑ Hellinger), we allow for both positive
and negative weights while retaining i pi = 1, and so require alternative dis-
tance measures.
We also forgo the power divergence metric of Cressie and Read (1984) that
was used by Hall and Huang (2001) since it is only valid for probability weights.
Instead we use the l2 metric D(p) = (pu − p)′ (pu − p) that has a number of
appealing features in this context, as will be seen. Our problem then is to select
weights p that minimize D(p) subject to l(x) ≤ ĝ (s) (x|p) ≤ u(x), and perhaps
additional constraints of a similar form; this can be cast as a general nonlinear
programming problem. Theoretical underpinnings of the constrained estimator
are provided in Theorems 1 and 2 in Section 3. The explicit form of the quadratic
programming problem is presented right before Theorem 3 in Section 3, where
such form is needed for the theoretical development. Section 4 describes setup
and implementation details for two (common) types of constraints: coordinate-
wise monotonicity/concavity constraints and global concavity.
2.2. Hypothesis testing for shape constraints

In this section, we propose a test for the validity of arbitrary shape con-
straints using D(p̂) as the test statistic. It is a bootstrap testing procedure that
is simple to implement and extends the monotonicity testing procedure in Hall
et al. (2001) to our multivariate and multiple constraints setting. Theoretical
underpinnings are provided in Theorem 3 in Section 3.
This bootstrap approach involves estimating the constrained regression func-
tion ĝ(x|p) based on the sample realizations {Yi , Xi } and then rejecting H0 if the
observed value of D(p̂) is too large. We use a resampling approach for generat-
ing the null distribution of D(p̂) which involves generating resamples for y drawn
from the constrained model via iid residual resampling (i.e., conditional on the
sample {Xi }), which we denote {Yi∗ , Xi } (for non-iid data the dependent wild
bootstrap residual resampling scheme could be used instead; see Shao (2010)).
These resamples are generated under H0 , hence we recompute ĝ(x|p) for the
bootstrap sample {Yi∗ , Xi }, denoted ĝ(x|p∗ ), which then yields D(p∗ ). We re-
peat this process B times. Finally, we compute the empirical P value, PB , the
proportion of the B bootstrap resamples D(p∗ ) that exceed D(p̂),
1 ∑
B
PB = 1 − F̂ (D(p̂)) = I(D(p∗ ) > D(p̂)),
B
j=1
where I(·) is the indicator function and F̂ (D(p̂)) is the empirical distribution
function of the bootstrap statistics; one rejects the null hypothesis if PB is less
than α, the level of the test.
We note three situations here that can occur in practice.

(i) Impose non-binding constraints (they are ‘correct’ de facto).
(ii) Impose binding constraints that are correct.
(iii) Impose binding constraints that are incorrect.
If one encounters (i) in practice, D(p̂) = 0, we recommend following the advice

of Hall et al. (2001, p.609): “For those datasets with D(p̂) = 0, no further
bootstrapping is necessary [. . . ] and so the conclusion (for that dataset) must be
to not reject H0 .”
3. Theoretical Properties of the Estimator and Test Statistic

In the following subsections we consider the theoretical properties of the
estimator and test procedure. We denote the domain of interest by J ≡ [a, b] =
∏r
i=1 [ai , bi ]. And to simplify the notation, we define a differential operator m 7→
D D
∑
m such that m (x) is a length-T vector with kth entry s∈Sk αs,k m(s) (x). We
∑
take |s| = ri=1 si as the order for a derivative vector s = (s1 , . . . , sr ), and say
a derivative s1 has a higher order than s2 if |s1 | > |s2 |. Let S = ∪Tk=1 Sk and
dS be the derivative of the “maximum order” among all the derivatives in S; for
simplicity, we drop the subscript S from dS . With ck (x)’s set to 0, we plug (2.4)
into (2.2) to yield
∑
n
pi AD
i (x)Yi ≥ 0. (3.1)
i=1
3.1. Existence of the constrained estimator

The theorem here shows the existence of a set of weights that satisfy the
constraints in (3.1). Its proof is in Section S1 of the online Supplementary Ma-
terial.
Theorem 1. Assume that the set {1, . . . , n} contains a sequence {i1 , . . . , ik } with
the following properties.
(i) For each k, ADik (x) is strictly positive and continuous on an open set Oik ⊂
Rr , and vanishes on Rr \ Oik ,
(ii) Every x ∈ J is contained in at least one open set Oik ,
(iii) For 1 ≤ i ≤ n, AD
ik (x) is continuous on (−∞, ∞) .
r
Then there exists a vector p = (p1 , . . . , pn ) such that the constraints are satisfied
for all x ∈ J .
Conditions (i) and (ii) are to ensure the existence of an open cover of the
domain J by the open sets Oi on which AD i is positively supported for some i. We
note that the above conditions are sufficient but not necessary for the existence
of a set of weights that satisfy the constraints for all x ∈ J . For example, if
sgn AD D
jn (x) = 1 ∀x ∈ J for some sequence jn in {1, . . . , n} and sgn Aln (x) = −1
∀x ∈ J for another sequence ln in {1, . . . , n}, then for those observations that
switch signs, pi may be set equal to zero, while pjn > 0 and pln < 0 is sufficient
to ensure existence of a set of p’s satisfying the constraints.
3.2. Consistency of the constrained estimator

Here we detail the consistency of our constrained estimator.
{ To begin, define
}
∏
a hyperplane subset of J to be a subset of the form S = x0k × i̸=k [ai , bi ] for
some 1 ≤ k ≤ r and some x0k ∈ [ak , bk ]. We call S an interior hyperplane subset
if x0k ∈ (ak , bk ). For what follows, g(·) (or g D (·)) is the true conditional mean
(or its derivative), p̂ is the optimal weight vector satisfying the constraints, ĝ(·|p̂)
(or ĝ D (·|p̂)) is the constrained estimator defined in (2.4), and g̃(·) (or g̃ D (·)) is
the unconstrained estimator defined in (2.3).
Assumption A1.
(i) The sample Xi either form a regularly spaced grid on a compact set I ≡
∏
[c, e] = ri=1 [ci , ei ] or constitute independent random draws from a distribu-
tion whose density f is continuous and nonvanishing on I; the εi are inde-
pendent and identically distributed with zero mean and variance σ 2 , and are
independent of the Xi ; the kernel function K(·) is a symmetric, compactly
supported density such that K D is Hölder-continuous on J ⊂ I.
( )
(ii) E |εi |t is bounded for sufficiently large t > 0.
(iii) g D is continuous on J . For random Xi ’s, their density function f is also
continuous on J .
(iv) The bandwidth associated with each explanatory variable, hj , satisfies hj ∝
n−1/(3r+2|d|) , 1 ≤ j ≤ r, where |d| is the maximum order of the derivative
vector d.
Assumption A1(i) is standard in the kernel regression literature. Assump-
tion A1(ii) is a sufficient condition required for the application of a strong ap-
proximation result that we invoke in Lemma S2.2 in the online Supplementary
Material, while Assumption A1(iii) assures requisite smoothness of f D and g D
(f is the design density). Note that the bandwidth rate in Assumption A1(iv)
is generally higher than the standard optimal rate n−1/(r+4) . However, this is
not surprising for our restricted problem. The optimal rate only guarantees the
convergence of our unrestricted function estimator g̃. But the restricted problem
also requires the convergence of the derivative g̃ D , which often needs a higher
bandwidth rate. In the single-predictor monotone regression problem consid-

ered in Hall and Huang (2001), this rate happens to coincide with the optimal
rate n−1/5 . Furthermore, when the bandwidths all share the same rate, one can
rescale each component of x to ensure a uniform bandwidth h ∝ n−1/(3r+2|d|) for
all components. This simplification is made without loss of generality. Thus we
∏
use hr rather than rj=1 hj for notational simplicity.
Theorem 2. Suppose that Assumption A1(i)−(iv) holds.
(i) If g D > 0 on J then, with probability 1, p̂ = 1/n for all sufficiently large n
and ĝ D (·|p̂) = g̃ D on J for all sufficiently large n. Hence, ĝ(·|p̂) = g̃ on J
for all sufficiently large n.
(ii) Suppose that g D > 0 except on an interior hyperplane subset X0 ⊂ J where
we have g D (x0 ) = 0, ∀x0 ∈ X0 . Also, for any x0 ∈ X0 , suppose that g D has
D
second order continuous derivatives in the neighborhood of x0 with ∂g (x0 ) =
∂ 2 gD ( |d|+(r+1)/2 ) ∂x
0 and ∂x∂xT (x0 ) nonsingular; then |ĝ(·|p̂) − g̃| = Op h uniformly
on J .
(iii) Under the conditions in (ii), there exist
( random
) variables Θ = Θ(n) and
Z1 = Z1 (n) ≥ 0 satisfying Θ = Op h|d|+r+1 and Z1 = Op (1), such that
ĝ(x|p̂) = (1+Θ)g̃(x) uniformly for x ∈ J with inf x0 ∈X0 |x−x0 | > Z1 h(r+1)/4 .
In Theorem 2, part (i) suggests that when the constraint is strictly satisfied
by the true function, the constrained estimator ĝ(·|p̂) and the unconstrained
estimator g̃ are essentially the same and thus share the same rate of convergence.
Part (ii) gives the order of difference between ĝ(·|p̂) and g̃ when g D = 0 on an
interior hyperplane. Note that the order in (ii) indicates a different convergence
rate of ĝ(·|p̂) from that of g̃ in such a case. Part (iii) is concerned with the
asymptotic behavior of the weights p̂ in such a case. Also note that the results
are easily extendable to the case of g D ≤ 0 with a switch of sign in g.
The proof of Theorem 2 is relegated to the online Supplementary Material.
Theorem 2 is the multivariate, multi-constraint, hyperplane subset generalization
of the univariate, single constraint, single point violation setting considered in
Hall and Huang (2001) having dispensed with probability weights and power
divergence distance measures of necessity. The theory in Hall and Huang (2001)
lays the foundation for the multivariate analogues in Theorem 2 and (iii) of the
rates in their univariate setting (Hall and Huang (2001, Thm. 4.3(c))).
A further issue with the imposition of constraints is the choice of the distance
metric used to select the optimal weights. Hall and Huang (2001) use the power
∑
divergence metric, Dρ (p) = ρ−1 (1 − ρ)−1 {n − ni=1 (npi )ρ } that depends on a
parameter ρ; they impose the condition that ρ lies between 0 and 1 for their
technical arguments. In a sense, if 0 ≤ ρ ≤ 2 then the l1 and l2 norms can be
viewed as limiting cases for the Hall and Huang analysis; see their equation (5.26).
However, a key difference here is the relative ease with which the constraints can
be implemented in practice if one forgoes power divergence and uses either a
linear or quadratic program to solve for the optimal weights. Furthermore, in
our proof of Theorem 3 the use of the l2 norm delivers simplifications that are
not available when using the power divergence metric. Additionally, under the
∑
condition ni=1 pi = 1, D2 (p) is equivalent to the l2 norm.
3.3. Asymptotic properties of D(p̂)

Let ψi (x) = AD
i (x)Yi , i = 1, . . . , n, so that our minimization problem is
∑
n ∑
n ∑
n
min (n−1 − pi )2 , s.t. pi = 1, pi ψi (x) ≥ 0, ∀x. (3.2)
pi ,...,pn
i=1 i=1 i=1
In practice, this can be carried out by taking a fine grid (x1 , . . . , xN ), N large,
and solving
∑
n ∑
n ∑
n
min (n−1 − pi )2 , s.t. pi = 1, pi ψi (xj ) ≥ 0, 1 ≤ j ≤ N. (3.3)
pi ,...,pn
i=1 i=1 i=1
Assumption A2.
(i) N → ∞ as n → ∞ and N = O(n).
(ii) If dN = inf 1≤j1 ,j2 ≤N |xj1 − xj2 |, then dN → 0 and h−1 dN → ∞.
Assumption A2(ii) requires that the minimum distance between grid points
decreases at a rate slower than h such that the correlation between derivative
estimates at these grid points is zero when n is sufficiently large.
Let p̂i , i = 1, . . . , n, solve (3.3). The proof of the following theorem is in
Section S3 of the online Supplemental Material.
Theorem 3. Suppose Assumptions A1(i)−(iv) and A2(i)−(ii) hold. Then, as
n → ∞, we have
n2 σ K
2
)2 D(p̂) ∼ χ (n),
2
(∑ (3.4)
h 2|d|+r M D ∗
j=1 g (xj )
∫ [ (d) ]2
where σK2 = σ2 K (y) dy, and {x∗1 , . . . , x∗M } ⊂ {x1 , . . . , xN } are the slack
points defined in Section S3 of the online Supplementary Material.
The diverging degrees of freedom in the asymptotic distribution here is of
no surprise since both the null and alternative hypotheses are nonparametric
and reside on infinite dimensional parameter spaces. A similar phenomenon was
observed by Fan, Zhang, and Zhang (2001) for their generalized likelihood ratio
test. Theoretically, the asymptotic distribution of D(p̂) in Theorem 3 can be
used in determining the p-value of our test. However, this may pose difficulties
in practice: the asymptotic distribution may not be a good approximation for
finite sample sizes; the normalizing constant in (3.4) requires the determination
of slack points. A bootstrap approach, like that proposed in Section 2.2, is an
alternative.
Under the power divergence metric, Carroll, Delaigle, and Hall (2011) showed
the consistency of the hypothesis test on monotonicity using Dρ (p̂) as the test
statistic, which implies consistency of the bootstrap version. A similar result
here consists of two parts:
(i) If the true function g satisfies the shape constraint, then as n → ∞,
P {D(p̂) ≤ nϵ} → 1 for all ϵ > 0. (3.5)
(ii) If the true function g does not satisfy the shape constraint on J , then
lim lim inf P {D(p̂) ≥ nϵ} = 1. (3.6)

ϵ→0 n→∞
In particular, (i) can be proved by a similar argument to that of (iii) of Theorem 2.

The proof of (ii) follow the steps listed in Carroll, Delaigle, and Hall (2011):
provide an almost-sure lower bound of an integrated distance between a given
constraint-violating function and all the shape-constrained functions; use this
to derive an almost-sure lower bound for the distance between the constrained
estimator ĝ(·|p̂) and the unconstrained estimator g̃; show that the latter distance
is of a lower order in probability than D(p̂). A formal treatment would be
interesting but lies beyond the scope of this paper.
Even if the true function g does not satisfy the shape constraints, our boot-
strap test procedure simulates resamples based on the constrained estimator
ĝ(·|p̂) that does satisfy them. For each resample with its constrained estimator
ĝ ∗ (·|p∗ ) we can show, similar to (3.5), that for all ϵ > 0, P {D(p∗ ) ≤ nϵ|Data} → 1
in probability. This implies that the empirical critical point ξˆα used in the test
satisfies P (ξˆα ≤ nϵ) → 1. Combining this with (3.6) yields the validity of our
bootstrap test procedure, that is, P {D(p̂) > ξˆα } → 1 as n → ∞.
4. Two Illustrations
In this section, we give two examples of the implementation of the quadratic
programming outlined above for our method. The first example incorporates
monotonicity or/and concavity in each dimension. The second example on global
concavity takes up a generalized version of the constraints in (2.2) that is more
challenging numerically. The constraints are enforced on the sample realizations,

but it is straightforward to also enforce them on non-sample regions if desired.
4.1. Coordinate-wise monotonicity and concavity

To impose coordinate-wise monotonicity and concavity at each xi , we use,
respectively, the constraints
∑
n
(s)
pj Aj (xi )Yj ≥ 0, for s ∈ S1 (4.1)
j=1
∑
n
(s)
pj Aj (xi )Yj ≤ 0, for s ∈ S2 , (4.2)
j=1
where S1 = {(1, 0, . . . , 0), (0, 1, . . . , 0), . . . , (0, 0, . . . , 1)} and S2 = {(2, 0, . . . , 0),
(0, 2, . . . , 0), . . . , (0, 0, . . . , 2)}. To enforce either (4.1) or (4.2) at all data points
consists of rn conditions in total, r the dimension of the covariates. Even with
thousands of observations, the constraints are easy to construct and implement
via quadratic programming procedures.
4.2. Global concavity

A popular, if challenging, shape constraint is global concavity in a multiple-
dimension setting. In this section we highlight some results from nonlinear pro-
gramming that can be utilized in such an implementation.
While there are a variety of ways to enforce concavity (convexity), we require
that the constraints be linear in p. And so using the Hessian matrix does not
fit into our framework. Instead, we use the Afriat condition (Afriat (1967)) that
states that a function g is (globally) concave if and only if
∂g
g(z) − g(x) ≤ (x)′ (z − x), ∀z, x.
∂x
To impose this first-order condition at a given point xi , our constraint set is
∑ n [ ]
∂Aj ′
pj (xi ) (xℓ −xi )−{Aj (xℓ )−Aj (xi )} Yj ≥ 0, ∀1 ≤ i ̸= ℓ ≤ n. (4.3)
∂x
j=1
Hence, enforcing concavity at all points results in n(n − 1) overall constraints.

Note that the number of constraints does not depend on the dimension r of the
covariates, indicating that the program scales well with respect to r. However,
the computational burden can be overwhelming when n is large.
To address such computational issues, generally, we briefly describe the con-
straint generation approach (Dantzig, Fulkerson, and Johnson (1954, 1959)) that
can be used to construct and enforce constraints more efficiently. Rather than
imposing concavity at all sample realizations, impose concavity on some sizable
subset of observations, and then check which observations do not satisfy global
concavity. Call this set V. Take observations from V and add them to the orig-
inal set of observations where concavity is enforced and re-solve the quadratic
program. Repeat the procedure until there is a subset of observations large
enough that imposing concavity on these points is sufficient to ensure concav-
ity at all points. The approach is widespread and an excellent example is in
Lee et al. (2012). A large literature on constraint complexity bounds for linear
and quadratic programming problems exists, see Potra and Wright (2000) for an
in-depth review.
For illustrative constraints we consider that the necessary (in)equalities are
linear in p, which can be solved using standard quadratic programming methods
and off-the-shelf software, using the quadprog package in R, for example.
5. Numerical Properties and an Application

We demonstrate the flexibility and simplicity of the proposed method through
a series of numerical studies, and provide a data set on which one imposes the
economic constraint known as ‘constant returns to scale’.
5.1. Visualization of bivariate estimates

In what follows we simulate data from two nonlinear bivariate relationships
and consider imposing a range of restrictions. We demonstrate the method by
imposing restrictions on the surface and also on its first and second partial deriva-
tives using the locally constant kernel estimator.
5.1.1. Visualization of bivariate estimates

Consider the bivariate surface
(√ )
sin 2
X1i + X2i2
Yi = √ + εi , i = 1, . . . , n, (5.1)
2 + X2
X1i 2i
where x1 and x2 are independent drew from the uniform [-5,5]. We draw n =
10, 000 observations from this DGP with ε ∼ N (0, σ 2 ) and σ = 0.1. The large
sample size speaks to the feasibility of the approach in moderate/large sample
settings. Our simulations with sample sizes in the hundreds resulted in reasonable
although slightly rougher estimates. Figure 1 shows the unrestricted regression
estimate with bandwidths chosen via least squares cross-validation.
We first imposed the constraint that the regression function lies in the range
[0, 0.5]. A plot of the restricted surface appears in Figure 2. We next imposed
Figure 1. Unrestricted kernel estimate of (5.1).
Figure 2. Restricted kernel estimate of (5.1) with the restriction 0 ≤ ĝ(x|p)

≤ 0.5.
the constraint that the first derivatives with respect to both x1 and x2 lie in the
range [-0.1,0.1]. A plot of the restricted surface appears in Figure 3.
Figure 3. Restricted kernel estimate of (5.1) with the restrictions −0.1 ≤

∂ĝ(x|p)/∂x1 ≤ 0.1, −0.1 ≤ ∂ĝ(x|p)/∂x2 ≤ 0.1.
5.1.2. Imposing true constraints on a shape-constrained function

Consider then the bivariate surface
Yi = (X1i X2i )0.4 + εi , i = 1, . . . , n, (5.2)
where x1 and x2 are independent draws from the uniform [1,10]. Figure 4 shows
the surface. We drew n = 100 observations with ε ∼ N (0, σ 2 ) and σ = 0.7. This
DGP is positive, monotonic in both x1 and x2 , globally concave, symmetric in
x1 and x2 , and homogeneous of degree 0.8. Any or all of these constraints could
be imposed on the estimated surface. Here we imposed negativity of the second
derivatives of both x1 and x2 on a grid of 250 points equally spaced over the
support of X.
Figure 5 presents the unrestricted local constant kernel regression estimate
with bandwidths chosen via least squares cross-validation. Figure 6. presents the
restricted local constant kernel regression estimates. Here the bumps that were
present in the unrestricted estimator have been removed by the enforcement of
the constraints.
5.2. Inference with the constrained estimator

5.2.1. Testing Inequality Restrictions
Consider testing the inequality restriction H0 : g(x) ≥ 0 versus H1 : g(x) < 0
when
Yi = g(Xi ) + εi = Xi + εi , (5.3)
Figure 4. Actual unknown DGP in (5.2).
Figure 5. Unrestricted kernel estimate of (5.2).
where Xi is uniform [−a, 4 − a], a is a parameter that determines whether the

constraint is binding (a > 0) or not (a = 0) and the length of interval over which
the constraint binds, and ε ∼ N (0, 1/4).
We constructed power curves based on M = 1, 000 Monte Carlo replications,
and we computed B = 99 bootstrap replications. The power curves correspond-
ing to α = 0.05 appear in Figure 7. This reveals that the empirical rejection
frequencies are in line with nominal size while power increases with n.
Figure 6. Restricted kernel estimate of (5.2) with the restrictions

∂ 2 ĝ(x|p)/∂x21 ≤ 0, ∂ 2 ĝ(x|p)/∂x22 ≤ 0.
Figure 7. Power curves for α = 0.05 for sample sizes n = (25, 50, 75, 100)
based upon the DGP given in (5.3). The solid horizontal line represents the
test’s nominal level (α).
It is known (Andrews (2000)) that standard bootstrap inference procedures

may not be consistent when inequality constraints are involved, Galindo-Garre
and Vermunt (2004). For example, one approach for dealing with size distor-
tions in such instances is the ‘double-bootstrap’ (van de Schoot, Hoijtink, and
Deković (2010)). Carroll, Delaigle, and Hall (2011) use calibration, not a double-
bootstrap, and attain excellent finite sample properties for their bootstrap in-
equality test, while Andrews (2000) suggests a number of alternatives including
subsampling (Politis and Romano (1994)), among others. Though there are no
discernible size distortions present in the power curve summarized in Figure 7, it
is prudent to verify results based on the simple bootstrap with these alternatives.
5.2.2. Testing equality restrictions

We consider testing the restriction that a nonparametric model g(x) has a
specific parametric functional form. Starting with
2
Yi = g(Xi1 , Xi2 ) + εi = 1 + Xi1 + Xi2 + εi ,
where the Xij , j = 1, 2, are uniform [−2, 2] and ε ∼ N (0, 1/2).

When we generated data from this DGP and imposed the correct model as
a restriction we could assess the test’s rejection frequencies under H0 , and when
we generated data from this DGP and imposed an incorrect model that is in fact
linear in variables we could assess the test’s power.
We conducted M = 1, 000 Monte Carlo replications from our DGP, and took
B = 99 bootstrap replications. Results are in Table 1 in the form of empirical
rejection frequencies for nominal size α = (0.10, 0.05, 0.01), for samples of size
n = (25, 50, 75, 100, 200). Table 1 indicates that the tests’ empirical rejection
frequency appears to be in line with nominal size while power increases with n.
5.3. An empirical assessment of implementation issues

We conducted simulations to gauge several implementation issues for the
constrained estimator. We focused on how long it takes the quadratic program
to solve the problem based on a fixed grid of points, how the method performs
across different bandwidths, and how the method performs across alternative
nonparametric methods. For all simulations we took n = (100, 200, 300, 400, 500)
observations drawn from the DGP in Lee et al. (2012),
Yi = (Xi1 Xi2 Xi3 Xi4 )0.2 + εi , (5.4)
where X·j ∼ U[1, 10] for 1 ≤ j ≤ 4 and ε is normal with mean zero and variance
0.49. For the simulations we focused on enforcing coordinate-wise monotonicity.
Table 1. Test for correct parametric functional form. Values are empirical
rejection frequencies over the M = 1, 000 Monte Carlo replications.
n α = 0.10 α = 0.05 α = 0.01
Size
25 0.100 0.049 0.010
50 0.074 0.043 0.011
75 0.086 0.034 0.008
100 0.069 0.031 0.006
200 0.093 0.044 0.007
Power
25 0.391 0.246 0.112
50 0.820 0.665 0.356
75 0.887 0.802 0.590
100 0.923 0.849 0.669
200 0.987 0.970 0.903
Table 2. Performance of imposing constraints on a rough grid in a 4-

dimension space. The numbers are the median percentages of the obser-
vations that ended up satisfying the constraints and the median execution
times for the quadratic program solver in seconds.
n % satisfied Time (secs)
100 99.00 0.42
200 99.50 0.64
300 99.42 0.93
400 99.38 1.40
500 99.28 1.97
5.3.1. Enforcing constraints over a grid versus sample realizations

Table 2 gives the percentage of observations for which the constraints were
violated when we enforced the constraints over an equi-spaced grid of 5 points
on [1, 10] in each dimension (a total of 54 = 625 grid points). We give me-
dian run times and the median percentage of observations where the constraints
were satisfied, using the optimal weights determined by the grid points. In all
the simulations the percentage of observations satisfying monotonicity was over
99%, suggesting that imposing constraints on a rough grid we can still achieve
monotonicity almost everywhere.
5.3.2. An assessment of bandwidth selection

We used least squares cross-validation to determine optimal bandwidths,
then used these bandwidths, cross-validated bandwidths divided by 2, and cross-
validated bandwidths multiplied by 2,d and enforced the constraints. We com-
pared performance of the estimators based on the median ratio of average squared
error (ASE) taken at the observations.
Table 3. Bandwidth selection performance. The values are median ratios

of ASEs for the estimators with cross-validated smoothing (ASE2 ), under-
smoothing (ASE3 ), and oversmoothing (ASE4 ) (ASE1 is unrestricted with
cross-validated smoothing). Numbers greater than one indicate superior
performance of the restricted cross-validated estimator.
n ASE1 /ASE2 ASE3 /ASE2 ASE4 /ASE2
100 1.30 2.23 2.34
200 1.28 2.44 2.99
300 1.31 2.58 3.32
400 1.31 2.65 3.57
500 1.29 2.76 3.58
Table 4. Comparison of kernel versus B-spline and unrestricted versus re-

stricted estimation. The numbers are the median ratios of the ASEs: unre-
stricted (ASE1 ) and restricted (ASE3 ) local constant kernel methods; un-
restricted (ASE2 ) and restricted (ASE4 ) B-splines.
n ASE1 /ASE3 ASE2 /ASE4 ASE3 /ASE4
100 1.30 1.11 1.28
200 1.28 1.00 1.18
300 1.31 1.03 1.23
400 1.31 1.01 1.25
500 1.29 1.02 1.34
As seen in Table 3, the constrained estimator with cross-validated band-

widths outperforms both the undersmoothed and the oversmoothed constrained
estimator. We also see that generically imposing valid constraints results in im-
proved in-sample fit of the unknown function.
5.3.3. Implementation with alternative smoothers

While the theory is provided for kernel estimators, the estimation proce-
dure constrained by (2.2) is applicable to any (local) linear smoother. Here we
compare implementations involving local constant kernel regression and involv-
ing B-splines (both with cross-validated smoothing parameter selection), and
compare their ASEs. Table 4 gives the median ratios of the ASEs, comparing
the restricted and the unrestricted estimators for each method against the con-
strained estimators enforcing monotonicity in each dimension. As expected, the
restricted methods outperformed the unrestricted methods. Moreover, it appears
that the unrestricted regression B-splines satisfy the constraints more often as
the sample size increases (note that ASE2 /ASE4 ≈ 1). We also see that the
regression B-splines outperformed the local kernel methods.
Table 5. Summary Statistics for the Data

Variable Mean StdDev
log(rice) 6.9170 0.9144
log(seed) 2.4534 0.9295
log(urea) 4.0144 1.1039
log(TSP) 2.7470 1.4093
log(labor) 5.6835 0.8588
log(land) -1.1490 0.9073
5.4. Application: Imposing constant returns to scale for indonesian

rice farmers
We consider a data set studied by Horrace and Schmidt (2000) who analyzed
technical efficiency for Indonesian rice farms. We examine the issue of returns
to scale, focusing on one growing season’s worth of data, in 1977, acknowledged
to be a particularly wet season. 171 farmers were selected from six villages in
the rice production area of the Cimanuk River Basin in West Java by the Center
for Agro Economic Research, Ministry of Agriculture, Indonesia. Output was
measured as kilograms (kg) of rice produced, with inputs of seed (kg), urea (kg),
trisodium phosphate (TSP) (kg), labour (hours), and land (hectares). Table 5
presents several summary statistics for the data. We use log transformations
throughout.
Of interest here is whether or not the production technology exhibits constant
returns to scale, whether or not the sum of the first order partial derivatives
is one. Constant returns to scale implies that output increases by exactly the
amount that all the inputs are increased if all the inputs are doubled, then output
doubles. Given the primitive nature of the production of rice, one expects the
existence of constant returns to scale to be present in the underlying technology.
The constraint set for imposing returns to scale can be written as
∑
5
∂ĝ(xi )
xik = 1, ∀i. (5.5)
∂xik
k=1
We impose this constraint for each observation, as opposed to over a grid. This
results in n = 171 total constraints.
We estimate the production function using a nonparametric local linear esti-
mator with least squares cross-validated bandwidth selection. Figure 8 presents
the unrestricted and restricted partial derivative sums for each observation (i.e.,
farm), where the restriction is that the sum of the partial derivatives equals one.
The horizontal line represents the restricted partial derivative sum (1.00) and the
points represent the unrestricted sums for each farm. An examination of Figure
8 reveals that the estimated returns to scale lie in the interval [0.98, 1.045].
Figure 8. The sum of the partial derivatives for observation i appear on the
vertical axis, and each observation appears on the horizontal axis.
In order to test whether the restriction is valid we apply the test outlined in
Section 2.2. We conducted B = 999 bootstrap replications and tested the null
that the technology exhibits constant returns to scale. The empirical P value is
PB = 0.122, hence we fail to reject the null at all conventional levels. We are
encouraged by this nonparametric application as it involves a fairly large number
of predictors (five) and a fairly small number of observations (n = 171).
6. Discussion
We present a framework for imposing and testing the validity of conventional
constraints on the partial derivatives of a multivariate nonparametric kernel re-
gression function. The proposed approach covers imposing monotonicity and
concavity while delivering a seamless framework for general restricted nonpara-
metric kernel estimation and inference. Simulations are run and the method is
applied to a data set. An open implementation in the R language (R Core Team
(2012)) is available from the authors.
We note that our procedure is valid for a range of kernel estimators as well
as for estimation and testing in the presence of categorical data. Our constrained
smoothing approach can be used in a wide variety of settings. Future work on the
theoretical side could focus on the importance of the choice of distance metric, the
asymptotic behavior of the bootstrap testing procedure, and the relative merits
of the alternative data tilting methods that exists. These are subjects for future
research.
Acknowledgements
We thank an associate editor and the referees for their insightful comments
that have significantly improved the paper. We would also like to thank but
not implicate Daniel Wikström for inspiring conversations and Li-Shan Huang
and Peter Hall for their insightful comments and suggestions. The research of
Du is supported by NSF DMS-1007126. Racine would like to gratefully acknowl-
edge support from Natural Sciences and Engineering Research Council of Canada
(www.nserc.ca), the Social Sciences and Humanities Research Council of Canada
(www.sshrc.ca), and the Shared Hierarchical Academic Research Computing Net-
work (www.sharcnet.ca).
References
Afriat, S. N. (1967). The construction of utility functions from expenditure data. Internat.
Econom. Rev. 8, 67-77.
Andrews, D. K. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of
the parameter space. Econometrica 68, 399-405.
Birke, M. and Dette, H. (2007). Estimating a convex function in nonparametric regression.
Scand. J. Statist. 34, 384-404.
Birke, M. and Pilz, K. F. (2009). Nonparametric option pricing with no-arbitrage constraints.
J. Finan. Econom. 7, 53-76.
Braun, W. J. and Hall, P. (2001). Data sharpening for nonparametric inference subject to
constraints. J. Comput. Graph. Statist. 10, 786-806.
Carroll, R. J., Delaigle, A. and Hall, P. (2011). Testing and estimating shape-constrained non-
parametric density and regression in the presence of measurement error. J. Amer. Statist.
Assoc. 106, 191-202.
Cressie, N. A. C. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist.
Soc. Ser. B 46, 440-464.
Csörgő, M. and Révész, P. (1981). Strong Approximations in Probability and Statistics. Academic
Press, New York.
Dantzig, G. B., Fulkerson, D. R. and Johnson, S. M. (1954). Solution of a large-scale traveling-
salesman problem. Operations Research 2, 393-410.
Dantzig, G. B., Fulkerson, D. R. and Johnson, S. M. (1959). On a linear-programming combi-
natorial approach to the traveling-salesman problem. Operations Research 7, 58-66.
Dette, H. and Scheder, R. (2006). Strictly monotone and smooth nonparametric regression for
two or more variables. Canad. J. Statist. 34, 535-561.
Fan, J. (1992). Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 87, 998-1004.
Fan, J., Zhang, C. and Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phe-
nomenon. Ann. Statist. 29, 153-193.
Galindo-Garre, F. and Vermunt, J. K. (2004). The order-restricted association model: two
estimation algorithms and issues in testing. Psychometrika 69, 641-654.
Gallant, A. R. (1981). On the bias in flexible functional forms and an essential unbiased form:
The fourier flexible form. J. Econometrics 15, 211-245.
Gallant, A. R. (1982). Unbiased determination of production technologies. J. Econometrics 20,
285-323.
Gallant, A. R. and Golub, G. H. (1984). Imposing curvature restrictions on flexible functional
forms. J. Econometrics 26, 295-321.
Gasser, T. and Müller, H.-G. (1979). Kernel estimation of regression functions. In Smoothing
Techniques for Curve Estimation, 23-68. Springer-Verlag, New York.
Gasser, T. and Müller, H.-G. (1984). Estimating regression functions and their derivatives by
the kernel method. Scand. J. Statist. 11, 171-185.
Ghosal, S., Sen, A. and van der Vaart, A. W. (2000). Testing monotonicity of regression. Ann.
Statist. 28, 1054-1082.
Hall, P. and Heckman, N. E. (2000). Testing monotonicity of a regression mean by calibrating
for linear functionals. Ann. Statist. 28, 20-39.
Hall, P. and Huang, H. (2001). Nonparametric kernel regression subject to monotonicity con-
straints. Ann. Statist. 29, 624-647.
Hall, P., Huang, H., Gifford, J. and Gijbels, I. (2001). Nonparametric estimation of hazard rate
under the constraint of monotonicity. J. Comput. Graph. Statist. 10, 592-614.
Hall, P. and Kang, K. H. (2005). Unimodal kernel density estimation by data sharpening. Statist.
Sinica 15, 73-98.
Horrace, W. and Schmidt, P. (2000). Multiple comparisons with the best, with economic appli-
cations. J. Appl. Econometrics 15, 1-26.
Komlós, J., Major, P. and Tusnády, G. (1975). An approximation of partial sums of independent
random variables and the sample distribution function, part I. Z. Wahrsch. Verw. Gebiete
32, 111-131.
Lee, C.-Y., Johnson, A. L., Moreno-Ceteno, E. and Kuosmanen, T. (2012). A more efficient al-
gorithm for convex nonparametric least squares. Texas A&M University Technical Report.
Mammen, E. (1991). Estimating a smooth monotone regression function. Ann. Statist. 19, 724-
740.
Mammen, E., Marron, J. S., Turlach, B. A., and Wand, M. P. (2001). A general projection
framework for constrained smoothing. Statist. Sci. 16, 232-248.
Mammen, E. and Thomas-Agnam, C. (1999). Smoothing splines and shape restrictions. Scand.
J. Statist. 26, 239-252.
Matzkin, R. L. (1991). Semiparametric estimation of monotone and concave utility functions
for polychotomous choice models. Econometrica 59, 1315-1327.
Matzkin, R. L. (1992). Nonparametric and distribution-free estimation of the binary choice and
the threshold-crossing models. Econometrica 60, 239-270.
Mukerjee, H. (1988). Monotone nonparametric regression. Ann. Statist. 16, 741-750.
Nadaraya, E. A. (1965). On nonparametric estimates of density functions and regression curves.
Theory Probab. Appl. 10, 186-190.
Pal, J. K. and Woodroofe, M. (2007). Large sample properties of shape restricted regression
estimators with smoothness adjustments. Statist. Sinica 17, 1601-1616.
Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples
under minimal assumptions. Ann. Statist. 22, 2031-2050.
Potra, F. A. and Wright, S. J. (2000). Interior-point methods. J. Comput. Appl. Math. 124,
281-302.
Priestley, M. B. and Chao, M. T. (1972). Nonparametric function fitting. J. Roy. Statist. Soc.
Ser. B 34, 385-392.
R Core Team (2012). R: A Language and Environment for Statistical Computing. Vienna, Aus-
tria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
Racine, J. S. and Li, Q. (2004). Nonparametric estimation of regression functions with both
categorical and continuous data. J. Econometrics 119, 99-130.
Ramsay, J. O. (1988). Monotone regression splines in action (with comments). Statist. Sci. 3,
425-461.
Robertson, T., Wright, F. and Dykstra, R. (1988). Order Restricted Statistical Inference. John
Wiley, New York.
Shao, X. (2010). The dependent wild bootstrap. J. Amer. Statist. Assoc. 105, 218-235.
van de Schoot, R., Hoijtink, H. and Deković, M. (2010). Testing inequality constrained hypothe-
ses in SEM models. Structural Equation Modeling 17, 443-463.
Villalobos, M. and Wahba, G. (1987). Inequality-constrained multivariate smoothing splines
with application to the estimation of posterior probabilities. J. Amer. Statist. Assoc. 82,
239-248.
Wang, X. and Shen, J. (2010). A class of grouped Brunk estimators and penalized spline esti-
mators for monotone regression. Biometrika 97, 585-601.
Watson, G. S. (1964). Smooth regression analysis. Sankhyā 26:15, 175-184.
Wright, I. W. and Wegman, E. J. (1980). Isotonic, convex and related splines. Ann. Statist. 8,
1023-1035.
Xu, K.-L. and Phillips, P. C. B. (2012). Tilted nonparametric estimation of volatility functions
with empirical applications. J. Bus. Econom. Statist. 29, 518-528.
Yatchew, A. and Härdle, W. (2006). Nonparametric state price density estimation using con-
strained least squares and the bootstrap. J. Econometrics 133, 579-599.
Department of Statistics, Virginia Tech, Blacksburg, VA 24061, U.S.A.

E-mail: pangdu@vt.edu
Department of Economics, University of Miami, Coral Gables, FL 33124, U.S.A.
E-mail: cparmeter@bus.miami.edu
Department of Economics, Department of Mathematics and Statistics (Graduate Program in
Statistics), McMaster University, Hamilton, Ontario L8S 4M4, Canada.
E-mail: racinej@mcmaster.ca
(Received January 2012; accepted September 2012)

Nonparametric Kernel Regression With Multiple Predictors and Multiple Shape Constraints

Uploaded by

Copyright:

Available Formats

Nonparametric Kernel Regression With Multiple Predictors and Multiple Shape Constraints

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Nonparametric Kernel Regression With Multiple Predictors and Multiple Shape Constraints

Uploaded by

Copyright:

Available Formats

Statistica Sinica 23 (2013), 1347-1371

NONPARAMETRIC KERNEL REGRESSION WITH

Pang Du, Christopher F. Parmeter and Jeﬀrey S. Racine

Virginia Tech, University of Miami and McMaster University

Abstract: Nonparametric smoothing under shape constraints has recently received

(2012) where positivity is imposed on the estimation of a conditional covariance

can be readily implemented using standard sequential quadratic programming

2. A Constrained Kernel Regression Estimator

2.1. The estimator

l(x) ≤ ĝ (s) (x) ≤ u(x) (2.1)

for arbitrary l(·), u(·), and s. For some applications, s = (0, . . . , 0, 1, 0, . . . , 0)

The two-sided constraint (2.1) can be considered as a special case of multiple

2.2. Hypothesis testing for shape constraints

We note three situations here that can occur in practice.

If one encounters (i) in practice, D(p̂) = 0, we recommend following the advice

3. Theoretical Properties of the Estimator and Test Statistic

3.1. Existence of the constrained estimator

3.2. Consistency of the constrained estimator

bandwidth rate. In the single-predictor monotone regression problem consid-

3.3. Asymptotic properties of D(p̂)

P {D(p̂) ≤ nϵ} → 1 for all ϵ > 0. (3.5)

lim lim inf P {D(p̂) ≥ nϵ} = 1. (3.6)

In particular, (i) can be proved by a similar argument to that of (iii) of Theorem 2.

challenging numerically. The constraints are enforced on the sample realizations,

4.1. Coordinate-wise monotonicity and concavity

4.2. Global concavity

Hence, enforcing concavity at all points results in n(n − 1) overall constraints.

5. Numerical Properties and an Application

5.1. Visualization of bivariate estimates

5.1.1. Visualization of bivariate estimates

Figure 1. Unrestricted kernel estimate of (5.1).

Figure 2. Restricted kernel estimate of (5.1) with the restriction 0 ≤ ĝ(x|p)

Figure 3. Restricted kernel estimate of (5.1) with the restrictions −0.1 ≤

5.1.2. Imposing true constraints on a shape-constrained function

Yi = (X1i X2i )0.4 + εi , i = 1, . . . , n, (5.2)

5.2. Inference with the constrained estimator

Figure 4. Actual unknown DGP in (5.2).

Figure 5. Unrestricted kernel estimate of (5.2).

where Xi is uniform [−a, 4 − a], a is a parameter that determines whether the

Figure 6. Restricted kernel estimate of (5.2) with the restrictions

It is known (Andrews (2000)) that standard bootstrap inference procedures

5.2.2. Testing equality restrictions

where the Xij , j = 1, 2, are uniform [−2, 2] and ε ∼ N (0, 1/2).

5.3. An empirical assessment of implementation issues

Yi = (Xi1 Xi2 Xi3 Xi4 )0.2 + εi , (5.4)

Table 2. Performance of imposing constraints on a rough grid in a 4-

5.3.1. Enforcing constraints over a grid versus sample realizations

5.3.2. An assessment of bandwidth selection

Table 3. Bandwidth selection performance. The values are median ratios

Table 4. Comparison of kernel versus B-spline and unrestricted versus re-

As seen in Table 3, the constrained estimator with cross-validated band-

5.3.3. Implementation with alternative smoothers

Table 5. Summary Statistics for the Data

5.4. Application: Imposing constant returns to scale for indonesian

Department of Statistics, Virginia Tech, Blacksburg, VA 24061, U.S.A.

(Received January 2012; accepted September 2012)

You might also like