Nonparametric Kernel Regression With Multiple Predictors and Multiple Shape Constraints
Nonparametric Kernel Regression With Multiple Predictors and Multiple Shape Constraints
Nonparametric Kernel Regression With Multiple Predictors and Multiple Shape Constraints
doi:http://dx.doi.org/10.5705/ss.2012.024
Key words and phrases: Hypothesis testing, multivariate kernel estimation, non-
parametric regression, shape restrictions.
1. Introduction
Imposing shape constraints on a regression model is a necessary component
of sound applied data analysis. For example, imposing monotonicity and con-
cavity constraints is often required in a range of application domains. Early
developments in the absence of smoothness restrictions can be found in Robert-
son, Wright, and Dykstra (1988) and the references therein. When data are
believed to have nonlinear structure in addition to obeying shape constraints,
nonparametric smoothing methods such as kernel smoothing and splines are of-
ten used. For example, restricted kernel smoothing has been considered by Muk-
erjee (1988), Mammen (1991), Hall and Huang (2001), Braun and Hall (2001),
Hall and Kang (2005), Birke and Dette (2007), and Carroll, Delaigle, and Hall
(2011), among others; restricted spline smoothing has been studied by Wright
and Wegman (1980), Ramsay (1988), Mammen and Thomas-Agnam (1999), Pal
and Woodroofe (2007), and Wang and Shen (2010), to name but a few. An ex-
ample of restricted estimation by other smoothing methods is Xu and Phillips
1348 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
generalization of Carroll, Delaigle, and Hall (2011) where the monotonicity con-
straint is replaced by a general shape constraint with the complication of mea-
surement errors. Similar to Hall and Huang (2001), our estimator is constructed
by introducing weights for each response data point that can dampen or magnify
the impact of any observation. In order to deliver an estimate satisfying the
shape constraints, the weights are selected to minimize their distance from the
uniform weights of the unconstrained estimator while obeying the constraints. In
Hall and Huang (2001), this distance was measured by a power divergence met-
ric introduced in Cressie and Read (1984); this has a rather complicated form
and is hard to generalize. Instead, we resort to the well-known l2 -metric that is
much simpler and has all the desired properties of the power divergence metric
(Theorem 1). Under certain conditions, we generalize the consistency results of
Hall and Huang (2001, Thm. 4.3) to our multivariate and multiple constraint
setting. In essence, when the shape constraints are non-binding (i.e., strictly
satisfied) on the domain, the restricted estimator is asymptotically and numer-
ically equivalent to the unrestricted estimator. When the shape constraints are
non-binding everywhere except for a certain area of measure 0, the restricted and
unrestricted estimators are still very close to each other except in a neighborhood
of the binding area.
Besides the multivariate and multi-constraint extension, we also propose a
bootstrap procedure to test the validity of the shape constraints. Inference for
shape restrictions in nonparametric regression settings has attracted much at-
tention in the past decade. For example, Hall and Heckman (2000) developed
a bootstrap test for monotonicity whose test statistic is formulated around the
slope estimates of linear regression models fitted over small intervals. Ghosal,
Sen, and van der Vaart (2000) introduced a statistic based on a U-process measur-
ing the discordance between predictor and response. Both of these approaches
are specifically designed for monotonicity constraints and it is difficult to ex-
tend them to our multivariate and multi-constraint setting. Yatchew and Härdle
(2006) employed a residual-based test to check for monotonicity and convexity
simultaneously in a regression setting, but their result is for univariate functions
and requires that the constraints be strictly satisfied. Our bootstrap procedure
originates from Hall et al. (2001), which tested the monotonicity of hazard func-
tions but provided no theoretical justification for the procedure. Although the
test statistic is difficult to analyze asymptotically, we are able to provide asymp-
totic results for its implementation when shape constraints are satisfied on a
sufficiently dense grid of points. The derivation of our result takes advantage of
the simple form of the l2 metric that may not be easily available for the power
divergence metric, a promising artifact of the l2 metric we adopt. Another ap-
pealing aspect of our method is that it only involves quadratic programming and
1350 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
where T is the number of restrictions and, in each restriction, the sum is taken
over all vectors in Sk that correspond to the constraints, with αs,k a set of
constants used to generate them. Note that (2.2) could be further generalized
to contain more sophisticated constraints such as global concavity/convexity or
homogeneity of degree R (Euler’s theorem) by allowing the αs,k to be functions
of the covariates. See Section 4.2 for one such generalization. In what follows we
presume, without loss of generality, that for all s, αs,k ≥ 0 and ck (x) ≡ 0, since
the ck (x)’s are known functions. The approach we describe is quite general; it
admits arbitrary combinations of constraints subject to the obvious caveat that
the constraints must be internally consistent.
Standard kernel regression smoothers can be written as linear combinations
of the response Yi ,
∑n
ĝ(x) = Ai (x)Yi , (2.3)
i=1
where Ai (x) is a local weighting matrix. This includes the Nadaraya-Watson esti-
mator (Nadaraya (1965),Watson (1964)), the Priestley-Chao estimator (Priestley
and Chao (1972)), the Gasser-Müller estimator (Gasser and Müller (1979)), and
the local polynomial estimator (Fan (1992)), among others. Following Hall and
Huang (2001), we consider a generalization of (2.3) to
∑
n
ĝ(x|p) = pi Ai (x)Yi , (2.4)
i=1
∑ (s)
and where ĝ (s) (x|p) = ni=1 pi Ai (x)Yi .
As an example, we use (2.4) to generate an unrestricted Nadaraya-Watson es-
∑
timator. Here we take pi = 1/n, i = 1, . . . , n, and set Ai (x) = nKh (Xi , x)/ nj=1
Kh (Xj , x), where Kh (·) is a product kernel and h is a vector of bandwidths; see
Racine and Li (2004) for details. When pi ̸= 1/n for some i, we have a restricted
Nadaraya-Watson estimator; the selection of p satisfying particular restrictions
is discussed below.
Let pu be the n-vector of uniform weights and let p be the vector of
weights to be selected. To impose our constraints, we choose p to minimize
some distance measure from p to pu as proposed by Hall and Huang (2001).
1352 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
Whereas Hall and Huang (2001) consider probability weights and distance mea-
sures suitable for probability weights (e.g.,
∑ Hellinger), we allow for both positive
and negative weights while retaining i pi = 1, and so require alternative dis-
tance measures.
We also forgo the power divergence metric of Cressie and Read (1984) that
was used by Hall and Huang (2001) since it is only valid for probability weights.
Instead we use the l2 metric D(p) = (pu − p)′ (pu − p) that has a number of
appealing features in this context, as will be seen. Our problem then is to select
weights p that minimize D(p) subject to l(x) ≤ ĝ (s) (x|p) ≤ u(x), and perhaps
additional constraints of a similar form; this can be cast as a general nonlinear
programming problem. Theoretical underpinnings of the constrained estimator
are provided in Theorems 1 and 2 in Section 3. The explicit form of the quadratic
programming problem is presented right before Theorem 3 in Section 3, where
such form is needed for the theoretical development. Section 4 describes setup
and implementation details for two (common) types of constraints: coordinate-
wise monotonicity/concavity constraints and global concavity.
1 ∑
B
PB = 1 − F̂ (D(p̂)) = I(D(p∗ ) > D(p̂)),
B
j=1
where I(·) is the indicator function and F̂ (D(p̂)) is the empirical distribution
function of the bootstrap statistics; one rejects the null hypothesis if PB is less
than α, the level of the test.
KERNEL REGRESSION WITH SHAPE CONSTRAINTS 1353
Theorem 1. Assume that the set {1, . . . , n} contains a sequence {i1 , . . . , ik } with
the following properties.
(i) For each k, ADik (x) is strictly positive and continuous on an open set Oik ⊂
Rr , and vanishes on Rr \ Oik ,
(ii) Every x ∈ J is contained in at least one open set Oik ,
(iii) For 1 ≤ i ≤ n, AD
ik (x) is continuous on (−∞, ∞) .
r
Then there exists a vector p = (p1 , . . . , pn ) such that the constraints are satisfied
for all x ∈ J .
1354 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
Conditions (i) and (ii) are to ensure the existence of an open cover of the
domain J by the open sets Oi on which AD i is positively supported for some i. We
note that the above conditions are sufficient but not necessary for the existence
of a set of weights that satisfy the constraints for all x ∈ J . For example, if
sgn AD D
jn (x) = 1 ∀x ∈ J for some sequence jn in {1, . . . , n} and sgn Aln (x) = −1
∀x ∈ J for another sequence ln in {1, . . . , n}, then for those observations that
switch signs, pi may be set equal to zero, while pjn > 0 and pln < 0 is sufficient
to ensure existence of a set of p’s satisfying the constraints.
viewed as limiting cases for the Hall and Huang analysis; see their equation (5.26).
However, a key difference here is the relative ease with which the constraints can
be implemented in practice if one forgoes power divergence and uses either a
linear or quadratic program to solve for the optimal weights. Furthermore, in
our proof of Theorem 3 the use of the l2 norm delivers simplifications that are
not available when using the power divergence metric. Additionally, under the
∑
condition ni=1 pi = 1, D2 (p) is equivalent to the l2 norm.
∑
n ∑
n ∑
n
min (n−1 − pi )2 , s.t. pi = 1, pi ψi (x) ≥ 0, ∀x. (3.2)
pi ,...,pn
i=1 i=1 i=1
In practice, this can be carried out by taking a fine grid (x1 , . . . , xN ), N large,
and solving
∑
n ∑
n ∑
n
min (n−1 − pi )2 , s.t. pi = 1, pi ψi (xj ) ≥ 0, 1 ≤ j ≤ N. (3.3)
pi ,...,pn
i=1 i=1 i=1
Assumption A2.
(i) N → ∞ as n → ∞ and N = O(n).
(ii) If dN = inf 1≤j1 ,j2 ≤N |xj1 − xj2 |, then dN → 0 and h−1 dN → ∞.
Assumption A2(ii) requires that the minimum distance between grid points
decreases at a rate slower than h such that the correlation between derivative
estimates at these grid points is zero when n is sufficiently large.
Let p̂i , i = 1, . . . , n, solve (3.3). The proof of the following theorem is in
Section S3 of the online Supplemental Material.
Theorem 3. Suppose Assumptions A1(i)−(iv) and A2(i)−(ii) hold. Then, as
n → ∞, we have
n2 σ K
2
)2 D(p̂) ∼ χ (n),
2
(∑ (3.4)
h 2|d|+r M D ∗
j=1 g (xj )
∫ [ (d) ]2
where σK2 = σ2 K (y) dy, and {x∗1 , . . . , x∗M } ⊂ {x1 , . . . , xN } are the slack
points defined in Section S3 of the online Supplementary Material.
The diverging degrees of freedom in the asymptotic distribution here is of
no surprise since both the null and alternative hypotheses are nonparametric
and reside on infinite dimensional parameter spaces. A similar phenomenon was
KERNEL REGRESSION WITH SHAPE CONSTRAINTS 1357
observed by Fan, Zhang, and Zhang (2001) for their generalized likelihood ratio
test. Theoretically, the asymptotic distribution of D(p̂) in Theorem 3 can be
used in determining the p-value of our test. However, this may pose difficulties
in practice: the asymptotic distribution may not be a good approximation for
finite sample sizes; the normalizing constant in (3.4) requires the determination
of slack points. A bootstrap approach, like that proposed in Section 2.2, is an
alternative.
Under the power divergence metric, Carroll, Delaigle, and Hall (2011) showed
the consistency of the hypothesis test on monotonicity using Dρ (p̂) as the test
statistic, which implies consistency of the bootstrap version. A similar result
here consists of two parts:
(i) If the true function g satisfies the shape constraint, then as n → ∞,
(ii) If the true function g does not satisfy the shape constraint on J , then
4. Two Illustrations
In this section, we give two examples of the implementation of the quadratic
programming outlined above for our method. The first example incorporates
monotonicity or/and concavity in each dimension. The second example on global
concavity takes up a generalized version of the constraints in (2.2) that is more
1358 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
where S1 = {(1, 0, . . . , 0), (0, 1, . . . , 0), . . . , (0, 0, . . . , 1)} and S2 = {(2, 0, . . . , 0),
(0, 2, . . . , 0), . . . , (0, 0, . . . , 2)}. To enforce either (4.1) or (4.2) at all data points
consists of rn conditions in total, r the dimension of the covariates. Even with
thousands of observations, the constraints are easy to construct and implement
via quadratic programming procedures.
can be used to construct and enforce constraints more efficiently. Rather than
imposing concavity at all sample realizations, impose concavity on some sizable
subset of observations, and then check which observations do not satisfy global
concavity. Call this set V. Take observations from V and add them to the orig-
inal set of observations where concavity is enforced and re-solve the quadratic
program. Repeat the procedure until there is a subset of observations large
enough that imposing concavity on these points is sufficient to ensure concav-
ity at all points. The approach is widespread and an excellent example is in
Lee et al. (2012). A large literature on constraint complexity bounds for linear
and quadratic programming problems exists, see Potra and Wright (2000) for an
in-depth review.
For illustrative constraints we consider that the necessary (in)equalities are
linear in p, which can be solved using standard quadratic programming methods
and off-the-shelf software, using the quadprog package in R, for example.
where x1 and x2 are independent drew from the uniform [-5,5]. We draw n =
10, 000 observations from this DGP with ε ∼ N (0, σ 2 ) and σ = 0.1. The large
sample size speaks to the feasibility of the approach in moderate/large sample
settings. Our simulations with sample sizes in the hundreds resulted in reasonable
although slightly rougher estimates. Figure 1 shows the unrestricted regression
estimate with bandwidths chosen via least squares cross-validation.
We first imposed the constraint that the regression function lies in the range
[0, 0.5]. A plot of the restricted surface appears in Figure 2. We next imposed
1360 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
the constraint that the first derivatives with respect to both x1 and x2 lie in the
range [-0.1,0.1]. A plot of the restricted surface appears in Figure 3.
KERNEL REGRESSION WITH SHAPE CONSTRAINTS 1361
where x1 and x2 are independent draws from the uniform [1,10]. Figure 4 shows
the surface. We drew n = 100 observations with ε ∼ N (0, σ 2 ) and σ = 0.7. This
DGP is positive, monotonic in both x1 and x2 , globally concave, symmetric in
x1 and x2 , and homogeneous of degree 0.8. Any or all of these constraints could
be imposed on the estimated surface. Here we imposed negativity of the second
derivatives of both x1 and x2 on a grid of 250 points equally spaced over the
support of X.
Figure 5 presents the unrestricted local constant kernel regression estimate
with bandwidths chosen via least squares cross-validation. Figure 6. presents the
restricted local constant kernel regression estimates. Here the bumps that were
present in the unrestricted estimator have been removed by the enforcement of
the constraints.
Figure 7. Power curves for α = 0.05 for sample sizes n = (25, 50, 75, 100)
based upon the DGP given in (5.3). The solid horizontal line represents the
test’s nominal level (α).
1364 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
where X·j ∼ U[1, 10] for 1 ≤ j ≤ 4 and ε is normal with mean zero and variance
0.49. For the simulations we focused on enforcing coordinate-wise monotonicity.
KERNEL REGRESSION WITH SHAPE CONSTRAINTS 1365
Table 1. Test for correct parametric functional form. Values are empirical
rejection frequencies over the M = 1, 000 Monte Carlo replications.
n α = 0.10 α = 0.05 α = 0.01
Size
25 0.100 0.049 0.010
50 0.074 0.043 0.011
75 0.086 0.034 0.008
100 0.069 0.031 0.006
200 0.093 0.044 0.007
Power
25 0.391 0.246 0.112
50 0.820 0.665 0.356
75 0.887 0.802 0.590
100 0.923 0.849 0.669
200 0.987 0.970 0.903
We impose this constraint for each observation, as opposed to over a grid. This
results in n = 171 total constraints.
We estimate the production function using a nonparametric local linear esti-
mator with least squares cross-validated bandwidth selection. Figure 8 presents
the unrestricted and restricted partial derivative sums for each observation (i.e.,
farm), where the restriction is that the sum of the partial derivatives equals one.
The horizontal line represents the restricted partial derivative sum (1.00) and the
points represent the unrestricted sums for each farm. An examination of Figure
8 reveals that the estimated returns to scale lie in the interval [0.98, 1.045].
1368 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
Figure 8. The sum of the partial derivatives for observation i appear on the
vertical axis, and each observation appears on the horizontal axis.
In order to test whether the restriction is valid we apply the test outlined in
Section 2.2. We conducted B = 999 bootstrap replications and tested the null
that the technology exhibits constant returns to scale. The empirical P value is
PB = 0.122, hence we fail to reject the null at all conventional levels. We are
encouraged by this nonparametric application as it involves a fairly large number
of predictors (five) and a fairly small number of observations (n = 171).
6. Discussion
We present a framework for imposing and testing the validity of conventional
constraints on the partial derivatives of a multivariate nonparametric kernel re-
gression function. The proposed approach covers imposing monotonicity and
concavity while delivering a seamless framework for general restricted nonpara-
metric kernel estimation and inference. Simulations are run and the method is
applied to a data set. An open implementation in the R language (R Core Team
(2012)) is available from the authors.
We note that our procedure is valid for a range of kernel estimators as well
as for estimation and testing in the presence of categorical data. Our constrained
smoothing approach can be used in a wide variety of settings. Future work on the
KERNEL REGRESSION WITH SHAPE CONSTRAINTS 1369
theoretical side could focus on the importance of the choice of distance metric, the
asymptotic behavior of the bootstrap testing procedure, and the relative merits
of the alternative data tilting methods that exists. These are subjects for future
research.
Acknowledgements
We thank an associate editor and the referees for their insightful comments
that have significantly improved the paper. We would also like to thank but
not implicate Daniel Wikström for inspiring conversations and Li-Shan Huang
and Peter Hall for their insightful comments and suggestions. The research of
Du is supported by NSF DMS-1007126. Racine would like to gratefully acknowl-
edge support from Natural Sciences and Engineering Research Council of Canada
(www.nserc.ca), the Social Sciences and Humanities Research Council of Canada
(www.sshrc.ca), and the Shared Hierarchical Academic Research Computing Net-
work (www.sharcnet.ca).
References
Afriat, S. N. (1967). The construction of utility functions from expenditure data. Internat.
Econom. Rev. 8, 67-77.
Andrews, D. K. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of
the parameter space. Econometrica 68, 399-405.
Birke, M. and Dette, H. (2007). Estimating a convex function in nonparametric regression.
Scand. J. Statist. 34, 384-404.
Birke, M. and Pilz, K. F. (2009). Nonparametric option pricing with no-arbitrage constraints.
J. Finan. Econom. 7, 53-76.
Braun, W. J. and Hall, P. (2001). Data sharpening for nonparametric inference subject to
constraints. J. Comput. Graph. Statist. 10, 786-806.
Carroll, R. J., Delaigle, A. and Hall, P. (2011). Testing and estimating shape-constrained non-
parametric density and regression in the presence of measurement error. J. Amer. Statist.
Assoc. 106, 191-202.
Cressie, N. A. C. and Read, T. R. C. (1984). Multinomial goodness-of-fit tests. J. Roy. Statist.
Soc. Ser. B 46, 440-464.
Csörgő, M. and Révész, P. (1981). Strong Approximations in Probability and Statistics. Academic
Press, New York.
Dantzig, G. B., Fulkerson, D. R. and Johnson, S. M. (1954). Solution of a large-scale traveling-
salesman problem. Operations Research 2, 393-410.
Dantzig, G. B., Fulkerson, D. R. and Johnson, S. M. (1959). On a linear-programming combi-
natorial approach to the traveling-salesman problem. Operations Research 7, 58-66.
Dette, H. and Scheder, R. (2006). Strictly monotone and smooth nonparametric regression for
two or more variables. Canad. J. Statist. 34, 535-561.
Fan, J. (1992). Design-adaptive nonparametric regression. J. Amer. Statist. Assoc. 87, 998-1004.
1370 PANG DU, CHRISTOPHER PARMETER AND JEFFREY RACINE
Fan, J., Zhang, C. and Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phe-
nomenon. Ann. Statist. 29, 153-193.
Galindo-Garre, F. and Vermunt, J. K. (2004). The order-restricted association model: two
estimation algorithms and issues in testing. Psychometrika 69, 641-654.
Gallant, A. R. (1981). On the bias in flexible functional forms and an essential unbiased form:
The fourier flexible form. J. Econometrics 15, 211-245.
Gallant, A. R. (1982). Unbiased determination of production technologies. J. Econometrics 20,
285-323.
Gallant, A. R. and Golub, G. H. (1984). Imposing curvature restrictions on flexible functional
forms. J. Econometrics 26, 295-321.
Gasser, T. and Müller, H.-G. (1979). Kernel estimation of regression functions. In Smoothing
Techniques for Curve Estimation, 23-68. Springer-Verlag, New York.
Gasser, T. and Müller, H.-G. (1984). Estimating regression functions and their derivatives by
the kernel method. Scand. J. Statist. 11, 171-185.
Ghosal, S., Sen, A. and van der Vaart, A. W. (2000). Testing monotonicity of regression. Ann.
Statist. 28, 1054-1082.
Hall, P. and Heckman, N. E. (2000). Testing monotonicity of a regression mean by calibrating
for linear functionals. Ann. Statist. 28, 20-39.
Hall, P. and Huang, H. (2001). Nonparametric kernel regression subject to monotonicity con-
straints. Ann. Statist. 29, 624-647.
Hall, P., Huang, H., Gifford, J. and Gijbels, I. (2001). Nonparametric estimation of hazard rate
under the constraint of monotonicity. J. Comput. Graph. Statist. 10, 592-614.
Hall, P. and Kang, K. H. (2005). Unimodal kernel density estimation by data sharpening. Statist.
Sinica 15, 73-98.
Horrace, W. and Schmidt, P. (2000). Multiple comparisons with the best, with economic appli-
cations. J. Appl. Econometrics 15, 1-26.
Komlós, J., Major, P. and Tusnády, G. (1975). An approximation of partial sums of independent
random variables and the sample distribution function, part I. Z. Wahrsch. Verw. Gebiete
32, 111-131.
Lee, C.-Y., Johnson, A. L., Moreno-Ceteno, E. and Kuosmanen, T. (2012). A more efficient al-
gorithm for convex nonparametric least squares. Texas A&M University Technical Report.
Mammen, E. (1991). Estimating a smooth monotone regression function. Ann. Statist. 19, 724-
740.
Mammen, E., Marron, J. S., Turlach, B. A., and Wand, M. P. (2001). A general projection
framework for constrained smoothing. Statist. Sci. 16, 232-248.
Mammen, E. and Thomas-Agnam, C. (1999). Smoothing splines and shape restrictions. Scand.
J. Statist. 26, 239-252.
Matzkin, R. L. (1991). Semiparametric estimation of monotone and concave utility functions
for polychotomous choice models. Econometrica 59, 1315-1327.
Matzkin, R. L. (1992). Nonparametric and distribution-free estimation of the binary choice and
the threshold-crossing models. Econometrica 60, 239-270.
Mukerjee, H. (1988). Monotone nonparametric regression. Ann. Statist. 16, 741-750.
Nadaraya, E. A. (1965). On nonparametric estimates of density functions and regression curves.
Theory Probab. Appl. 10, 186-190.
KERNEL REGRESSION WITH SHAPE CONSTRAINTS 1371
Pal, J. K. and Woodroofe, M. (2007). Large sample properties of shape restricted regression
estimators with smoothness adjustments. Statist. Sinica 17, 1601-1616.
Politis, D. N. and Romano, J. P. (1994). Large sample confidence regions based on subsamples
under minimal assumptions. Ann. Statist. 22, 2031-2050.
Potra, F. A. and Wright, S. J. (2000). Interior-point methods. J. Comput. Appl. Math. 124,
281-302.
Priestley, M. B. and Chao, M. T. (1972). Nonparametric function fitting. J. Roy. Statist. Soc.
Ser. B 34, 385-392.
R Core Team (2012). R: A Language and Environment for Statistical Computing. Vienna, Aus-
tria: R Foundation for Statistical Computing. ISBN 3-900051-07-0.
Racine, J. S. and Li, Q. (2004). Nonparametric estimation of regression functions with both
categorical and continuous data. J. Econometrics 119, 99-130.
Ramsay, J. O. (1988). Monotone regression splines in action (with comments). Statist. Sci. 3,
425-461.
Robertson, T., Wright, F. and Dykstra, R. (1988). Order Restricted Statistical Inference. John
Wiley, New York.
Shao, X. (2010). The dependent wild bootstrap. J. Amer. Statist. Assoc. 105, 218-235.
van de Schoot, R., Hoijtink, H. and Deković, M. (2010). Testing inequality constrained hypothe-
ses in SEM models. Structural Equation Modeling 17, 443-463.
Villalobos, M. and Wahba, G. (1987). Inequality-constrained multivariate smoothing splines
with application to the estimation of posterior probabilities. J. Amer. Statist. Assoc. 82,
239-248.
Wang, X. and Shen, J. (2010). A class of grouped Brunk estimators and penalized spline esti-
mators for monotone regression. Biometrika 97, 585-601.
Watson, G. S. (1964). Smooth regression analysis. Sankhyā 26:15, 175-184.
Wright, I. W. and Wegman, E. J. (1980). Isotonic, convex and related splines. Ann. Statist. 8,
1023-1035.
Xu, K.-L. and Phillips, P. C. B. (2012). Tilted nonparametric estimation of volatility functions
with empirical applications. J. Bus. Econom. Statist. 29, 518-528.
Yatchew, A. and Härdle, W. (2006). Nonparametric state price density estimation using con-
strained least squares and the bootstrap. J. Econometrics 133, 579-599.