Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
11 views

Smooth Coefficient Estimation of A Seemingly Unrelated Regression

Uploaded by

Joe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Smooth Coefficient Estimation of A Seemingly Unrelated Regression

Uploaded by

Joe
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 15

Journal of Econometrics 189 (2015) 148–162

Contents lists available at ScienceDirect

Journal of Econometrics
journal homepage: www.elsevier.com/locate/jeconom

Smooth coefficient estimation of a seemingly unrelated regression


Daniel J. Henderson a,∗ , Subal C. Kumbhakar b,c , Qi Li d,e , Christopher F. Parmeter f
a
Department of Economics, Finance and Legal Studies, University of Alabama, Tuscaloosa, AL 35487-0224, United States
b
Department of Economics, State University of New York at Binghamton, United States
c
University of Stavanger Business School, Stavanger, Norway
d
Department of Economics, Texas A& M University, United States
e
ISEM, Capital University of Economics & Business, China
f
Department of Economics, University of Miami, United States

article info abstract


Article history: This paper proposes estimation and inference for the semiparametric smooth coefficient seemingly
Received 5 June 2012 unrelated regression model. We discuss the imposition of cross-equation restrictions which are required
Received in revised form by economic theory as well as methods for data driven bandwidth selection. A test of correct functional
12 March 2015
form for the entire system of equations is also constructed. Asymptotic and finite sample results are given.
Accepted 12 July 2015
Available online 29 July 2015
We illustrate our estimator by applying it to a cost system for US commercial banks. Our results show that
most of the banks are operating under increasing returns to scale, but that returns to scale decrease with
JEL classification:
bank size.
C14 © 2015 Elsevier B.V. All rights reserved.
C39

Keywords:
Semiparametric smooth coefficient model
System estimation
Bandwidth selection
Banking

1. Introduction Here we use the semiparametric smooth coefficient model (SP-


SCM) to illustrate this point. The SPSCM has its origins in econo-
Nonparametric methods are now quite popular among statis- metrics dating back to the seminal work of Robinson (1989).1 Cur-
ticians, econometricians and applied economists. However, a well rently it has seen renewed interest, most likely stemming from the
known criticism against the use of nonparametric models is the fact that it is easily manipulated to mesh with a variety of econo-
‘curse of dimensionality’. In applied settings this is likely to be metric settings. Das (2005) and Cai et al. (2006) proposed using
troubling as researchers typically have access to a potentially large this estimator in an instrumental variable setting while Cai and Li
number of explanatory variables. While one could employ di- (2008) proposed using the SPSCM to estimate a dynamic panel re-
mension reduction methods such as projection pursuit (Huber,
gression model. In an applied setting, Mamuneas et al. (2006) used
1985) or engage in significance testing/automatic variable removal
the SPSCM to study the relationship between development and hu-
(Lavergne and Vuong, 2000; Hall et al., 2007), a common alterna-
man capital.
tive is to use semiparametric methods. While not as flexible as their
nonparametric counterparts, semiparametric methods can lessen In this paper we develop a SPSCM for estimating a seemingly
the curse of dimensionality while not sacrificing too much flexi- unrelated regression (SUR) model. There are several reasons why
bility for the problem at hand. Additionally, in some settings the we choose to use the SPSCM for a SUR model. First, as mentioned
use of semiparametric methods can allow easier implementation previously, semiparametric methods lessen the curse of dimen-
of an estimator that satisfies certain faculties of the given problem, sionality, which is important in applied settings. Second, in typ-
imposing constraints for example.

1 These methods were made popular in statistics when they were explored by
∗ Corresponding author. Tel.: +1 205 348 8991; fax: +1 205 348 0186. Cleveland et al. (1991) and Hastie and Tibshirani (1993) where they are commonly
E-mail address: djhender@cba.ua.edu (D.J. Henderson). referred to as varying coefficient models.

http://dx.doi.org/10.1016/j.jeconom.2015.07.002
0304-4076/© 2015 Elsevier B.V. All rights reserved.
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162 149

ical applications of SUR models, cross-equation restrictions2 are these variables is that they change the production landscape in a
required and can be imposed via matrix arrangement when we manner that makes the model a standard parametric production
use the SPSCM. Lastly, the SPSCM has a ‘conditionally’ parametric model, but holding the levels of these variables fixed.
structure, which makes interpretation of results straightforward. We apply our cross-sectional cost system method to US com-
Our discussion of semiparametric estimation in a system adds mercial banks in 2010. Since bank size is an important factor in the
to the burgeoning literature in this field, which has seen recent production environment, we use it as an argument for the smooth
interest (Welsh and Yee, 2006; Matzkin, 2008; Jun and Pinkse, coefficients. We consider a single equation cost function with a SP-
2009). Our semiparametric estimator is straightforward to imple- SCM as well as a cost system with a SPSCM. Our results suggest
ment and should prove valuable for modeling systems of equations that the impact of the production environment, as measured by
when the functional form of the unknown responses is not imme- bank size depends on whether we use the cost share equations or
diately derived from economic theory and cross-equation coeffi- focus exclusively on the single equation cost function. We find in-
cient restrictions are necessary to impose. In microeconomic the- creasing returns for most banks, but our results show that returns-
ory, cross-equation coefficient restrictions are quite common. For to-scale diminish with bank size. When we use the single equa-
example, in estimating consumer demand functions based on util- tion cost function, the increasing returns hold for even very large
ity maximization with budget constraints (or minimizing cost for banks, whereas for the system estimator, we cannot reject constant
a given level of utility), the demand functions share the param- returns for the largest banks. This finding is potentially important
eters in the utility function. Similarly, in production theory, con- as increasing returns is often used to justify bank mergers and in
ditional input demand functions for cost minimizing firms share policy debates on regulations limiting the size of banks (especially
the parameters of the production (cost) function. The (uncondi- after the recent financial crisis).
tional) input demand and output supply functions for profit max- The remainder of the paper is organized as follows. Section 2
imizing firms depend on the production (profit) function parame- presents the SPSCM estimator for a SUR system and establishes the
ters. We show how to incorporate these restrictions in a SUR sys- large sample theory. Section 3 provides a test of correct functional
tem and that accounting for these cross-equation coefficient re- form for the entire SUR system. Section 4 provides finite sample
strictions improves the asymptotic efficiency of the semiparamet- results from a small Monte Carlo setup. Results from our empirical
ric smooth coefficient estimator. example are given in Section 5. Finally, Section 6 presents some
In a similar vein, Orbe et al. (2003) develop a semiparametric concluding remarks and direction for future research.
time-varying smooth coefficient estimator for a system of equa-
tions (see also, Orbe et al., 2005). In much the same spirit that Li
2. Semiparametric smooth coefficient systems of equations
et al. (2002) generalizes the work of Robinson (1989), our work
here generalizes that of Orbe et al. (2003) who developed an esti-
The general setup of a varying coefficient regression takes the
mator to impose various restrictions on (potentially) time-varying
form
coefficients. No theoretical properties of the proposed estimator
were provided although the method was demonstrated to work yi = xTi β(zi ) + ui , i = 1, . . . , n (1)
well in both practical and simulated settings. Moreover, the focus
of Orbe et al. (2003) was on allowing for seasonality and trending where yi is the response variable of unit i, xi is a l × 1 vector of
for all coefficients in a system of equations, coupled with restric- regressors, the superscript T denotes transpose, zi is a vector of en-
tions on the coefficients. Their estimator requires solving a recur- vironmental variables of dimension q and ui is an additive idiosyn-
sion formula. In our setting, a closed form solution exists that does cratic error. One can envision the setup in (1) stemming from the
not become more difficult as the sample size grows. Further, we translog cost function presented in (11) via a set of environmental
provide the asymptotic properties of our estimator as well as for variables that characterize the operating environment of the firms.
our test of correct parametric specification of the SUR model. In For example, Feng and Serletis (2009) allow the parameters of their
addition, we propose a method for data driven bandwidth selec- translog cost function to varying depending on the size category
tion. that each bank falls within. Asaftei and Parmeter (2010) note that
Alternatively, models with varying coefficients stemming from the smooth coefficient model can be thought of as linear in param-
a system of equations which depend on unobserved heterogene- eters for a fixed value of z.
ity have recently been proposed. Jun (2009) presents a triangular Li et al. (2002) discuss standard local-constant estimation of
model with varying coefficients that depend upon unobserved het- this model in the multivariate setting, prove its consistency and
erogeneity as opposed to explanatory variables. Jun’s (2009) model provide a test of function form while Lee and Ullah (2001) study
stems from a non-separable triangular system that allows for a the local-linear version of this estimator. Other theoretical con-
wide variety of heterogeneity as well as endogeneity. tributions include Cai et al. (2000a,b) who propose a one-step
A further benefit of the semiparametric system estimator local maximum likelihood estimator for generalized linear mod-
within the production paradigm is the inclusion of non-traditional els with varying coefficients, Cai et al. (2000a,b) who study the
inputs or environmental variables (which will be illustrated in our time-series properties of the varying coefficient model and show
empirical example). It is common to encounter key variables in how many practical time series models can have smoothly vary-
an applied production setting which do not fit into a classic in- ing coefficients, and Cai (2007) and Cai et al. (2009) who discuss
put/output analysis, but more than likely impact the production the asymptotic properties of the local linear smooth coefficient
environment of the firm. Our semiparametric model can incor- model in the presence of non-stationarity. Further, Fan and Huang
porate these variables directly into the smooth coefficients and, (2005) detail inference via profile likelihood estimation of vary-
conditional on these variables, we have a consistent notion of the ing coefficient models and show that a profile likelihood ratio test
production environment. Another way to view the influence of provides power gains over existing tests involving varying coeffi-
cients while Li and Racine (2010) study the theoretical and practi-
cal properties of the varying coefficient estimator case in the mixed
discrete–continuous data environment. As should be evident, the
2 The cross-equation coefficient restrictions we consider here are required by
single equation varying coefficient regression estimator is well
economic theory and are not debatable. However, the theory is more general and
could be used to impose other restrictions of economic interest (e.g., constant
studied and has been shown to have suitable asymptotic proper-
and/or unitary returns to scale, separability, monotonicity, etc.). ties across a range of models and assumptions.
150 D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162

2.1. Seemingly unrelated regression of varying coefficients and


 −1  
n n
We consider a semiparametric smooth coefficient version of  
the SUR model proposed by Zellner (1962) (which we denote as β̂(z ) = x̃Ti Σ −1 x̃i Ki (z ) x̃Ti Σ −1 ỹi Ki (z ) . (7)
i=1 i=1
a SPSCM SUR). First, let

ysi = xTsi βs (zsi ) + usi , s = 1, . . . , m; i = 1, . . . , n (2)


Let Ỹ = (ỹT1 , ỹT2 , . . . , ỹTn )T be the (mn) × 1 vector of the
dependent variable, and X̃ = (x̃1 , x̃2 , . . . , x̃n )T be the (mn) × (ml)
where the subscript s denotes observations from the sth equation
matrix of the explanatory variables, Γ̃ = In ⊗ Σ , and K̃(z ) =
for s ∈ {1, . . . , m}, xsi and βs (·) are both of dimension ls × 1, zsi is of
K (z ) ⊗ Im , where K (z ) = Diag (K1 (z ), K2 (z ), . . . , Kn (z )) is a
dimension q, ysi and usi are scalars. The main interest of the paper is
n × n diagonal kernel weight matrix. With these notations, it is
to obtain consistent estimates of the varying coefficient functions
at an arbitrary point z ∈ Rq . straightforward to show that β̂(z ) defined in (7) can be written as
For simplicity, we will consider the case that l1 = l2 = · · · =  −1  
lm = l, and z1i = z2i = · · · = zmi = zi (the same zi variable appears β̂(z ) = X̃ T K̃(z )1/2 Γ̃ −1 K̃(z )1/2 X̃ X̃ T K̃(z )1/2 Γ̃ −1 K̃(z )1/2 Ỹ . (8)
in all m equations). Since there are two subscripts s and i, we define
ỹi = (y1i , y2i , . . . , ymi )T as a m × 1 vector of the dependent variable The above method (notation) of the stacking data by grouping
for the ith individual. Similarly, let x̃i be the m × (ml) matrix given individual i’s data first (m of them), then stacking individuals
by one by one to get the full data is convenient for deriving the
semiparametric estimator of β(z ) defined by (7) or equivalently
xT1i ...
 
0 0 by (8), but it is not the commonly used way of stacking data. The
0 xT2i ... 0 more conventional way is to put all the data for the first equation
x̃i =  . .. ..  , (3)
 
. first (n observations for s = 1), followed by all the data for the
. . .  second equation, and so on. Define a (mn) × 1 vector of dependent
0 0 ... xTmi variables Y = (yT1 , yT2 , . . . , yTm )T , where ys = (ys1 , . . . , ysn )T is
which is the matrix of explanatory variables for the ith individual; n × 1 vector of the dependent variable for the sth equation, s =
β(zi ) = (β1 (zi )T , β2 (zi )T , . . . , βm (zi )T )T is the (ml) × 1 vector 1, 2, . . . , m. Similarly, let Xs be the n × l matrix of explanatory
of the varying coefficient function evaluated at zi ; and ũi = variables from the sth equation (s = 1, . . . , m), and the (mn)×(ml)
(u1i , u2i , . . . , umi )T the m × 1 vector of the error term for individual explanatory variable matrix
i. Below we first use the above notation to discuss some parametric 
X1 0 ... 0

model estimation; these will help us derive the semiparametric ...
0 X2 0
.
estimators.
If β(ml)×1 is a vector of constant parameters, then the ordinary
 ..
X= .. .. 
. . .
least squares (OLS) estimator of β can be obtained as the solution 0 0 ... Xm
of b in the following minimization problem:
n n K(z ) = Im ⊗ K (z ) and Γ = Σ ⊗ In . Then it can be shown that β̂(z )
defined in (7) (or (8)) can also be written as
 
min ũTi ũi = min [ỹi − x̃i b]T [ỹi − x̃i b].
b b
i=1 i =1
β̂(z ) = XT K(z )1/2 Γ −1 K(z )1/2 X X K(z )1/2 Γ −1 K(z )1/2 Y , (9)
 −1  T 
However, it is well known that the OLS estimator of β is less
efficient than the generalized least squares (GLS) estimator. Let where Γ −1 = Σ −1 ⊗ In .
Σ = Var (ũi ) = {σts }t ,s=1,...,m is the m × m variance–covariance What separates this model from Zellner (1962) is the vector
matrix of ũi = (u1i , . . . , umi )T . Then the GLS estimator of β can be of regression coefficients. The l × 1 parameter vector βs (z ) is a
obtained by minimizing function of z. In this sense the sth equation is an example of the
n
 n
 SPSCM of Li et al. (2002). We note that we can potentially vary the
min ũTi Σ −1 ũi = min [ỹi − x̃i b]T Σ −1 [ỹi − x̃i b]. elements of z across s for the semiparametric Zellner (1962) case,
b b
i=1 i =1 but we keep it fixed here for simplicity.3
For our semiparametric model, β(z ) is a function of z. We want We estimate the varying coefficient functions by the nonpara-
to estimate β(z ) for a given point z. To achieve this goal, we need metric kernel method. We allow for z to contain both discrete and
to use a kernel weight function Ki (z ) = Kzi ,z which gives more continuous components. Let z = (z c , z d ), where z c = (z1c , . . . , zqc1 )
weights to observations (of zi ’s) that are closer to z. Hence, our and z d = (z1d , . . . , zqd2 ) are the continuous and discrete compo-
semiparametric (OLS type) estimator of β(z ) is the solution of b nents of z, respectively (q1 ≥ 1, q2 ≥ 0  with q1 + q2 = q).
q1
that minimizes Define product kernel functions Wi (z c ) = j=1 w((zij − zj )/h)
c c
q
n
 and Li (z d ) = j=1 l(zij , zj , λ), where w(·) and l(·) are univari-
2 d d
[ỹi − x̃i b] [ỹi − x̃i b]Ki (z ).
T
(4) ate kernel functions. h and λ are smoothing parameters associated
i =1
with z c and z d , respectively. For example, one can use the Gaus-
Similarly, the semiparametric GLS type estimator of β(z ) is the sian kernel for w(·), l(zijd , zjd , λ) = 1(zijd = zjd ) + λ1(zijd ̸= zjd ) if
vector of b that minimizes zjd is an unordered discrete variable (see Racine and Li (2004)), and
n 1(|z d −z d |)
l(zijd , zjd , λ) = 1(zijd = zjd ) + λ ij j 1(zijd ̸= zjd ) if zjd is an or-

[ỹi − x̃i b]T Σ −1 [ỹi − x̃i b]Ki (z ). (5)
i =1 dered discrete variable, where 1(A) = 1 if A holds true, and zero

If we use β̃(z ) and β̂(z ) to denote the solutions of b to (4) and


(5), respectively, then it is easy to show that
3 There is no loss of generality in this approach as we could always redefine
 −1 
z = (z1 , z2 , . . . , zm ) where zs is the set of z variables in equation s. However,
 
n
 n
β̃(z ) = x̃i x̃i Ki (z )
T
x̃i ỹi Ki (z ) ,
T
(6) knowing which zs enter which equation will mitigate the impact of the curse of
dimensionality. We thank a referee for drawing this generalization to our attention.
i=1 i=1
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162 151

otherwise (see eq. (10) in Ouyang et al. (2009)). The n × n diagonal is often specified in terms of the dual cost/profit/revenue function.
kernel matrix defined earlier: K (z ) = diag (K1 (z ), . . . , Kn (z )) has A common feature of these functions is that they satisfy derivative
its diagonal element given by Ki (z ) = Wi (z c )Li (z d ). properties (Shephard’s/Hotelling’s lemma). These derivative prop-
Our estimation method allows for both discrete and continuous erties require functional restrictions that are to be satisfied by the
covariates z (Li and Racine, 2010). Thus, we generalize Li et al. underlying technology. In a parametric model, these functional re-
(2002) to both the SUR framework and to the mixed discrete strictions require cross-equational restrictions on the parameters.
and continuous nonparametric covariates case. Here we note that As a concrete example, if firms minimize cost, the underlying
alternative versions of (9) exist (Lin and Carroll, 2000; Henderson production technology is specified in terms of a dual cost function,
and Ullah, 2005), but Welsh and Yee (2006) show that this version viz., C = C (v, o) where o is a vector of L outputs, v a vector of J
gives consistent estimates in a fully nonparametric SUR model. input prices and X the corresponding vector of J inputs. This cost
Further extensions (likely with the same benefits and difficul- function satisfies the following derivative property (Shephard’s
ties) can be made by considering a local-linear version of (9) fol- lemma)
lowing the method discussed in Lee and Ullah (2001). A local lin- ∂C
ear estimator requires more complicated notation and asymptotic = Xj , j = 1, 2, . . . , J .
∂vj
analysis. Therefore, we do not pursue a local linear estimator in this
paper. If we add the firm subscript i and use a translog cost function,
The estimator defined in (9), however, is generally not opera- J L J J
tional because Γ is typically unknown. A consistent estimate of Γ
  1 
ln Ci = β0 + βvj ln vji + βot oti + γjk ln vji ln vki
can be obtained using a consistent estimate of u which can be ob- j =1 t =1
2 j=1 k=1
tained by using the consistent system estimator that ignores the in-
L J J 
L
formation in the variance–covariance matrix. In other words, this 1  
estimator is formed by setting Γ = Imn . In this case our SPSCM + δot ln oti ln oji + κjt ln vji ln oti , (11)
2 t =1 j =1 j =1 t =1
SUR looks nearly identical to the original, single equation SPSCM
estimator, where β ≡ β0 , βv1 , . . . , βvJ , βo1 , . . . , βoL , γ ≡ γ11 , . . . , γJJ ,
   

δ ≡ (δo1 , . . . , δoL ) and κ ≡ κ11 , . . . , κJL are parameters to


−1  T  
β̃(z ) = XT K(z )X X K(z )Y .
 
(10)
be estimated, then Shephard’s lemma delivers the following cost
With this estimator in hand, we can obtain the m × 1 vector of share equations
residuals as ûi = ỹi − x̃i β̃(zi ), the estimate of the m × m matrix Σ ∂ ln Ci vji Xji
n
is calculated as Σ̂ = n−1 i=1 ûi ûTi . Hence, the feasible estimator =
∂ ln vji Ci
of β(z ) can be obtained from β̂(z ) by replacing Σ −1 by Σ̂ −1 . Given
J
that Σ is a finite dimensional matrix and that Σ̂ − Σ = op (1)  L

= βvj + γjk ln vji + κjt ln oti , j = 1, . . . , J . (12)
(hence, Σ̂ −1
−Σ= op (1)), the feasible estimator of β(z ) has
−1
k=1 t =1
the same asymptotic behavior as β̂(z ) (that uses Σ −1 in its defi-
In this setting, the SUR system consisting of the cost function in
nition). Therefore, for notational simplicity we will only consider
(11) and J − 1 of the cost share equations5 in (12). Note that none
β̂(z ) in this paper. of the parameters in the cost share equations in (12) are new in
The proposed semiparametric smooth coefficient generaliza- the sense that they all appear in (11). That is, in using both (11)
tion of Zellner’s (1962) SUR model offers several advantages over a and (12) the parameters βv j , γjk and κjt appear across the system of
fully parametric SUR model. From an economic perspective, if there equations. Christensen and Greene (1976) used these restrictions
are operating or environmental variables that characterize the un- in estimating a dual cost function in a fully parametric setting,
derlying technology, then their omission will lead to biased and in- given in (11) and (12).
consistent estimates. Further, these variables’ impact on firm tech- The cost system in (11) and (12) can be written in the form
nology is in general poorly understood given that they do not act ysi = xTsi βs (zi ) + usi subject to a set of restrictions Rβ (zi ) = r.
as traditional inputs, thus, the exact manner in which they enter Specifically, our goal is to estimate the varying coefficient (vector)
the model is debatable and a semiparametric approach has obvi- function β(z ) subject to the set of restrictions on the functional
ous appeal. How these additional variables are selected is case spe- coefficients6
cific and will need to be tailored for a given application. However,
our theoretical results require that the z variables are exogenous, Rβ(z ) = r , (13)
thus assisting in the types of operating environments and manage- where R is the standard (J × ml) design matrix, where J is the
rial effects that can be measured and included. Econometrically, total number of coefficient restrictions and r is a J × 1 vector. It
the inclusion of these additional variables poses little additional
costs (e.g., computing time) as the model still retains the paramet-
ric structure of a SUR.4
5 We cannot use all J share equations because the shares in (12) sum to unity,
the random disturbances corresponding to the share equations sum to zero, thus
2.2. Cross-equation coefficient restrictions yielding a singular covariance matrix of errors. Barten (1969) has shown that full
information maximum likelihood estimates of the parameters can be obtained by
The SPSCM SUR estimator we have discussed so far may not arbitrarily deleting any one cost share equation. Alternatively, this problem can also
be directly portable to the applied microeconomics setting where be avoided by normalizing the cost and input prices by one of the input prices such
that only J − 1 share equations are left.
cross-equation restrictions need to hold. For example, in the the- 6 As mentioned in the introduction, these restrictions are part and parcel of
ory of the firm, the focus is on estimation of the technology, which the models based on duality. Thus the model without these restrictions might be
meaningless. However, one can consider other restrictions in the model that follow
duality results. It is also possible to think of the model in more general terms
(applications beyond duality) in which the unrestricted model might make sense
4 In the case where a subset of the coefficients do not vary with z it would be and a key objective is to test the restrictions. In such a case the idea might be to
possible to construct a partially linear extension of our model. We leave this for examine efficiency gains from imposing the constraints. Our discussion in Section 4
future research. follows this route.
152 D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162

should be noted that this method is not only useful for ensuring Let
that the cross-equation coefficient restrictions hold, but it also
leads to asymptotic and finite sample gains as the number of Ĝ(z ) = [XT K(z )1/2 Γ −1 K(z )1/2 X]−1 RT
parameters being estimated can potentially be decreased by J and × {R[XT K(z )1/2 Γ −1 K(z )1/2 X]−1 RT }−1 R
more observations may potentially be used to estimate a given Jn (z ) = [XT K(z )1/2 Γ −1 K(z )1/2 X]−1 RT
parameter.
× {R[XT K(z )1/2 Γ −1 K(z )1/2 X]−1 RT }−1 r
Estimation of (2) subject to the cross-equation restrictions in
(13) amounts to solving for β(z ) in minimizing (5) subject to (13). then we can, using the same ideas that we used to decompose
The Lagrangian for this problem is β̂(z ), show that our estimator which imposes the cross-equation
L = [Y − X β(z )]T K(z )1/2 Γ −1 K(z )1/2 [Y − X β(z )] coefficient restrictions is

− 2µT [Rβ(z ) − r] , (14) β̂ ∗ (z ) = [I − Ĝ(z )]{β(z ) + Dn (z )−1 [An (z ) + Cn (z )]} + Jn (z ). (16)


where µ is a J × 1 vector of Lagrange multipliers. Taking the first- We now provide the assumptions under which we will develop
order conditions of (14) with respect to β(z ) and γ and solving for the theoretical properties for our semiparametric estimator. These
β(z ) leads to the estimator assumptions will also be used for our testing procedure which is
based on an integrated squared difference statistic. We allow for
β̂ ∗ (z ) = β̂(z ) − XT K(z )1/2 Γ −1 K(z )1/2 X
 −1 T
R z to contain both continuous and discrete components. We write
−1 T −1  z = (z c , z d ), where z c and z d are the continuous and discrete
  
× R XT K(z )1/2 Γ −1 K(z )1/2 X R Rβ̂(z ) − r ,
components of z, respectively.

which is itself a function of β̂(z ), the unconstrained system


Assumption 2.1. (i) The data {x̃i , ỹi , zi }ni=1 are independent and
estimator. In the standard case where r is a vector of zeros, this
estimator simplifies to identically distributed (i.i.d.) as (x̃1 , ỹ1 , z1 ). E [ỹi |x̃i , zi ] = x̃i β(zi )
almost everywhere, and ũi = ỹi − x̃i β(zi ) possesses finite fourth
moments. (ii) Let fz (z ) denote the marginal density function of zi

Iml − XT K(z )1/2 Γ −1 K(z )1/2 X
−1
β̂ ∗ (z ) = RT

and let fJ (x̃i , zi ) represent the joint density function of (x̃i , zi ). β(z )
  −1  is three-time continuously differentiable with respect to z c at all
R XT K(z )1/2 Γ −1 K(z )1/2 X
−1
× RT R β̂(z ). interior point z c ∈ Zc , where Zc is the support of zic . fz (z ) and
fJ (x, z ) are both twice continuously differentiable with respect to
To construct the estimate of Γ , we define the estimator which z c for z c in the interior of Zc . (iii) fJ (x, z ) and fz (z ) are bounded and
ignores the information in the variance–covariance matrix (for the β(zi ) and (x̃i , zi ), possess finite fourth moments.
case where r = 0J ) as q
   −1  Assumption 2.2. (i) W (·) is a product kernel W (v) = j=1 1 w(vj ).
w(·) is a bounded symmetric (around zero) density function satis-
 −1 −1
β ∗ (z ) = Iml − XT K(z )X T
R XT K(z )X RT R β̂(z ).

 R
fying w(v)v 4 dv < ∞. (ii) As n → ∞, h → 0 and nhq1 → ∞.
The full construction of the variance–covariance matrix follows
Assumption 2.1 places very generic conditions on the data gen-
from the discussion in the previous sub-section.
erating process underlying our smooth coefficient system. Further,
2.1(i) implies that E (ui |xi , zi ) = 0 while 2.1(ii) allows for very gen-
2.3. Large sample properties eral forms of unknown conditional heteroscedasticity. Assump-
tion 2.2 places standard conditions on the product kernel used
First, we write our estimator in (7) as for construction of the semiparametric smooth coefficient estima-
tor. 2.2(i) suggests that we are using standard second-order kernel
β̂(z ) = β(z ) + [Dn (z )]−1 {An (z ) + Cn (z )},
functions when smoothing. 2.2(ii) places the usual limit behavior
where, on the bandwidths used for smoothing. As the sample size grows
n
the bandwidth(s) need to decrease to eliminate the bias, yet they
1 
must decrease slow enough that the variance component also de-
Dn (z ) = x̃Ti Σ −1 x̃i Kiz
nhq1 i=1 creases, the classic bias–variance trade-off. Here we are requiring
n the optimal decay of the bandwidth to balance squared bias and
1 
An (z ) = x̃Ti Σ −1 x̃i (β(zi ) − β(z ))Kiz variance.
nhq1 i=1 Summarizing what we have found above, we obtain the follow-
n ing result:
1 
Cn (z ) = x̃Ti Σ −1 ũi Kiz ,
nhq1 i=1
Theorem 2.1. Under Assumptions 2.1 and 2.2, for a fixed point z =
In Appendix A we show that Dn (z ) = M (z ) + op (1), where (z c , z d ) ∈ Zc × D with z c is an interior point of Zc , where D is the
M (z ) = E [x̃Ti Σ −1 x̃i |zi = z ]f (z ). The An (z ) corresponds to bias support of z d , we have
terms with An (z ) = h2 A1 (z ) + λA2 (z ) + op (h2 + λ + (nhq1 )−1/2 ), √ 
nhq1 β̂ ∗ (z ) − [I − Ĝ(z )]β(z ) − Jn (z )
where A1 (z ) and A2 (z ) are finite constants (depending on z) and
are defined at the Appendix.
 d
− [I − Ĝ(z )]Dn (z )−1 Ah,λ (z ) → N (0, Λ(z )),
We also show that Cn (z ) has zero mean and variance
(nhq1 )−1 [ν0 M (z ) + o(1)]. Moreover, by the Liapunov central limit
theorem we have where Ah,λ (z ) = h2 A1 (z ) + λA2 (z ), A1 (z ) and A2 (z ) are defined in
the Appendix A. Λ(z ) = ν0 [I − G(z )]M (z )−1 [I − GT (z )], G(z ) =

M (z )−1 RT [RM (z )−1 RT ]−1 R is the probability limit of Ĝ(z ),
d
nhq1 Cn (z ) → N (0, ν0 M (z )). (15)
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162 153

Note that ν0 M (√
z )−1 is the asymptotic variance of (the unre- 2.4. Bandwidth selection
stricted estimator) nhq1 β̂(z ). The difference between the asymp-
totic variances of the unrestricted and the restricted estimators is As with nonparametric estimation, the choice of smoothing
parameters is imperative to the performance of semiparametric
ν0 M (z )−1 − Λ(z ) = ν0 M (z )−1 − ν0 [I − G(z )]M (z )−1 [I − GT (z )] models. A common approach to obtaining bandwidths is to use a
leave-one-out cross-validation routine. Cross-validation routines
where G(z ) is an asymmetric and idempotent matrix. By proposi- are an alternative to plug in methods which often require pilot
tion 2 in Taylor (1976) we have that ν0 M (z )−1 − Λ(z ) is positive bandwidths and rely on complicated asymptotic expressions. Here
semi-definite. Thus, as expected, the restricted estimator is more we use least-squares cross-validation (LSCV), which in the SUR
efficient than the unrestricted estimator. This is an important re- setting selects bandwidths that minimize
sult which extends the insights of Taylor (1976) to restricted es-
n
timation within a semiparametric context. Our main result here 1  T  
CV (h, λ) = ỹj − x̃j β̂−j (zj ) ỹj − x̃j β̂−j (zj ) , (18)
suggests that if cross-equation restrictions exist then they can be mn j=1
used in estimation to improve efficiency. We again want to reiter-
ate that the restrictions we have in mind in this paper are required where our vector of leave-one-out estimates of the smooth
by theory, but these restrictions lead to gains nonetheless. coefficients is expressed as
 1/2 −1  1/2  −1
As an example, consider a simple case where l = 1, r = 0 and

β̂−j (zj ) = XT−j K−j zj Γ−j K−j zj X−j
R = (1, −1) which corresponds to the restriction β1 (z ) = β2 (z ).
We then have (xTsi = xsi since xsi is a scalar in this example)  1/2 −1  1/2
× XT−j K−j zj Γ −j K −j z j Y −j ,
ysi = xsi βs (zsi ) + usi , i = 1 , . . . , n, s = 1 , 2 , (17) and the notation of subscript −j implies that the jth row is removed
from X and Y , and the jth row and the jth column are removed
If we assume that σ12 =
 0, then M (z) becomes a block diagonal from K(zj ) and Γ −1 . This variant of cross-validation is different in
M11 (z ) 0 the single regression setting. If we were to use Γ = I, then we
matrix with M (z ) = 0 M (z )
. To ease the analysis, we
22
could estimate the bandwidths for each equation separately since
further assume that M11 (z ) = M
22 (z ) 
≡ M0 (z ), in this case it is
1 1
the smoothing of each equation would be independent from the
easy to see that I − G(z ) = (1/2) 1 1
. Hence, we have remaining equations. However, when Γ ̸= I or when we have
cross-equation restrictions, we must use the above formulation
β̂ (z ) + β̂2 (z ) since this allows for both of these events.7
 
β̂ (z ) ∼ [I − G(z )]β̂(z ) = (1/2) 1

Bandwidth selection in a systems setting offers several interest-
β̂1 (z ) + β̂2 (z )
ing alternatives to single equation bandwidth selection. First, if dif-
so that ferent z variables appear in different equations, they will need sep-
arate bandwidths. Secondly, when we impose cross-equation coef-
β̂1∗ (z ) = β̂2∗ (z ) ∼ (1/2)[β̂1 (z ) + β̂2 (z )]. ficient restrictions, regardless of whether the bandwidths are iden-
tical or not, the coefficients will still satisfy the equalities across-
Hence, equations since the coefficients are determined via the system and
not a particular equation. This also suggests that in certain settings,
ν0
 
1 1
ν0 [I − G(z )]M (z ) [I − G(z )] =
−1
. we could use information from one equation to assist with estima-
2M0 (z ) 1 1 tion of a smooth coefficient in another equation.
By following similar arguments as in Li and Racine (2010), one
Let β̃(z ) denote the estimator based on (17). Under the can show that the above cross-validation method works, i.e., it
assumption that M11 (z ) = M22 (z ) ≡ M0 (z ), the asymptotic selects an h that is asymptotically equivalent to an optimal h that
variance of the unrestricted estimator is (for s = 1, 2) minimizes a weighted estimation mean squared error.
ν0 As an aside, an anonymous referee correctly points out that, if
Av ar [β̃s (z )] ∼ M0 (z )−1 , we have information that certain coefficients do not depend on z,
nh then this information will assist in the estimation of√the constant
while that for the cross-equation coefficient equality estimator is coefficients because the rate of convergence will be n, i.e., there
is no curse of dimensionality for the parametric components in
ν0 a partially linear varying coefficient SUR model. Due to space
Av ar [β̂s∗ (z )] ∼ M0 (z )−1 .
2nh limitations we do not consider a partially linear varying coefficient
SUR model in this paper.
We observe that the asymptotic variance of the restricted estima-
tor is half that of the unrestricted estimator. This is intuitive given
that the restriction is such that we effectively have a sample of size 2.5. Large sample properties with only discrete environmental
variables
2n to estimate a single smooth coefficient whereas the unrestricted
estimator has available n observations to estimate each smooth co-
Theorem 2.1 deals with a mixed continuous and discrete z
efficient. Note also that for the estimator which respects the cross-
variable case with z = (z c , z d ), where z c of dimensional q1 , and
equation coefficient restrictions we have introduced asymptotic
covariance between β̂1∗ (z ) and β̂2∗ (z ) whereas the unrestricted es-
timator did not have an asymptotic covariance.
7 Additionally, in our setup here we have assumed that all coefficients in a
Theorem 2.1 only considers a single point z. If one is interested
given equation are smoothed equally. Alternatively, we could use the two-step
in estimating β(z ) for finitely many different points, because it is smoothing approach of Fan and Zhang (1999) to allow each coefficient to be
known that nonparametric kernel estimator evaluated at different smoothed differently. In this setup we would obtain a preliminary set of bandwidths
points (of z) are asymptotically independent, point-wise asymp- which under-smooth. We leave this as a topic for future research. Further, as noted
by a referee, if we elected to allow each variable to be smoothed differently in each
totic results can be applied directly to each of the finitely many smooth coefficient, this will result in a high-dimension vector of bandwidths and it
points. will be numerically more demanding to obtain this vector bandwidths in practice.
154 D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162

z d of dimensional q2 (1 ≤ q1 ≤ q and q1 + q2 = q). It requires where ûi,0 = ỹi − x̃i β̂0 (zi ) is the m × 1 vector residual from the
that z contains at least one continuous component z c . If z = z d parametric null model, and Kzi ,zj = Wz c ,z c Lz d ,z d is the generalized
i j i j
only contains discrete components, then the result of Theorem 2.1 product kernel introduced in Racine and Li (2004).
should be modified as follows: We now present asymptotic results for our proposed testing
√   d procedure.
n β̂ ∗ (z ) − [I − Ĝ(z )]β(z ) − Jn (z ) → N (0, Λd (z )), (19)
Theorem 3.1. Provided Assumptions 2.1 and 2.2 hold,
where Ĝ and Jn are defined below Eq. (15) except that in the
kernel function we remove the product kernel function associated (1) under H0 , Ĵn = nhq1 /2 În /σ̂0 → N (0, 1) in distribution, where
with z c , Λd (z ) = [I − G(z )]Md (z )−1 [I − GT (z )], G(z ) = 2
n 
 n

Md (z )−1 RT [RMd (z )−1 RT ]−1 R is the probability limit of Ĝ, and σ̂02 = (ûTi,0 x̃i x̃Tj ûj,0 )2 Kz2i ,zj ,
n2 hq1 i=1 j=
Md (z ) = f (z )E [x̃Ti Σ −1 x̃i |zi = z ]. The proof of (19) follows similar ̸ i
steps as in the proof of Theorem 2.1 by realizing that λ = Op (n−1 ) is a consistent estimator of σ02 = 2ν0 E [f (zi )P (zi , zi )], where

(see Theorem 1 of Li et al. (2013)) and nλ = Op (n−1/2 ) so that ν0 = W 2 (v)dv and P (zi , zj ) = trE x̃Ti Σ x̃j x̃Ti Σ x̃j |zi , zj .
  
the leading bias√ is asymptotically negligible even after multiplying
by a factor of n. Therefore, we omit the proof of (19). (2) under H1 , Prob[Ĵn > Bn ] → 1 as n → ∞, where Bn is any non-
stochastic sequence with Bn = o(nhq1 /2 ).
3. A test for correct functional form We provide a sketch of the proof to Theorem 3.1 in Appendix A.
Part (1) of Theorem 3.1 suggests a rescaled statistic that is
Consider a parametric specification of our smooth coefficient asymptotically pivotal making bootstrapping inference valid,
SUR model: while part (2) shows that the test is consistent under departures
from H0 .
ysi = xTsi βs,0 (zi ) + usi , i = 1, . . . , n, s = 1, . . . , m, (20)
Here we consider a four step bootstrap procedure to employ
where βs,0 (z ) is a parametric function of z. For example, if we had the test in practice. Given that we have a system of equations,
a scalar z, we could have βs (z ) = (αs + z γs , β0s T
z )T . In this case the basic idea of the bootstrap here is similar to the panel data
we would have a standard SUR. Testing for correct functional form case (e.g. Henderson et al., 2008) where we randomly sample all
is prudent if model misspecification is of concern. In empirical the residuals for a particular cross-sectional unit with replacement
applications, correctly specified parametric models are more
 T
ûit t =1 . For our wild bootstrap in the system of equations case, we
efficient than their semiparametric counterparts. However, if the assign the same (wild) bootstrap weight to each cross-sectional
parametric model is misspecified then estimation results based on unit (i) across the S equations. Our four step procedure is as
it will lead to inconsistent results. In what follows, we propose follows:
a statistic that can test if the fully parametric SUR is correctly
specified against the SPSCM SUR. If interest hinges on a single (1) Compute the test statistic Ĵn for the original sample of {xsi },
equation within the SUR, the test in Li et al. (2002) can be deployed. {zi }, {ysi } for s = 1, . . . , m and i = 1, . . . , n and save
The null hypothesis that model (20) is correctly specified is H0 : the re-centered residuals from the null model ûi,0 − û0 , i =
β(z ) − β0 (z ) = 0, almost everywhere. The alternative hypothesis 1, 2, . . . , n, where ûi,0 = ỹi −x̃i β̂0 (zi ) and û0 = (n)−1 j=1 ûj,0 .
n
is H1 : β(z ) − β0 (z ) ̸= 0 on a set with positive measure. Following (2) For each cross-sectional unit i, construct the  (vector of)
 √
Li et al. (2002), we use an integrated squared difference statistic as 1− 5
bootstrapped residuals u∗i , where u∗i = ûi,0 − û0
the basis for our test.8 The integrated squared difference is defined √ √  2

as with probability 1+√ 5 and u∗i = 1+ 5
2
ûi,0 − û0 with
2 5
 √
def
I= [β(z ) − β0 (z )]T [β(z ) − β0 (z )] dz . (21) probability 1 − 1+√ 5 . Construct the bootstrapped left-hand-
2 5
variable by adding the bootstrapped residuals to the fitted
I = 0 under H0 and I > 0 under H1 . To obtain a feasible test statistic values under the null as y∗i = x̃i β̂0 (z )+ u∗i . Call {x̃1 , x̃2 , . . . , x̃n },
we replace β(z ) and β0 (z ) with estimates. {z1 , z2 , . . . , zn } and {y∗1 , y∗2 , . . . , y∗n } the bootstrap sample.
We will replace β(z ) by β̃(z ) (defined in (10)) in (21) to
(3) Calculate Ĵn∗ where Ĵn∗ is calculated the same way as Ĵn except
obtain a feasible test statistic. However, given that the random
that yi and ûi,0 are replaced by y∗i and û∗i,0 = y∗i − x̃i β̂0∗ (zi ),
denominator D̃n (z ) = XT K(z )X in β̃(z ) is not strictly bounded
 
away from 0, obtaining the asymptotic distribution of I is difficult. β̂0∗ (zi ) is the estimator of β0 (zi ) based on the null model and
To avoid the random denominator issue we propose an alternative, using the bootstrap sample.
weighted test statistic, (4) Repeat steps (2)–(3) a large number (B) of times and then
   T    construct the sampling distribution of the bootstrapped test
In = D̃n (z ) β̃(z ) − β̂0 (z ) D̃n (z ) β̃(z ) − β̂0 (z ) dz , statistics. We reject the null that the parametric model is
correctly specified if the estimated test statistic Ĵn is greater
where β̂0 (z ) is the estimator of β0 (z ) based on the parametric null than the upper α -percentile of the bootstrapped test statistics.
model. After some further simplifications (as in Li et al., 2002),
including the removal of a center term and the replacement of 4. Simulations
a convolution kernel function by a standard second order kernel
function, we obtain a final test statistic given by While the conclusions from Theorem 2.1 suggest the restricted
estimator should be efficient relative to the unrestricted estimator,
n n
1  we examine the finite sample performance of each of our
În = ûTi,0 x̃i x̃Tj ûj,0 Kzi zj , (22)
n2 h q 1 estimators to determine the magnitude of such gains. Our setup
i=1 j̸=i
is a two-equation model with two regressors and no intercept (for
simplicity). Specifically,

8 Other means to construct consistent model specification tests exist. See Bierens y1i = b11 (zi )x1i,1 + b12 (zi )x1i,2 + u1i
and Ploberger (1997) and Li and Wang (1998) for two alternative setups. y2i = b21 (zi )x2i,1 + b22 (zi )x2i,2 + u2i
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162 155

Table 1
Finite sample performance. Entries are the median ratio between the unrestricted and the restricted estimator. Entries greater than 1 indicate superior performance of the
restricted estimator. The lower and upper deciles of the ratio of ASE across the 1000 Monte Carlos simulations are reported in parentheses beneath each estimate.
Function b11 (z ) b12 (z ) b21 (z ) b22 (z )

n = 100 1.02 1.25 4.102 1.52 1.04


(0.996, 1.06) (0.753, 2.04) (1.075, 14.93) (0.871, 2.64) (0.921, 1.14)
n = 200 1.02 1.23 3.692 1.43 1.02
c=0
(1.002, 1.04) (0.819, 1.82) (1.134, 12.82) (0.888, 2.30) (0.928, 1.11)
n = 500 1.01 1.14 3.728 1.39 1.01
(1.004, 1.03) (0.822, 1.58) (1.304, 10.93) (0.927, 2.09) (0.949, 1.07)
n = 100 1.02 1.27 4.003 1.58 1.04
(0.997, 1.07) (0.789, 1.96) (1.045, 15.89) (0.968, 2.71) (0.940, 1.15)
n = 200 1.02 1.21 3.745 1.50 1.02
c = 0.25
(1.002, 1.04) (0.810, 1.76) (1.117, 13.36) (0.955, 2.48) (0.940, 1.11)
n = 500 1.01 1.16 3.615 1.46 1.01
(1.004, 1.03) (0.826, 1.58) (1.195, 11.10) (0.969, 2.17) (0.948, 1.08)
n = 100 1.02 1.30 4.371 1.65 1.05
(0.999, 1.06) (0.758, 2.06) (1.173, 17.45) (1.033, 2.74) (0.933, 1.15)
n = 200 1.02 1.21 4.084 1.60 1.03
c = 0.50
(1.003, 1.05) (0.794, 1.81) (1.178, 12.23) (1.034, 2.58) (0.941, 1.12)
n = 500 1.01 1.15 3.547 1.53 1.02
(1.005, 1.03) (0.815, 1.58) (1.143, 11.27) (1.040, 2.36) (0.950, 1.08)
n = 100 1.03 1.28 3.800 1.74 1.05
(1.000, 1.08) (0.763, 2.14) (1.017, 13.19) (1.103, 2.77) (0.942, 1.18)
n = 200 1.02 1.21 3.686 1.74 1.04
c=1
(1.004, 1.05) (0.746, 1.89) (0.978, 11.72) (1.106, 2.73) (0.947, 1.14)
n = 500 1.02 1.17 3.173 1.66 1.02
(1.005, 1.03) (0.801, 1.63) (1.056, 9.63) (1.124, 2.42) (0.949, 1.09)

for i = 1, . . . , n. For our simulations we consider the case where OLS estimation. In our case this means that the naïve unrestricted
b12 (z ) = b21 (z ). If one views these as share equations then estimator will be identical to the unrestricted estimator which uses
b12 (z ) = b21 (z ) restrictions are symmetry restrictions (equality Σ = Σ̂ .
of cross-partials or Young’s theorem) which must be imposed. u1i The results from the two tables suggest at least two things. First,
is generated as i.i.d. N (0, 1) while u2i = c u1i + v2i where v2i is Table 1 shows that finite sample gains only appear to accrue on
also generated as i.i.d. N (0, 1) and c ∈ {0, 0.25, 0.5, 1}. When coefficients that are common across the two equations. Second,
c = 0 the two equations are unrelated. For c ̸= 0 there exists Table 2 shows that there does not appear to be gains from
cross-equation correlation and it is increasing in c. The regressors estimating the two step estimator when there is no cross-equation
x1i and x2i are generated as i.i.d. U[0, 2] and U[0, 1], respectively. correlation between the errors. Further, the fact that the elements
The nonparametric covariate zi is generated as i.i.d. U[−3, 2]. of Γ have to be estimated leads to some finite sample outcomes
Finally, we assume the following functional forms for the varying where the two-step restricted estimator performs worse than the
coefficient functions naïve estimator. For example, for c = 0.5 and n = 200 we see that
at the median the two-step restricted estimator provides a roughly
b11 (z ) = 3z , b12 (z ) = b21 (z ) = sin(z ), b22 (z ) = z 3 . 10% improvement in the global estimation of b12 (z ), and at the
We consider two different estimators for estimating b11 (z ), b12 (z ) upper/lower deciles we witness a 35% decrease in improvement
and b22 (z ), one is the estimator which ignores the cross-equation against a 71% improvement. These features taken together suggest
restrictions, and one which imposes the restrictions prior to that deploying the naïve restricted estimator is likely to provide
estimation. For our first set of simulations where we compare the solid finite sample results (see Lin and Carroll, 2000 for a similar
restricted and unrestricted estimators, we set Σ = I. We compare result in the panel data literature). We also point out that the
the average square error (ASE) of the conditional mean for each of relative gain/loss in improvement at the upper/lower deciles for
the estimators. Additionally, we calculate the ASE for each smooth the other coefficients, and the unknown function itself suggest
coefficient estimate. The ASE for each smooth coefficient is defined roughly comparable trade-offs.
as
n 5. Application: Returns to scale in US banking
 2
ASE (bst ) = n −1
βst (zi ) − βst (zi ) ,


i=1
In Section 2.2 we discussed the cost system stemming from
derivative properties of the underlying cost function. Here we pro-
for s = 1, 2 and t = 1, 2. We let n = 100, 200 and 500 and set the vide an application of a cost system that is still quite popular in
number of Monte Carlo simulations equal to 1000. Table 1 gives the the literature and dates back to (at least) Christensen and Greene
results from this exercise. (1976). We estimate a cost system for US commercial banks, in
Table 2 presents the ratio of ASEs for the conditional mean and which the first equation is the translog cost function and the re-
the three unknown smooth coefficients comparing the restricted mainder are the cost share equations derived from Shephard’s
estimator which does not make use of the covariance structure lemma. As mentioned before, Shephard’s lemma reinforces the op-
across the two equations, against the estimator which does, timality conditions used in deriving the cost function. In this sense
Σ = Im vs. Σ̂ . We refer to the estimator which ignores the its use does not impose any additional restrictions in the model.
covariance structure as the naïve estimator. We do not focus on However, it implies some mathematical relations to be satisfied,
the performance of the unrestricted estimator since as with the viz., the integrability condition (integration of the share equations
traditional parametric SUR, if the covariates are the same cross- gives the cost function). Thus, all the parameters of the cost share
equations, then feasible generalized least-squares accounting for equations come from the cost function and these restrictions are
the covariance structure is equivalent to equation by equation required when estimating the system. The so called ‘unrestricted’
156 D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162

Table 2
Finite sample performance. Entries are the median ratio between the naïve and the restricted estimator. Entries greater than 1 indicate superior performance of the restricted
estimator. The lower and upper decile of the ratio of ASE across the 1000 Monte Carlos simulations is reported in parentheses beneath each estimate.
Function b11 (z ) b12 (z ) b22 (z )

n = 100 0.998 0.993 1.04 1.01


(0.972, 1.02) (0.936, 1.04) (0.674, 1.56) (0.917, 1.10)
n = 200 1.000 0.994 1.05 1.01
c=0
(0.983, 1.01) (0.952, 1.03) (0.732, 1.40) (0.941, 1.07)
n = 500 1.000 0.998 1.03 1.00
(0.990, 1.01) (0.972, 1.02) (0.807, 1.28) (0.964, 1.04)
n = 100 1.000 0.991 1.05 1.01
(0.964, 1.03) (0.935, 1.04) (0.610, 1.64) (0.892, 1.13)
n = 200 1.001 0.994 1.08 1.01
c = 0.25
(0.974, 1.03) (0.954, 1.04) (0.685, 1.57) (0.922, 1.10)
n = 500 1.002 0.998 1.06 1.01
(0.985, 1.02) (0.976, 1.02) (0.792, 1.42) (0.952, 1.06)
n = 100 1.003 0.995 1.09 1.02
(0.959, 1.04) (0.928, 1.05) (0.596, 1.83) (0.869, 1.17)
n = 200 1.004 0.995 1.10 1.02
c = 0.50
(0.969, 1.03) (0.951, 1.03) (0.651, 1.71) (0.910, 1.12)
n = 500 1.002 0.997 1.09 1.01
(0.978, 1.03) (0.971, 1.02) (0.726, 1.65) (0.929, 1.08)
n = 100 1.009 0.989 1.23 1.04
(0.953, 1.07) (0.902, 1.07) (0.545, 2.42) (0.850, 1.25)
n = 200 1.007 0.994 1.24 1.03
c=1
(0.959, 1.06) (0.936, 1.06) (0.604, 2.32) (0.866, 1.20)
n = 500 1.005 0.993 1.23 1.01
(0.967, 1.04) (0.946, 1.04) (0.672, 2.17) (0.893, 1.14)

cost system might not be meaningful here because it does not im- among the largest banks. As such, a growing body of literature sug-
pose integrability conditions. Therefore, we do not (i) estimate the gests that large banks employ ‘‘hard’’ information-based technol-
‘unrestricted’ cost system in which the share equation parameters ogy (e.g. Berger et al., 2005) while smaller, commercial banks use
are treated as ‘free’ parameters (not related to those in the cost ‘‘soft’’ information based production technologies (Berger, 2003).
function) even if such a system can be defined and (ii) compare Further, evidence also suggests that banks serve/specialize in dif-
results with the ‘restricted’ cost system.9 ferent market segments depending upon their size. We follow the
The data used in our application come from the Reports of main empirical practice (see Berger and Mester, 2003; Berger et al.,
Income and Condition (Call Reports) published by the Federal 2005; Feng and Serletis, 2009, 2010 among others) in the applied
Reserve Bank of Chicago. Our sample consists of a random sample banking literature and use log(assets) to measure the size of a bank.
of 3112 commercial banks in the most recent year available (2010). Unlike previous papers partitioning banks by size in a arguably
Since banking outputs are services which cannot be stored, the ad hoc manner,11 we allow assets to affect technology in a
standard practice is to specify the production technology in terms completely flexible manner. The advantage of making all the
parameters a nonparametric function of bank size is that it is
of a dual cost function thereby meaning that banks minimize cost
not necessary to classify banks into some arbitrary number of
taking outputs as given. We use the standard input and output
categories (as in Feng and Serletis, 2009) and allow the coefficients
variables in the literature (see, for example Restrepo-Tobón and
to vary by size categories only. Further, we do not have to
Kumbhakar, 2012; Wheelock and Wilson, 2012 and references
specify how bank size enters into the smooth coefficient. Hence,
cited therein).10 Output and input variables for each year are
this allows for heterogeneity of any form with respect to bank
computed as the quarterly average of balance-sheet nominal
size, which is widely believed to exist, but the form of which is
values. The output variables are: Household and individual loans
unknown.
(y1 ), Real estate loans (y2 ), Loans to business and other institutions
(y3 ), Federal funds sold and securities purchased under agreements 5.1. The model
to resell (y4 ) and Other assets (y5 ). The input variables are: Labor
quantity (x1 ), Premises and fixed assets (x2 ), Purchased funds (x3 ), Our normalized translog cost function is
Interest-bearing transaction accounts (x4 ), and Non-transaction
4
accounts (x5 ). For each input xj , its price wj is obtained by dividing 
ln(Ci /w5i ) = α0 (z ) + αj (z ) ln(wji /w5i )
its total expenses by the corresponding input quantity.
j =1
We allow a bank’s technology to vary smoothly depending upon
5 5 
5
the size of the bank. The United States banking industry has seen  
numerous changes brought about by changing regulation, result- + γt (z ) ln yti + (1/2) γtt ′ (z ) ln yti ln yt ′ i
t =1 t =1 t ′ =1
ing in consolidation via merger and acquisition activities, a de-
4 
4
cline in commercial banks and increased concentration of assets 
+ (1/2) ηjj′ (z ) ln(wji /w5i ) ln(wj′ i /w5i )
j=1 j′ =1

9 However, there is nothing wrong in treating the model that satisfies the
economic theoretical restrictions as the ‘unrestricted model’ and consider some
special cases such as the one that imposes separability constraints, constant returns 11 While the Federal Financial Institutions Examination Council (FFIEC) provides
to scale constraints, etc., and call them as restricted models. In such cases we can standard asset size categories (<100 million, 100–300 million and >300 million),
compare between restricted and unrestricted models. there is no reason to believe these categories are set based on banks underlying
10 Table 3 presents summary statistics for our data. technology.
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162 157

Table 3 Table 4
Summary statistics for US banking data in 2010. Decile, quartile and mean estimates of RTS for both the parametric and
semiparametric models. RTSSUR is estimated returns to scale obtained from the
Min Max Mean SD
corresponding SUR model and RTSSIN is estimated returns to scale obtained from
C 241.188 34126.050 6629.340 4936.184 the corresponding cost function. D10 and D90 are the lower and upper deciles,
S1 0.175 0.817 0.490 0.090 respectively while Q25 , Q50 and Q75 are the lower quartile, median and upper
S2 0.019 0.330 0.120 0.040 quartile, respectively.
S3 0.000 0.424 0.050 0.049
D10 Q25 Q50 Q75 D90 Mean
S4 0.000 0.202 0.018 0.022
w1 2.964 108.694 65.760 13.609 Parametric
w2 0.011 2.282 0.292 0.270 RTSSUR 1.421 1.637 1.978 2.451 2.972 2.104
w3 0.000 1.149 0.031 0.034 RTSSIN 1.295 1.486 1.764 2.196 2.645 1.896
w4 0.000 0.028 0.005 0.004 Semiparametric
w5 0.001 0.038 0.016 0.004 RTSSUR 0.985 1.001 1.028 1.064 1.097 1.035
y1 272.037 70964.410 6738.084 7671.218 RTSSIN 1.004 1.021 1.049 1.084 1.120 1.056
y2 3984.411 512193.400 103649.916 89185.630
y3 1786.997 196452.400 36104.131 31350.075
y4 1964.557 270410.300 46110.637 41996.257 5.2. Parametric estimates
y5 284.819 30836.040 4300.835 4204.533
z 9.439 13.596 11.891 0.741 Before looking into the results for our SPSCM estimator, we feel
Notes: All variables are measured in thousands of dollars. z is log of assets where it prudent to examine the results from a flexible parametric model.
assets is measured in thousands of dollars. Here we consider both a single equation and system where our
z variable (ln (assets)) enters in full translog form (i.e., it enters
4 
5
 linearly, in quadratic form and interacts with each regressor). The
+ δjt (z ) ln(wji /w5i ) ln yti + ui , first two rows of Table 4 present extreme deciles, quartiles and
j =1 t =1
means for our estimated RTS across the two parametric approaches
where Ci is the total cost defined as the sum of costs of all five inputs (single equation and system). We find increasing returns at each
for bank i. The cost function is normalized (by w5i ) so that the linear decile and substantial heterogeneity across the sample. However,
homogeneity (in input prices) property is automatically satisfied. these results do not appear to be economically reasonable and
Symmetry (Young’s theorem) of the cost function requires impo- are much larger than those typically found in the literature
sition of the following restrictions: ηjj′ (z ) = ηj′ j (z ) and γtt ′ (z ) = (e.g. Hughes and Mester, 2013). We tried several other ways to
γt ′ t (z ). These restrictions are automatically satisfied by the above introduce z and these led to even larger values of RTS (these
normalized cost function. results are not reported but are available from the authors upon
The derivative conditions (Shephard’s lemma) give us the request).12
following four cost share equations (which will be used alongside The problem with the parametric approach is that the way in
our cost function in our cost system) which ln (assets) enters the equation(s) must be specified a pri-
ori and each of the approaches we tried led to results which did
4 5
  not appear reasonable from an economic point of view. That is
Sji = αj (z ) + ηjj′ (z ) ln(wj′ i /w5i ) + δjt (z ) ln yti + uji ,
a good enough justification for rejecting a model on economic
j ′ =1 t =1
grounds. However, this may sound judgmental and hence we ap-
for j ∈ {1, 2, 3, 4}. Note that Sji = wji xji /Ci is the cost share of plied our functional form test outlined in Section 3 to the afore-
input j for bank i. The fifth cost share equation (S5i ) is automatically mentioned parametric system estimator. Using a wild bootstrap
dropped (sum of cost shares equals unity) because we normalize approach specifically designed for a system of equations, we found
the cost function by w5i . The full five equation cost system requires our p-value to be equal to zero to four decimal places which favors
restrictions both across the share equations as well as between the semiparametric models. Hence, we spend the remainder of the
the share equations and the cost function, making it an excellent paper focusing on our semiparametric approach.
illustration of our smooth coefficient SUR estimator. (See Table 3.)
In almost all banking studies, the focus is on estimating returns- 5.3. Semiparametric estimates
to-scale (RTS), which is defined as the reciprocal of the sum of cost
elasticities. That is, if we define the sum of output elasticities as The third and fourth rows of Table 4 present extreme deciles,
5 ∂ ln Ci quartiles and means for our estimated RTS across the two semi-
Ecyi = t =1 ∂ ln yti , then RTSi = 1/Ecyi and scale economies are parametric approaches (single equation and system). We see from
often defined as (1 − Ecyi ). A positive value of scale economies the table that a potential first order dominance exists. Both mod-
(RTS > 1) means that for a one percentage increase in all outputs els provide reasonable estimates of RTS for the cross-section of US
cost is increased by less than one percent. Here, the presence of banks in 2010. We find evidence of increasing RTS from both the
increasing RTS for a bank means that it is operating below its SPSCM and SPSCM SUR models. The single equation SPSCM cost
efficient scale size (RTS = 1). Because of this, policy analysts, model shows increasing RTS for almost all banks, while the SPSCM
regulators, and bankers want to know whether banks have scale SUR shows roughly 25% of the banks operated under increasing RTS
economies or not, thereby implying whether banks can benefit in 2010. This is consistent with some recent studies (e.g. Wheelock
from expansion. This information is often used to justify bank and Wilson, 2012), although we are not aware of any study with
mergers and regulation. Thus, knowledge of the extent of scale data up to 2010. In addition to finding a large number banks with
economies is important to argue for or against control of bank estimated RTS greater than 1 in the data, the absolute values are
size either as a policy in general or for a particular merger case, similar to those of recent studies (e.g. Hughes and Mester, 2013).
especially if it involves big banks. Since size is related to RTS, it is
important to use it in a flexible manner in the cost function so that
the RTS measure is fully flexible in terms of bank-size. We feel that
12 We have also estimated RTS from the SUR treating assets as discrete, using a
using size as the z variable makes the model much more flexible
framework similar to Feng and Serletis (2009, Table III). As mentioned before, the
than arbitrarily grouping them in terms of assets. Given that bank cutoffs deployed are arbitrary in nature and any misspecification in the appropriate
size (our z variable) enters the RTS function in a flexible manner, cutoffs could impact estimation results. That being said, our estimated RTS are
our RTS estimates are bank-specific (note that this would even be qualitatively similar and track closely those treating assets as continuous. These
true with a Cobb–Douglas cost function). results are available upon request.
158 D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162

it does not take into account the cost share equations thereby ig-
noring valuable information which was essentially costless (in the
sense that the cost shares do not add any extra parameters). There-
fore, the estimates from the single equation model will be less ef-
ficient compared to the full system.
Fig. 3 presents 45° degree plots (Henderson et al., 2012) of the
estimated RTS. This plot allows us to visualize the significance
of the estimated RTS. For example, we have plotted the vertical
and horizontal axes at 1 and any point estimate whose confidence
bounds contain 1 are statistically indistinguishable from 1. We note
that the RTS estimates obtained from the SUR have smaller boot-
strap standard errors (using a wild bootstrap) for approximately
95% of all observations relative to the RTS estimates constructed
using the smooth coefficient estimates from just the cost function.
Fig. 4 plots only those estimates of RTS which are statistically dif-
ferent from 1. While it is not immediately obvious from either fig-
ure that the bootstrap standard errors are narrower for the system,
consider the difference across Figs. 3 and 4 for estimated RTS less
than one. Here it appears that the single equation estimates are
much less precise (in the sense of testing against unitary RTS).
6. Conclusion

Fig. 1. Empirical cumulative distribution functions for estimated RTS for the This paper has presented a semiparametric estimator for a
system (SPSCM SUR) and single equation (SPSCM) estimators. The dashed line is seemingly unrelated regression that is straightforward to imple-
for the SPSCM estimates while the solid line is for the SPSCM SUR estimates. ment and impose cross-equation restrictions. The use of a semi-
parametric model lessens the curse of dimensionality relative to
Fig. 1 plots the estimated CDFs of our estimates for RTS. general implementation of a nonparametric model. This model is
Following Henderson and Maasoumi (2013), a test of first- a generalization of Zellner (1962) and should allow for greater in-
order stochastic dominance fails to reject the null of first-order sight into economic analysis of systems of equations where eco-
dominance (p-value = 1.000). Note that the percentage of banks nomic theory does not provide ample support for a specific form
operating under increasing RTS is found to be higher in the single for β(z ).
equation SPSCM. Since the additional equations (cost shares) are We have further shown the asymptotic properties of this es-
implied by the cost model and add extra information without any timator. Our theoretical results suggest an asymptotic improve-
extra parameters, we believe that the results from the SUR SPSCM ment when cross-equation restrictions are present and a small
are more reliable. scale Monte Carlo analysis demonstrated impressive finite sam-
Even with a single z variable, in this case the logarithm of bank ple gains. The fact that our estimator provides asymptotically more
assets, it is hard to plot an exact relationship between estimated efficient estimates lends credence to the empirical importance
RTS and the smoothing variable. Fig. 2 plots the estimated RTS for of imposing cross-equation restrictions in a production setting.
each bank along with an estimated conditional mean using local- Moreover, the ease with which cross-equation restrictions can be
constant kernel regression (with 95% confidence bounds). We can incorporated into this estimator relative to a fully nonparametric
approach makes it a desirable empirical tool.
see that both models suggest, on average, a decreasing relationship
Finally, as our estimator is motivated by economic theory, we
between RTS and bank assets. It also shows that scale economies of
showed how this estimator could be used to estimate a cost system.
very large banks are non-existent. This is consistent with economic
Using US commercial banking data, we estimated both a single
theory which suggests that scale economies tend to decline with
equation cost function as well as a system approach consisting
increase in size. Furthermore, we find that some of the largest bank
of the cost function and the cost share equations. We rejected a
in our sample have exhausted their scale economies (and are oper-
parametric version of the model with our theoretically justified
ating at their efficient scale size). This is in contrast to the Wheelock
functional form test with asymptotically valid bootstrap. We found
and Wilson (2012) study who found increasing RTS even for the more efficient estimation with the cost system, which requires
largest banks. Since they have not used a system they have prob- cross-equation restrictions. Using a smooth coefficient model for
ably over estimated RTS like our single equation SPSCM. That be- each where bank size entered the coefficients nonparametrically,
ing said, we should note that our data is more recent and we only we found that returns-to-scale diminished with bank size. We
used a cross-sectional (2010) data set whereas they used a panel also found evidence in the system that the largest banks exhibited
(1984–2006). constant returns-to-scale, but found increasing returns for the
The estimated SPSCM SUR cost system accommodated the largest banks in the single equation model.
derivative conditions (Shephard’s lemma) which imposed para-
metric restrictions. Given that the cost system without these con- Acknowledgments
straints does not make economic sense, we do not report the so
called ‘unrestricted’ cost system.13 Instead, we estimated a single We would like to thank three referees and an associate editor for
equation SPSCM with the cost function alone. Since the cost func- providing insightful comments that greatly improved the paper.
tion contains all the parameters, we could estimate them consis- We would also like to thank participants at the Applied and
Theoretical Econometrics Workshop at the University of Colorado-
tently using a single equation (i.e., a SPSCM translog cost function).
Boulder, New York Camp Econometrics VII, Midwest Econometric
We view this single equation SPSCM as ‘limited’ in the sense that
Group (University of Chicago), University of Miami Finance Series
and University of Padua for valuable comments. Li’s research is
partly supported by National Nature Science Foundation of China
13 These results are available upon request. (Key Project, Grant # 71133001).
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162 159

(a) System. (b) Single equation.

Fig. 2. RTS point estimates along with local constant conditional mean. The solid lines are the local constant estimates from the regression of estimated RTS on ln assets.
95% Confidence bands are presented as dotted lines. Bandwidths for each plot were selected via least-squares cross-validation.

Appendix A. Proofs Dn (z ) = M (z ) + op (1). (A.2)

Proof of Theorem 2.1. In the text it was shown that


Next, it is easy to show that E (Cn (z )) = 0 and Var (Cn (z )) =
β̂(z ) = β(z ) + Dn (z ) {An (z ) + Cn (z )},
−1
(A.1)
(nhq1 )−1 (ν0 M (z ) + o(1)). Then by the Liapunov’s central limit
theorem we know that
where,
√ d
1
n
 nhq1 Cn (z ) → N (0, ν0 M (z )). (A.3)
Dn (z ) = x̃Ti Σ −1 x̃i Ki (z ),
nhq1 i =1
n It can be easily shown that An gives the leading bias terms.
1 
An (z ) = x̃Ti Σ −1 x̃Ti (β(zi ) − β(z ))Ki (z ),
nhq1 i=1
1 
E (An (z )) = E E (x̃Ti Σ −1 x̃i |zi )(β(zi ) − β(z ))Ki (z )

n
1  hq 1 
Cn (z ) = x̃Ti Σ −1 ũi Ki (z ).
nhq1 i=1

= M (z̃ )(β(z̃ ) − β(z ))Wz̃ c ,z d Lz̃ d ,z d dz̃ c
z̃ d ∈D
Note that, Ki (z ) = Wi (z c )Li (z d ) is the ith diagonal element
of K (z ). It is easy to show that E (Dn ) = M (z ) + Op (h2 + λ + = h2 A1 (z ) + λA2 (z ) + o(h2 + λ), (A.4)
(nhq1 )−1/2 ), where M (z ) = f (z )E [x̃Ti Σ −1 x̃i |zi = z ] and that
Var (Dn,j ) = Op ((nhq1 )−1 ) = op (1), where Dn,j is the jth column where D is the support of the discrete random variable z d , M (z̃ ) =
of Dn , j = 1, . . . , (ml). Hence, we have f (z̃ )E [x̃Ti Σ −1 x̃i |zi = z̃ ], and
160 D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162

(a) SUR. (b) Single equation.

Fig. 3. RTS point estimates along with 95% bootstrap confidence intervals. Panel (a) presents point estimates of RTS (as circles) from the SPSCM SUR along with 95% bootstrap
confidence intervals (as triangles). Panel (b) presents point estimates of RTS (as circles) from the SPSCM along with 95% bootstrap confidence intervals (as triangles). The
vertical and horizontal dashed lines at 1 represent constant RTS, estimates in the upper right quadrant display increasing RTS while estimates in the lower left quadrant
display decreasing RTS.
q1
 Let Ĝ(z ) = (XT K(z )X)−1 RT {R[XT K(z )X]−1 RT }−1 R, by Eq. (16)
A1 (z ) = µ2 [Mj (z )βj (z ) + M (z )βjj (z )/2], we have
j=1
β̂ ∗ (z ) = [I − Ĝ(z )]{β(z ) + Dn (z )−1 [Ah,λ (z ) + Cn ] + Jn (z )

A2 (z ) = I (z̃ d , z d )M (z c , z̃ d )(β(z̃ d , z c ) − β(z )), (A.5)
z̃ d ∈D + op (h2 + λ + (nhq1 )−1/2 )},
q2
where µ2 w(v)v 2 dv , I (z̃ d , z d ) = j=1 I (z̃j , zj ) with
 d d
= where Ah,λ (z ) = h2 A1 (z ) + λA2 (z ).
I (z̃jd
, )= zjd 1(z̃jd
̸= zj ) if zj is an unordered discrete variable, and
d d
Hence,
I (z̃jd
, ) = 1(|z̃jd − zjd | = 1) if zjd is an ordered discrete variable.
zjd √ 
nh β̂ ∗ (z ) − [I − Ĝ(z )]β(z ) − Jn (z )
Also, we used the notation for a q1 dimension z c = (z(c1) , . . . , z(cq ) )
1
∂ g (z ) ∂ 2 g (z )
 d
that gj (z ) = and gjj (z ) = (∂ z c )2 denote the first order
∂ z(cj) − [I − Ĝ(z )]Dn (z )−1 Ah,λ (z ) → N (0, Λ(z )),
(j)
and second order derivative functions of g (·) with respect to z(cj) ,
where G(z ) = M (z )−1 RT [RM (z )−1 RT ]−1 R is the probability limit of
j = 1, . . . , q1 , where g (z ) is either M (z ) or β(z ).
It is straightforward to show that Var (An (z )) = O h2 (nhq1 )−1 Ĝ(z ) and Λ(z ) = ν0 [I − G(z )]M (z )−1 [I − G(z )]. 
 

= o((nhq1 )−1 ). Combining the above results we have shown that


Consistent estimates of bias and variance terms
An (z ) = h2 A1 (z ) + λA2 (z ) + op (h2 + λ + (nhq1 )−1/2 ). (A.6) The leading bias and variance terms are Jn (z ) + [I −
By Liapunov’s central limit theorem we know that Ĝ(z )]Dn (z )−1 Ah,λ (z ) and Λ(z ) = ν0 [I − G(z )]M (z )−1 [I − G(z )T ],
√ d where Ĝ(z ) = (XT K(z )X)−1 RT {R[XT K(z )X]−1 RT }−1 R, Jn (z ) =
Dn (z )−1 nhq1 Cn (z ) → M (z )−1 N (0, ν0 M (z )) (XT K(z )X)−1 RT {R[XT K(z )X]−1 RT }−1 r, Dn (z ) = (nhq1 )−1 XT K(z )X,
= N (0, ν0 M (z )−1 ). and Ah,λ (z ) = h2 A1 (z ) + λA2 (z ).
D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162 161

(a) SUR. (b) Single equation.

Fig. 4. RTS point estimates along with 95% bootstrap confidence intervals for estimates statistically different than 1. Panel (a) presents point estimates of RTS (as circles)
from the SPSCM SUR along with 95% bootstrap confidence intervals (as triangles). Panel (b) presents point estimates of RTS (as circles) from the SPSCM along with 95%
bootstrap confidence intervals (as triangles). The vertical and horizontal dashed lines at 1 represent constant RTS, estimates in the upper right quadrant display increasing
RTS while estimates in the lower left quadrant display decreasing RTS.

d
Let B be any quantity that appears at the leading bias of leading (β̂0 (zi ) − β0 (zi ))T x̃Ti x̃i x̃Tj x̃j (β̂0 (zj ) − β0 (zj ))Kij . The term nhq1 /2 I1n →
variance of β̂(z ). It is easy to show that the leading bias can be N (0, σ02 ) via a similar argument as found in the proof of Lemma 1
estimated by replacing B by B̂, where B̂ is obtained by replacing in Li and Wang (1998). Here we sketch the argument.
unknown functions in B by kernel estimators. For example, βj (z ) I1n can be written as the sum of second order, degenerate U-
and βjj (z ) can be estimated by local quadratic method, M ′ (z ) by statistics. It is straightforward to show that this second order U-
local linear, other functions by local constant method. The leading statistic has E (nhq1 /2 I1n ) = 0 and variance σ02 + o(1). With these
variance can be estimated by Σ̂ = ν0 [I − Ĝ](M̂ (z ))−1 [I − ĜT ], two facts, using Hall’s (1984) central limit theorem for degenerate
where Ĝ is defined a few lines above (16), M̂ (z ) can be obtained by d
U-statistics suggests that nhq1 /2 I1n → N (0, σ02 ). It is easy to show
replacing M (z ) by its kernel estimator and replacing Σ −1 by (Σ̂ )−1 , that σ̂02 = σ02 + op (1). Next, by the fact that β̂0 (z ) − β0 (z ) =
where Σ̂ = n−1 i=1 ûi ûTi , ûi = ỹi − x̃i β̂(zi ).
n
Op (n−1/2 ), similar arguments as in Li and Wang (1998) lead to
Proof of Theorem 3.1. The proof of Theorem 3.1 follows closely to I2n = Op (n−1 ), I3n = Op (n−1 ) and σ̂02 = σ02 + op (1). Therefore, we
that of Theorem 3.1 in Li et al. (2002) and so we only provide a d
have nhq1 /2 În /σ̂0 = nhq1 /2 I1n /σ0 + op (1) → N (0, 1), under H0 . 
sketch of the proof here.
Proof of Theorem 3.1(b). This proof follows directly from the
Proof of Theorem 3.1(a). First,
 note that under
 H0 the identity p
results in Li and Wang (1998). First, one can show that În → I > 0
ûi,0 = ỹi − x̃i β̂0 (zi ) = ũi + x̃i β0 (zi ) − β̂0 (zi ) holds. We have
under H1 . Second, σ̂0 = C + op (1) is easily established under

În = I1n + 2I2n + I3n , H1 , where C is a positive constant. Lastly, Ĵn = nhq1 /2 În /σ̂0 =
nhq1 /2 [I /C + op (1)]. As n → ∞, we see that the test statistic
where I1n = (n2 hq1 )−1 ũTi x̃i x̃Tj ũj Kij , I2n = (n2 hq1 )−1
n n
i =1 j̸=i Ĵn diverges to +∞ at the rate of nhq1 /2 . This proves the desired
(β0 (zj ) − β̂0 (zj ))Kij , I3n = (n h )
n n T T 2 q1 −1
n n
i =1 j̸=i ũi x̃i x̃j x̃j i=1 j̸=i result. 
162 D.J. Henderson et al. / Journal of Econometrics 189 (2015) 148–162

Having proven parts (a) and (b) this completes the proof of Cai, Z., Li, Q., 2008. Nonparametric estimation of varying coefficient dynamic panel
Theorem 3.1.  data models. Econometric Theory 24, 1321–1342.
Cai, Z., Li, Q., Park, J.Y., 2009. Functional-coefficient models for nonstationary time
series data. J. Econometrics 148, 101–113.
Appendix B. Equation by equation SPSCM estimation Christensen, L.R., Greene, W.H., 1976. Economics of scale in U.S. electric power
generation. J. Polit. Econ. 84, 655–676.
Cleveland, W.S., Grosse, E., Shyu, W.M., 1991. Local regression models. In: Cham-
We provide a short sketch that when the covariates are identical bers, J.M., Hastie, T. (Eds.), In Statistical Models in S. Pacific Grove: Wadsworth
across the equations (with no restrictions), that accounting for and Brooks/Cole, pp. 309–376.
Das, M., 2005. Instrumental variables estimators of nonparametric models with
the covariance structure of the residuals does not differ from the
discrete endogenous regressors. J. Econometrics 124, 335–361.
estimator that ignores this structure. Fan, J., Huang, T., 2005. Profile likelihood inference on semiparametric varying-
Recall that our matrix of covariates in this setting is X = Im ⊗ X , coefficient partially linear models. Bernoulli 11, 1031–1057.
Γ = Σ ⊗ In and K1/2 = Im ⊗ K 1/2 . Note that (A ⊗ B)(C ⊗ D) = Fan, J., Zhang, W., 1999. Statistical estimation in varying-coefficient models. Ann.
Statist. 27, 1491–1518.
AC ⊗ BD and (A ⊗ B)−1 = A−1 ⊗ B−1 . Our feasible SPSCM-SUR Feng, G., Serletis, A., 2009. Efficiency and productivity of the US banking industry,
estimator is 1998–2005: evidence from the Fourier cost function satisfying global regularity
conditions. J. Appl. Econometrics 24, 105–138.
β̂(z ) = XT K1/2 Γ −1 K1/2 X
−1 T 1/2 −1 1/2
X K Γ K Y Feng, G., Serletis, A., 2010. Efficiency, technical change, and returns to scale in

large US banks: Panel data evidence from an output distance function satisfying
−1  −1 theoretical regularity. J. Bank. Finance 34 (1), 127–138.
= Σ ⊗ X KX Σ ⊗ XT K Y
 −1  T 
Hall, P., 1984. Central limit theorem for integrated square error of multivariate
 −1 T  nonparametric density estimators. J. Multivariate Anal. 14 (1), 1–16.
= Im ⊗ X T K X

X K Y Hall, P., Li, Q., Racine, J.S., 2007. Nonparametric estimation of regression functions
in the presence of irrelevant variables. Rev. Econ. Stat. 89, 784–789.
 T  −1 T Hastie, T., Tibshirani, R., 1993. Varying-coefficient models. J. R. Stat. Soc. Ser. B Stat.
= X KX X KXY = β̃(z ). Methodol. 55, 757–796.
Henderson, D.J., Carroll, R.J., Li, Q., 2008. Nonparametric estimation and testing of
fixed effects panel data models. J. Econometrics 144, 257–275.
Henderson, D.J., Kumbhakar, S.C., Parmeter, C.F., 2012. A simple method to visualize
Appendix C. Properties of G (z ) and M (z )−1 results in nonlinear regression models. Econom. Lett. 117, 578–581.
Henderson, D.J., Maasoumi, E., 2013. Searching for rehabilitation in nonparamet-
We show here the properties necessary for Proposition 2 of ric regression models with exogenous treatment assignment. In: Ullah, A.,
Racine, J.S., Su, L. (Eds.), Handbook of Applied Nonparametric and Semi-
Taylor (1976) hold. First, G(z ) is an asymmetric and idempotent parametric Econometrics and Statistics. Oxford University Press, New York,
matrix. pp. 501–520.
−1 Henderson, D.J., Ullah, A., 2005. A nonparametric random effects estimator.
G(z ) = M (z )−1 RT RM (z )−1 RT

R Econom. Lett. 88, 403–407.
Huber, P., 1985. Projection Pursuit. Ann. Statist. 13, 435–475.
T −1 Hughes, J.P., Mester, L.J., 2013. Who said large banks don’t experience scale
T
RM (z )−1 R RM (z )−1 = G(z )T
 
̸= R economies? Evidence from a risk-return-driven cost function. J. Financ.
Intermed. 22, 559–585.
and Jun, S.J., 2009. Local structural quantile effects in a model with a nonseparable
−1 control variable. J. Econometrics 151, 82–97.
G(z )G(z ) = M (z )−1 RT RM (z )−1 RT

Jun, S.J., Pinkse, J., 2009. Efficient semiparametric seemingly unrleated quantile
 −1 regression estimation. Econometric Theory 25, 1392–1414.
× RM (z )−1 RT RM (z )−1 RT

R Lavergne, P., Vuong, Q., 2000. Nonparametric significance testing. Econometric Rev.
16, 576–601.
− 1
= M (z )−1 RT RM (z )−1 RT R = G(z ).
 
Lee, T.-H., Ullah, A., 2001. Nonparametric bootstrap tests for neglected nonlinearity
in time series regression models. J. Nonparametr. Stat. 13, 425–451.
Li, Q., Huang, C.J., Li, D., Fu, T.-T., 2002. Semiparametric smooth coefficient models.
Next, the key assumption in Taylor (1976) is that G(z )M (z )−1 = J. Bus. Econom. Statist. 20, 412–422.
M (z )−1 G(z )T . This identity holds by noting that Li, Q., Racine, J.S., 2010. Smooth varying-coefficient estimation and inference for
qualitative and quantitative data. Econometric Theory 26, 1607–1637.
−1
G(z )M (z )−1 = M (z )−1 RT RM (z )−1 RT RM (z )−1 , Li, Q., Ouyang, D.S., Racine, J.S., 2013. Categorical semiparametric varying-

coefficient models. J. Appl. Econometrics 28, 551–579.
−1 Li, Q., Wang, S., 1998. A simple consistent bootstrap test for a parametric regression
M (z )−1 G(z )T = M (z )−1 RT RM (z )−1 RT RM (z )−1 .

function. J. Econometrics 87, 145–165.
Lin, X., Carroll, R.J., 2000. Nonparametric function estimation for clustered data
We see that G(z )M (z )−1 = M (z )−1 G(z )T . Thus, the symmetry when the predictor is measured without/with error. J. Amer. Statist. Assoc. 95,
condition needed for Proposition 2 in Taylor (1976) holds. 520–534.
Mamuneas, T.P., Savvides, A., Stengos, T., 2006. Economic development and the
return to human capital: a smooth coefficient semiparametric approach. J. Appl.
Econometrics 21, 111–132.
References Matzkin, R.L., 2008. Identification in nonparametric simultaneous equations.
Econometrica 76, 945–978.
Asaftei, G., Parmeter, C.F., 2010. Market power, EU integration and privatization: Orbe, S., Ferreira, E., Rodriguez-Poo, J.M., 2003. An algorithm to estimate time
The case of Romania. J. Comp. Econ. 38 (3), 340–356. varying parameters SURE models under different types of restriction. Comput.
Barten, A.P., 1969. Maximum likelihood estimation of a complete system of demand Statist. Data Anal. 42, 363–383.
equations. Eur. Econ. Rev. 1, 7–73. Orbe, S., Ferreira, E., Rodriguez-Poo, J.M., 2005. Nonparametric estimation of time
Berger, A.N., 2003. The economic effect of technological progress: Evidence from varying parameters under shape restrictions. J. Econometrics 126, 53–77.
the banking industry. J. Money Credit Bank. 35, 141–176. Ouyang, D., Li, Q., Racine, J., 2009. Nonparametric estimation of regression functions
Berger, A.N., Mester, L.J., 2003. Explaining the dramatic changes in performance with discrete regressor. Econometric Theory 25, 1–42.
of US banks: technological change, deregulation, and dynamic changes in Racine, J.S., Li, Q., 2004. Nonparametric estimation of regression functions with both
competition. J. Financ. Intermed. 12, 57–95. categorical and continuous data. J. Econometrics 119, 99–130.
Restrepo-Tobón, D., Kumbhakar, S.C., 2012. Measuring Profit Efficiency without
Berger, A.N., Miller, H.M., Mitchell, A.P., Rajan, R.G., Stein, J.C., 2005. Does function
Estimating a Profit Function: The Case of U.S. Commercial Banks. Working
follow organizational form? Evidence from the lending practices of large and
paper, State University of New York at Binghamton.
small banks. J. Financ. Econ. 76, 237–269.
Robinson, P., 1989. Nonparametric estimation of time-varying parameters.
Bierens, H.J., Ploberger, W., 1997. Asymptotic theory of integrated conditional
In: Hackl, P. (Ed.), Analysis and Forecasting of Economic Structural Change.
moment tests. Econometrica 65, 1129–1151.
North Holland, Amsterdam.
Cai, Z., 2007. Trending time-varying coefficient time series models with serially Taylor, W.E., 1976. Prior information on the coefficients when the disturbance
correlated errors. J. Econometrics 136, 163–188. covariance matrix is unknown. Econometrica 44, 725–739.
Cai, Z., Das, M., Xiong, H., Wu, X., 2006. Functional coefficient instrumental variables Welsh, A.H., Yee, T.W., 2006. Local regression for vector responses. J. Statist. Plann.
models. J. Econometrics 133, 207–241. Inference 136, 3007–3031.
Cai, Z., Fan, J., Li, R., 2000a. Efficient estimation and inferences for varying- Wheelock, D.C., Wilson, P.W., 2012. Do large banks have lower costs? New
coefficient models. J. Amer. Statist. Assoc. 95, 888–902. estimates of returns to scale for U.S. banks. J. Money Credit Bank. 44, 171–199.
Cai, Z., Fan, J., Yao, Q., 2000b. Functional-coefficient regression models for nonlinear Zellner, A., 1962. An efficient method for estimating seemingly unrelated
time series. J. Amer. Statist. Assoc. 95, 941–956. regressions and tests for aggregation bias. J. Amer. Statist. Assoc. 57, 585–612.

You might also like