Semi-Parametric Estimation of A Generalized Threshold Regression Model Under Conditional Quantile Restriction

Econometrics Journal (2013), volume 16, pp. 250277.
doi: 10.1111/ectj.12005
Semi-parametric estimation of a generalized threshold regression
model under conditional quantile restriction
ZHENGYU ZHANG
,
School of Economics, Shanghai University of Finance and Economics, 777, Guoding Road,
200433, Shanghai, China.
Center for Econometric Study, Shanghai Academy of Social Sciences, China.

E-mail: zyzhang@sass.org.cn
First version received: August 2011; nal version accepted: January 2013
Summary We consider semi-parametric estimation of a generalized threshold regression
model with both the link function and the error term distribution left unspecied. We propose
for the model a maximum integrated score estimator (MISE) which allows us to estimate
the model under weaker conditional quantile restriction. The MISE is shown to have a
convergence rate n
1
for the threshold parameter and a regular n
1/2
rate for the remaining
parameters. Moreover, it turns out that the estimates for both parts are asymptotically
independent in that their limiting distributions are the same as what they would be if the other
part were known. Monte Carlo results indicate that our estimator performs reasonably well in
nite samples.
Keywords: Maximum score method, Quantile regression, Threshold regression model,
Transformation model.
1. INTRODUCTION
There is a growing econometric literature dealing with a class of models named threshold
regression models, characterized by piecewise linear processes separated according to a threshold
in a covariate. As a major attractiveness of this model, it treats the sample splitting value
(threshold parameter) as unknown instead of using ad hoc criterion to select subsamples. A base-
line linear threshold regression model is given by
y
= x
0
+z
0
I(q >
0
) u, (1.1)
where y
is the (latent) dependent variable, (x, z) is a (p

x
+p
z
)-dimensional explanatory
variable; we assume that z is a subset of x (p
z
p
x
) for expositional simplicity;
0
and
0
are
two conformable vectors of coefcients; u is the error term; q is an observed threshold variable
which can be an element of x and
0
is the threshold parameter.
1
Estimation of model (1.1)
by least-squares-based method has been considered by, for example, Hansen (2000), Caner and
Hansen (2004) and Seo and Linton (2007). Recently, this baseline model has been extended
in diverse ways. It has been studied for time series autoregressive models (Chan, 1993, Cho
1
Throughout the paper, any parameter with subscript zero represents the true parameter that generates the data.
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society. Published by John Wiley & Sons Ltd, 9600 Garsington
Road, Oxford OX4 2DQ, UK and 350 Main Street, Malden, MA, 02148, USA.
Journal
The
Econometrics
Quantile regression estimation of generalized threshold model 251
and White, 2007), in non-parametric context (Delgado and Hidalgo, 2000), for transformation
models (Pons, 2003, Kosorok and Song, 2007) and for binary choice models (Lee and Seo,
2008).
In this paper, we consider semi-parametric estimation of a generalized threshold regression
model with an unknown transformation of the dependent variable under conditional quantile
restriction, that is,
H(y
) = x
0
+z
0
I(q >
0
) u, (1.2)
where H() is a strictly increasing function and the -th quantile of u conditional on (x, q) is
zero.
2
We observe (y, x, q), where y is allowed subject to some two-sided censoring, that is,
y = a
1
I(y
< a
1
) +y
I(a
1
y
a
2
) +a
2
I(y
> a
2
), (1.3)
with a
1
< a
2
.
Regression models with the dependent variable subject to an unknown monotonic
transformation are also known as the duration models, which are relevant in a variety of
applications. This is because many time-to-event variables are of interest to researchers
conducting empirical studies in economics, nance and management. These time-to-event
variables can be the length of an unemployment spell or a strike in labour economics, the time
between purchases of a durable good in marketing, the duration of business circles or recessions
in macroeconomics and the survival time in biostatistics. See Lancaster (1990) and Van den Berg
(2001) for extensive surveys on these applications. In comparison with a linear transformation
model with no threshold effect, (1.2) is useful in modelling some non-linear and asymmetric
features in an economic relationship. For example, a vast literature invokes tax and regulatory
barriers to micro-rm growth in developing countries (Collier and Gunning, 1999), suggesting
the presence of a growth threshold effect for rms in these countries. Gauthier and Gersowitz
(1997) nd an inverted-U relationship between Cameroonian rms tax burden and survival
time, with very young and very large rms revealing low tax burdens. In such an application,
the dependent variable is the duration of a rms survival and the threshold variables include
the variables measuring tax and regulatory burdens. Then testing the presence of and estimating
the extent of the non-linear path for rms growth with respect to tax and regulatory barriers
amounts to testing for and estimating the threshold effect in the model. The effect of nancial
constraint on rms investment behaviour can also be modelled as a discontinuous threshold
effect (Fazzari et al., 1988, Hansen, 1999). To test the effect of nancial constraint, one may
take the elapsed time until a major investment decision made by a rm as the dependent variable
and the leverage ratio as the threshold variable. In biostatistics, Cox regression models with
covariate threshold are used to study the risk factors for breast cancer with a threshold in the
effect of estrogen receptors (Jespersen, 1986) and the effect of tumour thickness on survival with
melanoma (Andersen et al., 1993). In addition, our generalized threshold model (1.2) can be
viewed as a exible and parsimonious strategy for adding non-parametric component into the
baseline parametric threshold model (1.1).
Estimation of a model similar to ours has been considered by a number of studies. Among
them, Liang et al. (1990) and Luo et al. (1997) consider a Cox model (when the error term
follows the extreme value distribution) with time-dependent covariates and a change point at an
2
Notice that the x in model (1.2) should not contain a constant term. According to Horowitz (1996), no intercept is
identied for models subject to unknown transformation.
C
C
2013 Royal Economic Society.
252 Z. Zhang
unknown time. Pons (2003) studies the maximum partial likelihood estimate for the Cox model
with a change point at unknown threshold of a covariate. One paper that is closely related to
ours is Kosorok and Song (2007), who consider for model (1.2) the non-parametric maximum
likelihood (NML) estimation under the assumption that the error term is independent of the
regressors. As will become evident soon, our estimation method for model (1.2) is fundamentally
different fromKosorok and Songs. Specically, we propose for the model a maximum integrated
score estimator (MISE), which has the following two major advantages in comparison with the
existing estimators.
First, to the best of our knowledge, there has been no such study that considers estimation of
(1.2) under conditional quantile restriction. Most existing methods assume that the error term
is independent of the regressors, thus imposing a pure location shift structure in the effect
of regressors on the transformed dependent variable. In contrast, aside from its robustness
properties relative to the mean regression, quantile regression allows for a wide range of
conditional heteroscedasticity of unknown form, thus permitting the researcher to learn a much
richer structure of heterogenous impact of regressors on the transformed dependent variable.
Second, our MISE is able to accommodate both xed and random censoring in the dependent
variables. Notice that Kosorok and Songs (2007) NML estimator can deal with the random
censoring with the censoring variable independent of y
conditional on (x, q). However, one

slight restriction for us is that we require the censoring point be simultaneously independent of y
and (x, q).

Our MISE is motivated by the insight that a transformation model is equivalent to a sequence
of binary choice models under monotonicity of the transformation function. That is, through a
discretization technique, (1.2) can be represented as a sequence of binary choice models and
for each individual binary choice model, we use Horowitzs (1992) smoothed maximum score
method to obtain a consistent estimate of the nite dimensional parameters. The resulting MISE
combines these individual estimators through an integration process to achieve higher efciency
with faster convergence rate. This MISE has been applied by Chen (2010) to a generalized
quantile regression model with no threshold effect. In general, any threshold regression model
distinguishes from others in that the change point is unknown to the researcher. Once the change
point is known or can be consistently estimated, the resulting model is essentially the same
as the linear transformation model as studied by Horowitz (1996) and Ye and Duan (1997).
Therefore, in applying MISE to the current setting, our main theoretical concern will focus
on studying the large sample properties of the estimate of the threshold parameter, that is,
regarding its convergence rate and its limiting distribution. We show in this paper that MISE
has the convergence rate n
1
for the threshold parameter while it has a usual n
1/2
rate for the
remaining parameters. Moreover, it turns out that the estimators for both parts are asymptotically
independent, that is, their limiting distributions are the same as what they would be as if the other
part were known.
On a related note, we build our estimator on Lee and Seo (2008), who propose a maximum
score estimator for a threshold binary response model. Our work is related to theirs since MISE
involves estimating a sequence of threshold binary choice models as an intermediary step, but the
difference between the two models is clear: either one is not a special case of the other. Moreover,
we employ a kernel smoothing technique in the paper in the spirit of Horowitz (1992) while they
do not. The rest of the paper is structured as follows. The estimator is motivated and described in
Section 2. The large sample properties of MISE are established in Section 3. Section 4 discusses
the MISE-based inference. Section 5 reports some Monte Carlo results and Section 6 concludes.
All the technical details are collected in the Appendix.
C
C
2. THE ESTIMATOR
First consider the case with xed censoring, with the cutoff points a
1
and a
2
in (1.3) xed
constants that are known to the researcher. Dene z
d
( ) = zI(q > ), z
d
= z
d
(
0
), X =
(x
, z
d
)
, = (
and X( ) = (x
, z
d
( ))
.
An MISE is motivated by the following observation: under monotonicity of the
transformation function, a transformation model is equivalent to a sequence of binary choice
models. That is, given a (a
1
, a
2
) and letting d
i
(a) = I(y
i
> a), we have the following binary
choice model for each a (a
1
, a
2
) due to the monotonicity of H:
d
i
(a) = I(y
i
> a) = I(y
i
> a) = I(X
0
H(a) > u
i
), i = 1, . . . , n. (2.1)
Under the assumption that the th quantile of u conditional on (x, q) is zero, we have
E(d(a)|x, q) = F
u|xq
(X
0
H(a)|x, q) > 0 X
0
H(a) > 0, (2.2)
where F
u|xq
is the distribution function of u conditional on (x, q). Based on the above
observation, Manskis (1985) maximum score method applies and a consistent estimator for
(
0
,
0
) can be obtained by maximizing
Q
n
(a, , ) =
1
n
n
i=1
(d
i
(a) )I[X
i
( ) H(a) > 0], (2.3)
with respect to and .
Apparently, for the threshold binary choice model (2.1) generated with any given a (a
1
, a
2
),
we know from Lee and Seo (2008) that maximizing the criterion function (2.3) may have already
produced a consistent estimator for the parameters in the model. However, the key point here is
that by xing a, the objective function (2.3) has made use of only a small fraction of information
contained in the original dataset, thus resulting in efciency loss to a great degree. To remedy
this drawback, one may integrate Q
n
(a, , ) over the interval (a
1
, a
2
) and estimate (
0
,
0
) by
maximizing the integrated criterion function
Q
n
(, ) =
_
Q
n
(a, , )(a)da, (2.4)
with a non-negative weighting function (). For example, letting
(a) =
1
a
2
a
1
I(a
1
a a
2
),
amounts to imposing a uniform weighting scheme on the integrated criterion function. However,
two issues remain to be addressed before the MISE dened above can be put into practice.
First, it is well known that Manskis (1985) maximum score objective function is difcult to
analyse since it involves a discontinuous step function and, therefore, not amenable to analysis
through the usual Taylor series approximation. Thus instead of analysing (2.4), one may employ
a smoothed maximum score function in the principle of Horowitz (1992) and consider the
following objective function:
_
1
n
n
i=1
(d
i
(a) )K
_
X
i
( ) H(a)
h
n
_
(a)da,
C
C
254 Z. Zhang
where K(v) =
_
v
k(s)ds is an integrated kernel function and h

n
is a sequence of smoothing
parameters to be detailed later. Second, the MISE dened above is infeasible since it involves
an unknown function H. To circumvent this difculty, we can replace H(a) with H
n
(a, , ) =
arg max
c
Q
n
(a, , , c),
Q
n
(a, , , c) =
1
n
n
i=1
(d
i
(a) )K
_
X
i
( ) c
h
n
_
. (2.5)
The rationale behind this replacement is that H(a) is implicitly determined by (2.1) for each a.
Based on these discussions, our MISE is dened, by a slight abuse of notation, as a solution to
the following maximization problem:
(
n
,
n
) = arg max
,
Q
n
(, ), (2.6)
where and are compact sets that contain the true parameters as the interior, and
Q
n
(, ) =
_
1
n
n
i=1
(d
i
(a) )K
_
X
i
( ) H
n
(a, , )
h
n
_
(a)da,
with H
n
(a, , ) dened in (2.5).
Notice that the original criterion function (e.g. (2.3)) contains two sources of discontinuity:
one comes from the score function and the other from the threshold indicator I(q > ). A
subtle issue one might be concerned with our estimation strategy is why we choose to partially
smooth the criterion function, that is, leaving the threshold indicator unsmoothed instead of
smoothing both or leaving both unsmoothed. There are several reasons that encourage us to do
like this. First, like many other semi-parametric estimation procedures, proling out the innite
dimensional nuisance parameter H in our model to recover the nite dimensional parameters of
central interest essentially involves non-parametric estimation based on some local smoothing
technique, suggesting that smoothing is necessary for our method. Second, as will become
evident later, our MISE by maximizing a partially smoothed criterion function has no negative
effect on the convergence rate as well as the limiting distribution of the resulting estimator.
Instead, double smoothing forces us to deal with two kernel functions and two sequences of
smoothing parameters, that complicates the matter to a great degree. Even worse, it seems much
difcult for us to derive the limiting distribution under double smoothing. Third, even with partial
smoothing, our MISE is shown to have a convergence rate n
1
, thus being likely the fastest
convergence rate it may achieve. On the other hand, smoothing technique is not used for free.
The more smoothing is used, the more stringent smoothness conditions we must impose on the
data generating process, even the additional smoothing can not bring us a faster convergence rate.
In practice, MISE can be computed via two steps. In the rst stage, for xed , obtain
n
( ) = arg max
n
(, ) with some scale normalization restriction on the coefcients. In
the second stage, let Q
n
( ) = Q
n
(,
n
( )) and search over a grid of for
n
that maximize
Q
n
( ). Then the MISE for can be dened as
n
=
n
(
n
).
The MISE proposed above can be readily extended to the occasion with random censoring,
without loss of generality, let us assume that the observations consist of a random sample of
(y, t, x, q), where
y = min(y
, C), t = I(y C), (2.7)

C
C
the latent dependent variable y
is generated by (1.2) and the random variable C is assumed to

be independent of (y
, x, q). Typically, we assume that y
[0, +) and C [0, +). With

random censoring, the reasoning similar to (2.2) gives that for each a (0, +)
E (d(a)|x, q) = E (I(y > a)|x, q) = E (I (min(y
, C) > a) |x, q)
= G(a)E (I (y
> a) |x, q) ,
(2.8)
where G() = 1 F
C
(), F
C
() is the distribution function of C. The observation (2.8), together
with the preceding procedure dened for the xed censoring case, suggests that the MISE for
(, ) with random censoring be dened as
(
rn
,
rn
) = arg max
,,
Q
rn
(, ), (2.9)
where
Q
rn
(, ) =
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)

_
K
_
X
i
( ) H
rn
(a, , )
h
n
_
(a)da, (2.10)
G
n
() is the KaplanMeier estimator for the survival function G(),
H
rn
(a, , ) = arg max
c
Q
r,n
(a, , , c),
with
Q
rn
(a, , , c) =
1
n
n
i=1
_
d
i
(a)
G
n
(a)

_
K
_
X
i
( ) c
h
n
_
. (2.11)
3. LARGE SAMPLE PROPERTIES
This section is organized as follows. Section 3.1 introduces the regularity conditions needed for
asymptotic analysis of MISE with xed censoring and Section 3.2 presents the large sample
properties for this estimator. Extension to the case of random censoring is summarized in
Section 3.3.
3.1. Assumptions
It is well known (e.g. Horowitz, 1996) that for the models subject to unknown transformation,
0
can be identied only up to scale and x should contain at least one component whose probability
distribution conditional on the remaining components is absolutely continuous with respect to
Lebesgue measure. Thus we arrange the components of x = (x
1
, x
2
)
so that its rst component

x
1
, satises this condition. For scale normalization, let
0
= (
10
,
20
)
, with |
10
| = 1 and
20
being a (p
x
1)-dimensional coefcient vector for x
2
. To analyse the MISE formally, let us
introduce the following assumptions:
ASSUMPTION 3.1. (a)
_
y
i
, x
i
, q
i
_
, i = 1, . . . , n is a random sample of (y, x, q) generated by
(1.2) and (1.3); (b) H() is a strictly increasing function.
ASSUMPTION 3.2. (a) The support of x is not contained in any proper linear subspace of
R
p
x
under the distribution of xconditional on q for almost every q; (b) the distribution of x
1
C
C
256 Z. Zhang
conditional on x
2
and q has a positive density almost everywhere with respect to Lebesgue
measure; (c) q is continuously distributed with support containing
0
.
ASSUMPTION 3.3. (a) |
10
| = 1; (b)
0
= 0; (c) Pr(z
0
= 0|q =
0
) > 0; (d) is a compact
set on the real line that contains
0
as the interior; = {1, 1}
2
is a compact set of R
p
x
+p
z
that contains
0
= (
0
,
0
)
as the interior. Moreover, G is a compact set on the real line with

H(y) as its interior point for any y with (y) > 0.
ASSUMPTION 3.4. The distribution of u conditional on (x, q) is absolutely continuous with
respect to Lebesgue measure and the corresponding conditional density is continuous and
positive almost everywhere. In addition, the -th quantile of u conditional on (x, q) is zero.
Assumptions 3.1 describes the data generation process. Assumptions 3.2(a) and (b) are
standard for identication of
0
in a transformation model (e.g. Horowitz, 1996) while
Assumption 3.2(c) is made for identication of
0
in a threshold regression model. Assumption
3.3(a) is a normalization restriction. Assumption 3.3(b) and (c) are often maintained for threshold
model to exclude some irregular cases. If
0
= 0, then
0
is not identied. Assumption 3.3(c) says
that the regression is essentially discontinuous, which contrasts with Hansens (2000) small
threshold effect context. If the threshold effect asymptotically vanishes, then we would have
slower rate of convergence and different limiting distribution for the estimate of the threshold
parameter. For example, Hansen (2000) assumes that
0
= O(n
) with > 0, thus

0
0
as n . Consequently, the estimator for
0
converges at a rate n
21
, slower than n
1
.
Also see Seo and Linton (2007, section 5) for a related discussion. Assumption 3.3(d) is a
regular condition made for most non-linear models. Assumption 3.4 imposes conditional quantile
restriction on the error term. It allows for an arbitrary form of dependence between u and (x, q)
as long as the conditional quantile independence is satised. In addition to Assumptions 3.13.4,
kernel smoothing technique involved in our estimator calls for additional assumptions regarding
choice of the kernel function and the bandwidth sequence as well as stronger smoothness on the
data generating process.
ASSUMPTION 3.5. f
u|xq
(u|x, q), the probability density function of u conditional on x and q is
times continuously differentiable with respect to u and x
1
, andf
x
1
|x
2
q
(x
1
|x
2
, q), the probability
density function of x
1
conditional on (x
2
, q) is times continuous differentiable with respect
to x
1
with all the derivatives uniformly bounded. f
u|xq
(0|x, q) is uniformly bounded away from
zero for any (x, q). The link function H() is times continuous differentiable. In addition, the
weighting function () is non-negative, uniformly bounded with a bounded support.
ASSUMPTION 3.6. (a) The kernel function k() is continuously differentiable, satisfying
_
k(v)dv = 1,
_
v
r
k(v)dv = 0 for 0 < r < ; (b) h
n
= o
p
(n
1/2
),
1n
2n
= o
p
(n
1/2
), where
1n
= h
n
+(ln n/nh
n
)
1/2
and
2n
= h
n
+(ln n/nh
3
n
)
1/2
.
Denote
Q
(, ) =
_
E[(I(y > a) )I(X
( ) H(a, , ) > 0)](a)da, (3.1)

where H(a, , ) = arg max
cG
Q(a, , , c) with
Q(a, , , c) = E[(I(y > a) )I(X
( ) c > 0)]. (3.2)

More notations are introduced below. If z contains x
1
, let z
2
denote the other remaining
components in z than x
1
. Then partition into (
1
,
2
)
so that
1
is the coefcient on z
d
1
( ) =
C
C
x
1
I(q > ) and
2
is the remaining (p
z
1)-dimensional coefcients. Otherwise, z
2
z,
2
=
and
1
0. Let
10
(q) = 1 +
10
I(q >
0
),
2
= (
2
,
,
20
= (
20
,
0
)
, X
2
= (x
2
, z
d
2
)
.
Dene the Hessian type matrix
=

2
Q
(, )
=
0
, =
0
=
_ _
xx
(a)
oo
(a)
H(a,
0
,
0
)
H(a,
0
,
0
)
_
(a)da,
(3.3)
where
xx
(a) = E
_
1
10
(q)
f
ux
1
|x
2
q
_
0,
H(a) X
20
10
(q)
x
2
, q
_
X
2
X
2
_
, (3.4)
oo
(a) = E
_
1
10
(q)
f
ux
1
|x
2
q
_
0,
H(a) X
20
10
(q)
x
2
, q
__
. (3.5)
ASSUMPTION 3.7. The matrix is positive denite.
Assumption 3.5 contains a set of smoothness conditions in parallel with Chens (2010)
Assumption 5. It imposes the boundedness and smoothness on the probability density of
two continuous random components, namely, u and x
1
conditional on the other components.
Assumption 3.6 places restrictions on the kernel function K() and a related sequence of
bandwidths, which parallels Chens (2010) Assumptions 6 and 7. In particular, it requires
nh
4
n
/ ln
2
n and nh
2
n
0, which further requires that > 2. Assumption 3.7 is analogous
to the condition that the information matrix of a maximum likelihood estimator is negative
denite.
3.2. Limiting distributions of MISE
We begin this part with the consistency result.
THEOREM 3.1. Under Assumptions 3.13.4, (
n
,
n
) as dened by (2.6) is consistent with
1n
=
10
with probability approaching one as n .
In parallel with the asymptotic results obtained by recent literature on generalized threshold
regression models (Pons, 2003, Kosorok and Song, 2007), MISE for the threshold parameter is
shown to have an irregular non-normal limiting distribution and the estimator for the remaining
parameters has a standard normal distribution. To formally describe the asymptotic behaviour, let
us dene the following right-continuous jump process. Let f
q
() denote the probability density
function of q. Let
+
and
denote two independent jump processes on R such that

+
(s)
is a Poisson random variable with parameter sf
q
(
0
) for s > 0 and
+
(s) = 0 for s 0;
(s)
is a Poisson random variable with parameter sf
q
(
0
) for s < 0 and
(s) = 0 for s 0. In
addition, let {
+
l
: l = 0, 1, 2, . . .} and {
l
: l = 0, 1, 2, . . .} be two independent sequences of
i.i.d. random variables with characteristic functions
+
(t )
= E
_
exp
1t
_
(d(a) )[I(x
0
> H(a)) I(x
0
+z
0
> H(a))](a)da
q =
+
0
_
C
C
258 Z. Zhang
and
(t )
= E
_
exp
1t
_
(d(a) )[I(x
0
+z
0
> H(a)) I(x
0
> H(a))](a)da
q =
0
_
,
respectively, with E(|q =
+
0
) = lim
v0
+ E(|q =
0
+v), E(|q =
0
) = lim
v0
+ E(|q =
0
v) and
+
0
=
0
= 0. Denote Q(s) = Q
+
(s)I(s > 0) +Q
(s)I(s < 0) with

Q
+
(s) =
0l
+
(s)
+
l
and Q
(s) =
0l
(s)
l
.
The following proposition establishes the limiting distribution of MISE with xed censoring:
THEOREM 3.2. Under Assumptions 3.13.7, we have as n ,
n (
n
0
)
d
inf {
= arg max
Q()}
and
n(
2n
20
)
d
N(0,
1
V
1
),
with V = E
i
i
,
i
=
_
X
2i

xo
(H
1
(X
0
))
oo
(H
1
(X
0
))
_
(I(u
i
< 0) )
(H
1
(X
0
))
H
(H
1
(X
0
))
,
where H
is the rst derivative of H,

oo
(a) is dened by (3.5) and
xo
(a) = E
_
1
10
f
ux
1
|x
2
q
_
0,
H(a) X
20
10
x
2
, q
_
X
2
_
.
In addition,
n
and
2n
are asymptotically independent.
Statistical inference based on MISE will be discussed in Section 4. From Theorem 3.2, we
know that MISE for
n
converges faster than the remaining parameters and has a non-standard
limiting distribution, depending on nuisance parameters in complex manners. Such asymptotic
behaviour of
n
is quite similar with the results obtained by recent literature on generalized
threshold models, for example, Pons (2003) and Kosorok and Song (2007). The implication of
asymptotic independence between
n
and
2n
is two fold. First, the limiting distribution of

2n
can be derived as if
0
were known, which has already been exploited in our proof. Second,
asymptotic independence indicates that once a consistent estimator for
0
is available, inference
about

2n
could be essentially identical to that for a transformation model with no threshold
effect.
3.3. Random censoring case
Large sample properties of MISE for random censoring can be established similarly to that for
the xed censoring case. To present it formally, let us make an additional assumption.
C
C
ASSUMPTION 3.8. (y
i
, t
i
, x
i
, q
i
), i = 1, . . . , n is a random sample of (y, t, x, q) generated by
(1.2) and (2.7).H() is a strictly increasing function. The censoring variable C is independent of
(y
, x, q) and continuously distributed with a positive density on an interval containing y with

(y) > 0.
Denote Q
r
(s) = Q
+
r
(s)I(s > 0) +Q
r
(s)I(s < 0), where
Q
+
r
(s) =
0l
+
(s)
+
rl
and Q
r
(s) =
0l
(s)
rl
,
{
+
rl
: l = 0, 1, 2, . . .} and {
rl
: l = 0, 1, 2, . . .} are two independent sequences of i.i.d. random
variables with characteristic functions
+
r
(t ) = E
_
exp
1t
_ _
d(a)
G(a)

_
[I(x
0
> H(a)) I(x
0
+z
0
> H(a))](a)da
q =
+
0
_
and
r
(t ) = E
_
exp
1t
_ _
d(a)
G(a)

_
[I(x
0
+z
0
> H(a)) I(x
0
> H(a))](a)da
q =
0
_
,
respectively.
THEOREM 3.3. Under Assumptions 3.23.8, (
rn
,
rn
) as dened by (2.9) is consistent with
1,rn
=
10
with probability approaching one as n . As n , we have
n (
rn
0
)
d
inf
_
= arg max
Q
r
()
_
and
n(
2,rn
20
)
d
N(0,
1
V
r
1
),
where V
r
= E
ri
ri
,
ri
=
1,ri
+
2,ri
,
1,ri
=
_
X
2i

xo
(H
1
(X
0
))
oo
(H
1
(X
0
)
__
I(H(y
i
) > X
0
)
G(H
1
(X
0
))

_
(H
1
(X
0
))
H
(H
1
(X
0
))
,
and
2,ri
=
_

rx
(v)
(v)
dM
i
(v), with (v) = Pr(y v), M
i
(v) = I(y
i
v, t
i
= 0)
_
v
0
I(y
i

s)d(s), () is the cumulative hazard function of C,
rx
(a) = E
_
1
10
_
X
2

xo
(a)
oo
(a)
_
f
x
1
|x
2
q
_
H(a) X
20
10
x
2
, q
__
,
C
C
260 Z. Zhang
rx
(v) =
_

rx
(a)
G(a)
I(a v)(a)da.
In addition,
rn
and
2,rn
are asymptotically independent.
Statistical inference based on (
2,rn
,
rn
) will be discussed in Section 4. In comparison with
the xed censoring case, there is an extra term
2,ri
in the asymptotic linear representation of the
estimator, due to the presence of KaplanMeier estimator G
n
() in the criterion function (2.10).
4. INFERENCE
4.1. Inference on
According to Theorems 3.2 and 3.3, the asymptotic distribution for
n
is highly non-standard and
cannot be tabulated as it depends on nuisance parameters in complex manners. Similar issues
are also with, for example, Lee and Seo (2008) and Kosorok and Song (2007). Even so, two
possible methods can be used to carry out large sample inference. First, subsampling provides
a consistent inferential method for the asymptotic distribution of
n
as in Gonzalo and Wolf
(2005). Condence intervals can be constructed following the standard subsampling procedure,
see for example, Politis et al. (1999). The second inferential approach seems to be more practical
but a little informal. The basic motivation underlying this approach is that such irregular limiting
distribution for
n
can be largely attributed to the discontinuity of the threshold indicator I(q >
0
). To restore the regularity we can replace I(q > ) with K
2
((q )/h
2n
) where K
2
is another
integrated kernel function and h
2n
is another sequence of bandwidths. This idea is not new. It has
been used by Seo and Linton (2007) to develop a smoothed LS estimator for model (1.1) within
the mean regression framework. Although this method enables the standard normal inference on
n
, it might cause the estimator to have a slower convergence rate.
4.2. Inference on regular parameters
In comparison with
n
, inference on
n
is fairly standard given its
n-consistency and asymptotic

normal distribution. Moreover, since
n
and
n
are asymptotically independent,
n
has the same
limiting distribution as to what it would converge weakly as if
0
were known, suggesting that
inferential procedure concerning

n
would be essentially identical to that for a transformation
model with a change point known to the researcher.
To be specic, rst consider the case with xed censoring. It follows from the proof of
Theorem 3.2 that can be consistently estimated by
n
=

2
Q
n
(
n
,
n
)
2
=
1
nh
2
n
_
n
i=1
(d
i
(a) )k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
_
_
X
2i
(
n
)X
2i
(
n
)
H
n
(a,
n
,
n
)
2
H
n
(a,
n
,
n
)
2
_
(a)da,
where X
2
( ) = (x
2
, z
d
2
( ))
, H
n
(a,
n
,
n
) is determined by
1
nh
n
n
i=1
(d
i
(a) )k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
_
= 0,
C
C
and
H
n
(a, , )
2
=
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( ) H
n
(a, , )
h
n
_
_
1
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( ) H
n
(a, , )
h
n
_
X
2i
( )
_
.
For the estimation of V, dene
i
=
_
(d
i
(a) )
1
h
n
k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
__
X
2i
(
n
)
xo
(a)
oo
(a)
_
(a)da,
where
xo
(a) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
_
X
2i
(
n
),
oo
(a) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
_
.
Following the arguments in Powell et al. (1989), we can show that
n
i=1
i

i
2
= o
p
(1) and
consequently

V
n
=
1
n
n
i=1
will be a consistent estimator for V.

Next consider the case with random censoring. It follows from the proof of Theorem 3.3 that
can be consistently estimated by
rn
=

2
Q
rn
(
rn
,
rn
)
2
=
1
nh
2
n
_
n
i=1
_
d
i
(a)
G
n
(a)

_
k
_
X
i
(
rn
)
rn
H
rn
(a,
rn
,
rn
)
h
n
_
_
X
2i
(
rn
)X
2i
(
rn
)
H
rn
(a,
rn
,
rn
)
2
H
rn
(a,
rn
,
rn
)
2
_
(a)da,
where H
rn
(a,
n
,
n
) is determined by
1
nh
n
n
i=1
_
d
i
(a)
G
n
(a)

_
k
_
X
i
(
rn
)
rn

H
rn
(a,
rn
,
rn
)
h
n
_
= 0,
and
H
rn
(a, , )
2
=
_
1
nh
2
n
n
i=1
_
d
i
(a)
G
n
(a)

_
k
_
X
i
( ) H
rn
(a, , )
h
n
_
_
1
_
1
nh
2
n
n
i=1
_
d
i
(a)
G
n
(a)

_
k
_
X
i
( ) H
rn
(a, , )
h
n
_
X
2i
( )
_
.
For the estimation ofV
r
, dene
ri
=
1,ri
+
2,ri
,
1,ri
=
_ _
d
i
(a)
G
n
(a)

_
1
h
n
k
_
X
i
(
rn
)
n
H
rn
(a,
rn
,
rn
)
h
n
__
X
2i
(
n
)
r,xo
(a)
r,oo
(a)
_
(a)da,
C
C
262 Z. Zhang
2,ri
=
_

rx
(v)
(v)
d

M
i
(v),
where
r,xo
(a) =
1
nh
2
n
n
i=1
_
d
i
(a)
G
n
(a)

_
k
_
X
i
(
rn
)
rn
H
rn
(a,
rn
,
rn
)
h
n
_
X
2i
(
rn
),
r,oo
(a) =
1
nh
2
n
n
i=1
_
d
i
(a)
G
n
(a)

_
k
_
X
i
(
rn
)
rn
H
rn
(a,
rn
,
rn
)
h
n
_
,
rx
(a) =
1
nh
n
n
i=1
d
i
(a)k
_
X
i
(
rn
)
rn
H
rn
(a,
rn
,
rn
)
h
n
__
X
2i
(
n
)
r,xo
(a)
r,oo
(a)
_
,
rx
(a) =
_

rx
(a)
G
n
(a)
I(a v)(a)da,
(v) = n
1
n
i=1
I(y
i
v),

M
i
(v) = I(y
i
v, t
i
= 0)
_
v
0
I(y
i
s)d
n
(s),
n
() is the
Nelson estimator for the cumulative hazard function of C. Following the arguments in
Powell et al. (1989), we can show that

n
i=1
ri

ri
2
= o
p
(1) and consequently

V
rn
=
1
n
n
i=1
ri
ri
will be a consistent estimator for V

r
.
4.3. Testing for threshold effect
To apply our method, it is important to determine whether the threshold effect is statistically
signicant. We consider a test of no threshold effect against the presence of threshold effects.
The null and alternative hypotheses are that
H
0
:
0
= 0 for any
0
vs. H
1
:
0
= 0 for some
0
.
Notice that all the unknown parameters in (1.2) are identiable under the alternative
hypothesis while the threshold parameter
0
is not identied under the null. This is the so-
called Davies problem (Davies, 1977). Recently, Lee et al. (2011) develop a general method
for testing the threshold effects in various regression models, using the sup-likelihood-ratio (sup-
LR) statistics. A key ingredient of their testing framework is that there exist an objective function
and a corresponding extreme estimator for the model with no threshold (under the null) and for
the model with threshold effect (under the alternative). Their sup-LR statistic is then constructed
based on the difference between the maximum values of the objective functions under the null
and the alternative hypotheses. Although the illustrative examples in Lee et al. do not include
our generalized threshold regression model, it is straightforward to see that their method applies
to our model since the MISE belongs to the class of extremum estimate, that is, obtained by
maximizing a criterion function.
Without loss of generality, we only detail the testing procedure for the xed censoring case.
A test for the random censoring case can be constructed similarly. Based on the criterion function
Q
n
(, ) dened below (2.6), dene
Q
n
( ) = Q
n
(,
n
( )),
n
( ) = arg max
n
(, ),
C
C
and
n
= max
B,=0
Q
n
(, ), (4.1)
where B is a compact set containing
0
as the interior. Notice that (3.6) is well dened since
Q
n
(, ) does not depend on when = 0. Then the sup-LR statistic is dened as
QLR
n
= sup

n(Q
n
( )

Q
n
).
The limiting distribution of QLR
n
under the null is highly non-standard and non-normal, thus
cannot be tabulated. Instead, we can carry out the following steps to simulate the null distribution:
STEP 1. Generate i.i.d. random variables v
ij
s for i = 1, . . . , n and j = 1, . . . , J from the
uniform distribution on [0, 1] for a sufciently large J.
STEP 2. Simulate the following unrestricted and restricted score functions, respectively:
G
j
( ) =
1
nh
n
n
i=1
_
[I(X
i
( )
n
( ) H
n
(a, ,
n
( )) > v
ij
) ]
k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
__
X
2i
( )

xo
(a, )

oo
(a, )
_
(a)da,
and
G
1,j
=
1
nh
n
n
i=1
_
[I(x
n
H
n
(a,
n
) > v
ij
) ]k
_
x
n
H
n
(a,
n
)
h
n
__
x
2i

xo
(a)

oo
(a)
_
(a)da,
where

n
( ) = arg max
n
(, ),

n
= arg max
=0,B
Q
n
(, ),
3
H
n
(a, ,
n
( )) is
determined by
1
nh
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
= 0,
H
n
(a,
n
) is determined by
1
nh
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
= 0,

xo
(a, ) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
X
2i
( ),

oo
(a, ) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
,
3
Notice that Q
n
(, ) depends only on when = 0.
C
C
264 Z. Zhang

xo
(a) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
x
2i
and

oo
(a) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
.
STEP 3. Further dene
( ) =
1
nh
2
n
_
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
_
X
2i
( )X
2i
( )
H
n
(a, ,
n
( ))
2
H
n
(a, ,
n
( ))
2
_
(a)da,
and
1
=
1
nh
2
n
_
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
__
x
2i
x
2i

H
n
(a,
n
)
2
H
n
(a,
n
)
2
_
(a)da,
where
H
n
(a, ,
n
( ))
2
=
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
_
1
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
X
2i
( )
_
and
H
n
(a,
n
)
2
=
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
_
1
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
x
2i
_
.
STEP 4. Finally, the simulated null distribution is readily derived from {D
j
}
j=1,...,J
, with
D
j
= sup
1
2
[G
j
( )
1
( )G
j
( ) G
1,j
1
1
G
1,j
].
5. MONTE CARLO SIMULATION
In this section, we conduct a small-scale Monte Carlo experiment to evaluate the nite sample
performance of the suggested MISE for a generalized threshold regression model. Alarger Monte
C
C
Table 1. Simulation results for xed censoring.
n = 200
= 0.6 = 2
= 0.2 = 0.4 = 0.6 = 0.2 = 0.4 = 0.6
Mean 0.4782 0.4892 0.4825 0.4913 0.4838 0.4920
SD 0.0943 0.0831 0.0857 0.0683 0.0653 0.0662
Mean 1.6438 1.2017 0.7063 1.6853 1.2134 0.7182
SD 0.1662 0.1482 0.1505 0.1453 0.1384 0.1425
Mean 1.1134 1.1483 1.1653 1.1337 1.1597 1.1511
SD 0.2328 0.2136 0.2194 0.2014 0.1878 0.1907
n = 400
= 0.6 = 2
= 0.2 = 0.4 = 0.6 = 0.2 = 0.4 = 0.6
Mean 0.4827 0.4936 0.4878 0.4926 0.4885 0.4852
SD 0.0496 0.0423 0.0448 0.0362 0.0349 0.0351
Mean 1.7280 1.2236 0.7349 1.7032 1.2393 0.7324
SD 0.1223 0.1197 0.1172 0.1134 0.1051 0.1114
Mean 1.1253 1.1062 1.1339 1.0982 1.1121 1.0942
SD 0.1743 0.1610 0.1651 0.1524 0.1428 0.1448
Carlo study relating to a wider set of experiments than those described below is left for future
research. The data generating process considered throughout the simulation is given by
H(y
, ) = x
1
0
+x
2
+x
1
0
I(q >
0
) x
1
,
where H(y, ) =
|y|
sgn(y)1
, a modied BoxCox transformation, x

1i
s are i.i.d. drawn from the
chi-square distribution with one degree of freedom; x
2i
= (x
1i
1 +
i
)/2 with
i
s i.i.d. drawn
from the standard normal distribution; q
i
s are i.i.d. drawn from the uniform distribution on [0, 1]
and
i
s are i.i.d. drawn from a standard normal distribution, independent from (x
1
, x
2
, q). Set
0
= 0.5,
0
=
0
= 1. As the coefcient on x
2
is normalized to one, we are only considering the
estimation of
0
,
0
and
0
. Since the model contains conditional heteroscedasticity associated
with x
1
, only the coefcient on x
1
is expected to vary with different quantile indices.
First consider the case with xed censoring. We let y = min(y
, c), where c is the 0.7-th

quantile of y
, resulting in about 30% censoring. Consistent with Chen (2010), we nd that the
MISE is insensitive to the choice of both bandwidth h and weighting function (). Thus only
the results with h = 0.3 for n = 400 and h = 0.4 for n = 200 are reported in the table. Other
choices of h give similar estimation results. Throughout the simulation, we use the following
fourth-order integrated kernel function (Muller, 1984):
K(v) =
_
1
2
+
105
64
_
v
5
3
v
3
+
7
5
v
5
3
7
v
7
__
I(|v| 1) +I(v > 1)
and choose (c) = I(c c c), where c and c are 0.05-th and 0.65-th quantile of y,
respectively. The MISE is computed by a two-step algorithm based on a grid searching between
C
C
266 Z. Zhang
Table 2. Simulation results for random censoring.
n = 200
= 0.6 = 2
= 0.2 = 0.4 = 0.6 = 0.2 = 0.4 = 0.6
Mean 0.4743 0.4793 0.4838 0.4832 0.4892 0.4847
SD 0.1146 0.1027 0.1041 0.0834 0.0880 0.0867
Mean 1.6647 1.2163 0.7134 1.6935 1.2196 0.7252
SD 0.1835 0.1636 0.1670 0.1604 0.1581 0.1576
Mean 1.1456 1.1642 1.1552 1.1481 1.1623 1.1684
SD 0.2453 0.2274 0.2262 0.2185 0.2092 0.2075
n = 400
= 0.6 = 2
= 0.2 = 0.4 = 0.6 = 0.2 = 0.4 = 0.6
Mean 0.4774 0.4853 0.4879 0.4884 0.4902 0.4924
SD 0.0553 0.0570 0.0569 0.0472 0.0451 0.0436
Mean 1.7083 1.2242 0.7358 1.7165 1.2323 0.7357
SD 0.1384 0.1352 0.1348 0.1250 0.1236 0.1228
Mean 1.0782 1.0848 1.0773 1.0647 1.0934 1.0925
SD 0.1854 0.1796 0.1769 0.1687 0.1562 0.1532
the 0.15th and 0.85th empirical quantiles of q
i
s. We consider {0.6, 2}, n = {200, 400} and
= {0.2, 0.4, 0.6} with 1000 replications for each case. We report the empirical mean and
empirical standard deviation for each estimator.
Table 1 summarizes the simulation results and there are several main ndings. First, the
estimates of are essentially unbiased. They give desirable precision and their empirical standard
deviations decline fast with the sample size. Moreover, the decline rate seems consistent with the
n-asymptotics. Second, the estimates of
1
do change with the quantile indices. The estimates of
2
usually have a larger bias than
2
but the bias seems to decline with the sample size. Also, their
empirical standard deviations decline with the sample size and the magnitude of such decline is
generally consistent with the
n-asymptotics.
We also consider the case with random censoring. We adopt the same design except that
the observations consist of (y, t, x, q), y = min(y
, C), where C is i.i.d. drawn from a uniform

distribution on [c, c], c and c are the median and 0.98-th empirical quantiles of y
, respectively,
resulting in about 25% censoring on the dependent variable. Again, we report the empirical
mean and standard deviation for each case. The results are summarized in Table 2. Overall, the
estimators perform well and behave in a pattern similar to the xed censoring case.
6. CONCLUSION
In this paper, we consider the maximum integrated score estimation of a generalized threshold
regression model with the dependent variable subject to both unknown monotone transformation
C
C
and some type of censoring. As one major advantage of our MISE, it allows us to deal with
the model under conditional quantile restriction, thus permitting certain robustness against
conditional heteroscedasticity of arbitrary form. Large sample properties of the proposed
estimator are formally established. The estimator for the threshold parameter is shown to
have a convergence rate n
1
, weakly converge to a non-normal distribution and asymptotically
independent of the remaining parameters. All these ndings are consistent with the theoretical
results obtained by the existing literature on threshold regression model. MISE-based inference is
discussed in detail. Simulation results indicate that our estimator performs well in nite samples.
ACKNOWLEDGMENTS
I am grateful to the co-editor Oliver Linton and three anonymous referees for their constructive
comments and suggestions. I am also grateful to Pingfang Zhu for many useful discussions. The
research is supported by the National Science Foundation (Grant No. 71271139).
REFERENCES
Andersen, P. K., O. Borgan, R. D. Gill and N. Keiding (1993). Statistical Models Based on Counting
Processes. New York, NY: Springer.
Asparouhova, E., R. Golanski, K. Kasprzyk, R. P. Sherman and T. Asparouhov (2002). Rank estimators for
a transformation model. Econometric Theory 18, 1099120.
Caner, M. and B. E. Hansen (2004). Instrumental variable estimation of a threshold model. Econometric
Theory 20, 81343.
Chan, K. S. (1993). Consistency and limiting distribution of the least squares estimator of a threshold
autoregressive model. Annals of Statistics 21, 52033.
Chen, S. (2010). An integrated maximum score estimator for a generalized censored quantile regression
model. Journal of Econometrics 155, 908.
Cho, J. S. and H. White (2007). Testing for regime switching. Econometrica 75, 1671720.
Collier, P. and J. W. Gunning (1999). Explaining African economic performance. Journal of Economic
Literature 37, 64111.
Davies, R. B. (1977). Hypothesis testing when a nuisance parameter is present only under the alternative.
Biometrika 64, 24754.
Delgado, M. A. and J. Hidalgo (2000). Nonparametric inference on structural breaks. Journal of
Econometrics 96, 11344.
Fazzari, S. M., R. Glenn Hubbard and B. C. Petersen (1988). Financing constraints and corporate
investment. Brookings Papers on Economic Activity 19, 14195.
Fleming, T. R. and D. P. Harrington (1991). Counting Processes and Survival Analysis. New York, NY:
John Wiley.
Gauthier, B. and M. Gersowitz (1997). Revenue erosion through exemption and evasion in Cameroon.
Journal of Public Economics 64, 40724.
Gonzalo, J. and M. Wolf (2005). Subsampling inference in threshold autoregressive models. Journal of
Hansen, B. E. (1999). Threshold effects in non-dynamic panels: estimation, testing and inference. Journal
of Econometrics 93, 34568.
C
C
268 Z. Zhang
Hansen, B. E. (2000). Sample splitting and threshold estimation. Econometrica 68, 575603.
Horowitz, J. L. (1992). A smoothed maximum score estimator for the binary response model. Econometrica
60, 50531.
Horowitz, J. L. (1996). Semiparametric estimation of a regression model with an unknown transformation
of the dependent variable. Econometrica 64, 10337.
Jespersen, N. C. B. (1986). Dichotomizing a continuous covariate in the Cox model. Research Report 86/02,
Statistical Research Unit, University of Copenhagen.
Kosorok, M. R. and R. Song (2007). Inference under right censoring for transformation models with a
change-point based on a covariate threshold. Annals of Statistics 35, 95789.
Lancaster, T. (1990). The Econometric Analysis of Transition Data. Cambridge: Cambridge University
Press.
Lee, S. and M. Seo (2008). Semi-parametric estimation of a binary response model with a change-point due
to a covariate threshold. Journal of Econometrics 144, 49299.
Lee, S., M. Seo and Y. Shin (2011). Testing for threshold effects in regression models. Journal of the
American Statistical Association 106, 22031.
Liang, K., S. Self and X. Liu (1990). The Cox proportional hazards model with change point: an
epidemiologic application. Biometrics 46, 78393.
Luo, X., B. Turnbull and L. Clark (1997). Likelihood ratio tests for a change point with survival data.
Biometrika 84, 55565.
Manski, C. F. (1985). Semiparametric analysis of discrete response: Asymptotic properties of the maximum
score estimator. Journal of Econometrics 27, 31333.
Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. In R. F. Engle and
D. McFadden (Eds.), Handbook of Econometrics Volume IV, 2111245. Amsterdam: North Holland.
Pakes, A. and D. Pollard (1989). Simulation and the asymptotics of optimization estimators. Econometrica
57, 102757.
Politis, D., J. Romano and M. Wolf (1999). Subsampling. New York, NY: Springer.
Pons, O. (2003). Estimation in a Cox regression model with a change-point according to a threshold in a
covariate. Annals of Statistics 31, 44263.
Powell, J. L., J. H. Stock and T. M. Stoker (1989). Semiparametric estimation of weighted average
derivatives. Econometrica 57, 140330.
Seo, M. and O. Linton (2007). A smoothed least squares estimator for the threshold regression. Journal of
Van den Berg, G. J. (2001). Duration models: Specication, identication and multiple durations. In J. J.
Heckman and E. Leamer (Eds.), Handbook of Econometrics Volume 5, 33813460. Amsterdam: North-
Holland.
Van der Vaart, A. and J. Wellner (1996). Weak Convergence and Empirical Process. New York, NY:
Springer.
Ye, J. and N. Duan (1997). Nonparametric n
1/2
-consistent estimation for the general transformation
models. Annals of Statistics 25, 2682717.
APPENDIX A: LEMMAS
LEMMA A.1. Let H
n
(a, , ) and H (a, , ) be dened by (2.5) and (3.2), respectively, in the text. Under
Assumptions 3.13.6, we have
H
n
(a, , ) H(a, , ) = O
p
(
1n
)
C
C
and
H
n
(a, , )
H(a, , )
= O
p
(
2n
),
uniformly over a (a
1
, a
2
) and (, ) {1, 1}
2
. In addition,
H
n
(a, , ) H(a, , ) =
1
(a, , )
1
n
n
i=1
(d
i
(a) )k
_
X
i
( ) H(a, , )
h
n
_
+o
p
(n
1/2
)
uniformly over a (a
1
, a
2
) and (, ) {1, 1}
2
, where
(a, , ) = E
_
x
2
_
20

10
(q)
1
(q, ,
1
)
2
_
+z
2
_
I(q >
0
)
20

10
(q)
1
(q, ,
1
)
I(q > )
2
_
_
H(a)

10
(q)
1
(q, ,
1
)
H(a, , )
_
,
H(a, , ) X
2
( )
2
1
(q, ,
1
)
, x
2
, q
_
with
10
(q) = 1 +
10
I(q >
0
),
1
(q, ,
1
) = 1 +
1
I(q > ) and
(s
1
, s
2
, x
2
, q) =
1
1
(q, ,
1
)
d
dv
F
u|xq

10
(q)v
1
(q, ,
1
)
+s
1
1
(q, ,
1
)
+s
2
, x
2
, q
f
x
1
|x
2
q
1
(q, ,
1
)
+s
2
x
2
, q
v=0
.
Proof: The proof is essentially the same as Chens (2010) Lemma A.1.
LEMMA A.2. For any random variables (v, q) satisfying E[v|q] = 0 and E[v
2
|q] < almost surely,
assume that (v
i
, q
i
), i = 1, . . . , n is a random sample of (v, q) and that q is continuously distributed and
has a bounded, continuous, positive density in a neighbourhood of r
0
R. Then for eachA > 0 and > 0,
there exists a positive constant B < such that for all 0 < < 1 and for all n > B/
Pr
_
sup
B/n<r<
1
n
n
i=1
I(r
0
< q
i
< r
0
+r)
Pr(r
0
< q
i
< r
0
+r)
1
> A
_
<
and
Pr
_
sup
B/n<r<
1
n
n
i=1
v
i

I(r
0
< q
i
< r
0
+r)
Pr(r
0
< q
i
< r
0
+r)
> A
_
< .
Proof: See Lee and Seos (2008) Lemma A.1.
LEMMA A.3. Under Assumptions 3.13.6,
1
n
n
i=1
(d
i
(a) )K
_
X
i
( ) c
h
n
_
1
n
n
i=1
(d
i
(a) )I(X
i
( ) c > 0)
0
almost surely uniformly over a (a
1
, a
2
), (, ) {1, 1}
2
and c G.
Proof: Notice that |d
i
(a) | is uniformly bounded over a (a
1
, a
2
), then the result essentially follows
from Horowitzs (1992) Lemma 4.
C
C
270 Z. Zhang
LEMMA A.4. For given > 0, dene
= {(, ) : |
0
| +
0
< }. Then there exists a
sufciently small > 0 and > 0 such that
sup
(,)
E{(d(a) )[I(x
H(a, , ) > 0) I(x
+z
H(a, , ) > 0)]|q} <

and
sup
(,)
E{(d(a) )[I(x
+z
H(a, , ) > 0) I(x
H(a, , ) > 0)]|q} <

uniformly over a (a
1
, a
2
).
Proof: The proof essentially replicates Lee and Seos (2008) Lemma A.2. However, it should be noted
that there is a slight distinction between their set-up and ours. In Lee and Seo (2008), the intercept
H(a, , ) is a free parameter while in our model, it is related to and for any given a. But it does
not cause any problem because once is chosen to be small enough such that
0
+|
0
| <
and |x
(
0
) +z
(
0
)| is sufciently small, |x
(
0
) +z
(
0
) +H(a, , ) H(a)| is also
sufciently small since H(a,
0
,
0
) = H(a) and H(a, , ) is continuous in and almost surely.
LEMMA A.5. Under Assumptions 3.13.6, n (
n
0
) = O
p
(1).
Proof: We prove this result by showing that there exists a n
1
-neighbourhood of
0
such that Q
n
(, )
Q
n
(
0
, ) < 0 with probability approaching one for all lying in this neighbourhood and for all lying in
a small neighbourhood of
0
. Consider
Q
n
(, ) Q
n
(
0
, )
=
_
1
n
n
i=1
(d
i
(a) )
_
K
_
X
i
( ) H
n
(a, , )
h
n
_
K
_
X
i
H
n
(a,
0
, )
h
n
__
(a)da
=
_
1
n
n
i=1
(d
i
(a) )
_
K
_
X
i
( ) H(a, , )
h
n
_
K
_
X
i
H(a,
0
, )
h
n
__
(a)da R
1n
+R
2n
=
_
1
n
n
i=1
(d
i
(a) )[I(X
i
( ) H(a, , ) > 0) I(X
i
H(a,
0
, ) > 0)](a)da
R
1n
+R
2n
+R
3n
+R
4n
,
where the second equality follows from Taylor expansion with
R
1n
= (H
n
(a, , ) H(a, , ))
_
1
nh
n
n
i=1
(d
i
(a) )k
_
X
i
( ) H
n
(a, , )
h
n
_
(a)da,
R
2n
= (H
n
(a,
0
, ) H(a,
0
, ))
_
1
nh
n
n
i=1
(d
i
(a) )k
_
X
i
H
n
(a,
0
, )
h
n
_
(a)da,
R
3n
=
_
1
n
n
i=1
(d
i
(a) )
_
K
_
X
i
( ) H(a, , )
h
n
_
I(X
i
( ) H(a, , )) > 0
_
(a)da
and
R
4n
=
_
1
n
n
i=1
(d
i
(a) )
_
K(
X
i
H(a,
0
, )
h
n
) I(X
i
H(a,
0
, )) > 0
_
(a)da,
C
C
H
n
(a, , ) lies between H
n
(a, , ) and H(a, , ), H
n
(a,
0
, ) lies between H
n
(a,
0
, ) and
H(a,
0
, ). It follows from Lemma A.1 that both R
1n
and R
2n
are O
p
(
1n
) uniformly over a (a
1
, a
2
) and
(, ) {1, 1}
2
. In addition, it follows from Lemma A.3, the uniform boundedness and non-
negativity of (a) that both R
3n
and R
4n
are o(1) almost surely uniformly over (, )
{1, 1} .
Suppose that >
0
. If q
i
> or q
i
<
0
, there always holds
I(X
i
( ) H(a, , ) > 0) I(X
i
H(a,
0
, ) > 0) = 0 (A.1)
because when q
i
> or q
i
<
0
, (a) X
i
( ) = X
i
and (b) H(a, , ) = H(a,
0
, ), where (b) follows from
the fact that H(a, , ) is totally determined by d(a) and X( ) for any given a by (3.2). Similarly, when
<
0
, (A.1) holds if q
i
< or q
i
>
0
.
Let
+
i
(, ) =
_
(d
i
(a) )[I(x
i
H(a, , ) > 0) I(x
i
+z
i
H(a,
0
, ) > 0)](a)da
and
i
(, ) =
_
(d
i
(a) )[I(x
i
+z
i
H(a, , ) > 0) I(x
i
H(a,
0
, ) > 0)](a)da.
Then based on the analysis above and (A.1), some manipulations give
Q
n
(, ) Q
n
(
0
, )
=
_
1
n
n
i=1
(d
i
(a) )[I(X
i
( ) H(a, , ) > 0) I(X
i
H(a,
0
, ) > 0)](a)da
R
1n
+R
2n
+R
3n
+R
4n
=
1
n
n
i=1
I(
0
< q
i
)
+
i
(, ) +
1
n
n
i=1
I( < q
i

0
)
i
(, ) +O
p
(
1n
) = A
n
+B
n
+O
p
(
1n
)
where
A
n
=
1
n
n
i=1
I(
0
< q
i
)E(
+
i
(, )|q
i
) +
1
n
n
i=1
I(
0
< q
i
)E(
i
(, )|q
i
)
and
B
n
=
1
n
n
i=1
I(
0
<q
i
)[
+
i
(, ) E(
+
i
(, )|q
i
)] +
1
n
n
i=1
I(
0
<q
i
)[
i
(, ) E(
i
(, )|q
i
)].
The it follows from the denition of
+
i
(, ),
i
(, ), the uniform boundedness of I() and Lemma A.4
that A
n

1
with probability approaching one for all |
0
| > B/n and for all (, )
for some
sufciently small ,
1
a small positive constant and
dened in Lemma A.4. Similarly, it follows from

Lemma A.2 that A
n

2
with probability approaching one for all |
0
| > B/n and (, )
with
2
is another small and positive constant. Now notice that both A
n
and B
n
are independent of
1n
and
1n
0. Thus it is straightforward to see that Q
n
(, ) Q
n
(
0
, ) < 0 with probability approaching one
for all |
0
| > B/n and for all (, )
, which completes the proof.

LEMMA A.6. [Q
n
(, ) Q
n
(
0
, )] [Q
n
(,
0
) Q
n
(
0
,
0
)] = o
p
(n
1
) uniformly over (, )
A
= {(, ) : n
1/3

0
A, n|
0
| A} for some A > 0.
C
C
272 Z. Zhang
Proof: According to Lemma A.5 and Lee and Seo (2008) that
n
is n
1
consistent and
n
is at least n
1/3
consistent. Then it sufces to consider a n
1
-neighbourhood of
0
and a n
1/3
-neighbourhood of
0
. Now
suppose that >
0
. It follows from the reasoning similar to (A.1) that when q
i
> or q
i
<
0
,
K
_
X
i
( ) H
n
(a, , )
h
n
_
K
_
X
i
H
n
(a,
0
, )
h
n
_
= 0. (A.2)
The key insight behind (A.2) is that when q
i
> or q
i
<
0
, H
n
(a, , ) = H
n
(a,
0
, ) because according
to (2.5), H
n
(a, , ) is totally determined by (d
i
(a), X
i
( ))s for i = 1, . . . , n and X
i
( ) = X
i
for each i.
Based on this observation, we have
[Q
n
(, ) Q
n
(
0
, )] [Q
n
(,
0
) Q
n
(
0
,
0
)]
=
1
n
n
i=1
I(
0
< q
i
)(
+
i
(, )

+
i
(,
0
)) +
1
n
n
i=1
I( < q
i

0
)(
i
(, )

i
(,
0
)),
where
+
i
(, ) =
_
(d
i
(a) )
_
K
_
x
i
H
n
(a, , )
h
n
_
K
_
x
i
+z
i
H
n
(a,
0
, )
h
n
__
(a)da
and
i
(, ) =
_
(d
i
(a) )
_
K
_
x
i
+z
i
H
n
(a, , )
h
n
_
K
_
x
i
H
n
(a,
0
, )
h
n
__
(a)da.
Then to prove Lemma A.6, it sufces to show that
C
n
=
1
n
n
i=1
I(
0
< q
i
)(
+
i
(, )

+
i
(,
0
)) = o
p
(n
1
) (A.3)
uniformly over n
1/3

0
A and n|
0
| A for some A > 0. We prove (A.3) by using the
arguments similar to that for Lee and Seos (2008) Lemma 6.1.
Dene
+
(a, , ) = K
_
x
H(a, , )
h
_
K
_
x
+z
H(a,
0
, )
h
_
.
Consider a class of functions indexed by (, ) for some positive constant A and a (a
1
, a
2
),
M
A
= {I(
0
< q )(
+
(a, , )
+
(a, ,
0
)) : n
1/3

0
A and n|
0
| A}.
We now show that M
A
is a Vapnik Chervonenkis (VC) class of functions. First, it is trivial to show that
M
A
1
= {I(
0
< q ) : n|
0
| A} is a VC class of functions. Then to show M
A
is a VC class, it
sufces to show, according to Pakes and Pollards (1989) Lemma 2.5 and K(s/h) I(s > 0) that
M
A
2
= {I(x
H(a, , ) > 0) : n
1/3

0
A and n|
0
| A}
is a VC class because similar arguments apply to other components of (
+
(a, , )
+
(a, ,
0
)).
Recalling the denition (3.2), H(a, , ) is monotone in x
and for any given x and a (a

1
, a
2
), then
x
H(a, , ) changes signs at only nite number of (, )s uniformly in x and a (a

1
, a
2
). According
to Asparouhova et al.s (2002) Lemma 1, M
A
2
is a VC class. Denote
A
= {(, ) : n
1/3

0

A, n|
0
| A}. If we can show
D
n
= E
_
sup
(,)
A
|C
n
EC
n
|
_
= o
p
(n
1
), (A.4)
C
C
then (A.3) follows since
E
sup
(,)
A
I(
0
< q )(
+
(, )

+
(,
0
))
sup
n|
0
|A
E(I(
0
< q )) sup
n
1/3
0
A
E(
+
(, )

+
(,
0
))
= O(n
1
)O(n
1/3
) = o(n
1
).
It follows from the uniform boundedness of d(a) and (a) that
D
n
c
1
E
_
sup
(,)
A
1
n
n
i=1
I(
0
< q
i
)
+
i
(a, , ) EI(
0
< q
i
)
+
i
(a, , )
_
c
1
c
2
n
1/2
J(1, M
A
)E
1/2
(M
A
)
2
= O(n
7/6
)
by Theorem 2.14.1 of Van der Vaart and Wellner (1996), where c
2
< is a universal constant, J(1, M
A
)
is the uniform entropy integral dened in Van der Varrt and Wellner (1996), which is bounded for a VC
class and M
A
is the envelope of M
A
.
APPENDIX B: PROOFS OF THEOREMS
Proof of Theorem 3.1: Recall Q
(, ) dened by (3.1) and Q(a, , , c) dened by (3.2), further dene

with a slight abuse of notation Q(a, , ) = Q(a, , , H(a, , )). We establish the consistency of (
n
,
n
)
by verifying the following conditions of Newey and McFadden (1994, Theorem 2.1) in sequence: (a)
Q
(, ) is uniquely maximized at (
0
,
0
); (b) Q
n
(, ) is maximized over a compact set that contains the
true parameters as the interior; (c) Q
(, ) is continuous; (d) Q
n
(, ) converges to Q
(, ) in probability
uniformly over (, ) .
For (a), rst it follows from Lee and Seo (2008, proof of Theorem 4.1) that Q(a,
0
,
0
, H(a))
Q(a, , , H(a, , )) > 0 for any (, ) with = {1, 1}
2
, (, ) = (
0
,
0
) and a
(a
1
, a
2
). Then it follows from the non-negativity of (a) that
Q
(
0
,
0
) Q
(, ) =
_
[Q(a,
0
,
0
) Q(a, , )](a)da
=
_
[Q(a,
0
,
0
, H(a)) Q(a, , , H(a, , ))](a)da > 0,
for any (, ) and (, ) = (
0
,
0
).
For (b), it sufces to denote = by Assumption 3.3. For (c), Q
(, ) is continuous
since I(x
+z
I(q > ) c > 0) is continuous at , and c G with probability one by

Assumption 3.2(b). For (d), rst note that Q
n
(a, , , c) Q(a, , , c) converges to zero in probability
uniformly over a (a
1
, a
2
), (, ) and c G by Lemma 2.4 of Newey and Mcfadden (1994) and
Lemma A.3. Then
Q
n
(a, , ) Q
(a, , ) = sup
cG
Q
n
(a, , , c) sup
cG
Q(a, , , c) = o
p
(1)
uniformly over a (a
1
, a
2
), (, ) , and moreover,
Q
n
(, ) Q
(, ) =
_
[Q
n
(a, , ) Q(a, , )](a)da = o
p
(1)
C
C
274 Z. Zhang
uniformly over (, ) . Finally,

1n
=
10
follows from the consistency result and Assumption
3.3(a).
Proof of Theorem 3.2: The n-consistency of
n
follows from Lemma A.5. Write
Q
n
(, ) Q
n
(
0
,
0
) = Q
n
(,
0
) Q
n
(
0
,
0
)
. ,, .
Q
1n
( )
+Q
n
(
0
, ) Q
n
(
0
,
0
)
. ,, .
Q
2n
()
+[Q
n
(, ) Q
n
(
0
, )] [Q
n
(,
0
) Q
n
(
0
,
0
)]
. ,, .
Q
3n
(,)
.
(B.1)
According to (B.1) and Lemma A.6, the
n
that maximizes Q
n
(, ) is equivalent to the one that maximizes
n (Q
1n
( ) +Q
3n
(, )) = n Q
1n
( ) +o
p
(1). To derive the limiting distribution of
n
, let us analyse the
limiting distribution of n Q
1n
( ). Since
n
is n-consistent, we only need to consider lying within a
n
1
-neighbourhood of
0
. Letting = n(
0
), the re-parameterization gives
n Q
1n
( ) = n(Q
n
(,
0
) Q
n
(
0
,
0
))
=
n
i=1
I(
0
< q
i
)
+
i
(,
0
) +
n
i=1
I( < q
i

0
)
i
(,
0
)
= I( > 0)
n
i=1
I
_
0
< q
i

0
+n
1
+
i
_
0
+n
1
,
0
_
+ I( < 0)
n
i=1
I
_
0
+n
1
< q
i

0
_

i
_
0
+n
1
,
0
_
= I( > 0)Q
+
n
() +I( < 0)Q
n
(),
where
Q
n
() =
n
i=1
I
_
0
+n
1
< q
i

0
_

i
_
0
+n
1
,
0
_
,
Q
+
n
() =
n
i=1
I
_
0
< q
i

0
+n
1
+
i
_
0
+n
1
,
0
_
,
+
i
(,
0
) =
_
(d
i
(a) )
_
K
_
x
0
H
n
(a, ,
0
)
h
n
_
K
_
x
0
+z
0
H
n
(a,
0
,
0
)
h
n
__
(a)da
and
i
(,
0
) =
_
(d
i
(a) )
_
K
_
x
0
+z
0
H
n
(a, ,
0
)
h
n
_
K
_
x
0
H
n
(a,
0
,
0
)
h
n
__
(a)da.
Without loss of generality, let us consider the weak convergence of the nite-dimensional distributions
of Q
+
n
(). To use the CramerWold device, let 0 <
1
< <
J
< for some positive integer J and
C
C
1
, . . . ,
J
be a sequence of constants. Consider the characteristic function of
S
n
=
J
j=1
j
(Q
+
n
(
j
) Q
+
n
(
j1
)),
which has the form
S
n
(t )
=
exp
1t
J
j=1
j
_
nj
(q)
+
_
0
+n
1
j
,
0
_
n,j1
(q)
+
_
0
+n
1
j1
,
0
__
n
,
with
nj
(q) = I(
0
< q
0
+n
1
j
),
+
(,
0
) =
_
(d(a) )
_
K
_
x
0
H
n
(a, ,
0
)
h
n
_
K
_
x
0
+z
0
H
n
(a,
0
,
0
)
h
n
__
(a)da.
Now letting
nj
(q) = I(
0
+n
1
j1
< q
0
+n
1
j
) and
+
=
_
(d(a) )
_
K
_
x
0
H(a)
h
n
_
K
_
x
0
+z
0
H(a)
h
n
__
(a)da,
we have
E
n
= lim
n
S
n
(t ) = lim
n
exp
1t
J
j=1
nj
(q)
n
since
0
+n
1
j

0
, H
n
(a, , ) converges to H(a, , ) uniformly over a (a
1
, a
2
), and
by Lemma A1, H(a,
0
,
0
) = H(a),

+
(,
0
) is uniformly bounded over and by the bounded
convergence theorem. Furthermore
E
n
= lim
n
1 +
J
j=1
[exp(
1t
j
nj
(q)
+
) 1]
n
= lim
n
1 +
J
j=1
E[exp(
1t
j
nj
(q)
+
) 1]
n
= lim
n
1 +
1
n
J
j=1
(
j

j1
)f
q
(
0
)E[exp(
1t
j

+
) 1|q =
+
0
] +o(1)
n
= exp
j=1
(
j

j1
)f
q
(
0
)E[exp(
1t
j
+
) 1|q =
+
0
]
,
where the rst equality follows from the identity exp (
j
a
j
) 1 =
j
(exp(a
j
) 1) when only one a
j
is
different from zero, the last equality follows from lim
n
(1 +a/n)
n
= exp(a), K(s/h
n
) I(s > 0) and
the dominated convergence theorem.
C
C
276 Z. Zhang
On the other hand, since the characteristic function of Q
+
() is
Q
+(t ) = exp [f
q
(
0
)(
+
(t ) 1)],
S =
J
j=1
j
(Q
+
(
j
) Q
+
(
j1
))
has the characteristic function
S
(t ) = exp [
J
j=1
(
j

j1
)f
q
(
0
)(
+
(
j
t ) 1)] = lim
n

S
n
(t ), thus
implying that Q
+
n
weakly converges to Q
+
. Moreover, following Lee and Seos (2008) Lemma 6.2, (B.8),
we can prove the tightness of Q
+
n
() by showing that
|Q
+
n
() Q
+
n
(
1
)||Q
+
n
(
2
) Q
+
n
()| C
q
(
2
1
)
2
with C
q
being the Lipschitz constant of F
q
. With these preliminary results, the limiting distribution of
n
follows from the argmax continuous mapping theorem.
To study the limiting distribution of
n
, we know from (B.1) and Lemma A.6 that the
n
that maximizes
Q
n
(, ) is equivalent to the one that maximizes
n(Q
2n
() +Q
3n
(, )) =
nQ
n
(
0
, ) +o
p
(1), which
does not depend on . In other words, the limiting distribution of
n
can be derived by viewing known
as
0
. If so, our model then can be regarded as a transformation model with no threshold effect. Notice that
given =
0
, the criterion function Q
n
(
0
, ) is smooth in , thus the limiting distribution of

n
can be
obtained through a standard Taylor expansion. To see this, Taylor expansion of the rst-order condition that
n
should satisfy, that is,
n
Q
n
(
0
,
n
)
2
= 0 gives
n(
2n
20
) =
_
2
Q
n
(
0
,
n
)
2
_
1
_
n
Q
n
(
0
,
0
)
2
_
, (B.2)
where
n
lies between
0
and
n
. Applying the same proofs in Chen (2010), we can show that
2
Q
n
(
0
,
n
)
2
p
and
n
Q
n
(
0
,
0
)
2
=
1
n
n
i=1
i
+o
p
(1).
Thus the

n-consistency and the desired limiting distribution of

2n
follows. Finally, the asymptotic
independence between

n
and
n
has been proved implicitly by Lemma A.6, (B.1) and the above
discussions.
Proof of Proposition 3.3: Similar to the proof of Theorem 3.1, the consistency result can be established
by checking a set of conditions of Newey and McFadden (1994, Theorem 2.1). In parallel with the proof
of Theorem 3.1, it sufces to show that Q
rn
(a, , , c) Q
r
(a, , , c) converges to zero in probability
uniformly over a (a
1
, a
2
), (, ) and c G, where
Q
r
(a, , , c) = E
__
I(y > a)
G(a)

_
I(X
( ) c > 0)
_
.
This can be shown by Lemma 2.4 of Newey and Mcfadden (1994), Lemma A.3 and the consistency of
KaplanMeier estimator (Fleming and Harrington, 1991).
The convergence rate and the limiting distribution of
rn
can be established by applying the reasoning
similar to that for the xed censoring case. The only distinction between themis that with randomcensoring,
there is a KaplanMeier estimator G
n
() in the criterion function (2.10), which should be taken into
consideration in the subsequent proof. For example, in proving Lemma A.5, the presence of KaplanMeier
C
C
estimator leads to an extra remainder term R
r,5n
in addition to R
r,ln
for l = 1, . . . , 4, that is,
Q
rn
(, ) Q
rn
(
0
, )
=
_
1
n
n
i=1
_
d
i
(a)
G(a)

_
[I(X
i
( ) H(a, , ) > 0) I(X
i
H(a,
0
, ) > 0)](a)da
R
r,1n
+R
r,2n
+R
r,3n
+R
r,4n
+R
r,5n
,
where
R
r,1n
= (H
n
(a, , ) H(a, , ))
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)

_
h
1
n
k
_
X
i
( ) H
n
(a, , )
h
n
_
(a)da,
R
r,2n
= (H
n
(a,
0
, ) H(a,
0
, ))
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)

_
h
1
n
k
_
X
i
H
n
(a,
0
, )
h
n
_
(a)da,
R
r,3n
=
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)

__
K
_
X
i
( ) H(a, , )
h
n
_
I(X
i
( ) H(a, , )) > 0
_
(a)da,
R
r,4n
=
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)

__
K
_
X
i
H(a,
0
, )
h
n
_
I(X
i
H(a,
0
, )) > 0
_
(a)da
and
R
r,5n
=
_
G(a) G
n
(a)
G(a)G
n
(a)
1
n
n
i=1
d
i
(a)[I(X
i
( ) H(a, , ) > 0)
I(X
i
H(a,
0
, ) > 0)](a)da.
Among the remainder terms R
r,ln
,i = 1, . . . , 5, R
r,1n
, . . . , R
r,4n
can be shown to be o
p
(1) via similar
arguments to the proof of Lemma A.5 and the extra remainder term R
r,5n
is o
p
(1) via the consistency
result of KaplanMeier estimator G
n
(a) and the uniform boundedness of the other components appearing
in R
r,5n
. Consequently, the n-consistency of
rn
can be established.
As another distinction from the xed censoring case, the presence of KaplanMaier estimator leads to
an extra term
2,ri
in the asymptotic linear representation for
rn
. But given the asymptotic independence
between
rn
and
rn
, deriving the limiting distribution of
rn
can essentially follow the same arguments to
Chen (2010), thus is not a new result in this paper.
C
C

Semi-Parametric Estimation of A Generalized Threshold Regression Model Under Conditional Quantile Restriction

Uploaded by

Copyright:

Available Formats

Semi-Parametric Estimation of A Generalized Threshold Regression Model Under Conditional Quantile Restriction

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Semi-Parametric Estimation of A Generalized Threshold Regression Model Under Conditional Quantile Restriction

Uploaded by

Copyright:

Available Formats

Econometrics Journal (2013), volume 16, pp. 250277.

Center for Econometric Study, Shanghai Academy of Social Sciences, China.

is the (latent) dependent variable, (x, z) is a (p

conditional on (x, q). However, one

and (x, q).

k(s)ds is an integrated kernel function and h

, C), t = I(y C), (2.7)

is generated by (1.2) and the random variable C is assumed to

, x, q). Typically, we assume that y

[0, +) and C [0, +). With

so that its rst component

as the interior. Moreover, G is a compact set on the real line with

) with > 0, thus

( ) H(a, , ) > 0)](a)da, (3.1)

( ) c > 0)]. (3.2)

denote two independent jump processes on R such that

(s)I(s < 0) with

is the rst derivative of H,

, x, q) and continuously distributed with a positive density on an interval containing y with

n-consistency and asymptotic

will be a consistent estimator for V.

will be a consistent estimator for V

, a modied BoxCox transformation, x

, c), where c is the 0.7-th

, C), where C is i.i.d. drawn from a uniform

H(a, , ) > 0) I(x

H(a, , ) > 0)]|q} <

H(a, , ) > 0) I(x

H(a, , ) > 0)]|q} <

dened in Lemma A.4. Similarly, it follows from

, which completes the proof.

and for any given x and a (a

H(a, , ) changes signs at only nite number of (, )s uniformly in x and a (a

(, ) dened by (3.1) and Q(a, , , c) dened by (3.2), further dene

I(q > ) c > 0) is continuous at , and c G with probability one by

You might also like