Semi-Parametric Estimation of A Generalized Threshold Regression Model Under Conditional Quantile Restriction
Semi-Parametric Estimation of A Generalized Threshold Regression Model Under Conditional Quantile Restriction
Semi-Parametric Estimation of A Generalized Threshold Regression Model Under Conditional Quantile Restriction
doi: 10.1111/ectj.12005
Semi-parametric estimation of a generalized threshold regression
model under conditional quantile restriction
ZHENGYU ZHANG
,
School of Economics, Shanghai University of Finance and Economics, 777, Guoding Road,
200433, Shanghai, China.
= x
0
+z
0
I(q >
0
) u, (1.1)
where y
) = x
0
+z
0
I(q >
0
) u, (1.2)
where H() is a strictly increasing function and the -th quantile of u conditional on (x, q) is
zero.
2
We observe (y, x, q), where y is allowed subject to some two-sided censoring, that is,
y = a
1
I(y
< a
1
) +y
I(a
1
y
a
2
) +a
2
I(y
> a
2
), (1.3)
with a
1
< a
2
.
Regression models with the dependent variable subject to an unknown monotonic
transformation are also known as the duration models, which are relevant in a variety of
applications. This is because many time-to-event variables are of interest to researchers
conducting empirical studies in economics, nance and management. These time-to-event
variables can be the length of an unemployment spell or a strike in labour economics, the time
between purchases of a durable good in marketing, the duration of business circles or recessions
in macroeconomics and the survival time in biostatistics. See Lancaster (1990) and Van den Berg
(2001) for extensive surveys on these applications. In comparison with a linear transformation
model with no threshold effect, (1.2) is useful in modelling some non-linear and asymmetric
features in an economic relationship. For example, a vast literature invokes tax and regulatory
barriers to micro-rm growth in developing countries (Collier and Gunning, 1999), suggesting
the presence of a growth threshold effect for rms in these countries. Gauthier and Gersowitz
(1997) nd an inverted-U relationship between Cameroonian rms tax burden and survival
time, with very young and very large rms revealing low tax burdens. In such an application,
the dependent variable is the duration of a rms survival and the threshold variables include
the variables measuring tax and regulatory burdens. Then testing the presence of and estimating
the extent of the non-linear path for rms growth with respect to tax and regulatory barriers
amounts to testing for and estimating the threshold effect in the model. The effect of nancial
constraint on rms investment behaviour can also be modelled as a discontinuous threshold
effect (Fazzari et al., 1988, Hansen, 1999). To test the effect of nancial constraint, one may
take the elapsed time until a major investment decision made by a rm as the dependent variable
and the leverage ratio as the threshold variable. In biostatistics, Cox regression models with
covariate threshold are used to study the risk factors for breast cancer with a threshold in the
effect of estrogen receptors (Jespersen, 1986) and the effect of tumour thickness on survival with
melanoma (Andersen et al., 1993). In addition, our generalized threshold model (1.2) can be
viewed as a exible and parsimonious strategy for adding non-parametric component into the
baseline parametric threshold model (1.1).
Estimation of a model similar to ours has been considered by a number of studies. Among
them, Liang et al. (1990) and Luo et al. (1997) consider a Cox model (when the error term
follows the extreme value distribution) with time-dependent covariates and a change point at an
2
Notice that the x in model (1.2) should not contain a constant term. According to Horowitz (1996), no intercept is
identied for models subject to unknown transformation.
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
252 Z. Zhang
unknown time. Pons (2003) studies the maximum partial likelihood estimate for the Cox model
with a change point at unknown threshold of a covariate. One paper that is closely related to
ours is Kosorok and Song (2007), who consider for model (1.2) the non-parametric maximum
likelihood (NML) estimation under the assumption that the error term is independent of the
regressors. As will become evident soon, our estimation method for model (1.2) is fundamentally
different fromKosorok and Songs. Specically, we propose for the model a maximum integrated
score estimator (MISE), which has the following two major advantages in comparison with the
existing estimators.
First, to the best of our knowledge, there has been no such study that considers estimation of
(1.2) under conditional quantile restriction. Most existing methods assume that the error term
is independent of the regressors, thus imposing a pure location shift structure in the effect
of regressors on the transformed dependent variable. In contrast, aside from its robustness
properties relative to the mean regression, quantile regression allows for a wide range of
conditional heteroscedasticity of unknown form, thus permitting the researcher to learn a much
richer structure of heterogenous impact of regressors on the transformed dependent variable.
Second, our MISE is able to accommodate both xed and random censoring in the dependent
variables. Notice that Kosorok and Songs (2007) NML estimator can deal with the random
censoring with the censoring variable independent of y
, z
d
)
, = (
and X( ) = (x
, z
d
( ))
.
An MISE is motivated by the following observation: under monotonicity of the
transformation function, a transformation model is equivalent to a sequence of binary choice
models. That is, given a (a
1
, a
2
) and letting d
i
(a) = I(y
i
> a), we have the following binary
choice model for each a (a
1
, a
2
) due to the monotonicity of H:
d
i
(a) = I(y
i
> a) = I(y
i
> a) = I(X
0
H(a) > u
i
), i = 1, . . . , n. (2.1)
Under the assumption that the th quantile of u conditional on (x, q) is zero, we have
E(d(a)|x, q) = F
u|xq
(X
0
H(a)|x, q) > 0 X
0
H(a) > 0, (2.2)
where F
u|xq
is the distribution function of u conditional on (x, q). Based on the above
observation, Manskis (1985) maximum score method applies and a consistent estimator for
(
0
,
0
) can be obtained by maximizing
Q
n
(a, , ) =
1
n
n
i=1
(d
i
(a) )I[X
i
( ) H(a) > 0], (2.3)
with respect to and .
Apparently, for the threshold binary choice model (2.1) generated with any given a (a
1
, a
2
),
we know from Lee and Seo (2008) that maximizing the criterion function (2.3) may have already
produced a consistent estimator for the parameters in the model. However, the key point here is
that by xing a, the objective function (2.3) has made use of only a small fraction of information
contained in the original dataset, thus resulting in efciency loss to a great degree. To remedy
this drawback, one may integrate Q
n
(a, , ) over the interval (a
1
, a
2
) and estimate (
0
,
0
) by
maximizing the integrated criterion function
Q
n
(, ) =
_
Q
n
(a, , )(a)da, (2.4)
with a non-negative weighting function (). For example, letting
(a) =
1
a
2
a
1
I(a
1
a a
2
),
amounts to imposing a uniform weighting scheme on the integrated criterion function. However,
two issues remain to be addressed before the MISE dened above can be put into practice.
First, it is well known that Manskis (1985) maximum score objective function is difcult to
analyse since it involves a discontinuous step function and, therefore, not amenable to analysis
through the usual Taylor series approximation. Thus instead of analysing (2.4), one may employ
a smoothed maximum score function in the principle of Horowitz (1992) and consider the
following objective function:
_
1
n
n
i=1
(d
i
(a) )K
_
X
i
( ) H(a)
h
n
_
(a)da,
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
254 Z. Zhang
where K(v) =
_
v
i=1
(d
i
(a) )K
_
X
i
( ) c
h
n
_
. (2.5)
The rationale behind this replacement is that H(a) is implicitly determined by (2.1) for each a.
Based on these discussions, our MISE is dened, by a slight abuse of notation, as a solution to
the following maximization problem:
(
n
,
n
) = arg max
,
Q
n
(, ), (2.6)
where and are compact sets that contain the true parameters as the interior, and
Q
n
(, ) =
_
1
n
n
i=1
(d
i
(a) )K
_
X
i
( ) H
n
(a, , )
h
n
_
(a)da,
with H
n
(a, , ) dened in (2.5).
Notice that the original criterion function (e.g. (2.3)) contains two sources of discontinuity:
one comes from the score function and the other from the threshold indicator I(q > ). A
subtle issue one might be concerned with our estimation strategy is why we choose to partially
smooth the criterion function, that is, leaving the threshold indicator unsmoothed instead of
smoothing both or leaving both unsmoothed. There are several reasons that encourage us to do
like this. First, like many other semi-parametric estimation procedures, proling out the innite
dimensional nuisance parameter H in our model to recover the nite dimensional parameters of
central interest essentially involves non-parametric estimation based on some local smoothing
technique, suggesting that smoothing is necessary for our method. Second, as will become
evident later, our MISE by maximizing a partially smoothed criterion function has no negative
effect on the convergence rate as well as the limiting distribution of the resulting estimator.
Instead, double smoothing forces us to deal with two kernel functions and two sequences of
smoothing parameters, that complicates the matter to a great degree. Even worse, it seems much
difcult for us to derive the limiting distribution under double smoothing. Third, even with partial
smoothing, our MISE is shown to have a convergence rate n
1
, thus being likely the fastest
convergence rate it may achieve. On the other hand, smoothing technique is not used for free.
The more smoothing is used, the more stringent smoothness conditions we must impose on the
data generating process, even the additional smoothing can not bring us a faster convergence rate.
In practice, MISE can be computed via two steps. In the rst stage, for xed , obtain
n
( ) = arg max
n
(, ) with some scale normalization restriction on the coefcients. In
the second stage, let Q
n
( ) = Q
n
(,
n
( )) and search over a grid of for
n
that maximize
Q
n
( ). Then the MISE for can be dened as
n
=
n
(
n
).
The MISE proposed above can be readily extended to the occasion with random censoring,
without loss of generality, let us assume that the observations consist of a random sample of
(y, t, x, q), where
y = min(y
, C) > a) |x, q)
= G(a)E (I (y
> a) |x, q) ,
(2.8)
where G() = 1 F
C
(), F
C
() is the distribution function of C. The observation (2.8), together
with the preceding procedure dened for the xed censoring case, suggests that the MISE for
(, ) with random censoring be dened as
(
rn
,
rn
) = arg max
,,
Q
rn
(, ), (2.9)
where
Q
rn
(, ) =
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)
_
K
_
X
i
( ) H
rn
(a, , )
h
n
_
(a)da, (2.10)
G
n
() is the KaplanMeier estimator for the survival function G(),
H
rn
(a, , ) = arg max
c
Q
r,n
(a, , , c),
with
Q
rn
(a, , , c) =
1
n
n
i=1
_
d
i
(a)
G
n
(a)
_
K
_
X
i
( ) c
h
n
_
. (2.11)
3. LARGE SAMPLE PROPERTIES
This section is organized as follows. Section 3.1 introduces the regularity conditions needed for
asymptotic analysis of MISE with xed censoring and Section 3.2 presents the large sample
properties for this estimator. Extension to the case of random censoring is summarized in
Section 3.3.
3.1. Assumptions
It is well known (e.g. Horowitz, 1996) that for the models subject to unknown transformation,
0
can be identied only up to scale and x should contain at least one component whose probability
distribution conditional on the remaining components is absolutely continuous with respect to
Lebesgue measure. Thus we arrange the components of x = (x
1
, x
2
)
20
)
, with |
10
| = 1 and
20
being a (p
x
1)-dimensional coefcient vector for x
2
. To analyse the MISE formally, let us
introduce the following assumptions:
ASSUMPTION 3.1. (a)
_
y
i
, x
i
, q
i
_
, i = 1, . . . , n is a random sample of (y, x, q) generated by
(1.2) and (1.3); (b) H() is a strictly increasing function.
ASSUMPTION 3.2. (a) The support of x is not contained in any proper linear subspace of
R
p
x
under the distribution of xconditional on q for almost every q; (b) the distribution of x
1
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
256 Z. Zhang
conditional on x
2
and q has a positive density almost everywhere with respect to Lebesgue
measure; (c) q is continuously distributed with support containing
0
.
ASSUMPTION 3.3. (a) |
10
| = 1; (b)
0
= 0; (c) Pr(z
0
= 0|q =
0
) > 0; (d) is a compact
set on the real line that contains
0
as the interior; = {1, 1}
2
is a compact set of R
p
x
+p
z
that contains
0
= (
0
,
0
)
n
= o
p
(n
1/2
),
1n
2n
= o
p
(n
1/2
), where
1n
= h
n
+(ln n/nh
n
)
1/2
and
2n
= h
n
+(ln n/nh
3
n
)
1/2
.
Denote
Q
(, ) =
_
E[(I(y > a) )I(X
2
)
so that
1
is the coefcient on z
d
1
( ) =
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 257
x
1
I(q > ) and
2
is the remaining (p
z
1)-dimensional coefcients. Otherwise, z
2
z,
2
=
and
1
0. Let
10
(q) = 1 +
10
I(q >
0
),
2
= (
2
,
,
20
= (
20
,
0
)
, X
2
= (x
2
, z
d
2
)
.
Dene the Hessian type matrix
=
2
Q
(, )
=
0
, =
0
=
_ _
xx
(a)
oo
(a)
H(a,
0
,
0
)
H(a,
0
,
0
)
_
(a)da,
(3.3)
where
xx
(a) = E
_
1
10
(q)
f
ux
1
|x
2
q
_
0,
H(a) X
20
10
(q)
x
2
, q
_
X
2
X
2
_
, (3.4)
oo
(a) = E
_
1
10
(q)
f
ux
1
|x
2
q
_
0,
H(a) X
20
10
(q)
x
2
, q
__
. (3.5)
ASSUMPTION 3.7. The matrix is positive denite.
Assumption 3.5 contains a set of smoothness conditions in parallel with Chens (2010)
Assumption 5. It imposes the boundedness and smoothness on the probability density of
two continuous random components, namely, u and x
1
conditional on the other components.
Assumption 3.6 places restrictions on the kernel function K() and a related sequence of
bandwidths, which parallels Chens (2010) Assumptions 6 and 7. In particular, it requires
nh
4
n
/ ln
2
n and nh
2
n
0, which further requires that > 2. Assumption 3.7 is analogous
to the condition that the information matrix of a maximum likelihood estimator is negative
denite.
3.2. Limiting distributions of MISE
We begin this part with the consistency result.
THEOREM 3.1. Under Assumptions 3.13.4, (
n
,
n
) as dened by (2.6) is consistent with
1n
=
10
with probability approaching one as n .
In parallel with the asymptotic results obtained by recent literature on generalized threshold
regression models (Pons, 2003, Kosorok and Song, 2007), MISE for the threshold parameter is
shown to have an irregular non-normal limiting distribution and the estimator for the remaining
parameters has a standard normal distribution. To formally describe the asymptotic behaviour, let
us dene the following right-continuous jump process. Let f
q
() denote the probability density
function of q. Let
+
and
(s)
is a Poisson random variable with parameter sf
q
(
0
) for s < 0 and
(s) = 0 for s 0. In
addition, let {
+
l
: l = 0, 1, 2, . . .} and {
l
: l = 0, 1, 2, . . .} be two independent sequences of
i.i.d. random variables with characteristic functions
+
(t )
= E
_
exp
1t
_
(d(a) )[I(x
0
> H(a)) I(x
0
+z
0
> H(a))](a)da
q =
+
0
_
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
258 Z. Zhang
and
(t )
= E
_
exp
1t
_
(d(a) )[I(x
0
+z
0
> H(a)) I(x
0
> H(a))](a)da
q =
0
_
,
respectively, with E(|q =
+
0
) = lim
v0
+ E(|q =
0
+v), E(|q =
0
) = lim
v0
+ E(|q =
0
v) and
+
0
=
0
= 0. Denote Q(s) = Q
+
(s)I(s > 0) +Q
0l
+
(s)
+
l
and Q
(s) =
0l
(s)
l
.
The following proposition establishes the limiting distribution of MISE with xed censoring:
THEOREM 3.2. Under Assumptions 3.13.7, we have as n ,
n (
n
0
)
d
inf {
= arg max
Q()}
and
n(
2n
20
)
d
N(0,
1
V
1
),
with V = E
i
i
,
i
=
_
X
2i
xo
(H
1
(X
0
))
oo
(H
1
(X
0
))
_
(I(u
i
< 0) )
(H
1
(X
0
))
H
(H
1
(X
0
))
,
where H
xo
(a) = E
_
1
10
f
ux
1
|x
2
q
_
0,
H(a) X
20
10
x
2
, q
_
X
2
_
.
In addition,
n
and
2n
are asymptotically independent.
Statistical inference based on MISE will be discussed in Section 4. From Theorem 3.2, we
know that MISE for
n
converges faster than the remaining parameters and has a non-standard
limiting distribution, depending on nuisance parameters in complex manners. Such asymptotic
behaviour of
n
is quite similar with the results obtained by recent literature on generalized
threshold models, for example, Pons (2003) and Kosorok and Song (2007). The implication of
asymptotic independence between
n
and
2n
is two fold. First, the limiting distribution of
2n
can be derived as if
0
were known, which has already been exploited in our proof. Second,
asymptotic independence indicates that once a consistent estimator for
0
is available, inference
about
2n
could be essentially identical to that for a transformation model with no threshold
effect.
3.3. Random censoring case
Large sample properties of MISE for random censoring can be established similarly to that for
the xed censoring case. To present it formally, let us make an additional assumption.
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 259
ASSUMPTION 3.8. (y
i
, t
i
, x
i
, q
i
), i = 1, . . . , n is a random sample of (y, t, x, q) generated by
(1.2) and (2.7).H() is a strictly increasing function. The censoring variable C is independent of
(y
r
(s)I(s < 0), where
Q
+
r
(s) =
0l
+
(s)
+
rl
and Q
r
(s) =
0l
(s)
rl
,
{
+
rl
: l = 0, 1, 2, . . .} and {
rl
: l = 0, 1, 2, . . .} are two independent sequences of i.i.d. random
variables with characteristic functions
+
r
(t ) = E
_
exp
1t
_ _
d(a)
G(a)
_
[I(x
0
> H(a)) I(x
0
+z
0
> H(a))](a)da
q =
+
0
_
and
r
(t ) = E
_
exp
1t
_ _
d(a)
G(a)
_
[I(x
0
+z
0
> H(a)) I(x
0
> H(a))](a)da
q =
0
_
,
respectively.
THEOREM 3.3. Under Assumptions 3.23.8, (
rn
,
rn
) as dened by (2.9) is consistent with
1,rn
=
10
with probability approaching one as n . As n , we have
n (
rn
0
)
d
inf
_
= arg max
Q
r
()
_
and
n(
2,rn
20
)
d
N(0,
1
V
r
1
),
where V
r
= E
ri
ri
,
ri
=
1,ri
+
2,ri
,
1,ri
=
_
X
2i
xo
(H
1
(X
0
))
oo
(H
1
(X
0
)
__
I(H(y
i
) > X
0
)
G(H
1
(X
0
))
_
(H
1
(X
0
))
H
(H
1
(X
0
))
,
and
2,ri
=
_
rx
(v)
(v)
dM
i
(v), with (v) = Pr(y v), M
i
(v) = I(y
i
v, t
i
= 0)
_
v
0
I(y
i
s)d(s), () is the cumulative hazard function of C,
rx
(a) = E
_
1
10
_
X
2
xo
(a)
oo
(a)
_
f
x
1
|x
2
q
_
H(a) X
20
10
x
2
, q
__
,
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
260 Z. Zhang
rx
(v) =
_
rx
(a)
G(a)
I(a v)(a)da.
In addition,
rn
and
2,rn
are asymptotically independent.
Statistical inference based on (
2,rn
,
rn
) will be discussed in Section 4. In comparison with
the xed censoring case, there is an extra term
2,ri
in the asymptotic linear representation of the
estimator, due to the presence of KaplanMeier estimator G
n
() in the criterion function (2.10).
4. INFERENCE
4.1. Inference on
According to Theorems 3.2 and 3.3, the asymptotic distribution for
n
is highly non-standard and
cannot be tabulated as it depends on nuisance parameters in complex manners. Similar issues
are also with, for example, Lee and Seo (2008) and Kosorok and Song (2007). Even so, two
possible methods can be used to carry out large sample inference. First, subsampling provides
a consistent inferential method for the asymptotic distribution of
n
as in Gonzalo and Wolf
(2005). Condence intervals can be constructed following the standard subsampling procedure,
see for example, Politis et al. (1999). The second inferential approach seems to be more practical
but a little informal. The basic motivation underlying this approach is that such irregular limiting
distribution for
n
can be largely attributed to the discontinuity of the threshold indicator I(q >
0
). To restore the regularity we can replace I(q > ) with K
2
((q )/h
2n
) where K
2
is another
integrated kernel function and h
2n
is another sequence of bandwidths. This idea is not new. It has
been used by Seo and Linton (2007) to develop a smoothed LS estimator for model (1.1) within
the mean regression framework. Although this method enables the standard normal inference on
n
, it might cause the estimator to have a slower convergence rate.
4.2. Inference on regular parameters
In comparison with
n
, inference on
n
is fairly standard given its
n
are asymptotically independent,
n
has the same
limiting distribution as to what it would converge weakly as if
0
were known, suggesting that
inferential procedure concerning
n
would be essentially identical to that for a transformation
model with a change point known to the researcher.
To be specic, rst consider the case with xed censoring. It follows from the proof of
Theorem 3.2 that can be consistently estimated by
n
=
2
Q
n
(
n
,
n
)
2
=
1
nh
2
n
_
n
i=1
(d
i
(a) )k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
_
_
X
2i
(
n
)X
2i
(
n
)
H
n
(a,
n
,
n
)
2
H
n
(a,
n
,
n
)
2
_
(a)da,
where X
2
( ) = (x
2
, z
d
2
( ))
, H
n
(a,
n
,
n
) is determined by
1
nh
n
n
i=1
(d
i
(a) )k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
_
= 0,
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 261
and
H
n
(a, , )
2
=
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( ) H
n
(a, , )
h
n
_
_
1
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( ) H
n
(a, , )
h
n
_
X
2i
( )
_
.
For the estimation of V, dene
i
=
_
(d
i
(a) )
1
h
n
k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
__
X
2i
(
n
)
xo
(a)
oo
(a)
_
(a)da,
where
xo
(a) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
_
X
2i
(
n
),
oo
(a) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
(
n
)
n
H
n
(a,
n
,
n
)
h
n
_
.
Following the arguments in Powell et al. (1989), we can show that
n
i=1
i
i
2
= o
p
(1) and
consequently
V
n
=
1
n
n
i=1
rn
=
2
Q
rn
(
rn
,
rn
)
2
=
1
nh
2
n
_
n
i=1
_
d
i
(a)
G
n
(a)
_
k
_
X
i
(
rn
)
rn
H
rn
(a,
rn
,
rn
)
h
n
_
_
X
2i
(
rn
)X
2i
(
rn
)
H
rn
(a,
rn
,
rn
)
2
H
rn
(a,
rn
,
rn
)
2
_
(a)da,
where H
rn
(a,
n
,
n
) is determined by
1
nh
n
n
i=1
_
d
i
(a)
G
n
(a)
_
k
_
X
i
(
rn
)
rn
H
rn
(a,
rn
,
rn
)
h
n
_
= 0,
and
H
rn
(a, , )
2
=
_
1
nh
2
n
n
i=1
_
d
i
(a)
G
n
(a)
_
k
_
X
i
( ) H
rn
(a, , )
h
n
_
_
1
_
1
nh
2
n
n
i=1
_
d
i
(a)
G
n
(a)
_
k
_
X
i
( ) H
rn
(a, , )
h
n
_
X
2i
( )
_
.
For the estimation ofV
r
, dene
ri
=
1,ri
+
2,ri
,
1,ri
=
_ _
d
i
(a)
G
n
(a)
_
1
h
n
k
_
X
i
(
rn
)
n
H
rn
(a,
rn
,
rn
)
h
n
__
X
2i
(
n
)
r,xo
(a)
r,oo
(a)
_
(a)da,
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
262 Z. Zhang
2,ri
=
_
rx
(v)
(v)
d
M
i
(v),
where
r,xo
(a) =
1
nh
2
n
n
i=1
_
d
i
(a)
G
n
(a)
_
k
_
X
i
(
rn
)
rn
H
rn
(a,
rn
,
rn
)
h
n
_
X
2i
(
rn
),
r,oo
(a) =
1
nh
2
n
n
i=1
_
d
i
(a)
G
n
(a)
_
k
_
X
i
(
rn
)
rn
H
rn
(a,
rn
,
rn
)
h
n
_
,
rx
(a) =
1
nh
n
n
i=1
d
i
(a)k
_
X
i
(
rn
)
rn
H
rn
(a,
rn
,
rn
)
h
n
__
X
2i
(
n
)
r,xo
(a)
r,oo
(a)
_
,
rx
(a) =
_
rx
(a)
G
n
(a)
I(a v)(a)da,
(v) = n
1
n
i=1
I(y
i
v),
M
i
(v) = I(y
i
v, t
i
= 0)
_
v
0
I(y
i
s)d
n
(s),
n
() is the
Nelson estimator for the cumulative hazard function of C. Following the arguments in
Powell et al. (1989), we can show that
n
i=1
ri
ri
2
= o
p
(1) and consequently
V
rn
=
1
n
n
i=1
ri
ri
n
(, ) dened below (2.6), dene
Q
n
( ) = Q
n
(,
n
( )),
n
( ) = arg max
n
(, ),
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 263
and
n
= max
B,=0
Q
n
(, ), (4.1)
where B is a compact set containing
0
as the interior. Notice that (3.6) is well dened since
Q
n
(, ) does not depend on when = 0. Then the sup-LR statistic is dened as
QLR
n
= sup
n(Q
n
( )
Q
n
).
The limiting distribution of QLR
n
under the null is highly non-standard and non-normal, thus
cannot be tabulated. Instead, we can carry out the following steps to simulate the null distribution:
STEP 1. Generate i.i.d. random variables v
ij
s for i = 1, . . . , n and j = 1, . . . , J from the
uniform distribution on [0, 1] for a sufciently large J.
STEP 2. Simulate the following unrestricted and restricted score functions, respectively:
G
j
( ) =
1
nh
n
n
i=1
_
[I(X
i
( )
n
( ) H
n
(a, ,
n
( )) > v
ij
) ]
k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
__
X
2i
( )
xo
(a, )
oo
(a, )
_
(a)da,
and
G
1,j
=
1
nh
n
n
i=1
_
[I(x
n
H
n
(a,
n
) > v
ij
) ]k
_
x
n
H
n
(a,
n
)
h
n
__
x
2i
xo
(a)
oo
(a)
_
(a)da,
where
n
( ) = arg max
n
(, ),
n
= arg max
=0,B
Q
n
(, ),
3
H
n
(a, ,
n
( )) is
determined by
1
nh
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
= 0,
H
n
(a,
n
) is determined by
1
nh
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
= 0,
xo
(a, ) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
X
2i
( ),
oo
(a, ) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
,
3
Notice that Q
n
(, ) depends only on when = 0.
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
264 Z. Zhang
xo
(a) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
x
2i
and
oo
(a) =
1
nh
2
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
.
STEP 3. Further dene
( ) =
1
nh
2
n
_
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
_
X
2i
( )X
2i
( )
H
n
(a, ,
n
( ))
2
H
n
(a, ,
n
( ))
2
_
(a)da,
and
1
=
1
nh
2
n
_
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
__
x
2i
x
2i
H
n
(a,
n
)
2
H
n
(a,
n
)
2
_
(a)da,
where
H
n
(a, ,
n
( ))
2
=
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
_
1
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
X
i
( )
n
( ) H
n
(a, ,
n
( ))
h
n
_
X
2i
( )
_
and
H
n
(a,
n
)
2
=
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
_
1
_
1
nh
2
n
n
i=1
(d
i
(a) )k
_
x
n
H
n
(a,
n
)
h
n
_
x
2i
_
.
STEP 4. Finally, the simulated null distribution is readily derived from {D
j
}
j=1,...,J
, with
D
j
= sup
1
2
[G
j
( )
1
( )G
j
( ) G
1,j
1
1
G
1,j
].
5. MONTE CARLO SIMULATION
In this section, we conduct a small-scale Monte Carlo experiment to evaluate the nite sample
performance of the suggested MISE for a generalized threshold regression model. Alarger Monte
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 265
Table 1. Simulation results for xed censoring.
n = 200
= 0.6 = 2
= 0.2 = 0.4 = 0.6 = 0.2 = 0.4 = 0.6
Mean 0.4782 0.4892 0.4825 0.4913 0.4838 0.4920
SD 0.0943 0.0831 0.0857 0.0683 0.0653 0.0662
Mean 1.6438 1.2017 0.7063 1.6853 1.2134 0.7182
SD 0.1662 0.1482 0.1505 0.1453 0.1384 0.1425
Mean 1.1134 1.1483 1.1653 1.1337 1.1597 1.1511
SD 0.2328 0.2136 0.2194 0.2014 0.1878 0.1907
n = 400
= 0.6 = 2
= 0.2 = 0.4 = 0.6 = 0.2 = 0.4 = 0.6
Mean 0.4827 0.4936 0.4878 0.4926 0.4885 0.4852
SD 0.0496 0.0423 0.0448 0.0362 0.0349 0.0351
Mean 1.7280 1.2236 0.7349 1.7032 1.2393 0.7324
SD 0.1223 0.1197 0.1172 0.1134 0.1051 0.1114
Mean 1.1253 1.1062 1.1339 1.0982 1.1121 1.0942
SD 0.1743 0.1610 0.1651 0.1524 0.1428 0.1448
Carlo study relating to a wider set of experiments than those described below is left for future
research. The data generating process considered throughout the simulation is given by
H(y
, ) = x
1
0
+x
2
+x
1
0
I(q >
0
) x
1
,
where H(y, ) =
|y|
sgn(y)1
0
= 0.5,
0
=
0
= 1. As the coefcient on x
2
is normalized to one, we are only considering the
estimation of
0
,
0
and
0
. Since the model contains conditional heteroscedasticity associated
with x
1
, only the coefcient on x
1
is expected to vary with different quantile indices.
First consider the case with xed censoring. We let y = min(y
, resulting in about 30% censoring. Consistent with Chen (2010), we nd that the
MISE is insensitive to the choice of both bandwidth h and weighting function (). Thus only
the results with h = 0.3 for n = 400 and h = 0.4 for n = 200 are reported in the table. Other
choices of h give similar estimation results. Throughout the simulation, we use the following
fourth-order integrated kernel function (Muller, 1984):
K(v) =
_
1
2
+
105
64
_
v
5
3
v
3
+
7
5
v
5
3
7
v
7
__
I(|v| 1) +I(v > 1)
and choose (c) = I(c c c), where c and c are 0.05-th and 0.65-th quantile of y,
respectively. The MISE is computed by a two-step algorithm based on a grid searching between
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
266 Z. Zhang
Table 2. Simulation results for random censoring.
n = 200
= 0.6 = 2
= 0.2 = 0.4 = 0.6 = 0.2 = 0.4 = 0.6
Mean 0.4743 0.4793 0.4838 0.4832 0.4892 0.4847
SD 0.1146 0.1027 0.1041 0.0834 0.0880 0.0867
Mean 1.6647 1.2163 0.7134 1.6935 1.2196 0.7252
SD 0.1835 0.1636 0.1670 0.1604 0.1581 0.1576
Mean 1.1456 1.1642 1.1552 1.1481 1.1623 1.1684
SD 0.2453 0.2274 0.2262 0.2185 0.2092 0.2075
n = 400
= 0.6 = 2
= 0.2 = 0.4 = 0.6 = 0.2 = 0.4 = 0.6
Mean 0.4774 0.4853 0.4879 0.4884 0.4902 0.4924
SD 0.0553 0.0570 0.0569 0.0472 0.0451 0.0436
Mean 1.7083 1.2242 0.7358 1.7165 1.2323 0.7357
SD 0.1384 0.1352 0.1348 0.1250 0.1236 0.1228
Mean 1.0782 1.0848 1.0773 1.0647 1.0934 1.0925
SD 0.1854 0.1796 0.1769 0.1687 0.1562 0.1532
the 0.15th and 0.85th empirical quantiles of q
i
s. We consider {0.6, 2}, n = {200, 400} and
= {0.2, 0.4, 0.6} with 1000 replications for each case. We report the empirical mean and
empirical standard deviation for each estimator.
Table 1 summarizes the simulation results and there are several main ndings. First, the
estimates of are essentially unbiased. They give desirable precision and their empirical standard
deviations decline fast with the sample size. Moreover, the decline rate seems consistent with the
n-asymptotics. Second, the estimates of
1
do change with the quantile indices. The estimates of
2
usually have a larger bias than
2
but the bias seems to decline with the sample size. Also, their
empirical standard deviations decline with the sample size and the magnitude of such decline is
generally consistent with the
n-asymptotics.
We also consider the case with random censoring. We adopt the same design except that
the observations consist of (y, t, x, q), y = min(y
, respectively,
resulting in about 25% censoring on the dependent variable. Again, we report the empirical
mean and standard deviation for each case. The results are summarized in Table 2. Overall, the
estimators perform well and behave in a pattern similar to the xed censoring case.
6. CONCLUSION
In this paper, we consider the maximum integrated score estimation of a generalized threshold
regression model with the dependent variable subject to both unknown monotone transformation
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 267
and some type of censoring. As one major advantage of our MISE, it allows us to deal with
the model under conditional quantile restriction, thus permitting certain robustness against
conditional heteroscedasticity of arbitrary form. Large sample properties of the proposed
estimator are formally established. The estimator for the threshold parameter is shown to
have a convergence rate n
1
, weakly converge to a non-normal distribution and asymptotically
independent of the remaining parameters. All these ndings are consistent with the theoretical
results obtained by the existing literature on threshold regression model. MISE-based inference is
discussed in detail. Simulation results indicate that our estimator performs well in nite samples.
ACKNOWLEDGMENTS
I am grateful to the co-editor Oliver Linton and three anonymous referees for their constructive
comments and suggestions. I am also grateful to Pingfang Zhu for many useful discussions. The
research is supported by the National Science Foundation (Grant No. 71271139).
REFERENCES
Andersen, P. K., O. Borgan, R. D. Gill and N. Keiding (1993). Statistical Models Based on Counting
Processes. New York, NY: Springer.
Asparouhova, E., R. Golanski, K. Kasprzyk, R. P. Sherman and T. Asparouhov (2002). Rank estimators for
a transformation model. Econometric Theory 18, 1099120.
Caner, M. and B. E. Hansen (2004). Instrumental variable estimation of a threshold model. Econometric
Theory 20, 81343.
Chan, K. S. (1993). Consistency and limiting distribution of the least squares estimator of a threshold
autoregressive model. Annals of Statistics 21, 52033.
Chen, S. (2010). An integrated maximum score estimator for a generalized censored quantile regression
model. Journal of Econometrics 155, 908.
Cho, J. S. and H. White (2007). Testing for regime switching. Econometrica 75, 1671720.
Collier, P. and J. W. Gunning (1999). Explaining African economic performance. Journal of Economic
Literature 37, 64111.
Davies, R. B. (1977). Hypothesis testing when a nuisance parameter is present only under the alternative.
Biometrika 64, 24754.
Delgado, M. A. and J. Hidalgo (2000). Nonparametric inference on structural breaks. Journal of
Econometrics 96, 11344.
Fazzari, S. M., R. Glenn Hubbard and B. C. Petersen (1988). Financing constraints and corporate
investment. Brookings Papers on Economic Activity 19, 14195.
Fleming, T. R. and D. P. Harrington (1991). Counting Processes and Survival Analysis. New York, NY:
John Wiley.
Gauthier, B. and M. Gersowitz (1997). Revenue erosion through exemption and evasion in Cameroon.
Journal of Public Economics 64, 40724.
Gonzalo, J. and M. Wolf (2005). Subsampling inference in threshold autoregressive models. Journal of
Econometrics 127, 20124.
Hansen, B. E. (1999). Threshold effects in non-dynamic panels: estimation, testing and inference. Journal
of Econometrics 93, 34568.
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
268 Z. Zhang
Hansen, B. E. (2000). Sample splitting and threshold estimation. Econometrica 68, 575603.
Horowitz, J. L. (1992). A smoothed maximum score estimator for the binary response model. Econometrica
60, 50531.
Horowitz, J. L. (1996). Semiparametric estimation of a regression model with an unknown transformation
of the dependent variable. Econometrica 64, 10337.
Jespersen, N. C. B. (1986). Dichotomizing a continuous covariate in the Cox model. Research Report 86/02,
Statistical Research Unit, University of Copenhagen.
Kosorok, M. R. and R. Song (2007). Inference under right censoring for transformation models with a
change-point based on a covariate threshold. Annals of Statistics 35, 95789.
Lancaster, T. (1990). The Econometric Analysis of Transition Data. Cambridge: Cambridge University
Press.
Lee, S. and M. Seo (2008). Semi-parametric estimation of a binary response model with a change-point due
to a covariate threshold. Journal of Econometrics 144, 49299.
Lee, S., M. Seo and Y. Shin (2011). Testing for threshold effects in regression models. Journal of the
American Statistical Association 106, 22031.
Liang, K., S. Self and X. Liu (1990). The Cox proportional hazards model with change point: an
epidemiologic application. Biometrics 46, 78393.
Luo, X., B. Turnbull and L. Clark (1997). Likelihood ratio tests for a change point with survival data.
Biometrika 84, 55565.
Manski, C. F. (1985). Semiparametric analysis of discrete response: Asymptotic properties of the maximum
score estimator. Journal of Econometrics 27, 31333.
Newey, W. K. and D. McFadden (1994). Large sample estimation and hypothesis testing. In R. F. Engle and
D. McFadden (Eds.), Handbook of Econometrics Volume IV, 2111245. Amsterdam: North Holland.
Pakes, A. and D. Pollard (1989). Simulation and the asymptotics of optimization estimators. Econometrica
57, 102757.
Politis, D., J. Romano and M. Wolf (1999). Subsampling. New York, NY: Springer.
Pons, O. (2003). Estimation in a Cox regression model with a change-point according to a threshold in a
covariate. Annals of Statistics 31, 44263.
Powell, J. L., J. H. Stock and T. M. Stoker (1989). Semiparametric estimation of weighted average
derivatives. Econometrica 57, 140330.
Seo, M. and O. Linton (2007). A smoothed least squares estimator for the threshold regression. Journal of
Econometrics 141, 70435.
Van den Berg, G. J. (2001). Duration models: Specication, identication and multiple durations. In J. J.
Heckman and E. Leamer (Eds.), Handbook of Econometrics Volume 5, 33813460. Amsterdam: North-
Holland.
Van der Vaart, A. and J. Wellner (1996). Weak Convergence and Empirical Process. New York, NY:
Springer.
Ye, J. and N. Duan (1997). Nonparametric n
1/2
-consistent estimation for the general transformation
models. Annals of Statistics 25, 2682717.
APPENDIX A: LEMMAS
LEMMA A.1. Let H
n
(a, , ) and H (a, , ) be dened by (2.5) and (3.2), respectively, in the text. Under
Assumptions 3.13.6, we have
H
n
(a, , ) H(a, , ) = O
p
(
1n
)
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 269
and
H
n
(a, , )
H(a, , )
= O
p
(
2n
),
uniformly over a (a
1
, a
2
) and (, ) {1, 1}
2
. In addition,
H
n
(a, , ) H(a, , ) =
1
(a, , )
1
n
n
i=1
(d
i
(a) )k
_
X
i
( ) H(a, , )
h
n
_
+o
p
(n
1/2
)
uniformly over a (a
1
, a
2
) and (, ) {1, 1}
2
, where
(a, , ) = E
_
x
2
_
20
10
(q)
1
(q, ,
1
)
2
_
+z
2
_
I(q >
0
)
20
10
(q)
1
(q, ,
1
)
I(q > )
2
_
_
H(a)
10
(q)
1
(q, ,
1
)
H(a, , )
_
,
H(a, , ) X
2
( )
2
1
(q, ,
1
)
, x
2
, q
_
with
10
(q) = 1 +
10
I(q >
0
),
1
(q, ,
1
) = 1 +
1
I(q > ) and
(s
1
, s
2
, x
2
, q) =
1
1
(q, ,
1
)
d
dv
F
u|xq
10
(q)v
1
(q, ,
1
)
+s
1
1
(q, ,
1
)
+s
2
, x
2
, q
f
x
1
|x
2
q
1
(q, ,
1
)
+s
2
x
2
, q
v=0
.
Proof: The proof is essentially the same as Chens (2010) Lemma A.1.
LEMMA A.2. For any random variables (v, q) satisfying E[v|q] = 0 and E[v
2
|q] < almost surely,
assume that (v
i
, q
i
), i = 1, . . . , n is a random sample of (v, q) and that q is continuously distributed and
has a bounded, continuous, positive density in a neighbourhood of r
0
R. Then for eachA > 0 and > 0,
there exists a positive constant B < such that for all 0 < < 1 and for all n > B/
Pr
_
sup
B/n<r<
1
n
n
i=1
I(r
0
< q
i
< r
0
+r)
Pr(r
0
< q
i
< r
0
+r)
1
> A
_
<
and
Pr
_
sup
B/n<r<
1
n
n
i=1
v
i
I(r
0
< q
i
< r
0
+r)
Pr(r
0
< q
i
< r
0
+r)
> A
_
< .
Proof: See Lee and Seos (2008) Lemma A.1.
LEMMA A.3. Under Assumptions 3.13.6,
1
n
n
i=1
(d
i
(a) )K
_
X
i
( ) c
h
n
_
1
n
n
i=1
(d
i
(a) )I(X
i
( ) c > 0)
0
almost surely uniformly over a (a
1
, a
2
), (, ) {1, 1}
2
and c G.
Proof: Notice that |d
i
(a) | is uniformly bounded over a (a
1
, a
2
), then the result essentially follows
from Horowitzs (1992) Lemma 4.
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
270 Z. Zhang
LEMMA A.4. For given > 0, dene
= {(, ) : |
0
| +
0
< }. Then there exists a
sufciently small > 0 and > 0 such that
sup
(,)
E{(d(a) )[I(x
+z
E{(d(a) )[I(x
+z
(
0
) +z
(
0
)| is sufciently small, |x
(
0
) +z
(
0
) +H(a, , ) H(a)| is also
sufciently small since H(a,
0
,
0
) = H(a) and H(a, , ) is continuous in and almost surely.
LEMMA A.5. Under Assumptions 3.13.6, n (
n
0
) = O
p
(1).
Proof: We prove this result by showing that there exists a n
1
-neighbourhood of
0
such that Q
n
(, )
Q
n
(
0
, ) < 0 with probability approaching one for all lying in this neighbourhood and for all lying in
a small neighbourhood of
0
. Consider
Q
n
(, ) Q
n
(
0
, )
=
_
1
n
n
i=1
(d
i
(a) )
_
K
_
X
i
( ) H
n
(a, , )
h
n
_
K
_
X
i
H
n
(a,
0
, )
h
n
__
(a)da
=
_
1
n
n
i=1
(d
i
(a) )
_
K
_
X
i
( ) H(a, , )
h
n
_
K
_
X
i
H(a,
0
, )
h
n
__
(a)da R
1n
+R
2n
=
_
1
n
n
i=1
(d
i
(a) )[I(X
i
( ) H(a, , ) > 0) I(X
i
H(a,
0
, ) > 0)](a)da
R
1n
+R
2n
+R
3n
+R
4n
,
where the second equality follows from Taylor expansion with
R
1n
= (H
n
(a, , ) H(a, , ))
_
1
nh
n
n
i=1
(d
i
(a) )k
_
X
i
( ) H
n
(a, , )
h
n
_
(a)da,
R
2n
= (H
n
(a,
0
, ) H(a,
0
, ))
_
1
nh
n
n
i=1
(d
i
(a) )k
_
X
i
H
n
(a,
0
, )
h
n
_
(a)da,
R
3n
=
_
1
n
n
i=1
(d
i
(a) )
_
K
_
X
i
( ) H(a, , )
h
n
_
I(X
i
( ) H(a, , )) > 0
_
(a)da
and
R
4n
=
_
1
n
n
i=1
(d
i
(a) )
_
K(
X
i
H(a,
0
, )
h
n
) I(X
i
H(a,
0
, )) > 0
_
(a)da,
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 271
H
n
(a, , ) lies between H
n
(a, , ) and H(a, , ), H
n
(a,
0
, ) lies between H
n
(a,
0
, ) and
H(a,
0
, ). It follows from Lemma A.1 that both R
1n
and R
2n
are O
p
(
1n
) uniformly over a (a
1
, a
2
) and
(, ) {1, 1}
2
. In addition, it follows from Lemma A.3, the uniform boundedness and non-
negativity of (a) that both R
3n
and R
4n
are o(1) almost surely uniformly over (, )
{1, 1} .
Suppose that >
0
. If q
i
> or q
i
<
0
, there always holds
I(X
i
( ) H(a, , ) > 0) I(X
i
H(a,
0
, ) > 0) = 0 (A.1)
because when q
i
> or q
i
<
0
, (a) X
i
( ) = X
i
and (b) H(a, , ) = H(a,
0
, ), where (b) follows from
the fact that H(a, , ) is totally determined by d(a) and X( ) for any given a by (3.2). Similarly, when
<
0
, (A.1) holds if q
i
< or q
i
>
0
.
Let
+
i
(, ) =
_
(d
i
(a) )[I(x
i
H(a, , ) > 0) I(x
i
+z
i
H(a,
0
, ) > 0)](a)da
and
i
(, ) =
_
(d
i
(a) )[I(x
i
+z
i
H(a, , ) > 0) I(x
i
H(a,
0
, ) > 0)](a)da.
Then based on the analysis above and (A.1), some manipulations give
Q
n
(, ) Q
n
(
0
, )
=
_
1
n
n
i=1
(d
i
(a) )[I(X
i
( ) H(a, , ) > 0) I(X
i
H(a,
0
, ) > 0)](a)da
R
1n
+R
2n
+R
3n
+R
4n
=
1
n
n
i=1
I(
0
< q
i
)
+
i
(, ) +
1
n
n
i=1
I( < q
i
0
)
i
(, ) +O
p
(
1n
) = A
n
+B
n
+O
p
(
1n
)
where
A
n
=
1
n
n
i=1
I(
0
< q
i
)E(
+
i
(, )|q
i
) +
1
n
n
i=1
I(
0
< q
i
)E(
i
(, )|q
i
)
and
B
n
=
1
n
n
i=1
I(
0
<q
i
)[
+
i
(, ) E(
+
i
(, )|q
i
)] +
1
n
n
i=1
I(
0
<q
i
)[
i
(, ) E(
i
(, )|q
i
)].
The it follows from the denition of
+
i
(, ),
i
(, ), the uniform boundedness of I() and Lemma A.4
that A
n
1
with probability approaching one for all |
0
| > B/n and for all (, )
for some
sufciently small ,
1
a small positive constant and
with
2
is another small and positive constant. Now notice that both A
n
and B
n
are independent of
1n
and
1n
0. Thus it is straightforward to see that Q
n
(, ) Q
n
(
0
, ) < 0 with probability approaching one
for all |
0
| > B/n and for all (, )
n
(, ) Q
n
(
0
, )] [Q
n
(,
0
) Q
n
(
0
,
0
)] = o
p
(n
1
) uniformly over (, )
A
= {(, ) : n
1/3
0
A, n|
0
| A} for some A > 0.
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
272 Z. Zhang
Proof: According to Lemma A.5 and Lee and Seo (2008) that
n
is n
1
consistent and
n
is at least n
1/3
consistent. Then it sufces to consider a n
1
-neighbourhood of
0
and a n
1/3
-neighbourhood of
0
. Now
suppose that >
0
. It follows from the reasoning similar to (A.1) that when q
i
> or q
i
<
0
,
K
_
X
i
( ) H
n
(a, , )
h
n
_
K
_
X
i
H
n
(a,
0
, )
h
n
_
= 0. (A.2)
The key insight behind (A.2) is that when q
i
> or q
i
<
0
, H
n
(a, , ) = H
n
(a,
0
, ) because according
to (2.5), H
n
(a, , ) is totally determined by (d
i
(a), X
i
( ))s for i = 1, . . . , n and X
i
( ) = X
i
for each i.
Based on this observation, we have
[Q
n
(, ) Q
n
(
0
, )] [Q
n
(,
0
) Q
n
(
0
,
0
)]
=
1
n
n
i=1
I(
0
< q
i
)(
+
i
(, )
+
i
(,
0
)) +
1
n
n
i=1
I( < q
i
0
)(
i
(, )
i
(,
0
)),
where
+
i
(, ) =
_
(d
i
(a) )
_
K
_
x
i
H
n
(a, , )
h
n
_
K
_
x
i
+z
i
H
n
(a,
0
, )
h
n
__
(a)da
and
i
(, ) =
_
(d
i
(a) )
_
K
_
x
i
+z
i
H
n
(a, , )
h
n
_
K
_
x
i
H
n
(a,
0
, )
h
n
__
(a)da.
Then to prove Lemma A.6, it sufces to show that
C
n
=
1
n
n
i=1
I(
0
< q
i
)(
+
i
(, )
+
i
(,
0
)) = o
p
(n
1
) (A.3)
uniformly over n
1/3
0
A and n|
0
| A for some A > 0. We prove (A.3) by using the
arguments similar to that for Lee and Seos (2008) Lemma 6.1.
Dene
+
(a, , ) = K
_
x
H(a, , )
h
_
K
_
x
+z
H(a,
0
, )
h
_
.
Consider a class of functions indexed by (, ) for some positive constant A and a (a
1
, a
2
),
M
A
= {I(
0
< q )(
+
(a, , )
+
(a, ,
0
)) : n
1/3
0
A and n|
0
| A}.
We now show that M
A
is a Vapnik Chervonenkis (VC) class of functions. First, it is trivial to show that
M
A
1
= {I(
0
< q ) : n|
0
| A} is a VC class of functions. Then to show M
A
is a VC class, it
sufces to show, according to Pakes and Pollards (1989) Lemma 2.5 and K(s/h) I(s > 0) that
M
A
2
= {I(x
H(a, , ) > 0) : n
1/3
0
A and n|
0
| A}
is a VC class because similar arguments apply to other components of (
+
(a, , )
+
(a, ,
0
)).
Recalling the denition (3.2), H(a, , ) is monotone in x
sup
(,)
A
I(
0
< q )(
+
(, )
+
(,
0
))
sup
n|
0
|A
E(I(
0
< q )) sup
n
1/3
0
A
E(
+
(, )
+
(,
0
))
= O(n
1
)O(n
1/3
) = o(n
1
).
It follows from the uniform boundedness of d(a) and (a) that
D
n
c
1
E
_
sup
(,)
A
1
n
n
i=1
I(
0
< q
i
)
+
i
(a, , ) EI(
0
< q
i
)
+
i
(a, , )
_
c
1
c
2
n
1/2
J(1, M
A
)E
1/2
(M
A
)
2
= O(n
7/6
)
by Theorem 2.14.1 of Van der Vaart and Wellner (1996), where c
2
< is a universal constant, J(1, M
A
)
is the uniform entropy integral dened in Van der Varrt and Wellner (1996), which is bounded for a VC
class and M
A
is the envelope of M
A
.
APPENDIX B: PROOFS OF THEOREMS
Proof of Theorem 3.1: Recall Q
n
)
by verifying the following conditions of Newey and McFadden (1994, Theorem 2.1) in sequence: (a)
Q
(, ) is uniquely maximized at (
0
,
0
); (b) Q
n
(, ) is maximized over a compact set that contains the
true parameters as the interior; (c) Q
(, ) is continuous; (d) Q
n
(, ) converges to Q
(, ) in probability
uniformly over (, ) .
For (a), rst it follows from Lee and Seo (2008, proof of Theorem 4.1) that Q(a,
0
,
0
, H(a))
Q(a, , , H(a, , )) > 0 for any (, ) with = {1, 1}
2
, (, ) = (
0
,
0
) and a
(a
1
, a
2
). Then it follows from the non-negativity of (a) that
Q
(
0
,
0
) Q
(, ) =
_
[Q(a,
0
,
0
) Q(a, , )](a)da
=
_
[Q(a,
0
,
0
, H(a)) Q(a, , , H(a, , ))](a)da > 0,
for any (, ) and (, ) = (
0
,
0
).
For (b), it sufces to denote = by Assumption 3.3. For (c), Q
(, ) is continuous
since I(x
+z
n
(a, , ) Q
(a, , ) = sup
cG
Q
n
(a, , , c) sup
cG
Q(a, , , c) = o
p
(1)
uniformly over a (a
1
, a
2
), (, ) , and moreover,
Q
n
(, ) Q
(, ) =
_
[Q
n
(a, , ) Q(a, , )](a)da = o
p
(1)
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
274 Z. Zhang
uniformly over (, ) . Finally,
1n
=
10
follows from the consistency result and Assumption
3.3(a).
Proof of Theorem 3.2: The n-consistency of
n
follows from Lemma A.5. Write
Q
n
(, ) Q
n
(
0
,
0
) = Q
n
(,
0
) Q
n
(
0
,
0
)
. ,, .
Q
1n
( )
+Q
n
(
0
, ) Q
n
(
0
,
0
)
. ,, .
Q
2n
()
+[Q
n
(, ) Q
n
(
0
, )] [Q
n
(,
0
) Q
n
(
0
,
0
)]
. ,, .
Q
3n
(,)
.
(B.1)
According to (B.1) and Lemma A.6, the
n
that maximizes Q
n
(, ) is equivalent to the one that maximizes
n (Q
1n
( ) +Q
3n
(, )) = n Q
1n
( ) +o
p
(1). To derive the limiting distribution of
n
, let us analyse the
limiting distribution of n Q
1n
( ). Since
n
is n-consistent, we only need to consider lying within a
n
1
-neighbourhood of
0
. Letting = n(
0
), the re-parameterization gives
n Q
1n
( ) = n(Q
n
(,
0
) Q
n
(
0
,
0
))
=
n
i=1
I(
0
< q
i
)
+
i
(,
0
) +
n
i=1
I( < q
i
0
)
i
(,
0
)
= I( > 0)
n
i=1
I
_
0
< q
i
0
+n
1
+
i
_
0
+n
1
,
0
_
+ I( < 0)
n
i=1
I
_
0
+n
1
< q
i
0
_
i
_
0
+n
1
,
0
_
= I( > 0)Q
+
n
() +I( < 0)Q
n
(),
where
Q
n
() =
n
i=1
I
_
0
+n
1
< q
i
0
_
i
_
0
+n
1
,
0
_
,
Q
+
n
() =
n
i=1
I
_
0
< q
i
0
+n
1
+
i
_
0
+n
1
,
0
_
,
+
i
(,
0
) =
_
(d
i
(a) )
_
K
_
x
0
H
n
(a, ,
0
)
h
n
_
K
_
x
0
+z
0
H
n
(a,
0
,
0
)
h
n
__
(a)da
and
i
(,
0
) =
_
(d
i
(a) )
_
K
_
x
0
+z
0
H
n
(a, ,
0
)
h
n
_
K
_
x
0
H
n
(a,
0
,
0
)
h
n
__
(a)da.
Without loss of generality, let us consider the weak convergence of the nite-dimensional distributions
of Q
+
n
(). To use the CramerWold device, let 0 <
1
< <
J
< for some positive integer J and
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 275
1
, . . . ,
J
be a sequence of constants. Consider the characteristic function of
S
n
=
J
j=1
j
(Q
+
n
(
j
) Q
+
n
(
j1
)),
which has the form
S
n
(t )
=
exp
1t
J
j=1
j
_
nj
(q)
+
_
0
+n
1
j
,
0
_
n,j1
(q)
+
_
0
+n
1
j1
,
0
__
n
,
with
nj
(q) = I(
0
< q
0
+n
1
j
),
+
(,
0
) =
_
(d(a) )
_
K
_
x
0
H
n
(a, ,
0
)
h
n
_
K
_
x
0
+z
0
H
n
(a,
0
,
0
)
h
n
__
(a)da.
Now letting
nj
(q) = I(
0
+n
1
j1
< q
0
+n
1
j
) and
+
=
_
(d(a) )
_
K
_
x
0
H(a)
h
n
_
K
_
x
0
+z
0
H(a)
h
n
__
(a)da,
we have
E
n
= lim
n
S
n
(t ) = lim
n
exp
1t
J
j=1
nj
(q)
n
since
0
+n
1
j
0
, H
n
(a, , ) converges to H(a, , ) uniformly over a (a
1
, a
2
), and
by Lemma A1, H(a,
0
,
0
) = H(a),
+
(,
0
) is uniformly bounded over and by the bounded
convergence theorem. Furthermore
E
n
= lim
n
1 +
J
j=1
[exp(
1t
j
nj
(q)
+
) 1]
n
= lim
n
1 +
J
j=1
E[exp(
1t
j
nj
(q)
+
) 1]
n
= lim
n
1 +
1
n
J
j=1
(
j
j1
)f
q
(
0
)E[exp(
1t
j
+
) 1|q =
+
0
] +o(1)
n
= exp
j=1
(
j
j1
)f
q
(
0
)E[exp(
1t
j
+
) 1|q =
+
0
]
,
where the rst equality follows from the identity exp (
j
a
j
) 1 =
j
(exp(a
j
) 1) when only one a
j
is
different from zero, the last equality follows from lim
n
(1 +a/n)
n
= exp(a), K(s/h
n
) I(s > 0) and
the dominated convergence theorem.
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
276 Z. Zhang
On the other hand, since the characteristic function of Q
+
() is
Q
+(t ) = exp [f
q
(
0
)(
+
(t ) 1)],
S =
J
j=1
j
(Q
+
(
j
) Q
+
(
j1
))
has the characteristic function
S
(t ) = exp [
J
j=1
(
j
j1
)f
q
(
0
)(
+
(
j
t ) 1)] = lim
n
S
n
(t ), thus
implying that Q
+
n
weakly converges to Q
+
. Moreover, following Lee and Seos (2008) Lemma 6.2, (B.8),
we can prove the tightness of Q
+
n
() by showing that
|Q
+
n
() Q
+
n
(
1
)||Q
+
n
(
2
) Q
+
n
()| C
q
(
2
1
)
2
with C
q
being the Lipschitz constant of F
q
. With these preliminary results, the limiting distribution of
n
follows from the argmax continuous mapping theorem.
To study the limiting distribution of
n
, we know from (B.1) and Lemma A.6 that the
n
that maximizes
Q
n
(, ) is equivalent to the one that maximizes
n(Q
2n
() +Q
3n
(, )) =
nQ
n
(
0
, ) +o
p
(1), which
does not depend on . In other words, the limiting distribution of
n
can be derived by viewing known
as
0
. If so, our model then can be regarded as a transformation model with no threshold effect. Notice that
given =
0
, the criterion function Q
n
(
0
, ) is smooth in , thus the limiting distribution of
n
can be
obtained through a standard Taylor expansion. To see this, Taylor expansion of the rst-order condition that
n
should satisfy, that is,
n
Q
n
(
0
,
n
)
2
= 0 gives
n(
2n
20
) =
_
2
Q
n
(
0
,
n
)
2
_
1
_
n
Q
n
(
0
,
0
)
2
_
, (B.2)
where
n
lies between
0
and
n
. Applying the same proofs in Chen (2010), we can show that
2
Q
n
(
0
,
n
)
2
p
and
n
Q
n
(
0
,
0
)
2
=
1
n
n
i=1
i
+o
p
(1).
Thus the
n-consistency and the desired limiting distribution of
2n
follows. Finally, the asymptotic
independence between
n
and
n
has been proved implicitly by Lemma A.6, (B.1) and the above
discussions.
Proof of Proposition 3.3: Similar to the proof of Theorem 3.1, the consistency result can be established
by checking a set of conditions of Newey and McFadden (1994, Theorem 2.1). In parallel with the proof
of Theorem 3.1, it sufces to show that Q
rn
(a, , , c) Q
r
(a, , , c) converges to zero in probability
uniformly over a (a
1
, a
2
), (, ) and c G, where
Q
r
(a, , , c) = E
__
I(y > a)
G(a)
_
I(X
( ) c > 0)
_
.
This can be shown by Lemma 2.4 of Newey and Mcfadden (1994), Lemma A.3 and the consistency of
KaplanMeier estimator (Fleming and Harrington, 1991).
The convergence rate and the limiting distribution of
rn
can be established by applying the reasoning
similar to that for the xed censoring case. The only distinction between themis that with randomcensoring,
there is a KaplanMeier estimator G
n
() in the criterion function (2.10), which should be taken into
consideration in the subsequent proof. For example, in proving Lemma A.5, the presence of KaplanMeier
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.
Quantile regression estimation of generalized threshold model 277
estimator leads to an extra remainder term R
r,5n
in addition to R
r,ln
for l = 1, . . . , 4, that is,
Q
rn
(, ) Q
rn
(
0
, )
=
_
1
n
n
i=1
_
d
i
(a)
G(a)
_
[I(X
i
( ) H(a, , ) > 0) I(X
i
H(a,
0
, ) > 0)](a)da
R
r,1n
+R
r,2n
+R
r,3n
+R
r,4n
+R
r,5n
,
where
R
r,1n
= (H
n
(a, , ) H(a, , ))
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)
_
h
1
n
k
_
X
i
( ) H
n
(a, , )
h
n
_
(a)da,
R
r,2n
= (H
n
(a,
0
, ) H(a,
0
, ))
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)
_
h
1
n
k
_
X
i
H
n
(a,
0
, )
h
n
_
(a)da,
R
r,3n
=
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)
__
K
_
X
i
( ) H(a, , )
h
n
_
I(X
i
( ) H(a, , )) > 0
_
(a)da,
R
r,4n
=
_
1
n
n
i=1
_
d
i
(a)
G
n
(a)
__
K
_
X
i
H(a,
0
, )
h
n
_
I(X
i
H(a,
0
, )) > 0
_
(a)da
and
R
r,5n
=
_
G(a) G
n
(a)
G(a)G
n
(a)
1
n
n
i=1
d
i
(a)[I(X
i
( ) H(a, , ) > 0)
I(X
i
H(a,
0
, ) > 0)](a)da.
Among the remainder terms R
r,ln
,i = 1, . . . , 5, R
r,1n
, . . . , R
r,4n
can be shown to be o
p
(1) via similar
arguments to the proof of Lemma A.5 and the extra remainder term R
r,5n
is o
p
(1) via the consistency
result of KaplanMeier estimator G
n
(a) and the uniform boundedness of the other components appearing
in R
r,5n
. Consequently, the n-consistency of
rn
can be established.
As another distinction from the xed censoring case, the presence of KaplanMaier estimator leads to
an extra term
2,ri
in the asymptotic linear representation for
rn
. But given the asymptotic independence
between
rn
and
rn
, deriving the limiting distribution of
rn
can essentially follow the same arguments to
Chen (2010), thus is not a new result in this paper.
C
2013 The Authors. The Econometrics Journal
C
2013 Royal Economic Society.