Integrated Conditional Moment Testing of Quantile Regression Models
Integrated Conditional Moment Testing of Quantile Regression Models
Integrated Conditional Moment Testing of Quantile Regression Models
Herman J. Bierens1 Department of Economics, Pennsylvania State University, 608 Kern Graduate Building, University Park, PA16802, and Tilburg University, The Netherlands
Donna K. Ginther Research Department, Federal Reserve Bank of Atlanta, 104 Marietta Street, NW, Atlanta, GA 30303
Abstract: In this paper we propose a consistent test of the linearity of quantile regression models, similar to the Integrated Conditional Moment (ICM) test of Bierens (1982) and Bierens and Ploberger (1997). This test requires re-estimation of the quantile regression model by minimizing the ICM test statistic with respect to the parameters. We apply this ICM test to examine the correctness of the functional form of three median regression wage equations.
Key words: Quantile regression; Test for linearity; Integrated conditional moment test; Wage equations 1 Correspondence address: Department of Economics, Pennsylvania State University, 608 Kern Graduate Building, University Park, PA 16802, USA. E-mail: hbierens@psu.edu. URL: http://econ.la.psu.edu/~hbierens/. Previous versions of this paper have been presented by the first author at the University of Pennsylvania, the Econometric Society European Meeting 1997, Toulouse, Johns Hopkins University, and the conference on Economic Applications of Quantile Regression in Konstanz, Germany. The constructive comments of the co-editor, Bernd Fitzenberger, and a referee, are gratefully acknowledged. 1
1.
Introduction
Median and quantile estimation methods have recently been applied to economic models
because these methods impose fewer restrictions on the data than widely-used mean regressions. The linear median regression model assumes that the conditional median of the dependent variable y is a linear function of the vector x of independent variables. The median regression model is particularly suitable if the conditional distribution of the y variable is fat-tailed, or if the lowest and/or highest values of y are truncated or misreported. The latter may occur if y is a measure of income, because respondents in the highest income group will often be reluctant or unwilling to reveal their true income. Also, the median regression model may serve as an alternative for a Tobit model if the conditional distribution of the error of the underlying latent model is symmetric but non-normal. To the best of our knowledge, the only paper in the econometrics literature that addresses the problem of consistent testing of the functional form of quantile regression models is Zheng (1998). Zhengs approach is based on weighted kernel regression estimation. In this paper we propose an alternative consistent test of the linearity of the median regression model, similar to the Integrated Conditional Moment (ICM) test of Bierens (1982) and Bierens and Ploberger (1997). This test can easily be extended to more general quantile regression models. This test requires re-estimation of the median regression model by minimizing the ICM test statistic with respect to the parameters. Although median and quantile regression has not been used as extensively as OLS in the empirical literature, recent papers have used this method to estimate wage equations and the conditional wage distribution. (See for example, Chamberlain 1994, Buchinsky 1994, 1995, 1997, and Poterba and Rueben 1994). However, the applicability of quantile regression is not limited to labor economics. For example, Chernozhukov and Umantsev (2001) estimate and analyze the conditional market risk of an oil producers stock price as a function of the key economic variables, using quantile regression, and discuss specification tests as well.. In order to show that not only in theory but also in practice the ICM test is able to detect 2
misspecification of quantile regressions models, we shall apply the ICM test to examine the functional form of three wage equations that has been estimated previously by quantile regression methods, using a sample of 28,155 male workers taken from the March 1988 Current Population Survey (CPS). All computations have been done using the econometrics software package EasyReg, written by the first author.(EasyReg is an interactive Windows 95/98/NT freeware program, which is downloadable from web page http://econ.la.psu.edu/~hbierens/EASYREG.HTM. The data is downloadable from web page http://econ.la.psu.edu/~hbierens/MEDIAN.HTM) In the discussion of the ICM test for quantile regression models we will focus on the median regression case. In section 2 of the appendix we show that only a minor change in the ICM testing protocol is required to cover more general quantile regression models as well.
2.
related to the vector x j of explanatory variables, possibly including a constant 1, by the median regression model y j ' 2Tx j % gj , where P[gj > 0*x j] ' P[gj < 0 *x j] . 0
T
(1)
As is well known, under some regularity conditions, in particular the conditions that the error term gj is continuously distributed with zero median and is independent of x j, and that E(x j x j) < 4 , the parameter vector 20 can be estimated consistently by the Least Absolute Deviation (LAD) estimator 2LAD ' argmin2 'n *y j & 2Tx j*, and j'1 n(2LAD&20 ) 6 Nk(0, (1/[2f(0)]2 )Q &1 ) in distribution,
n T
where f is the density of gj and Q = plimn64(1/n)'j'1 x jx j . See Koenker and Bassett (1978) and the references therein. The median regression model is equivalent to uj(2) ' I(yj & 2Tx j > 0) & I(y i & 2Tx j < 0) , E[u j(20 )*x j] ' 0 a.s. , where 20 0 1 d k with 1 the parameter space, I(.) is the indicator function, and (2)
Definition 1: x j is an m-vector of components of x j such that the F -Algebra generated by x j is equal to the F - Algebra generated by x j . For example, if x j = (1,zj,z2)T with zj a scalar random variable then x j ' z j . j The median regression model is correctly specified if H0 : 20 0 1 : P E[uj(20 )*xj] ' 0 ' 1. We will test the null hypothesis (3) against the general alternative that (3) is not correct, e.g. H1 : 2 0 1: P E[uj(2)*xj] ' 0 < 1. (4) (3)
3.
two hypotheses can be distinguished by using an infinite set of moment conditions of the type E[uj(2)w j(>)] = 0, where wj(>) ' w >TM(x j) , > 0 =, with M a bounded on-to-one mapping, w(.) an analytical function with all but a finite number of derivatives at zero unequal to zero, and = a subset of a Euclidean space with positive Lebesgue measure. Bierens (1990) has shown for the case w(.) = exp(.) that (5)
S(20 ) = = ,
and for all 2 0 1\{20 } , S(2) has Lebesgue measure zero and is nowhere dense, provided that the parameter 20 in (3) is unique. Under H1 , S(2) has Lebesgue measure zero and is nowhere dense for all 2 0 1 .
Bierens and Ploberger (1997) and Stinchcombe and White (1998) have shown that the same result holds for a much wider class of weight functions wj(.). In particular, Theorem 1 also holds if we choose w j(>) ' cos(>TM(x j)) % sin(>TM(x j)) , > 0 m , (6)
where M is the same as in (5). The advantage of this weight function is that it is uniformly bounded: *wj(>)* # 2. In this paper we will use the weight function (6), for reasons to be explained later. Q(2) ' where Assumption 1: wj(>) is defined by (5), is a probability measure on = that is absolutely continuous with respect to Lebesgue measure, and = is compact with positive Lebesgue measure, it follows from Theorem 1 that under H0, Q(20 ) = 0, whereas under H1, inf201Q(2) > 0. This result suggests to use the Integrated Conditional Moment (ICM) test based on inf Q(2) , where
201
* z(2,>)*2 d(>)
(8)
z(2 ,>) ' (1/n)'n uj(2)w j(>) . j'1 and Assumptions 1, nz(20 ,>) Y z(>),
(9)
It follows from the result in Bierens (1990) and Bierens and Ploberger (1997) that under H0
(10)
where "Y " means "converges weakly to" [cf. Billingsley (1968)], and z(.) is a zero mean Gaussian process on = with covariance function '(>1 ,> 2 ) ' E[z(>1 )z(>2 )] ' E[u1 (20 )2 w1 (>1 )w1 (>2 )]. Note that if Assumption 2: The conditional distribution of yj given x j is continuous,
T then P(yj & 20 x j ' 0) = 0, hence uj(20 )2 = 1 a.s. and consequently
(11)
(12)
It follows now from Bierens and Ploberger (1997) that under H0, Fn (2 0 ) ' nQ(20 ) (1/n)'n 1 wj(>)2 d(>) j' m 6 '4 g2 8i i'1 i ' 4 8i i'1 1 2 # sup j gi ' F m$1 m i'1
m
(13)
in distribution, where the gi 's are independent standard normally distributed, and the 8i 's are the '(> ,> )N (> )d(>2 ) ' 8iNi(>1 ) , with the Ni(>) the m 1 2 i 2 corresponding eigenfunctions. The eigenvalues 8i are non-negative and real valued, and the solutions of the eigenvalue problem eigenfunctions Ni(>) are real valued and orthogonal with respect to . The statistic Fn (20 ) cannot be used as a test statistic, because 20 is unknown. If we would plug in a n -consistent estimator 2 for 20 , for example the LAD estimator, we need to take the asymptotic distribution of n(Q(2) & Q(20 )) into account. Under similar conditions as in Powell (1984) and Fitzenberger (1997) it is possible to derive the exact asymptotic null distribution of nQ(2) , but this asymptotic distribution involves the unknown conditional density at zero of the error term gj in model (1). In principle we could estimate this density by a kernel density estimator, but the problem is that the height of the kernel density estimator is very sensitive for the choice of the window width, which renders this approach unreliable. For example, the t-values of the LAD model in Table 3.A below where computed using a kernel density estimator with standard normal kernel and window width h = s.n-0.2, where s is the mean of the absolute values of the LAD residuals in deviation of their sample mean. Its value at zero was 0.8853, but when the window width was multiplied by a factor 10 this value reduced to 0.5688 ! A possible alternative solution is to bootstrap Fn (2LAD) . However, for large data sets this may take too much computing time. In particular, the LAD estimation of the model in Table 3.A below, with 18 parameters and 28,155 observations, took about 15 minutes on a Pentium II PC, using EasyReg. Thus for this case 1000 bootstraps of Fn (2LAD) would take about 250 hours nonstop! Therefore, we propose the following more practical approach: Choose 6
F '
(14)
(15)
Then F ' Fn (2) # Fn (20 ) , hence the asymptotic inequality in (13) will be preserved: Theorem 2: Under Assumptions 1-2 and H0 , limsupn64P(F > F) # P(F > F) for all nonrandom F > 0.
Bierens and Ploberger (1997) have shown that P(F > 3.23) ' 0.10, P( F > 4.26) ' 0.05 level if F > 4.26. Admittedly, due to the inequality in Theorem 2 the actual size of the test will be smaller than the theoretical size (16), but this is the price we have to pay for feasibility. The actual size will be smaller than the unknown actual size of the test based on Fn (2LAD) , which in its turn is smaller than (16). There is no general answer to the question how much smaller the actual size is: The size distortion involved depends on the actual value of the conditional error density at zero, and on the distribution of the X variables (the latter via the eigenvalues of the covariance function), and therefore varies from case to case. Note that if we choose the weight function wj(>) as in (6) and the probability measure F symmetric, then wj(>)2 d(>) ' 1, so that then F ' nQ(2) . It is for that reason, and the fact that m then wj(>) is uniformly bounded, that we favor the weight function (6). The boundedness of the weight function is important for our applications, because due to the large sample size the integral in (8) has to be computed by Monte Carlo simulation. See section 2 of the appendix 7 (16) Thus we reject the median regression model at the 10% significance level if F > 3.23 and at the 5%
4.
H1. A sufficient condition for this is that Q(2) 6 Q(2) a.s. , (17) uniformly in 2 0 1 . Therefore, we will now set forth condition for uniform convergence of Q(2) . Under Assumption 2 it follows from the uniform strong law of large numbers of Jennrich (1969) [cf. Bierens (1994 Sec.2.7) for details] that sup>0=* , >) & E[u1 (2)w1 (>)]* 6 0 a.s. , z(2 (18)
pointwise in 2 0 1 , hence Q(2) 6 Q(2) a.s., pointwise in 2 0 1 . If the function uj(2) would be continuous in 2 and the parameter space 1 is compact, then by Jennrich's (1969) uniform strong law of large numbers this result would holds uniformly on1 as well. However, for the median regression model under review the function uj(2) is discontinuous in the parameters, so that the standard uniform convergence proof [see for example Bierens (1994, Sec. 4.2)] no longer applies. Nevertheless, it can be shown (see section 1 of the appendix) that (18) also holds uniformly on 1 , provided that:
Assumption 4: The conditional density f(y|x) of yj given x j = x satisfies E[supy f(y|x 1)] < 4 .
Then: Theorem 3: Under Assumptions 1-4, limn64sup201*Q(2) & Q(2)* ' 0 a.s. Thus, under the alternative hypothesis (4) we have liminfn64Q(2) $ inf201Q(2) > 0 a.s., which 8
5.
Local power of the ICM test for the case that 2 0 is known
In Bierens and Ploberger (1997) it has been shown that the ICM test of the functional form of
conditional expectation models has non-trivial n local power, which is superior to alternative consistent model misspecification tests based on comparison of the parametric functional form involved with a nonparametric kernel regression model. See the references in Zheng (1998). In this section we will therefore derive the local power properties of the ICM test for median regression models, but only for the special case that 20 is known. For the general case (14) we need to derive the limiting distribution and rate of convergence of the ICM estimator (15) under the local alternative, which can be done similar to Bierens and Ploberger (1997), using the conditions in Powell (1984) and Fitzenberger (1997). However, since this ICM estimator is of no particular interest by itself (it only serves an auxiliary role in the ICM test), deriving this limiting distribution is beyond the scope of this paper. Our purpose is to show that for this special case the ICM test has better local power than Zhengs (1998) test, as a motivation for the use of the ICM test (14). The local alternative involved is similar to the local alternative considered by Zheng (1998): H1L : yj ' 2Tx j % g(x j)/ n % gj , 0 where g is a uniformly bounded nonlinear function of x j such that P[g(x j) ' 0] < 1, and gj is independent of x j, with continuous distribution function F(.) satisfying F(0) = 0.5. Moreover, let f(.) be the density of gj , and assume that f(0) > 0 and f(.) is differentiable with uniformly bounded derivative
f )
(19)
(.) . Furthermore, we may without loss of generality assume further that 20 ' 0 .
Under the local alternative (19) with 20 ' 0 and weight function (6), it follows similarly to Bierens and Ploberger (1997) that
(20)
on = , where un,j ' I[gj > &g(x j)/ n] & I[gj < &g(x j)/ n] , and z((>) is a zero mean Gaussian process with covariance function
2 '((>1 ,>2 ) ' lim E[un,jwj(>1 )w j(>2 )] & E[un,jw j(>1 )] E[uj,n w j(>2 )] n64
(21)
' E[wj(>1 )w j(>2 )] ' '(>1 , >2 ). Compare (12). Thus, z((>) has the same distribution as the Gaussian process z(>) in (10) . The
2 second equality in (21) follows from the fact that E[un,j*x j] ' 1 a.s., and that by the Taylor expansion
and the uniform boundedness of g(.), f )(.) , and wj(>) , E[un,jwj(>)] ' E 1 & 2F(&g(x 1 )/ n)) w j(>) ' f(0)/ n E[g(x 1 )w1 (>)] % O(1/n) uniformly in > . It is now not hard to verify that under the local alternative, m in distribution. Finally, similarly to Bierens and Ploberger (1997) it can be shown that nQ(0) 6 z((>) % f(0)E[g(x 1 )w1 (>)] 2 d(>) m m Therefore, in the case that 20 is known the ICM test has non-trivial z((>) % f(0)E[g(x 1 )w 1 (>)] 2 d(>) > z((>) 2 d(>) . n local power, whereas in the
same case the test by Zheng (1998) has non-trivial local power only at a slower rate. For the general case we need to show that (1/ n)'n un,j & un,j w j(>) Y z(((>) on = for j'1 some Gaussian process z(((>) , where un,j ' I[gj > (2&20 )Tx j & g(x j)/ n] & I[gj < ( 2&20 )Tx j & g(x j)/ n], because then it is guaranteed that under the local alternative, plimn64Q(2) > 0 . Again, this can be done under similar conditions as in Powell (1984) and Fitzenberger (1997), by mimicking the approach in Bierens and Ploberger (1997). This comparison of the local power of the ICM test and Zhengs test is only a theoretical comparison, and one may wonder how much the finite sample powers differ. A Monte Carlo study could answer this question, but that is beyond the scope of this paper. Some preliminary unpublished Monte Carlo results by Bernd Fitzenberger suggest that the asymptotic advantage of the ICM test might require very large samples to be present in finite samples. 10
6.
6.1
Empirical illustrations
The data Linear quantile regression methods have been applied to estimating wage equations and
characterizing the conditional distribution of the log of wages (Chamberlain 1994, Buchinsky 1994, 1995, 1997, and Poterba and Rueben 1994). This application of the ICM test uses data from the same source as the above studies, the Current Population Survey (CPS). This study uses data on males from the 1988 March CPS and uses criteria similar to Buchinsky (1994) to sample the data. The March CPS contains information on previous year's wages, schooling, industry, and occupation. We select a sample of men ages 18 to 70 with positive annual income greater than $50 in 1992, who are not self-employed nor working without pay. The wage data is deflated by the deflator of Personal Consumption Expenditure for 1992. Our data contains 28,155 observations and has variables for age, years of schooling, years of potential work experience, industry, occupation, and dummy variables for race, region of residence, living in an SMSA, and working part time.
6.2
Wage specifications As an illustration of the performance of the ICM test we examine three wage equation
specifications. The first specification is the traditional Mincer-type model where the log of weekly wages are regressed on a constant, years of schooling, years of potential work experience and its square, and a race dummy. This model is used in most labor papers, including Buchinsky (1994). The second specification adds a cubic and quartic in potential work experience. Murphy and Welch (1990) indicate that a quartic in potential work experience fits the data better than the standard quadratic in experience used in the Mincer model. Finally, we examine a specification similar to that employed by Buchinsky (1997). The log of real weekly wages is regressed on years of schooling and its square, years of potential experience and its square, schooling interacted with experience, and dummy variables for region, SMSA, region interacted with SMSA, part time work, race, and race interacted with 11
schooling, experience, and part time employment. The variable names and their definitions are:
rwage: Real weekly wage. Calculated by taking Total Earnings from wages and salaries last year (1987) divided by weeks worked last year. The weekly wage is deflated by the deflator for Personal Consumption Expenditures where 1992 is the base year. ed: exper: regne: regmw: regw: smsa: race: Years of schooling. Years of schooling can take on values from zero to 18. Years of experience: age - years of schooling - 6. Dummy variable = 1 if lives in the North East. Dummy variable = 1 if lives in the Midwest. Dummy variable = 1 if lives in the West. Dummy variable = 1 if lives in a Standard Metropolitan Statistical Area (SMSA). Dummy variable = 1 if black. Only blacks and whites are included in the sample. The other categories are omitted from the sample. parttime: Dummy variable = 1 if worked less than 35 hours a week at job last year.
The other variables in the model are powers or products of these variables.
6.3
Practical implementation of the ICM test. The ICM tests have been conducted using the weight function functions wn,j(>) ' cos(>TM(x n,j)) % sin(>TM(xn, j)) (22)
(cf. (6)), where M(x) is a vector valued function with components arctan(x i), and x n,j is the vector of instruments x j , in deviation of their sample means and standardized by their sample standard errors. See Bierens (1982, 1990) for the reason for the latter. We recall that only the untransformed regressors need to be used as instruments. Thus, products and powers of explanatory variables are excluded from the list of instruments, because the F-Algebra generated by these instruments is the same as the F-Algebra generated by all regressors. The instrumental variables involved in the Tables 1.B-3.B 12
below are underlined. Moreover, the chosen probability measure F in the ICM test is the uniform probability measure on the hypercube = = [!c,c]...[!c,c], where c is either 1, 5 or 10. The reason for choosing different values of c is the following: With weight function (22) and the uniform probability measure F involved, the test statistic of the ICM test takes the form
n F(c) ' (2c)&m min ...... *(1/ n)'j'1 uj(2)w n,j(>)*2 d> m 201 m &c &c c c
(23)
sin c[arctan(x n,i(k))&arctan(x n,j(k))] 1 ' min j j u i(2)uj(2)k , c[arctan(x n,i(k))&arctan( x n,j(k))] 201 n i'1 j'1 k'1 where m is the number of instruments, and uj(2) is defined by (2). See (37) in the appendix for the second equality. If follows now from (23) that F(0) ' min201[(1/ n)'n u j(2)]2 6 0 j'1 (24)
in distribution, and F(4) = 0. Thus, if we choose c too small or too large, the power of the test will be negatively affected. Actually, the choice of c is crucial for the finite sample power of the test. One might think of choosing c as to maximize the test statistic (23). However, it is not clear whether the convergence in distribution results under the null hypothesis hold uniformly in c. More research is needed to investigate this issue, which is beyond the scope of this paper. The LAD estimators were used as starting values for computing the ICM test statistics. Since the signs of the large median residuals will likely not change, the minimization of the ICM objective function has been conducted by adjusting the 10% smallest (in absolute value) residuals only, using the downhill simplex method of and Nelder and Mead (1965) (see also Press et al. 1989, pp. 289-293), which do not require the use of derivatives, and after each iteration round we have checked whether the signs of the other 90% residuals have changed. Only one iteration round sufficed. The integral of the ICM statistic has been computed by Monte Carlo simulation, using 1000 random drawings from the uniform distribution on =, because the sample size is too large to compute this integral numerically by (23). 13
A more detailed description of this testing protocol will be given in section 2 of the appendix.
6.4
Median regression estimation and test results The t-values of the LAD estimators are based on the assumption that the error gj is
independent of the explanatory variables2, as in Koenker and Bassett (1978). The error density f(.) has been estimated by a kernel density estimator with standard normal kernel and window width h = s.n-0.2, where s is the mean of the absolute values of the LAD residuals in deviation of their sample mean. This scaling factor s makes the kernel density estimator location and scale invariant. <Insert Tables 1.A,B-3.A,B about here> The LAD estimation results are presented in Tables 1.A-3.A, and ICM estimation and test results are presented in Tables 1.B-3.B. First, observe that the ICM estimators are virtually the same as the LAD estimators. We will explain this in section 3 of the appendix. Next, observe that the ICM tests for c = 1 do not reject the three models at the 5% significance level, and that the ICM tests for c = 5 and c = 10 reject the median regressions in Tables 1.A-2.A at the 5% significance level. The differences of the ICM test statistics for c = 1, c = 5 and c = 10 in Tables 1.B-2.B indicate that the null hypothesis is not true, despite the test results for c = 1. Apparently, the value c = 1 is too close to zero [cf. (24)]. Only the Buchinsky type median regression in Table 3.A is accepted by the ICM test at the 10% significance level for c = 1 and c = 10, and at the 5% significance level for c = 5. These empirical illustrations clearly demonstrate the practical applicability and usefulness of the ICM test for quantile regression models.
2 This excludes heteroskedasticity of the errors. Admittedly, this assumption may not be realistic for wage equations. If so, the t-values reported in Tables 1.A-3.A are biased. However, asymptotically the ICM test is not affected by heteroskedasticity. See Assumptions 2 and 4. 14
Appendix
1. Proof of Theorem 3 We can approximate the median regression model by a continuously differentiable model, in various ways. For example, let for arbitrary small g > 0, uj(g )(2)
(yj & 2Txj)/g
'
&4
*v*exp(&v 2 /2)dv & 1 ' 1 & exp [&0.5 (yj&2Tx j)2 /g2 ] uj(2),
(25)
and z (g)(2,>) ' (1/n)'n uj(g)(2)wj(>) . Then it is easy to verify from (2), (7) and (25) that j'1 * , >) & z (g)(2 , >)* # z(2
n
(26)
It follows from Assumption 3 and Jennrich's (1969) uniform law of large numbers that for any fixed c > 0, sup
201 , g $ c
(27)
Moreover,
1 1 T 2 2 T 2 2 limsup sup j exp(&0.5(yj &2 x j) /g ) # limsup sup n j exp(&0.5(yj&2 x j) /c ) n64 2 0 1 , 0 # g # c n j'1 n64 201 j'1 1 # limsup sup * j exp(&0.5(yj&2Tx j)2 /c 2 ) & E[exp(&0.5(y1 &2Tx 1 )2 /c 2 )]* n64 2 0 1 n j'1 % sup E[exp(&0.5(y1 &2Tx 1 )2 /c 2 )] ' sup E[exp(&0.5(y1 &2 Tx 1 )2 /c 2 )] a.s.
20 1 201 n
(28)
Furthermore, it follows from Assumption 4 that 0 # E(exp[&0.5(y 1 &2Tx 1 )2 /g2 ]) # g 2BE[sup y f(y|x 1 )] 15 (29)
hence limg90 sup 201E(exp[&0.5(y 1 &2Tx 1 )2 /g2 ]) ' 0. limsup and similarly
n64
(30)
Combining (27) through (30), it follows that for arbitrary c > 0 there exists an number a > 0 such that sup
201 , 0#g#a
(31)
sup
201 , 0#g#a
(32)
where Q (g)(2) ' m * (g)(2 ,>)*2 d(>), Q (g)(2) ' z m *E [u1(g)(2)w1 (>)]*2 d(>) . (33)
Moreover, it follows from Jennrich's (1968) uniform law of large numbers that for fixed g > 0, Q (g)(2) 6 Q (g)(2) a.s. , uniformly on 1 . Combining (31), (32) and (34), and letting c 9 0 , Theorem 3 follows. Q.E.D. (34)
2.
The ICM test algorithm Given model (1), the first thing we have to do is to select the variables to be included in the
vector x j ' (xj(1) ,....., x j(m))T of instruments. For example, in the case of the Buchinsky type model in Table 3.B, x j consists of the (m = 8) underlined variables, i.e., the smallest set of variables such that the conditional distribution of yj given x j is the same as the conditional distribution of yj given x j . Next, we have to standardize x j , as follows (see Bierens 1982, 1990 for the reason): x n,j ' (xn,j(1),....., x n,j(m)) , where x n,j(i) '
T
x j(i) & (1/n)'n xj(i) j'1 (1/n)'j'1 x j(i)2 & (1/n)'j'1 x j(i)
n n 2
for I = 1,...,m. Moreover, we have to transform x n,j by a bounded one-to-one mapping M: m 6 m . We have chosen M((x 1 ,...,.x m )T) ' (arctan(x 1 ) ,.....,arctan(x m ))T . 16
Denote for > 0 = ' m [&c,c] , with c = 1, 5 or 10, i'1 up,n,j(21 , 2) ' dp,j(21 )u j(2) % (1 & dp,j(21 ))u j(21 ) (35)
and zp,n (21 ,2 , >) ' (1/n)'n up,n,j(21 , 2)wn,j(>) , where wn,j(>) is defined by (22), uj(2) is defined j'1 by (2), and dp,j(21 ) is a dummy variable which takes the value 1 if j belongs the set of the p%
T observations with the smallest value of *yj&21 x j* , and zero if not. In the empirical applications we have
chosen p = 10. Let be the uniform probability measure on = . Then the integral Qp (21 ,2) = m *zp,n (21 ,2 , >)*2 d(>) can be approximated by 1 N QN,p (21 , 2) ' j zp,n (21 ,2 ,>s) 2 , N s'1 (36)
were >1 ,.....,>N are random drawings from the uniform distribution on = . In the empirical application we have chosen N = 1000. Note that, due to the boundedness of wn,j(>) , (36) is a mean of bounded random variables zp,n (21 ,2 , >s)2 , conditional on the data and given the values of 21 and 2 . This will boost the performance of the law of large numbers on which this approximation relies. Moreover, it is not hard to verify that for the weight function (22) and the uniform probability measure on = = m [&c,c] , i'1 w (>)wn,j(>)d(>) ' k m n,i
k'1 m
(37)
hence wn,j(>)2 d(>) ' 1 . m Now given the initial value 21 ' 2LAD , minimize QN,p (21 , 2) to 2 , using the simplex method of Nelder and Mead (1965), which yields 22 ' argmin2 QN,p (21 ,2) . Since the objective function is piecewise constant, the vertexes of the start-simplex should be chosen sufficiently far away from the start point, say by line search in the principle directions up to the point where the objective function becomes unequal to its value in 21 . If uj(21 ) ' u j(22 ) for dp,j(21 ) = 0 then we are done: 2 . 2 ' 22 , and the simulated ICM 17
test statistic becomes F . nQN,p ( 2, 2) , else we repeat the minimization procedure with 21 replace by 22 . It would be better to conduct this algorithm on the basis of the exact integral Qp (21 ,2) , but this will involve m(n 2 & n)/2 multiplications of different pairs of (35) and the m factors of the product in (37), plus n squares of (35) . Since our sample size is n = 28155, the computation of the integralQp (21 ,2) will take therefore too long on a regular PC (The computation of each of the three ICM test statistics (38) for the model in Table 3.B took about 8 hours on a Pentium II PC , using EasyReg). It is easy to verify that in order to extend the ICM test to more general quantile regressions, with P(yj & 2Tx j # 0*x j) ' q, we only need to redefine uj(2) in (2) as 0 I(yj & 2 Tx j > 0) & I(y i & 2Tx j < 0) % 2q & 1 2 q(1&q) (38)
u j(2) '
where the scaling involved guarantees that under the null hypothesis, E[uj(20 )2 ] ' 1, and start the minimization of (36) from the corresponding quantile estimator.
3.
Why are the LAD and ICM estimators close? In order to explain why the LAD and ICM estimators in Tables 1-3 are so close, suppose that
the correct median regression model is yj ' g(x j) % gj , and that gj is independent of x j , with distribution function F(.) satisfying F(0) = 0.5. Moreover, let H(.) be the distribution function of x j . Since the LAD estimator is actually a method of moment estimator, namely the solution of the moment conditions (1/n)'n uj(2)x j ' 0 , it converges a.s. to j'1
18
2( ' argmin2E[uj(2)x j]22 ' argmin2E[(1 & 2F(g(x j) & 2Tx j))x j]22
2 2
' argmin (1 & 2F(g(x 1 ) & 2Tx 1 ))(1 & 2F(g(x 2 ) & 2Tx 2 ))W1 (x 1 ,x 2 )dH(x 1 )dH(x 2 ) , mm 2
(39)
where W1 (x 1 ,x 2 ) ' x 1Tx 2 , whereas the ICM estimator with weight function w(>TM(x j)) converges to 2(( ' argmin 2E[uj (2)w(>TM(x j))]22 d(>) m 2 ' argmin (1 & 2F(g(x 1 ) & 2Tx 1 ))(1 & 2F(g(x 2 ) & 2Tx 2 ))W2 (x 1 , x 2 )dH(x 1 )dH(x 2 ) , mm 2 where W2 (x 1 ,x 2 ) '
(40)
w(>TM(x 1 ))w(>TM(x 2 ))d(>) . Therefore, asymptotically the difference between m the LAD estimator and the ICM estimator will not be substantial if the misspecification is rather modest, because the objective functions (39) and (40) only differ with respect to the weight functions W1 and W2.
References
Bierens HJ (1982) Consistent Model Specification Tests. Journal of Econometrics 20: 105-134. Bierens HJ (1990) A Consistent Conditional Moment Test of Functional Form. Econometrica 58: 1443-1458. Bierens HJ (1994) Topics in Advanced Econometrics: Estimation, Testing, and Specification of Cross-Section and Time Series Models. Cambridge University Press, Cambridge. Bierens HJ, Ploberger W (1997) Asymptotic Theory of Integrated Conditional Moment Tests. Econometrica, 65: 1129-1151. Billingsley P (1968) Convergence of Probability Measures. John Wiley, New York. Buchinsky M (1994) Changes in the U.S. Wage Structure 1963-1987: Application of Quantile
19
Regression. Econometrica, 62: 405-458. Buchinsky M (1995) Quantile Regression Box-Cox Transformation Model, and the U.S. Wage Structure, 1963-1987. Journal of Econometrics 65: 109-154. Buchinsky M (1997) Recent Advances in Quantile Regression Models: A Practical Guide for Empirical Research. Journal of Human Resources (forthcoming). Chamberlain G (1994) Quantile Regression, Censoring, and the Structure of Wages. In: Sims C. (ed.) Proceedings of the Sixth World Congress of the Econometric Society. Cambridge University Press, New York. Chernozhukov V, Umantsev L (2001) Conditional Value-at-Risk: Aspects of Modeling and Estimation. Empirical Economics (this issue). Fitzenberger B (1997) A Guide to Censored Quantile Regression. In: Madala GS, Rao CR (ed.) Handbook of Statistics, Vol. 15. Elsevier, Amsterdam. Hansen LP (1982) Large Sample Properties of Generalized Method of Moments Estimators. Econometrica 50: 1029-1054. Jennrich RI (1969) Asymptotic Properties of Nonlinear Least Squares Estimators. Annals of Mathematical Statistics 40: 633-643. Koenker R, Bassett G (1978) Regression Quantiles. Econometrica 46: 33-50. Murphy K M, Welch F (1990) Empirical Age-Earnings Profiles. Journal of Labor Economics 8: 202-229. Nelder JA, Mead R (1965) A Simplex Method For Function Minimization. The Computer Journal 7: 308-313. Poterba J M, Reuben KS (1994) The Distribution of Public Sector Wage Premiums: New Evidence Using Quantile Regression Methods. NBER Working Paper. Powell JL (1984) Least Absolute Deviations Estimation of the Censored Regression Model. Journal of Econometrics 25: 303-325.
20
Press WH., Flannery BP, Teukolsky SA, Vetterling WT (1989) Numerical Recipes (Fortran Version). Cambridge University Press, Cambridge. Stinchcombe MB, White H (1998) Consistent Specification Testing with Nuisance Parameters Present Only Under the Alternative. Econometric Theory 14: 295-325. Zheng JX (1998) A Consistent Nonparametric Test of Parametric Regression Models under Conditioning Quantile Restrictions. Econometric Theory 14: 123-138.
21
TABLES
LAD estimation results for the standard Mincer type model estimates t-values -0.251165 -18.132 0.093462 68.514 0.076289 80.844 -0.001274 -62.568 4.279231 208.134
Table 1.B: ICM estimation and test results for the standard Mincer type model c = 1 c = 5 c = 10 X race -0.251183 -0.251832 -0.251853 0.093469 0.093711 0.093718 ed exper 0.076294 0.076492 0.076498 2 -0.001274 -0.001277 -0.001277 exper 1 4.279546 4.290609 4.290952 ICM test 4.224736 27.54178 18.57165 Critical values 10%: 3.23 5%: 4.26
LAD estimation results for the quartic model estimates t-values -0.245609 -18.003 0.095481 70.784 0.166344 49.568 -0.008562 -30.016 0.000201 23.086 -0.000002 -20.445 4.005403 181.053
22
ICM estimation and test results for the quartic model Table 2.B: X c = 1 c = 5 c = 10 -0.245609 -0.246033 -0.246020 race ed 0.095481 0.095646 0.095641 0.166344 0.166631 0.166622 exper exper2 -0.008562 -0.008577 -0.008577 3 0.000201 0.000201 0.000201 exper 4 exper -0.000002 -0.000002 -0.000002 1 4.005398 4.012319 0.401210 ICM test 2.425816 11.26330 7.969391 Critical values 10%: 3.23 5%: 4.26
Table 3.A: LAD estimation results for the Buchinsky type model estimates t-values X ed 0.137425 17.115 0.086972 43.398 exper regne 0.046971 2.173 regmw -0.013488 -0.786 0.042601 2.337 regw smsa 0.134946 9.895 -0.267068 -3.566 race parttime -0.934852 -69.697 -0.001063 -48.122 exper2 2 ed -0.000397 -1.516 -0.001785 -16.719 ed*exper regne*smsa 0.055559 2.314 0.097281 4.795 regmw*smsa regw*smsa 0.004530 0.212 0.000753 0.149 race*ed race*exp 0.000678 0.648 0.142179 3.473 race*parttime 1 3.788416 57.192
23
ICM estimation and test results for the Buchinsky type model Table 3.B: X c = 1 c = 5 c = 10 0.137446 0.137841 0.138012 ed exper 0.086984 0.087234 0.087342 0.046978 0.047114 0.047171 regne regmw -0.013490 -0.013529 -0.013545 0.042607 0.042730 0.042782 regw smsa 0.134966 0.135356 0.135521 -0.267107 -0.267876 -0.268206 race parttime -0.934991 -0.937682 -0.938834 2 -0.001063 -0.001066 -0.001067 exper ed2 -0.000397 -0.000398 -0.000399 -0.001785 -0.001791 -0.001793 ed*exper regne*smsa 0.055567 0.055727 0.055796 0.097295 0.097575 0.097695 regmw*smsa regw*smsa 0.004530 0.004543 0.004549 0.000753 0.000756 0.000756 race*ed race*exp 0.000678 0.000680 0.000681 0.142608 0.142784 race*parttime 0.142110 1 3.788976 3.799885 3.804553 ICM test 1.981098 3.333470 2.292801 Critical values 10%: 3.23 5%: 4.26
24