Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
23 views

Multiple Regression Inference

The document discusses the sampling distribution of ordinary least squares (OLS) estimators and how it can be used for statistical inference about regression coefficients. It states that the OLS estimators are normally distributed, allowing hypotheses about coefficients to be tested using t-tests. A t-test compares the coefficient estimate to its standard error to calculate a t-statistic, which follows a t-distribution that can be used to determine statistical significance. An example tests whether experience has an effect on wages using a t-test of the experience coefficient.

Uploaded by

Tram Luong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Multiple Regression Inference

The document discusses the sampling distribution of ordinary least squares (OLS) estimators and how it can be used for statistical inference about regression coefficients. It states that the OLS estimators are normally distributed, allowing hypotheses about coefficients to be tested using t-tests. A t-test compares the coefficient estimate to its standard error to calculate a t-statistic, which follows a t-distribution that can be used to determine statistical significance. An example tests whether experience has an effect on wages using a t-test of the experience coefficient.

Uploaded by

Tram Luong
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

3 Multiple Regression Model.

Inference

3.1 Sampling Distribution of OLS Estimator

In this section, we will learn how to use the OLS estimates to test various
hypotheses about the coe¢cients. This is called inference. Inference is crucial
for economic analysis as it allows to test economic theories and evaluate impact
of policies.
Throughout this chapter, we will maintain the classical linear regression
assumptions 6-10. In addition, we make the following assumption:

Assumption 11 Normality: The disturbances fui g are i.id. ui  N (0;  2 );


and independent of the explanatory variables x.

This assumption is needed to derive the sampling distribution of the OLS


estimator, which in turn is required for testing hypotheses. This assumption
implies that the dependent variables is distributed normally conditional on X :

2
yjX  N (X ;  In ):

or

2
yi jX  N (i ;  )

i = E (yi jX ) =  0 +  1 xi1 +  2 xi2 + ::: +  k xik

Why do we assume a normal distribution? Normality is justi…ed by a central


limit theorem (CLT) which says that the normalized sum of i.i.d. variables with
…nite variances can be approximated by a normal distribution. This means that
even if, in reality, the u are not normal, for large n, their distribution can be
approximated by a normal distribution. Moreover, normal distributions have a
convenient property: any linear combination of normal variables is again normal.
Recall that a standard normal distribution has a bell shape, it is symmetric
about 0. See Figure B.7.
Assumption 11 implies that the sampling distribution of the OLS estimator
is normal:

Theorem 14 Under Assumptions 6-11, the conditional distribution of the OLS


estimator is normal with mean, E ( b
jX ) =  and the variance V ar( b
jX ) =
2 0 1
 (X X )
b 2 0 1
jX  N (;  (X X ) )

and !
b 2
 j jX  N j ; :
SSTj (1  Rj2 )

b :
We can standardize the distribution of  j

b
j  j
 N (0; 1) (50)
b )
sd( j

42
q
b ) =
where sd( b . However, in practice
b ) is the standard deviation of 
V ar(
j j j

b ) is not known, and therefore, we need to replace it with its estimate:


sd( j

q
b ) =
se( j Vd b ):
ar( j (51)

Then, it turns out that the distribution

b
j  j
(52)
b )
se( j

is no longer normal, but is a t-distribution. More speci…cally, we have the


following theorem:

Theorem 15 Under Assumptions 6-11,

b
j  j
 tnk1 (53)
se( b
j )

where n  k  1 are degrees of freedom.

In general, t-distribution (or Student’s distribution) has a similar shape as


the standard normal distribution, symmetric about 0, but has heavier tails, see
Figure B.10. As the degrees of freedom increase, t-distribution approaches a
normal distribution. As a rule of thumb, for df > 100 a t distribution can be
approximated reasonably well by a normal distribution. A tp distribution can
be viewed as the ratio of two random variables: the numerator, which has the
standard normal distribution N (0; 1), and the denominator, which is a random
variable that is independent of the numerator and is equal to the square root of
the sum of squares of p independent N (0; 1) variables. That is, loosely speaking,

Y
tp = q 
X1 + :::: + Xp2 =p
2

where Y  N (0; 1), Xi  N (0; 1); i = 1; :::; p – all independent of each other.
As you know, the distribution of the sum of squares of p independent N (0; 1)
variables is called p2 distribution with p degrees of freedom. Hence, tp distribu-
tion is the ratio of the standard normal distribution N (0; 1) and the square root
2
of p divided by its degree of freedoms, which is independent of the numerator:

N (0; 1)
tp  q
2 =p
p

Thus, the intuition for the last theorem is as follows: the random variable in
(52) can be re-written as:
 
b b 
 b )
=sd(
j  j j j j
T = = :
b )
se( b )=sd( b
se( j )
j j

43
The numerator has the standard normal distribution, while the denominator is
the square root of the sum of n independent squared N (0; 1) variables. The
degrees of freedom is not n but n  k  1 to account for the fact that we have
used k + 1 degrees of freedoms (or data points) to estimate k + 1 coe¢cients.

3.2 Testing Hypotheses about Single Parameter

3.2.1 Two-Sided Test

b at hand, we can now test hypotheses. The


With the sampling distribution of  j
simplest hypothesis is the so-called statistical signi…cance test:

H0 : j = 0 (54)

H1 :  j 6= 0

If the hypothesis is valid then xj has no e¤ect (no signi…cance) on the expected
value of y when all other factors are accounted for. H0 is called the null hypoth-
esis or the null. Don’t write H0 :  b = 0: It is a mistake since we are making
j
inferences about the true parameter, not its random estimate. In addition, we
need to formulate an alternative hypothesis H1 – the conclusion that we would
make it H0 is not true. The alternative in (54) is called a two-sided alternative.
Though the true parameter  j is not known, we use its estimate bj to check if
the hypothesis is true.

Example 1. Consider the wage equation:

log wage =  0 +  1 educ +  2 exper + u

Suppose we want to test the hypothesis that experience has no e¤ect on wage
once education is accounted for:

H0 : 2 = 0

H1 :  2 6= 0

b will be close
Suppose for a moment that the true value of  2 = 0. Then,  2
0 with a high probability, and the statistic:

b
2  2 H0
b
 2
T = = (55)
se( b
2) b )
se( 2

b > 0, then T > 0. If 


will be also close to 0. Note also if  b < 0, then T < 0.
2 2

In other words, T mimics the behavior of  b . In contrast to 


b , T has the
2 2
advantage that we know its distribution. By Theorem 15, under the null H0 ; T
has a t-distribution with n  3 degrees of freedom. See Figure C.4.
If H0 is true, the value of the statistic T will fall into the unshaded segment
in the neighborhood of zero, see Figure C.6. Thus, small (in absolute value)
values of T will provide evidence in support of H0 : On the other hand, if H0 is

44
 
b 
not true, i.e.,  2 6= 0,  2  will be large with a high probability, and fall into

the region on the x-axis corresponding to the shaded area, see graph.
 In this
b 
case, we would reject H0 . But how "large" should be "large  2 " to reject

the null hypothesis? In other words, we need to specify the rejection region for
the hypothesis. In our example, this amounts to choosing the cut-o¤ or critical
value of the t-distribution – c – such that if jT j > c we reject H0 in favor of H1 ,
and if jT j  c, we don’t reject H0 . [Note on terminology: It is better to say not
reject instead of accept H0 .] The probability that T falls in the rejection region
is
Pr(jT j > cjH0 )   (56)

is called the signi…cance level of the test. Note that the conditioning on H0
is added to stress that the distribution of T is derived under the assumption
that H0 is valid. Recall from the stat course,  is also called Type I error : the
probability of rejecting H0 when it is in fact true. Note that (56) is equivalent
to
 
Pr(T > c) = and Pr(T < c) = or (57)
2 2
Pr(jT j  c) = 1  

From (56), it is clear that choosing c is equivalent to choosing . The choice


of  is at discretion of a researcher. The typical choices are:  = 0:1 (or
10%);  = 0:05 (or 5%);  = 0:01 (or 1%). For instance, if  = 0:05 (or 5%)
under the two-sided alternative, the probability of both the right-tail and left-
tail rejection regions for t-distribution (See Figure C.4.) is 0.025 each. In total,
the probability of both regions is 0.05.
After selecting , one looks up the table for a t-distribution and …nds the
critical value c that leaves =2 probability in the right tail, then computes T
and checks the rejection rule: jT j > c. If  = 0:05 and H0 is rejected, then
 2 is said to be statistically signi…cant at 5% signi…cance level. Note that if b2
b
is statistically signi…cant at  = 5%, it will be also signi…cant at any  > 5%.
This is because the rejection rejection  = 5% is a subset of the rejection region
for any  > 5%, and hence T falls automatically into the rejection region of
b is statistically signi…cant at  = 5%, it may fail to be
 > 5%. However, if  2
signi…cant at  < 5%. It needs to be checked separately.

To illustrate main steps in hypothesis testing, consider the following exam-


ple.
Example 2. Determinants of College GPA Consider the following
estimated regression for college GPA:

\
colGP A = 1:39 + 0:412 hsGP A + 0:015AC T  0:083 skipped
(0:33) (0:094) (0:011) (0:026)

2
n = 141; R = 0:234

where hsGP A is the high school GPA, AC T is the score on the ACT exam,
skipped is the number of skipped classes. The standard errors of the coe¢cient

45
estimates are reported in parenthesis. We want to test if high school GPA has
any e¤ect on college GPA.
1. The …rst step is to formulate the hypothesis in mathematic terms:

H0 : 1 = 0

H1 :  1 6= 0

So, the null H0 is that hsGP A has no e¤ect on colGP A. The alternative is
two-sided, i.e., we allow both positive and negative correlation between hsGP A
and colGP A.
2. The second step is to construct a test statistic and establish its distri-
bution. In general, a statistic is a function of the sample (data) that does not
involve any unknown parameters. A test statistic should meet two criteria: (i) it
should allow to check validity of H0 , and (ii) it should have a known (tabulated)
distribution. In this example, a convenient test statistic is the t-statistic:

b 
 H0
b

1 1 1
T = =  t137
se( b
1) b )
se( 1

because it mimics H0 and allows to check its validity, and second, it has a known
t-distribution for which tables exist. In our example, the degree of freedoms of
the T statistic is n  k  1 = 141  4 = 137.
3. The third step is to choose a rejection region or rejection rule. For
example, let  = 0:05 (or 5%) so that the rejection rule under the two-sided
alternative is
reject H0 if jT j > c

The 5%-critical value of the t-distribution with 137 degrees of freedom for
the two-sided test is 1.96: c = 1:96:
4. The …nal step is to compute the value of the test statistic and check
the rejection rule.

b
1 0:412
jT j = = = 4:38 > c = 1:96
b )
se( 0:094
1

b is statistically signi…cant at 5%
Thus, we reject H0 , and conclude that  1
(and hence also at 10%) signi…cance level and high school GPA has a posi-
tive statistically signi…cant e¤ect on college GPA. Note the two-sided test of
statistical signi…cance is included in all regression packages.
Let’s now test signi…cance of the coe¢cient on ACT score at 5% signi…cance
level. The T test is now:

0:015
T = = 1:36 < 1:96
0:011

Hence, we cannot reject the null, and the coe¢cient on ACT, b2 is not signi…cant
at 5% level. This means that even though the estimate b2 = 0:015, the true
 = 0 with a high probability. We have got a positive estimate b = 0:015
2 2
only by chance (because of randomness of the sample).

46

You might also like