Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
55 views

CHAPTER THREE - Multiple Linear Regression Analysis

Uploaded by

demilie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views

CHAPTER THREE - Multiple Linear Regression Analysis

Uploaded by

demilie
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 77

CHAPTER THREE

Multiple Linear
Regression Analysis
We Focus on Model with Two Explanatory Variables
 In simple regression we study the relationship between a
dependent variable and a single explanatory (independent
variable); assume that a dependent variable is influenced by
only one explanatory variable.
 However, many economic variables are influenced by several
factors or variables.
 For instance, in decision to investment studies we study the
relationship between quantity invested (or either to invest or
not) and interest rate, share price , exchange rate, etc..
 Multiple regression analysis is an extension of simple
regression analysis to cover cases in which the dependent
variable is hypothesized to depend on more than one
explanatory variable.
 Much of the analysis will be a straightforward extension of the
simple regression model, but we will encounter two new
problems.
 First, when evaluating the influence of a given explanatory
variable on the dependent variable, we now have to face the
problem of discriminating between its effects and the effects
of the other explanatory variables.
 Second, we shall have to tackle the problem of model
specification.
 Adding more variables to the simple linear regression model
leads us to the discussion of multiple regression models i.e.
models in which the dependent variable (or regressand)
depends on two or more explanatory variables, or regressors.
 The multiple linear regression (population regression function)
in which we have one dependent variable Y, and k explanatory
variables, is given by
• Definition of the multiple linear regression model

"Explains variable i in terms of variables "

Intercept Slope parameters

Dependent variable,
explained variable, Error term,
Independent variables, disturbance,
response variable,…
explanatory variables, unobservables,…
regressors,…
• Motivation for multiple regression
– Incorporate more explanatory factors into the model
– Explicitly hold fixed other factors that otherwise would be in
– Allow for more flexible functional forms
• Example: Wage equation

Now measures effect of education explicitly holding experience fixed

All other factors…

Hourly wage Years of education Labor market experience


• Example: Average test scores and per student spending

Other factors

Average standardized Per student spending Average family income


test score of school at this school of students at this school

– Per student spending is likely to be correlated with average family income at


a given high school because of school financing
– Omitting average family income in regression would lead to biased estimate
of the effect of spending on average test scores
– In a simple regression model, effect of per student spending would partly
include the effect of family income on test scores
• Interpretation of the multiple regression model

By how much does the dependent variable change if the j-th


independent variable is increased by one unit, holding all
other independent variables and the error term constant

– The multiple linear regression model manages to hold the values


of other explanatory variables fixed even if, in reality, they are
correlated with the explanatory variable under consideration
– "Ceteris paribus"-interpretation
– It has still to be assumed that unobserved factors do not change if the
explanatory variables are changed
Example: Determinants of college GPA

•Grade point average at college


Interpretation High school grade point average Achievement test score

– Holding ACT fixed, another point on high school grade point average
is associated with another .453 points college grade point average
– Or: If we compare two students with the same ACT, but the hsGPA
of student A is one point higher, we predict student A to have a
colGPA that is .453 higher than that of student B
– Holding high school grade point average fixed, another 10 points on
ACT are associated with less than one point on college GPA
 Assumptions of the Multiple Linear Regression
 In order to specify our multiple linear regression model
and proceed our analysis with regard to this model, some
assumptions are compulsory.
 But these assumptions are the same as in the single
explanatory variable model developed earlier except the
assumption of no perfect multicollinearity.
 These assumptions are:
Assumptions
1. The Linearity Assumption
2. Xi’s and the residuals are uncorrelated.
E  i / X 2, X 3, X k   0 , i, i  1,2,3, , n  E Y   X 

3. The Assumption of homoscadesticity


Var  i / X 2, X 3, , X 3    2 , i, i  1, 2,3, , n

4. The assumption of no serial correlation


Cov  i,  j   0 , i, j, i  j

• Assumptions (3) and (4) imply that the


variance is constant. Proof
May 2004
11
5. Xji is linearly independent
6. No perfect Collinearity
7. ~N(0, σ2I)
8. Correct specification of the model
 the model has no specification error in that all the important explanatory variables
appear explicitly in the function and the mathematical form is correctly defined
(linear or non-linear form and the number of equations in the model).

Prof.VuThieu
 Estimation of Partial Regression Coefficients

The model: Y   0  1 X 1   2 X 2  U i……………………………………………….………… (1)

Now it is time to state how (1) is estimated. Given


sample observation on Given sample observation on Y , X 1 & X 2
we estimate (1) using the method of least square (OLS).
Yˆ  ˆ0  ˆ1 X 1i  ˆ 2 X 2i  ei
…………………………………….………… (2)

The predicted value of Y is


Yˆ  ˆ0  ˆ1 X 1  ˆ2 X 2 …………………………………………….………… (3)
 The residual (e) is e  Y  Yˆ …………………………………….………… (4)
 To obtain expressions for the least square estimators, we partially
differentiate  i with respect to
e 2
ˆ0 , ˆ1 and ˆ 2 and set the partial
derivatives equal to zero.


  ei2   
 2 Yi  ˆ0  ˆ1 X 1i  ˆ2 X 2i  0 …………………………….………… (5)
ˆ 0


  ei2   
 2 X 1i Yi  ˆ0  ˆ1 X 1i  ˆ1 X 1i  0 …………………………….………… (6)
ˆ 1


  ei2   
 2 X 2i Yi  ˆ0  ˆ1 X 1i  ˆ2 X 2i  0
ˆ
 2

Summing from 1 to n, the multiple regression equation produces three Normal


Equations:
 Y  nˆ 0  ˆ1X 1i  ˆ 2 X 2i …………………………….………… (7)

 X Y  ˆ X  ˆ X  ˆ X X
1i i 0 1i 1
2
1i 2 1i 1i …………………………….………… (8)

 X Y  ˆ X  ˆ X X  ˆ X
2i i 0 2i 1 1i 2i 2
2
2i …………………………….………… (9)

From (7) we obtain ˆ 0

………………………….………… (10)
ˆ0  Y  ˆ1 X 1  ˆ 2 X 2

Substituting (10) in (8) , we get:


X Y  (Y  ˆ1 X 1  ˆ2 X 2 )X 1i  ˆ1X 1i  ˆ2 X 2i
2
1i i


X Y  YˆX 1i  ˆ1 (X 1i  X 1X 2i )  ˆ 2 (X 1i X 2i  X 2 X 2i )
2
1i i
 X Y  nY X  ˆ1 (X 1i  nX 1i )  ˆ2 (X 1i X 2  nX 1 X 2 ) ………………………….………… (11)
2 2
1i i 1i

We know that

 X  Yi   (X i Yi  nX i Yi )  xi yi
2
i

 X i  X i   (X i  nX i )  xi
2 2 2 2

Substituting the above equations in equation (10), the normal equation (8) can be
written in deviation form as follows:

1 1 1 2 12
x y  ˆ x 2  ˆ x x
 ………………………….………… (12)

Using the above procedure if we substitute (10) in (9), we get

2 1 12 2 2
x y  ˆ x x  ˆ x 2
 ………………………….………… (13)
Let‘s bring (12) and (13) together

 x y  ˆ x  ˆ 2 x1 x2
2
1 1 1
………………………….………… (14)

 x y  ˆ x x
2 1 1 2  ˆ x 2
 2 2 ………………………….………… (15)

ˆ1 and ̂ 2 can easily be solved using matrix

We can rewrite the above two equations in matrix form as follows

……….………… (16)
x1 y . x 2  x1 x 2 . x 2 y
2

ˆ1  ………………………….………… (17)


x1 . x 2  ( x1 x 2 ) 2
2 2

x 2 y . x1  x1 x 2 . x1 y


2

ˆ 2  ………………………….………… (18)
x1 . x 2  ( x1 x 2 )
2 2 2

We can also express ˆ1 and ˆ2 in terms of covariance and variances Y , X 1 and X 2

Cov( X 1 , Y ) . Var ( X 1 )  Cov( X 1 , X 2 ) . Cov( X 2 , Y )


ˆ1           (18)
Var ( X 1 ).Var ( X 2 )  [cov( X 1 , X 2 )]2

Cov( X 2 , Y ) . Var ( X 1 )  Cov( X 1 , X 2 ) . Cov( X 1 , Y )


ˆ2           (19)
Var ( X 1 ).Var ( X 2 )  [Cov( X 1 , X 2 )]2
Statistical Properties of the Parameters (Matrix) Approach

 We have seen, in simple linear regression that the OLS


estimators β𝑖 satisfy the small sample property of an
estimator i.e. BLUE property.
 In multiple regression, the OLS estimators also satisfy the
BLUE property.
 Now we proceed to examine the desired properties of the
estimators in matrix notations:
 Linearity ˆ  ( X ' X ) 1 X 'Y
 We know that:

Let C= ( X X ) 1 X 

 ̂  CY

Since C is a matrix of fixed variables, the above equation indicates us β is linear in Y.


 efficiency
 An efficient estimator is characterized by a small variance or mean
square error, indicating that there is a small deviance between the
estimated value and the "true" value.
 When defined asymptotically an estimator is fully efficient if
its variance achieves the Rao-Cramér lower bound.
 Since in many cases the lower bound in the Rao–Cramér inequality
cannot be attained, an efficient estimator in statistics is frequently
chosen based on having minimal variance in the class of all unbiased
estimator of the parameter.
 Thus optimality in practice is defined using the variance or mean
square error (MSE, thus minimum MSE estimator)
Goodness of Fit Statistics

• We would like some measure of how well our regression model actually fits
the data.
• We have goodness of fit statistics to test this: i.e. how well the sample
regression function (srf) fits the data.
• The most common goodness of fit statistic is known as R2. One way to define
R2 is to say that it is the square of the correlation coefficient between y and .
• For another explanation, recall that what we are interested in doing is
y$
explaining the variability of y about its mean value, , i.e. the total sum of
squares, TSS: y

TSS    yt  y 
2

• We can split the TSS into two parts,


t
the part which we have explained (known
as the explained sum of squares, ESS) and the part which we did not explain
using the model (the RSS).
Defining R2

• That is, TSS = ESS + RSS


 ty  y 2
 
 tˆ
y  y 2
  t
ˆ
u 2

t t t
• Our goodness of fit statistic is
ESS
R2 
TSS
• But since TSS = ESS + RSS, we can also write

ESS TSS  RSS RSS


R2    1
TSS TSS TSS
• R2 must always lie between zero and one. To understand this, consider two
extremes
RSS = TSS i.e. ESS = 0 so R2 = ESS/TSS = 0
ESS = TSS i.e. RSS = 0 so R2 = ESS/TSS = 1
The Limit Cases: R2 = 0 and R2 = 1

yt
yt

xt xt
Problems with R2 as a Goodness of Fit Measure

• There are a number of them:

1. R2 is defined in terms of variation about the mean of y so that if a model


is reparameterised (rearranged) and the dependent variable changes, R2
will change.

2. R2 never falls if more regressors are added. to the regression, e.g.


consider:
Regression 1: yt = 1 + 2x2t + 3x3t + ut
Regression 2: y = 1 + 2x2t + 3x3t + 4x4t + ut
R2 will always be at least as high for regression 2 relative to regression 1.

3. R2 quite often takes on values of 0.9 or higher for time series


regressions.
 𝑅2 is the ratio of the explained variation to the total
variation.
 Mathematically:

2 2 2
𝑦 (𝑌 − 𝑌 ) 𝑢𝑖
𝑅2 = 2 = = 1 −
𝑦 (𝑌 − 𝑌)2 𝑦2

𝛽1 𝑥1𝑖𝑦 + 𝛽2 𝑥2𝑖 𝑦𝑖
𝑅2 = 𝑖

𝑦𝑖2
 The value of R2 lies between 0 and 1. The higher R2 the greater the
percentage of the variation of Y explained by the regression plane, that
is, the better the ‗goodness of fit‘ of the regression plane to the sample
observations. The closer R2 to zero, the worse the fit
Example : Determinants of College GPA

colGPA = 1.29 + 0.453 ℎ𝑠𝐺𝑃𝐴 + 0.0094 𝐴𝐶𝑇

n = 141, 𝑅2 = 0.176.

 This means that hsGPA and ACT together explain about 17.6% of the
variation in college GPA for this sample of students.
 This may not seem like a high percentage, but we must remember that there
are many other factors—including family background, personality, quality of
high school education, affinity for college—that contribute to a student‘s
college performance.
 If hsGPA and ACT explained almost all of the variation in colGPA, then
performance in college would be preordained by high school performance!
 An important fact about 𝑅2 is that it never decreases, and it
usually increases, when another independent variable is
added to a regression and the same set of observations is
used for both regressions.
 An important caveat to the previous assertion about R-
squared is that it assumes we do not have missing data on the
explanatory variables.
 If two regressions use different sets of observations, then, in
general, we cannot tell how the R-squareds will compare,
even if one regression uses a subset of regressors.
Adjusted R2

• In order to get around these problems, a modification is often made


which takes into account the loss of degrees of freedom associated
with adding extra variables. This is known as , or R 2 adjusted R2:
2
 n 1  𝑢 (𝑛 − 𝐾)

2

R 1  (1  R 2 ) 2
𝑅 =1− 2
𝑖
n  k  𝑦𝑖 (𝑛 − 1)
• So if we add an extra regressor, k increases and unless R2 increases by
a more than offsetting amount, R 2 will actually fall.

• There are still problems with the criterion:


1. A ―soft‖ rule
2. No distribution for R 2 or R2
Statistical inference in the regression model
Hypothesis tests about population parameters
Construction of confidence intervals
Sampling distributions of the OLS estimators
The OLS estimators are random variables
We already know their expected values and their variances
However, for hypothesis tests we need to know their
distribution
In order to derive their distribution we need additional
assumptions
Assumption about distribution of errors: normal distribution
Multiple Regression
Analysis: Inference
• Assumption MLR.6 (Normality of error terms)

independently of

It is assumed that the unobserved


factors are normally distributed around the
population regression function.

The form and the variance of the


distribution does not depend on
any of the explanatory variables.

It follows that:
Multiple Regression
Analysis: Inference
• Discussion of the normality assumption (cont.)
– Examples where normality cannot hold:
• Wages (nonnegative; also: minimum wage)
• Number of arrests (takes on a small number of integer values)
• Unemployment (indicator variable, takes on only 1 or 0)
– In some cases, normality can be achieved through transformations of
the dependent variable (e.g. use log(wage) instead of wage)
– Under normality, OLS is the best (even nonlinear) unbiased estimator
– Important: For the purposes of statistical inference, the assumption of
normality can be replaced by a large sample size
Multiple Regression
Analysis: Inference
• Testing hypotheses about a single population parameter
• Theorem 3.1 (t-distribution for standardized estimators)

Under assumptions MLR.1 – MLR.6:


If the standardization is done using the estimated
standard deviation (= standard error), the normal
distribution is replaced by a t-distribution

Note: The t-distribution is close to the standard normal distribution if n-k-1 is large.

• Null hypothesis (for more general hypotheses, see below)


The population parameter is equal to zero, i.e. after
controlling for the other independent variables, there is no
effect of xj on y
Multiple Regression
Analysis: Inference
• t-statistic (or t-ratio)
The t-statistic will be used to test the above null hypothesis. The
farther the estimated coefficient is away from zero, the less
likely it is that the null hypothesis holds true. But what does
"far" away from zero mean?

This depends on the variability of the estimated coefficient, i.e. its


standard deviation. The t-statistic measures how many estimated
standard deviations the estimated coefficient is away from zero.

• Distribution of the t-statistic if the null hypothesis is true

• Goal: Define a rejection rule so that, if it is true, H0 is rejected only with a small
probability (= significance level, e.g. 5%)
Multiple Regression
Analysis: Inference
• Testing against one-sided alternatives (greater than zero)

Test aga against .

Reject the null hypothesis in favour of the


alternative hypothesis if the estimated coef-
ficient is „too large“ (i.e. larger than a criti-
cal value).

Construct the critical value so that, if the


null hypothesis is true, it is rejected in,
for example, 5% of the cases.

In the given example, this is the point of the t-


distribution with 28 degrees of freedom that is
exceeded in 5% of the cases.

! Reject if t-statistic greater than 1.701


Multiple Regression
Analysis: Inference
• Example: Wage equation
– Test whether, after controlling for education and tenure, higher work
experience leads to higher hourly wages

Standard errors

Test . against

One would either expect a positive effect of experience on hourly wage or no effect at all.
Multiple Regression
Analysis: Inference
• Testing against one-sided alternatives (less than zero)

Test against .

Reject the null hypothesis in favour of the alternative


hypothesis if the estimated coef-
ficient is „too small“ (i.e. smaller than a criti-
cal value).

Construct the critical value so that, if the


null hypothesis is true, it is rejected in,
for example, 5% of the cases.

In the given example, this is the point of the t-


distribution with 18 degrees of freedom so that
5% of the cases are below the point.

! Reject if t-statistic less than -1.734


Multiple Regression
Analysis: Inference
• Example: Student performance and school size
– Test whether smaller school size leads to better student performance

Percentage of students Average annual tea- Staff per one thou- School enrollment
passing maths test cher compensation sand students (= school size)

Test against .

Do larger schools hamper student performance or is there no such effect?


Multiple Regression
Analysis: Inference
• Example: Student performance and school size (cont.)
t-statistic

Degrees of freedom;
here the standard normal
approximation applies

Critical values for the 5% and the 15% significance level.

The null hypothesis is not rejected because the t-statistic is not


smaller than the critical value.

One cannot reject the hypothesis that there is no effect of school size on
student performance (not even for a lax significance level of 15%).
Multiple Regression
Analysis: Inference
• Example: Student performance and school size (cont.)
– Alternative specification of functional form:

R-squared slightly higher

Test against .
Multiple Regression
Analysis: Inference
• Example: Student performance and school size (cont.)

t-statistic

Critical value for the 5% significance level ! reject null hypothesis

The hypothesis that there is no effect of school size on student performance


can be rejected in favor of the hypothesis that the effect is negative.

How large is the effect? + 10% enrollment ! -0.129 percentage points


students pass test

(small effect)
Multiple Regression
Analysis: Inference
• Testing against two-sided alternatives

Test against .

Reject the null hypothesis in favour of the alternative


hypothesis if the absolute value
of the estimated coefficient is too large.

Construct the critical value so that, if the


null hypothesis is true, it is rejected in,
for example, 5% of the cases.

In the given example, these are the points


of the t-distribution so that 5% of the cases
lie in the two tails.

! Reject if absolute value of t-statistic is less than -


2.06 or greater than 2.06
Multiple Regression
Analysis: Inference
• Example: Determinants of college GPA Lectures missed per week

For critical values, use standard normal distribution

The effects of hsGPA and skipped are


significantly different from zero at the 1%
significance level. The effect of ACT is not
significantly different from zero, not even at
the 10% significance level.
Multiple Regression
Analysis: Inference
• "Statistically significant“ variables in a regression
– If a regression coefficient is different from zero in a two-sided test, the
corresponding variable is said to be "statistically significant―
– If the number of degrees of freedom is large enough so that the nor-mal
approximation applies, the following rules of thumb apply:

"statistically significant at 10 % level“

"statistically significant at 5 % level“

"statistically significant at 1 % level“


Multiple Regression
Analysis: Inference
• Guidelines for discussing economic and statistical significance

– If a variable is statistically significant, discuss the magnitude of the coefficient to get an


idea of its economic or practical importance
– The fact that a coefficient is statistically significant does not necessa-rily mean it is
economically or practically significant!
– If a variable is statistically and economically important but has the "wrong― sign, the
regression model might be misspecified

– If a variable is statistically insignificant at the usual levels (10%, 5%, 1%), one may
think of dropping it from the regression
– If the sample size is small, effects might be imprecisely estimated so that the case for
dropping insignificant variables is less strong
Multiple Regression
Analysis: Inference
• Computing p-values for t-tests
– If the significance level is made smaller and smaller, there will be a point
where the null hypothesis cannot be rejected anymore
– The reason is that, by lowering the significance level, one wants to avoid more
and more to make the error of rejecting a correct H0
– The smallest significance level at which the null hypothesis is still rejected, is
called the p-value of the hypothesis test
– A small p-value is evidence against the null hypothesis because one would
reject the null hypothesis even at small significance levels
– A large p-value is evidence in favor of the null hypothesis
– P-values are more informative than tests at fixed significance levels
Multiple Regression
Analysis: Inference
• How the p-value is computed (here: two-sided test)

The p-value is the significance level at which one


is indifferent between rejecting and not rejecting
the null hypothesis.

These would be the In the two-sided case, the p-value is thus the
critical values for a 5%
significance level probability that the t-distributed variable takes
on a larger absolute value than the realized value
of the test statistic, e.g.:

From this, it is clear that a null hypothesis is


rejected if and only if the corresponding p-value is
smaller than the significance level.
value of test statistic
For example, for a significance level of 5% the t-
statistic would not lie in the rejection region.
Multiple Regression
Analysis: Inference
Critical value of
two-sided test
• Confidence intervals
• Simple manipulation of the result in Theorem 3.2 implies that

Lower bound of the Upper bound of the Confidence level


Confidence interval Confidence interval

• Interpretation of the confidence interval


– The bounds of the interval are random
– In repeated samples, the interval that is constructed in the above way will
cover the population regression coefficient in 95% of the cases
Multiple Regression
Analysis: Inference
• Confidence intervals for typical confidence levels

Use rules of thumb

• Relationship between confidence intervals and hypotheses tests

reject in favor of
Multiple Regression
Analysis: Inference
• Example: Model of firms‘ R&D expenditures

Spending on R&D Annual sales Profits as percentage of sales

The effect of sales on R&D is relatively precisely estimated as This effect is imprecisely estimated as the in-
the interval is narrow. Moreover, the effect is significantly terval is very wide. It is not even statistically
different from zero because zero is outside the interval. significant because zero lies in the interval.
Global hypothesis test (F and r2)
• We used the t test to test single hypotheses ,i.e. Hypotheses
involving only one coefficient. But what if we want to test
more than one Coefficients simultaneously? We do using F-
test.
 Now, we want to test whether a group of variables has no effect on the
dependent variable.
 More precisely, the null hypothesis is that a set of variables has no
effect on y, once another set of variables has been controlled.
• F-test it is used for to test overall significance of a model.
𝑀𝑆𝑆 𝑜𝑓 𝐸𝑆𝑆 𝐸𝑆𝑆 (𝑘−1)
𝐹= ⟹
𝑀𝑆𝑆 𝑜𝑓 𝑅𝑆𝑆 𝑅𝑆𝑆 (𝑛−𝑘)
𝑅2 (𝑘−1)
Or using R-squared 𝐹 =
1−𝑅2 (𝑛−𝑘)
.
 Decision: If F > F α(k−1,n−k) , reject H0; otherwise you may
accept H0.(Fcal > F-tab).
 where F α(k−1,n−k) is the critical F value at the α
level of significance and (k − 1) numerator df and (n −
k) denominator df
• We used the t-test to test single hypotheses, i.e. hypotheses involving only
one coefficient. But what if we want to test more than one coefficient
simultaneously?

• We do this using the F-test. The F-test involves estimating 2 regressions.

• The unrestricted regression is the one in which the coefficients are freely
determined by the data, as we have done before.

• The restricted regression is the one in which the coefficients are restricted,
i.e. the restrictions are imposed on some s.
The F-test:
Restricted and Unrestricted Regressions

• Example
The general regression is
yt = 1 + 2x2t + 3x3t + 4x4t + ut (1)

• We want to test the restriction that 3+4 = 1 (we have some hypothesis
from theory which suggests that this would be an interesting hypothesis to
study). The unrestricted regression is (1) above, but what is the restricted
regression?
yt = 1 + 2x2t + 3x3t + 4x4t + ut s.t. 3+4 = 1

• We substitute the restriction (3+4 = 1) into the regression so that it is


automatically imposed on the data.
3+4 = 1  4 = 1- 3
The F-test: Forming the Restricted Regression

yt = 1 + 2x2t + 3x3t + (1-3)x4t + ut


yt = 1 + 2x2t + 3x3t + x4t - 3x4t + ut

• Gather terms in ‘s together and rearrange


(yt - x4t) = 1 + 2x2t + 3(x3t - x4t) + ut

• This is the restricted regression. We actually estimate it by creating two new


variables, call them, say, Pt and Qt.
Pt = yt - x4t
Qt = x3t - x4t so
Pt = 1 + 2x2t + 3Qt + ut is the restricted regression we actually estimate.
• Consider the hypothesis
against H1: not H0.
• We observe that under H0 the (restricted) model boils
down to
Yi   1   i and
Therefore, Yi  Y  e0i
• Thus, the restricted residual sum of squares (RRSS) is given
as

e  Y Y   yi
2
RRSS  2
0i  i
 2

May 2004 Prof.VuThieu 58


• On the other hand, the unrestricted residual sum of squares
(URSS) corresponds to the model
Yi  ˆ1  ˆ 2 X 2i  ˆ 3 X 3i    ˆ k X ki  ei  Yˆ i  ei
• From this model we get

 e  Y Yˆ 
2
URSS  2
i i i

 Y Y    Yˆ Y 
2 2
 i i

  i  i where yi  Y i  Y & ˆyi  Yˆ i  Y


y
2
 ˆ
y
2

  i   ˆyi ˆyi
y
2

  i   ˆyi ˆ 2 x2i  ˆ 3 x3i    ˆ k xki  as ˆyi  ˆ 2 x2i  ˆ 3 x3i    ˆ k xki


y
2

  y    y  e ˆ x  ˆ x    ˆ x  as ˆy  y  e
2
i i i 2 2i 3 3i k ki i i i

  y  ˆ  x y  ˆ  x y    ˆ  x y  as  x e  0 for all j  1,2, ,k


2
i 2 2i i 3 3i i k ki i ji i

May 2004 Prof.VuThieu 59


• Given these definitions of RRSS and URSS, it follows that
RRSS  URSS  ˆ 2  x2i yi  ˆ 3  x3i yi    ˆ k  xki yi  ESS

• We know that
URSS

e 2
i
~  2

 2
 2 nk

• Now, we propose that under H0


RRSS  URSS

ESS
~  2

2 2 k 1

• This follows from the fact that


n n k n

y
i 1
2
i   e   ̂
i 1
2
i
j 2 i 
x ji yi j
     
TSS RSS ESS

May 2004 Prof.VuThieu 60


We have already seen that
TSS RSS ESS
 


2


2


2

  
2 2 2

n 1 nk k 1

Using the results obtained in above equations we get


RRSS  URRS   2 k  1
URSS  2 n  k 
~ F k 1 , n  k 

Upon simplification, we get:


RRSS  URRS  k  1 
ESS k  1
~ F k 1 , nk  as URSS  RSS and TSS  RRSS
URSS n  k  RSS n  k 

But ESS ESS RSS


  R2 and  1 - R2
y 2
i
TSS y 2
i

May 2004 Prof.VuThieu 61


• Which implies that
ESS k  1 yi2  R k  1
2
 ~
RSS n  k  yi  1  R  n  k 
2 2 F k 1 , n  k 

 Decision Rule: Reject the null hypothesis if the F-value computed is


greater than the critical value from an F-table at  level of significance.
Example: Suppose the demand for textile industry was estimated using the
following model

Y  1   2 X 2   3 X 3   4 X 4  
Where: Y = ln of consumption
X2 = ln of income of consumers
X3 = ln of prices
X4 = time
• The number of years over which these data were obtained were 17, thus
n = 17, and the

May 2004 Prof.VuThieu 62


• The test statistic follows the F-distribution, which has 2 d.f.
parameters.

• The value of the degrees of freedom parameters are m and (T-k)


respectively (the order of the d.f. parameters is important).

• The appropriate critical value will be in column m, row (T-k).

• The F-distribution has only positive values and is not symmetrical. We


therefore only reject the null if the test statistic > critical F-value.
Determining the Number of Restrictions in an F-test

• Examples :
H0: hypothesis No. of restrictions, m
1 + 2 = 2 1
2 = 1 and 3 = -1 2
2 = 0, 3 = 0 and 4 = 0 3
• If the model is yt = 1 + 2x2t + 3x3t + 4x4t + ut,
then the null hypothesis
H0: 2 = 0, and 3 = 0 and 4 = 0 is tested by the regression F-statistic. It
tests the null hypothesis that all of the coefficients except the intercept
coefficient are zero.
• Note the form of the alternative hypothesis for all tests when more than one
restriction is involved: H1: 2  0, or 3  0 or 4  0
The Relationship between the t and the F-
Distributions

• Any hypothesis which could be tested with a t-test could have been
tested using an F-test, but not the other way around.

For example, consider the hypothesis


H0: 2 = 0.5
H1: 2  0.5
We could have tested this using the usual t-test: $2  0.5
test stat 
SE ( $2 )
or it could be tested in the framework above for the F-test.
• Note that the two tests always give the same result since the t-
distribution is just a special case of the F-distribution.
• For example, if we have some random variable Z, and Z  t (T-k) then
also Z2  F(1,T-k)
F-test Example

• Question: Suppose a researcher wants to test whether the returns on a company


stock (y) show unit sensitivity to two factors (factor x2 and factor x3) among three
considered. The regression is carried out on 144 monthly observations. The
regression is yt = 1 + 2x2t + 3x3t + 4x4t+ ut
- What are the restricted and unrestricted regressions?
- If the two RSS are 436.1 and 397.2 respectively, perform the test.
• Solution:
Unit sensitivity implies H0:2=1 and 3=1. The unrestricted regression is the one
in the question. The restricted regression is (yt-x2t-x3t)= 1+ 4x4t+ut or letting
zt=yt-x2t-x3t, the restricted regression is zt= 1+ 4x4t+ut
In the F-test formula, T=144, k=4, m=2, RRSS=436.1, URSS=397.2
F-test statistic = 6.68. Critical value is an F(2,140) = 3.07 (5%) and 4.79 (1%).
Conclusion: Reject H0.
Multiple Regression
Analysis: Inference
• Testing multiple linear restrictions: The F-test
• Testing exclusion restrictions
Salary of major lea- Years in Average number of
gue base ball player the league games per year

Batting average Home runs per year Runs batted in per year

against

Test whether performance measures have no effect/can be exluded from regression.


Multiple Regression
Analysis: Inference
• Unrestricted regression

The restricted model is actually a


• Restricted regression regression of [y-x1] on a constant

• Test statistic

cannot be rejected
Multiple Regression
Analysis: Inference
• Regression output for the unrestricted regression

When tested individually,


there is also no evidence
against the rationality of
house price assessments

• The F-test works for general multiple linear hypotheses


• For all tests and confidence intervals, validity of assumptions MLR.1 – MLR.6 has
been assumed. Tests may be invalid otherwise.
Selection of models

• One of the assumptions of the classical linear regression model


(CLRM), is that the regression model used in the analysis is
―correctly‖ specified: If the model is not ―correctly‖ specified,
we encounter the problem of model specification error or
model specification bias.
Basic questions related to model selection
 what are the criteria in choosing a model for empirical
analysis?
 What types of model specification errors is one likely to
encounter in practice?
 What are the consequences of specification errors?
Cont..
 How does one detect specification errors? In other words, what
are some of the diagnostic tools that one can use?
 Having detected specification errors, what remedies can one
adopt?
Model Selection Criteria
Model chosen for empirical analysis should satisfy the following
criteria
• Be data admissible; that is, predictions made from the model
must be logically possible.
• Be consistent with theory; that is, it must make good
economic sense.
Cont…
• Exhibit parameter constancy; that is, the values of the
parameters should be stable. Otherwise, forecasting will
be difficulty.
• Exhibit data coherency; that is, the residuals estimated
from the model must be purely random (technically, white
noise).
• Be encompassing; that is, the model should encompass
or include all the rival models in the sense that it is
capable of explaining their results.
• In short, other models cannot be an improvement over the
chosen model.
Types of Specification Errors
• In developing an empirical model, one is likely to commit one
or more of the following specification errors:
i. Omission of a relevant variable(s)
ii. Inclusion of an unnecessary variable(s)
iii. Adopting the wrong functional form
iv. Errors of measurement
Consequences of Model Specification Errors
Omitting a Relevant Variable
• If the left-out, or omitted, is correlated with the included
variable, the correlation coefficient between the two variables
is nonzero, the estimators are biased as well as inconsistent.
• Even if the two variables are not correlated, the intercept
parameter is biased, although the slope parameter is now
unbiased.
• The disturbance variance is incorrectly estimated.
• In consequence, the usual confidence interval and hypothesis-
testing procedures are likely to give misleading conclusions
about the statistical significance of the estimated parameters.
Con…
• There is asymmetry in the two types of specification biases.
• Including an irrelevant variable in the model, the model still
gives us unbiased and consistent estimates of the coefficients
in the true model, the error variance is correctly estimated, and
the conventional hypothesis-testing methods are still valid.
• The only penalty we pay for the inclusion of the superfluous
variable is that the estimated variances of the coefficients are
larger, and as a result our probability inferences about the
parameters are less precise.
Functional Forms of Regression Models
• Commonly used regression models that may be nonlinear in
the variables but are linear in the parameters or that can be
made so by suitable transformations of the variables.
1. Linear model: Y = β1 + β2X
2. Log model: lnY = β1 + β2 ln X
3. Semi-log model(lin-log or log-lin): Y = β1 + β2 ln X and lnY
= β1 + β2 X
4. Reciprocal model:Y = β1 + β2(1/X)
Example
Double-log model
ln 𝐸𝑋𝐷𝑈𝑅𝑡 = −7.5417 + 1.6266 ln 𝑃𝐶𝐸𝑋
𝑠𝑒 = (0.716) (0.080)
𝑡 = (−10.53) (20.3) 𝑟2 = 0.9695
• Interprtation:total personal expenditure goes up by 1 percent, on
average, the expenditure on durable goods goes up by about 1.63
percent.
Lin-log model
𝐹𝑜𝑜𝑑𝐸𝑥𝑝𝑖 = −1283.912 + 257.2700 ln 𝑇𝑜𝑡𝑎𝑙𝐸𝑥𝑝𝑖
𝑠𝑒 (?? ) ( ?? )
𝑡 = −4.3848 5.6625 𝑟 2 = 0.3769
• Interprtation: an increase in the total food expenditure of 1 percent,
on average, leads to about 2.57 birr increase in the expenditure on
food.
The End of Chapter Three”
Thank you for your attention!

May 2004 Prof.VuThieu 78

You might also like