Chapter2 Econometrics MultipleLinearRegressionModel 1 1
Chapter2 Econometrics MultipleLinearRegressionModel 1 1
y X 11 X 2 2 ... X k k .
This is called the multiple linear regression model. The parameters 1 , 2 ,..., k are the regression
coefficients associated with X 1 , X 2 ,..., X k respectively and is the random error component reflecting the
difference between the observed and fitted linear relationship. There can be various reasons for such
difference, e.g., the joint effect of those variables not included in the model, random factors which can not
be accounted for in the model etc.
Note that the j th regression coefficient j represents the expected change in y per unit change in the j th
E ( y )
j .
X j
Linear model:
y E ( y )
A model is said to be linear when it is linear in parameters. In such a case (or equivalently )
j j
should not depend on any ' s . For example,
i) y 0 1 X is a linear model as it is linear in the parameters.
a linear model.
Econometrics | Chapter 2 | Multiple Linear Regression Model |
1
iii) y 0 1 X 2 X 2
is linear in parameters 0 , 1 and 2 but it is nonlinear is variables X . So it is a linear model
1
iv) y 0
X 2
is nonlinear in the parameters and variables both. So it is a nonlinear model.
v) y 0 1 X 2
is nonlinear in the parameters and variables both. So it is a nonlinear model.
vi) y 0 1 X 2 X 2 3 X 3
is a cubic polynomial model which can be written as
y 0 1 X 2 X 2 3 X 3
which is linear in the parameters 0 , 1 , 2 , 3 and linear in the variables X 1 X , X 2 X 2 , X 3 X 3 .
So it is a linear model.
Example:
The income and education of a person are related. It is expected that, on average, a higher level of education
provides higher income. So a simple linear regression model can be expressed as
income 0 1 education .
Not that 1 reflects the change in income with respect to per unit change in education and 0 reflects the
income when education is zero as it is expected that even an illiterate person can also have some income.
Further, this model neglects that most people have higher income when they are older than when they are
young, regardless of education. So 1 will over-state the marginal impact of education. If age and education
are positively correlated, then the regression model will associate all the observed increase in income with an
increase in education. So a better model is
income 0 1 education 2 age .
Often it is observed that the income tends to rise less rapidly in the later earning years than is early years. To
accommodate such a possibility, we might extend the model to
income 0 1education 2 age 3age 2
This is how we proceed for regression modeling in real-life situation. One needs to consider the experimental
condition and the phenomenon before making the decision on how many, why and how to choose the
dependent and independent variables.
Econometrics | Chapter 2 | Multiple Linear Regression Model |
2
Model set up:
Let an experiment be conducted n times, and the data is obtained as follows:
Observation number Response Explanatory variables
y X1 X 2 X k
or y X .
In general, the model with k explanatory variables can be expressed as
y X
where y ( y1 , y2 ,..., yn ) ' is a n 1 vector of n observation on study variable,
(iii) Rank ( X ) k
(iv) X is a non-stochastic matrix
(v) ~ N (0, 2 I n ) .
These assumptions are used to study the statistical properties of the estimator of regression coefficients. The
following assumption is required to study, particularly the large sample properties of the estimators.
X 'X
(vi) lim exists and is a non-stochastic and nonsingular matrix (with finite elements).
n
n
The explanatory variables can also be stochastic in some cases. We assume that X is non-stochastic unless
stated separately.
We consider the problems of estimation and testing of hypothesis on regression coefficient vector under the
stated assumption.
Estimation of parameters:
A general procedure for the estimation of regression coefficient vector is to minimize
n n
M ( x) x , in general.
p
We consider the principle of least square which is related to M ( x) x 2 and method of maximum likelihood
estimation for the estimation of parameters.
Econometrics | Chapter 2 | Multiple Linear Regression Model |
4
Principle of ordinary least squares (OLS)
Let B be the set of all possible vectors . If there is no further information, the B is k -dimensional real
Euclidean space. The object is to find a vector b ' (b1 , b2 ,..., bk ) from B that minimizes the sum of squared
for given y and X . A minimum will always exist as S ( ) is a real-valued, convex and differentiable
function. Write
S ( ) y ' y ' X ' X 2 ' X ' y .
Differentiate S ( ) with respect to
S ( )
2X ' X 2X ' y
2 S ( )
2 X ' X (atleast non-negative definite).
2
The normal equation is
S ( )
0
X ' Xb X ' y
where the following result is used:
Result: If f ( z ) Z ' AZ is a quadratic form, Z is a m 1 vector and A is any m m symmetric matrix
then F ( z ) 2 Az .
z
Since it is assumed that rank ( X ) k (full rank), then X ' X is a positive definite and unique solution of the
normal equation is
b ( X ' X ) 1 X ' y
which is termed as ordinary least squares estimator (OLSE) of .
2 S ( )
Since is at least non-negative definite, so b minimize S ( ) .
2
where ( X ' X ) is the generalized inverse of X ' X and is an arbitrary vector. The generalized inverse
= X ( X ' X ) X ' y
which is independent of . This implies that ŷ has the same value for all solution b of X ' Xb X ' y.
(ii) Note that for any ,
S ( ) y Xb X (b ) y Xb X (b )
( y Xb)( y Xb) (b ) X ' X (b ) 2(b ) X ( y Xb)
( y Xb)( y Xb) (b ) X ' X (b ) (Using X ' Xb X ' y )
( y Xb)( y Xb) S (b)
y ' y 2 y ' Xb b ' X ' Xb
y ' y b ' X ' Xb
y ' y yˆ ' yˆ .
In the case of ˆ b,
yˆ Xb
X ( X ' X ) 1 X ' y
Hy
Residuals
The difference between the observed and fitted values of the study variable is called as residual. It is
denoted as
e y ~ yˆ
y yˆ
y Xb
y Hy
(I H ) y
Hy
where H I H .
Note that
(i) H is a symmetric matrix
(ii) H is an idempotent matrix, i.e.,
HH ( I H )( I H ) ( I H ) H and
(ii) Bias
Since X is assumed to be nonstochastic and E ( ) 0
E (b ) ( X ' X ) 1 X ' E ( )
0.
Thus OLSE is an unbiased estimator of .
(iv) Variance
The variance of b can be obtained as the sum of variances of all b1 , b2 ,..., bk which is the trace of covariance
matrix of b . Thus
Var (b) tr V (b)
k
E (bi i ) 2
i 1
k
Var (bi ).
i 1
e 'e
( y Xb) '( y Xb)
y '( I H )( I H ) y
y '( I H ) y
y ' Hy.
Also
SS r e s ( y Xb) '( y Xb)
y ' y 2b ' X ' y b ' X ' Xb
y ' y b ' X ' y (Using X ' Xb X ' y )
SSr e s y ' Hy
(X )'H (X )
' H (Using HX 0)
Thus E[ y ' Hy ] (n k ) 2
y ' Hy
2
or E
n k
or E MSr e s 2
SSr e s
where MSr e s is the mean sum of squares due to residual.
nk
Thus an unbiased estimator of 2 is
ˆ 2 MSr e s s 2 (say)
which is a model-dependent estimator.
Gauss-Markov Theorem:
The ordinary least squares estimator (OLSE) is the best linear unbiased estimator (BLUE) of .
Proof: The OLSE of is
b ( X ' X ) 1 X ' y
which is a linear function of y . Consider the arbitrary linear estimator
b* a ' y
of linear parametric function ' where the elements of a are arbitrary constants.
Then for b* ,
E (b* ) E (a ' y ) a ' X
Further
Var (a ' y ) a 'Var ( y )a 2 a ' a
Var ( ' b) 'Var (b)
2 a ' X ( X ' X ) 1 X ' a.
Consider
Var (a ' y ) Var ( ' b) 2 a ' a a ' X ( X ' X ) 1 X ' a
2 a ' I X ( X ' X ) 1 X ' a
2 a '( I H )a.
This reveals that if b* is any linear unbiased estimator then its variance must be no smaller than that of b .
Consequently b is the best linear unbiased estimator, where ‘best’ refers to the fact that b is efficient within
the class of linear and unbiased estimators.
1 1 n 2
(2 2 ) n /2
exp 2 2 i
i 1
1 1
exp 2 '
(2 )
2 n /2
2
1
1
exp 2 ( y X ) '( y X ) .
(2 ) 2
2 n /2
Since the log transformation is monotonic, so we maximize ln L( , 2 ) instead of L( , 2 ) .
n 1
ln L( , 2 ) ln(2 2 ) 2 ( y X ) '( y X ) .
2 2
The maximum likelihood estimators (m.l.e.) of and 2 are obtained by equating the first-order
ln L( , 2 ) 1
2 X '( y X ) 0
2 2
ln L( , 2 ) n 1
2 ( y X ) '( y X ).
2
2 2( 2 ) 2
The likelihood equations are given by
( X ' X ) 1 X ' y
1
2 ( y X ) '( y X ).
n
Further to verify that these values maximize the likelihood function, we find
2 ln L( , 2 ) 1
2 X 'X
2
2 ln L( , 2 ) n 1
6 ( y X ) '( y X )
( )
2 2 2
2 4
2 ln L( , 2 ) 1
4 X '( y X ).
2
Thus the Hessian matrix of second-order partial derivatives of ln L( , 2 ) with respect to and 2 is
2 ln L( , 2 ) 2 ln L( , 2 )
2 2
2 ln L( , 2 ) ln L( , )
2 2
2 2 ( 2 ) 2
which is negative definite at and 2 2 . This ensures that the likelihood function is maximized at
these values.
0.
This implies that OLSE converges to in quadratic mean. Thus OLSE is a consistent estimator of . This
holds true for maximum likelihood estimators also.
The same conclusion can also be proved using the concept of convergence in probability.
An estimator ˆn converges to in probability if
The consistency of OLSE can be obtained under the weaker assumption that
X 'X
plim * .
n
exists and is a nonsingular and nonstochastic matrix such that
X '
plim 0.
n
Since
b ( X ' X ) 1 X '
1
X ' X X '
.
n n
So
1
X 'X X '
plim(b ) plim plim
n n
*1.0
0.
Thus b is a consistent estimator of . Same is true for m.l.e. also.
Econometrics | Chapter 2 | Multiple Linear Regression Model |
13
(ii) Consistency of s 2
Now we look at the consistency of s 2 as an estimate of 2 as
1
s2 e 'e
nk
1
' H
nk
1
1 k
1 ' ' X ( X ' X ) X '
1
n n
k ' ' X X ' X X '
1 1
1 .
n n n n n
' 1 n 2
Note that
n
consists of
n i 1
i and { i2 , i 1, 2,..., n} is a sequence of independently and identically
distributed random variables with mean 2 . Using the law of large numbers
'
2
plim
n
' X X ' X 1 X ' 'X X ' X
1
X '
plim
plim plim plim
n n n n n n
0.*1.0
0
plim( s ) (1 0) 0
2 1 2
2.
Thus s 2 is a consistent estimator of 2 . The same holds true for m.l.e. also.
of measurement of j th explanatory variable X j . For example, in the following fitted regression model
yˆ 5 X 1 1000 X 2 ,
y is measured in litres, X 1 in litres and X 2 in millilitres. Although ˆ2 ˆ1 but the effect of both
explanatory variables is identical. One litre change in either X 1 and X 2 when another variable is held fixed
Sometimes it is helpful to work with scaled explanatory variables and study variable that produces
dimensionless regression coefficients. These dimensionless regression coefficients are called as
standardized regression coefficients.
There are two popular approaches for scaling, which gives standardized regression coefficients. We discuss
them as follows:
All scaled explanatory variable and scaled study variable has mean zero and sample variance unity, i.e.,
using these new variables, the regression model becomes
yi* 1 zi1 2 zi 2 ... k zik i , i 1, 2,..., n.
Such centering removes the intercept term from the model. The least-squares estimate of ( 1 , 2 ,..., k ) '
is
ˆ ( Z ' Z ) 1 Z ' y* .
This scaling has a similarity to standardizing a normal random variable, i.e., observation minus its mean and
divided by its standard deviation. So it is called as a unit normal scaling.
Econometrics | Chapter 2 | Multiple Linear Regression Model |
16
Test of hypothesis for H 0 : R r
We consider a general linear hypothesis that the parameters in are contained in a subspace of parameter
space for which R r , where R is ( J k ) a matrix of known elements and r is a ( J 1 ) vector of known
elements.
We assume that rank ( R ) J , i.e., full rank so that there is no linear dependence in the hypothesis.
Some special cases and interesting example of H 0 : R r are as follows:
(i) H 0 : i 0
Choose J 1, r 0, R [0, 0,..., 0,1, 0,..., 0] where 1 occurs at the i th position is R .
This particular hypothesis explains whether X i has any effect on the linear model or not.
(ii) H 0 : 3 4 or H 0 : 3 4 0
or H 0 : 3 4 0, 3 5 0
0 0 1 1 0 0 ... 0
Choose J 2, r (0, 0) ', R .
0 0 1 0 1 0 ... 0
(iv) H 0 : 3 5 4 2
(v) H 0 : 2 3 ... k 0
J k 1
r (0, 0,..., 0) '
0 1 0 ... 0 0
0
0 1 ... 0 0 I k 1
R .
0 0 0 ... 1 ( k 1)k 0
Econometrics | Chapter 2 | Multiple Linear Regression Model |
22
This particular hypothesis explains the goodness of fit. It tells whether i has a linear effect or not and are
they of any importance. It also tests that X 2 , X 3 ,..., X k have no influence in the determination of y . Here
1 0 is excluded because this involves additional implication that the mean level of y is zero. Our main
concern is to know whether the explanatory variables help to explain the variation in y around its mean
value or not.
We develop the likelihood ratio test for H 0 : R r.
max L( , 2 | y, X ) Lˆ ()
max L( , 2 | y, X , R r ) Lˆ ( )
where is the whole parametric space and is the sample space.
If both the likelihoods are maximized, one constrained, and the other unconstrained, then the value of the
unconstrained will not be smaller than the value of the constrained. Hence 1.
First, we discuss the likelihood ratio test for a more straightforward case when
R I k and r 0 , i.e., 0 . This will give us a better and detailed understanding of the minor details,
where 0 is specified by the investigator. The elements of 0 can take on any value, including zero. The
concerned alternative hypothesis is
H1 : 0 .
: ( , 2 ) : i , 2 0, i 1, 2,..., k
: ( , 2 ) : 0 , 2 0 .
Econometrics | Chapter 2 | Multiple Linear Regression Model |
23
The unconstrained likelihood under .
1 1
L( , 2 | y, X ) exp 2 ( y X ) '( y X ) .
(2 )2 n /2
2
Since 0 is known, so the constrained likelihood function has an optimum variance estimator
1
2 ( y X 0 ) '( y X 0 )
n
n
n n /2 exp
Lˆ ( ) 2 .
n /2
(2 ) ( y X 0 ) '( y X 0)
n /2
( y X ) '( y X ) e ' e
y ' I X ( X ' X ) 1 X ' y
y ' Hy
(X )'H (X )
' H (using HX 0)
(n k )ˆ 2
Z ' AZ
n n matrix of rank, p then ~ 2 ( p). If B is another n n symmetric idempotent matrix of rank
2
Z ' BZ
q , then ~ 2 (q) . If AB 0 then Z ' AZ is distributed independently of Z ' BZ .
2
Further, if H 0 is true, then 0 and we have the numerator in 0 . Rewriting the numerator in 0 , in
general, we have
( ) ' X ' X ( ) ' X ( X ' X ) 1 X ' X ( X ' X ) 1 X '
' X ( X ' X ) 1 X '
' H
where H is an idempotent matrix with rank k . Thus using this result, we have
' H ' X '( X ' X ) 1 X '
~ 2 (k ).
2 2
Furthermore, the product of the quadratic form matrices in the numerator ( ' H ) and denominator ( ' H )
of 0 is
I X ( X ' X ) 1 X ' X ( X ' X ) 1 X ' X ( X ' X ) 1 X ' X ( X ' X ) 1 X ' X ( X ' X ) 1 X ' 0
and hence the 2 random variables in the numerator and denominator of 0 are independent. Dividing
( 0 ) ' X ' X ( 0 )
2
k
1
(n k )ˆ 2
2
n k
( 0 ) ' X ' X ( 0 )
kˆ 2
( y X 0 ) '( y X 0 ) ( y X ) '( y X )
kˆ 2
~ F (k , n k ) under H 0 .
Numerator in 1 : Difference between the restricted and unrestricted error sum of squares.
1 F (k , n k )
where F (k , n k ) is the upper critical points on the central F -distribution with k and n k degrees of
freedom.
( , 2 ) : i , 2 0, i 1, 2,..., k
( , 2 ) : i , R r , 2 0 .
Since ~ N , 2 ( X ' X ) 1
so R ~ N R , 2 R ( X ' X ) 1 R '
R r R R R( ) ~ N 0, 2 R( X ' X ) 1 R ' .
1
There exists a matrix Q such that R( X ' X ) 1 R ' QQ ' and then
2
nk
1
R r ) ' R ( X ' X ) 1 R ' R r
J ˆ 2
~ F ( J , n k ) under H 0 .
So the decision rule is to reject H 0 whenever
1 F ( J , n k )
where F ( J , n k ) is the upper critical points on the central F distribution with J and (n k ) degrees of
freedom.
H 0 : 2 3 ... k 0
against the alternative hypothesis
H1 : j 0 for at least one j 2,3,..., k
This hypothesis determines if there is a linear relationship between y and any set of the explanatory
variables X 2 , X 3 ,..., X k . Notice that X 1 corresponds to the intercept term in the model and hence xi1 1
This is an overall or global test of model adequacy. Rejection of the null hypothesis indicates that at least
one of the explanatory variables among X 2 , X 3 ,..., X k . contributes significantly to the model. This is called
as analysis of variance.
Since ~ N (0, 2 I ),
so y ~ N ( X , 2 I )
b ( X ' X ) 1 X ' y ~ N , 2 ( X ' X ) 1 .
SS res
Also ˆ 2
nk
( y yˆ ) '( y yˆ )
nk
y ' I X ( X ' X ) 1 X ' y y ' Hy y' y b' X ' y
.
nk nk nk
Since ( X ' X )-1 X ' H 0, so b and ˆ 2 are independently distributed.
SSr e s ~ (2n k ) ,
and partition [ 1 , 2* ] where the subvector 2* contains the regression coefficients 2 , 3 ,..., k .
where SS r e g b2* ' X 2* ' AX 2*b2* is the sum of squares due to regression and the sum of squares due to residuals
is given by
SS r e s ( y Xb) '( y Xb)
y ' Hy
SST SSr e g .
Further
SSr e g * ' X * ' AX * * * ' X * ' AX * *
~ k21 2 2 2 2 2 , i.e., non-central 2 distribution with non centrality parameter 2 2 2 2 2 ,
2 2 2
Since X 2 H 0, so SSr e g and SS r e s are independently distributed. The mean squares due to regression is
SSr e g
MSr e g
k 1
and the mean square due to error is
SSr e s
MSres .
nk
Then
MSreg * ' X * ' AX * *
~ Fk 1,n k 2 2 2 2 2
MS res 2
which is a non-central F -distribution with (k 1, n k ) degrees of freedom and noncentrality parameter
2* ' X 2* ' AX 2* 2*
.
2 2
Under H 0 : 2 3 ... k ,
MSreg
F ~ Fk 1,n k .
MSres
Total SST n 1
Adding such explanatory variables also increases the variance of fitted values ŷ , so one needs to be careful
that only those regressors are added that are of real value in explaining the response. Adding unimportant
explanatory variables may increase the residual mean square, which may decrease the usefulness of the
model.
has already been discussed is the case of a simple linear regression model. In the present case, if H 0 is
accepted, it implies that the explanatory variable X j can be deleted from the model. The corresponding test
statistic is
bj
t ~ t (n k 1) under H 0
se(b j )
where the standard error of OLSE b j of j is
se(b j ) ˆ 2C jj where C jj denotes the j th diagonal element of ( X ' X ) 1 corresponding to b j .
t t .
, n k 1
2
Note that this is only a partial or marginal test because ˆ j depends on all the other explanatory variables
X i (i j that are in the model. This is a test of the contribution of X j given the other explanatory variables
in the model.
y ~ N ( X , 2 I )
b ~ N ( , 2 ( X ' X ) 1 ).
Thus the marginal distribution of any regression coefficient estimate
b j ~ N ( j , 2C jj )
Thus
bj j
tj ~ t (n k ) under H 0 , j 1, 2,...
ˆ 2C jj
SS r e s y ' y b ' X ' y
where ˆ 2 .
nk nk
So the 100(1 )% confidence interval for j ( j 1, 2,..., k ) is obtained as follows:
b j
P t j t 1
2 ,nk ˆ 2C jj ,nk
2
P b j t ˆ 2C jj j b j t ˆ 2C jj 1 .
,nk ,n k
2 2
Thus the confidence interval is
b j t , n k ˆ C jj , b j t , n k ˆ C jj .
2 2
2 2
Econometrics | Chapter 2 | Multiple Linear Regression Model |
32
Simultaneous confidence intervals on regression coefficients:
A set of confidence intervals that are true simultaneously with probability (1 ) are called simultaneous or
joint confidence intervals.
It is relatively easy to define a joint confidence region for in multiple regression model.
Since
(b ) ' X ' X (b )
~ Fk ,n k
k MS r e s
(b ) ' X ' X (b )
P F (k , n k ) 1 .
k MSr e s
So a 100 (1 )% joint confidence region for all of the parameters in is
(b ) ' X ' X (b )
F (k , n k )
k MSr e s
which describes an elliptically shaped region.
( y y)
i 1
i
2
SSres SS
1 reg
SST SST
where
SSr e s : sum of squares due to residuals,
Since
e ' e y ' I X ( X ' X ) 1 X ' y y ' Hy,
n n
( y y) y
i 1
i
2
i 1
2
i ny 2 ,
1 n
1
where y
n i 1
yi ' y
n
with 1,1,...,1 ', y y1 , y2 ,..., yn '
Thus
n
1
( y y)
i 1
i y ' y n 2 ' yy '
2
n
1
y ' y y ' ' y
n
1
y ' y y ' ( ' ) ' y
y ' I ( ' ) 1 ' y
y ' Ay
y ' Hy
So R2 1 .
y ' Ay
Similarly, any other value of R 2 between 0 and 1 indicates the adequacy of the fitted model.
With a purpose of correction in the overly optimistic picture, adjusted R 2 , denoted as R 2 or adj R 2 is used
which is defined as
SSr e s / (n k )
R2 1
SST / (n 1)
n 1
1 (1 R ).
2
n k
We will see later that (n k ) and (n 1) are the degrees of freedom associated with the distributions of SSres
SSr e s SST
and SST . Moreover, the quantities and are based on the unbiased estimators of respective
nk n 1
variances of e and y in the context of analysis of variance.
The adjusted R 2 will decline if the addition if an extra variable produces too small a reduction in (1 R 2 ) to
n 1
compensate for the increase in .
nk
Another limitation of adjusted R 2 is that it can be negative also. For example, if k 3, n 10, R 2 0.16,
then
9
R 2 1 0.97 0.25 0
7
which has no interpretation.
Reason that why R 2 is valid only in linear models with intercept term:
In the model y X , the ordinary least squares estimator of is b ( X ' X ) 1 X ' y . Consider the
fitted model as
y Xb ( y Xb)
Xb e
where e is the residual. Note that
y ly Xb e ly
yˆ e ly
where ŷ Xb is the fitted value and l (1,1,...,1) ' is a n 1 vector of elements unity. The total sum of
n
squares TSS ( yi y ) 2 is then obtained as
i 1
The Fisher Cochran theorem requires TSS SS reg SS res to hold true in the context of analysis of
variance and further to define the R2. In order that TSS SS reg SS res holds true, we need that
l ' e should be zero, i.e. l ' e =l '( y yˆ ) 0 which is possible only when there is an intercept term in the
model. We show this claim as follows:
First, we consider a no intercept simple linear regression model yi 1 xi i , (i 1, 2,..., n) where the
n
x y i i n n n
parameter 1 is estimated as b *
1
i 1
n
. Then l ' e = ei ( yi yˆi ) ( yi b1* xi ) 0, in general.
x
i 1
2
i
i 1 i 1 i 1
0.
In a multiple linear regression model with an intercept term y 0l X where the parameters 0
and are estimated as ˆ0 y bx and b ( X ' X ) 1 X ' y , respectively. We find that
l ' e =l '( y yˆ )
=l '( y ˆ0 Xb)
=l '( y y Xb Xb) ,
=l '( y y ) l '( X X )b
=0.
Thus we conclude that for the Fisher Cochran to hold true in the sense that the total sum of squares can
be divided into two orthogonal components, viz., the sum of squares due to regression and sum of
squares due to errors, it is necessary that l ' e =l '( y yˆ ) 0 holds and which is possible only when the
intercept term is present in the model.
3. R 2 always increases with an increase in the number of explanatory variables in the model. The main
drawback of this property is that even when the irrelevant explanatory variables are added in the
model, R 2 still increases. This indicates that the model is getting better, which is not really correct.
( y yˆ )
i i
2
R12 1 i 1
n
( y y)
i 1
i
2
(log y log yˆ )
i i
2
R22 1 i 1
n
.
(log y log y )
i 1
i
2
As such R12 and R22 are not comparable. If still, the two models are needed to be compared, a better
( y anti log yˆ )
i
*
i
R32 1 i 1
n
( y y)
i 1
i
2
y . Now
where y log
*
R12 and R32 on the comparison may give an idea about the adequacy of the two
i i
models.
SST
In the limit, when R 2 1, F . So both F and R 2 vary directly. Larger R 2 implies greater F value.
That is why the F test under the analysis of variance is termed as the measure of the overall significance of
estimated regression. It is also a test of significance of R 2 . If F is highly significant, it implies that we can
reject H 0 , i.e. y is linearly related to X ' s.
Its variance is
Var ( p ) E p E ( y ) ' p E ( y )
= 2 x0 ( X ' X ) 1 x0
Then
E ( yˆ 0 ) x0 E ( y | x0 )
Var ( yˆ 0 ) 2 x0 ( X ' X ) 1 x0
The confidence interval on the mean response at a particular point, such as x01 , x02 ,..., x0 k can be found as
follows:
The 100 (1 )% confidence interval on the mean response at the point x01 , x02 ,..., x0 k , i.e., E ( y / x0 ) is
yˆ 0 t , n k ˆ x0 ( X ' X ) x0 , yˆ 0 t , n k ˆ x0 ( X ' X ) x0
2 1 2 1
.
2 2
p f t , n k ˆ [1 x0 ( X ' X ) x0 ], p f t , n k ˆ [1 x0 ( X ' X ) x0 ] .
2 1 2 1
2 2
Econometrics
40