Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
14 views

MLRM

Uploaded by

Supravat Bagli
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views

MLRM

Uploaded by

Supravat Bagli
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

1

K-Variable Linear Regression Model

The multivariate relation gives rise to a richer array of inference questions and reduces the
chance of omitted variable bias in estimation than the two-variable equation. The
specification of k variable model:

Y i=β 1 + β 2 X 2i + β 3 X 3 i … …..+ β k X ki+ U i. [1] Population model/True model

This equation identifies k- 1 explanatory variables (regressors), namely, X 2, X3, . . . , Xk, that
are thought to influence the dependent variable (regressand) Y. subscript ‘i’ indicates the ith
population member(observation).

U is stochastic disturbance, captures the randomness of the relationship between the


regressand and the regressor. It contains the unobserved factors affecting Y.

The matrix form of the model [1] for n observations

Y = Xβ+U (1a)

[ ] [] [] []
1 X 21 ⋯ X k 1 Y1 β1 U1
Where, X= ⋮ ⋱ ⋮ Y= ⋮ β=¿ ⋮ and U = ⋮
1 X 2 n ⋯ X kn Yn βk Un

Assumptions

A1 Linear in coefficient Parameters

The relationship in [1] is linear in the coefficient parameters ( β j ), however Y and Xs may be
various transformations of the underlying variables of interest. This assumption simply
defines the multiple linear regression model.

A2 No Perfect Collinearity
There are no exact linear relationship among the independent variables. It means that there is
no exact multicollinearity problem. Technically this assumption says that X matric has full
column rank i.e. ρ ( X ) =k this assumption is known as the identification condition. The linear
independence of the columns of X is required for unique determination of the estimates of β j

Supravat Bagli/MLRM
2

A3 Zero Conditional mean of the disturbance


This assumption states that no observations on X convey information about the expected

[] [ ]
U1
E

[] ]
X 0
value of the disturbance. Which we write as E
U
X
=0 i . e .[ ]⋮ = ⋮

E
Un
X
0
[
The zero conditional mean implies that unconditional mean is also zero, since

[ [ ]]
E [ U i ]=E x E
Ui
X
= E x [ 0 ] =0 for all i

Since for each U i cov( E [ ]


Ui
X
, X ¿=cov ( U i , X )=0 for all i

The implication of this assumption is that

E
[ ]
Y
X
= Xβ [1b]

Assumption A1 and A3 comprises the linear regression model. The regression Y on X is the
condition mean of Y given X. So assumption A3 makes Xβ conditional mean function.

Secondly assumption A3 indicates that unobserved factors in U and the explanatory variables
are uncorrelated. It means that explanatory variables are exogenous. Violation of this
assumption creates the problem of misspecification of the model and the problem of
endogeneity.

A4 Spherical Disturbances
This assumption states that disturbances are homoscedastic and non-autocorrelated. It can be
written as

[ ]
¿
UU 2
E =σ I
X

This assumption has two parts; (a) variance of Ui is constant for all i

And (b) covariance of Ui and Uj is zero for all i and j. Violation of part (a) say the problem of
heteroscedasticity and violation of part (b) creates the problem of autocorrelation. When the
disturbance term satisfies assumption A4, it is called spherical disturbance.

Supravat Bagli/MLRM
3

A5. Data generating process for the regressor to be unrelated to Disturbances.


It is common to assume that explanatory variables are non-stochastic as it happens in
experimental situation. With this simplification A3 and A4 wold be made unconditional.
However in reality X can be a mixture of stochastic and non-stochastic. So we need an
assumption that the ultimate source of the data in X is statistically and economically
unrelated to the source of U. In other words ( Y i , X 2 i , X 3 i … … .. X ki) , i=1,2,…n are
independently and identically distributed. This assumption holds automatically if the data are
collected by simple random sampling.

Assumptions A1 through A5 are collectively known as Gauss-Markov assumptions for cross


section analysis.

Ordinary Least Squares (OLS) Estimates of the MLRM

The primary objective of the estimation of a MLRM is to estimate the conditional mean of Y
given X with some confidence interval. For this purpose we need to estimate k coefficient
parameters βs and conditional variance of U i.e. σ 2. So we estimate k+1 parameters in model
[1a]

If the unknown vector β in [1a] is replaced by some estimate ^β we can write a residual vector

e=Y −X ^β

Or Y = X ^β+ e --[2] sample model

The OLS principle is to choose ^β to minimise the residual sum of squares i.e. minimise
¿
e e=¿ -- [3]

We know that sum of squares deviations from mean is minimum. This minimising the sum of
the squares of the residuals we actually find out the conditional mean of Y.

¿
RSS=e e=¿

=Y ¿ Y −Y ¿ X ^β− ^β ¿ X ¿ Y + β^ ¿ X ¿ X ^β

= Y ¿ Y −2 β^ ¿ X ¿ Y + ^β¿ X ¿ X ^β

^ β^ ¿ X ¿ Y are scalar and one is the transpose of the other, hence both are equal.]
[Since Y ¿ X β∧

Supravat Bagli/MLRM
4

FOCs for minimisation

∂ RSS
=−2 X Y +2 X X ^β=¿ 0
¿ ¿
-- [4] are the k normal equations
∂ β^

Or, ^β=( X ¿ X )−1 X ¿ Y --- [5] this is the OLS estimates of β vector. This expression shows
that how the OLS estimate of β is related to the data.

2
∂ RSS ¿
SOCs requires =2 X X >0 is satisfied as 2 X ¿ X is positive definite matrix by
∂ β^
2

assumption A2.

If we put [2] in [5] we get

^β=( X ¿ X )−1 X ¿ X β^ + ( X ¿ X )−1 X ¿ e

−1
¿ 0=( X ¿ X ) X ¿ e

Or X ¿ e=0 -- [6] as ( X ¿ X )−1 ≠0

It is a fundamental result in OLS. The first element in [6] gives

∑ ei=0 i.e. e=Y − ^β 1− ^β 2 X 2 −…− β^ k X k =0 [6a]

Thus the residuals have zero mean, and the regression plane passes through the point of
means in k dimensional space. The remaining elements in Eq. (6) are of the form
n

∑ X ji ei=0 --[6b] j=1,2,…k, and n (sample size)


i

This condition means that each regressor has zero sample correlation with the residuals.
n n n
1 1
[ sample covariance of Xj and e = cov(Xj,e) = ∑ (X ji ¿− X j )ei= ∑ ( X ji )e i−X j ∑ ei ¿=0
n i=1 n i=1 i=1

n
as ∑ X ji ei =0 and ∑ ei =0]
i

This in turn implies that Y^ ( = X ^β ), the vector of regression values for Y, is uncorrelated with
e, for
Y^ ¿e = X ^β ¿e = ^β ¿ X ¿ e = 0 -- [7]

Supravat Bagli/MLRM
5

Further we have Y =Y^ + e

Implies Y =Y^ + e

Or Y =Y^ --[8] since e=0

Decomposition of the Sum of Squares

The zero covariance between the regressors and the residual underlie the decomposition of
the sum of squares. Decomposing the Y vector into the part explained by the regression and
the unexplained part,
Y =Y^ + e=X ^β+ e
it follows that
Y Y =¿ = ^β ¿ X ¿ X β^ +e ¿ e
¿

However, Y ¿ Y is the sum of squares of the actual Y values. We are actually interested to
analyse the variation in Y measured by the sum of squared deviation from the sample i.e.
n

∑ ¿¿ ¿
i=1

Thus, subtracting n Y 2 from each side of the previous decomposition gives a revised
decomposition,
Y Y −n Y =¿ ^β X X β−nY
^
¿ 2 ¿ ¿ 2 ¿
+e e
Or Y ¿ Y −n Y 2=¿ Y^ ¿ Y^ −n Y^ 2+ e¿ e since Y =Y^

Therefore, TSS = ESS + RSS [9]

where TSS indicates the total sum of squares in Y, and ESS and RSS the explained and
residual (unexplained) sum of squares.

Equation in Deviation Form

An alternative approach is to begin by expressing all the data in the form of deviations from
the sample means. We have sample regression function for ith observation
Y i= β^ 1 + ^β 2 X 2i + …+ ^β k X ki +e i [from equation 2]

Supravat Bagli/MLRM
6

And from [6a]


Y = ^β1 + ^β 2 X 2+ …+ ^β k X k
Now subtracting the second equation from the first gives

y i= β^ 2 x2 i +…+ β^ k x ki +e i --[10]

This is called the deviation form of the k variable LRM where lowercase letters denote
deviations from sample means. The intercept ^β 1disappears from the deviation form of the
equation, but it may be recovered from [6a].
^β =Y − ^β X −…− β^ X --[10a]
1 2 2 k k

The least-squares slope coefficients are identical in both forms of the regression equation, [2]
and [10] as are the residuals.

Collecting all n observations, the deviation form of the equation may be written compactly
using a transformation matrix,

A=I n − ( 1n ) ii ¿
[11]

where i is a column vector of n ones. This is a symmetric, idempotent matrix.

If we premultiply A in the matrix form of [2] we get the deviation form of the in matrix
notation as follows.

AY = AX β^ + Ae

Or AY = A [i X 2 ] [] ^β
α^
1
+ Ae

Or y=x α^ + e --[12]

Here X2 is a matrix of n × ( k−1 ) ∧^α is the column vector of the slope estimates of order
( k −1 ) × 1. Premultiplication of A with a vector of n observations, transforms that vector into
deviation form. It follows that Ae = e and Ai = 0. Therefore,
Now premultiplying x ¿ in [12] we get
¿ ¿ ¿
x y=x x α^ + x e
This the set of (k-1) normal equation
−1
Or α^ =( x ¿ x ) x ¿ y --[12a]
¿ ¿ ¿ ¿
From 12 we can write y y= α^ x x α^ + e e

Supravat Bagli/MLRM
7

Or , TSS=ESS+RSS

Estimation of the variance of disturbance σ 2

In addition to the coefficient parameters we have to estimate the unknown variance of U. it is


reasonable to base an estimate on the RSS from the fitted regression. We have
e=Y −X ^β

Implies e=Y −X ( X ¿ X )−1 X ¿ Y = [I −X ( X ¿ X )¿ ¿−1 X ¿ ]Y =MY ¿ where M is a symmetric and


idempotent matrix.

Therefore e=M ( Xβ+ U )=MXβ + MU =MU as MX=0


¿ ¿ ¿ ¿
Now e e=¿ = U M MU = U MU
¿ ¿
Thus E(e e)=E(U MU ) now trace of a scalar is a scalar. So,

¿ ¿ ¿ 2
E(U MU )=E (tr U MU )=E(tr U U M )=σ tr ( M )

¿ ¿
=σ 2tr[I n −X ( X X )¿ ¿−1 X ]¿= σ 2 ¿=σ 2 ¿

2 2
¿ σ ¿= ¿ σ [n−k ]

[trace implies the sum of the diagonal elements of a square matrix. trace (AB)=trace(BA) and
tr(ABC)=tr(BCA) where ABC are not necessarily square matrix.]
¿
ee 2
Or, E( )¿ σ
n−k
¿
2 ee
Therefore, s = --[13]
n−k

is the unbiased estimator of σ 2. The square root of s2 is the standard deviation of Y values
about the regression plane. It is often referred to as the standard error of the regression (SER).
The SER is used as a measure of the fit of the regression. The divisor n-k (rather than n)
adjust for the downward bias introduced by estimating k-1 slope coefficients and one

Supravat Bagli/MLRM
8

intercept parameter. When n is large, the effect of the degrees of freedom adjustment is
negligible.
SER is an absolute measure of the goodness of fit. It depends on the unit of Y. SER measure
the spread of the observations around the regression line. So higher the value of SER lower
would be the goodness of fit and vice versa. In other words, large spread means that
prediction of Y made using the selected X variables will often be wrong by a large amount.
Other measures of the Goodness of Fit
The goodness of fit of a linear regression model measure how well the estimated model fits a
given set of data or how well it can explain the population. It is however, difficult to come up
with a perfect measure of the goodness of fit for econometric model. A regression model fit
well if the dependent variable is explained more by the regressor than by the residual. The
coefficient determination R2 is defined as the squared of multiple correlation coefficient is a
common measure of the goodness of fit of a regression model.

2 ESS RSS
R= =1− --[14]
TSS TSS

Thus R2 measures the proportion of the total variation in Y explained by the linear
combination of the regressors. It is the square of the simple correlation coefficient of Y and Y^
. a high value of R2 indicates that we can predict individual outcomes on Y with much
accuracy on the basis of the estimated model.
TSS=RSS when the best fitted regression has no regressor, only intercept. If we add regressor
into the model RSS will reduce i.e. TSS≥RSS.
In one extreme R2 =0, the regression line is horizontal implying no change in Y with the
change in X. in other words X has no explanatory power.
In the other extreme, R2 =1 indicating all the data points lie on the same hyperplane of the
fitted regression and RSS=0. Thus in general 0≤ R2≤1

R2 is used as a measure of the goodness of fit, but it is difficult to say how large does R 2 need
to be considered as a good. The value of R 2 never decrease with addition of explanatory
variables. If added explanatory variable is totally irrelevant the ESS simply remains
unchanged due to the addition of this X. this is the basic limitation of the use of R2 as an
indicator of the goodness of fit.
Secondly, R2 is sensitive to extreme values, so R2 is not robust

Supravat Bagli/MLRM
9

Thirdly, R2 may be negative or greater than one if the intercept term is not included. So it may
not be a good measure of the fitness of the regression.
If intercept term is not included then

Y = X ^β+ e (2)

[ ] [] [] []
X 21 ⋯ X k 1 Y1 ^β e1
1
Where, X= ⋮ ⋱ ⋮ Y= ⋮ ^β=¿ ⋮ and e = ⋮
X 2 n ⋯ X kn Yn ^β en
k

From condition [4] we get X ¿ X β^ =X ¿ Y


Putting [2] in [4] gives X ¿ e=0 [ a k-1 column vector]
But as X matrix did not include the column of 1s so we cannot find the sum of ei is equal to
zero. Therefore, if we did not include the intercept we cannot say sample covariance of X
variables and residuals is zero.
We Y =Y^ + e
Or, (Y −lY )¿ (Y −lY )=( Y^ −l Y )¿ ( Y^ −lY )+e ¿ e +2( Y^ −lY )¿ e where l is a column vector of
n ones.
Or TSS=ESS+RSS+2( X ^β−lY ¿ ¿¿ e
Or, TSS-RSS =ESS−2 Y l ¿ e
¿
ESS−2Y l e
Or 1-RSS/TSS=
TSS
¿
2 ESS−2 Y l e
Or, R = which may be negative if ESS ¿ 2 Y l ¿ e . Thus without very strong
TSS
theoretical foundation we did not formulate a linear regression model without intercept term.

Adjusted R2 as measure of the Goodness of fit


The R2 may be an indicator of goodness of fit of the model after adjusting the degrees of
freedom in estimating the parameter. The value of R 2 adjusted by the degrees of freedom is
known as adjusted R2 denoted by
RSS /(n−k )
R2=1− --[15] where RSS/(n-k) is the unbiased estimator of the
TSS /(n−1)
variance of U or the conditional variance of Y and TSS/(n-1) is an unbiased estimator of the
unconditional variance of Y.

Supravat Bagli/MLRM
10

It is useful for comparing the fit of specifications that differ in the addition or deletion of
explanatory variables. While the unadjusted R2 will never decrease with the addition of any
variable to the set of regressor, the adjusted R2 however, may decrease with the addition of
variables of low explanatory power.
The relation between the adjusted and unadjusted R2 is

( n−1 ) RSS ( n−1 )


R2=1− =1− (1−R2 ) --[16]
( n−k ) TSS ( n−k )

2 2
R =R when k=1, it means that when the regression is formulated only with an intercept and
no explanatory variable or one explanatory variable without intercept (which is very rare). In
MLRM when k increases (n-1)/(n-k) increases and (1- R2) falls. The ratio (n-1)/(n-k) is called
the penalty of using more regressor in a model and R2 is the benefit of the addition of
regressor. Whether addition of the regressors improve the explanatory power of the model
depends on the trade-off between R2 and the penalty (n-1)/(n-k). Therefore, adjusted R2 may
not increase with the number of explanatory variables. If the contribution of the additional
regressor to the estimated model is more than the loss of the degrees of freedom R2 will rise
with the rise in the number of regressor otherwise it will decline if the additional explanatory
variable has no explanatory power.
So, clearly R2 ≤ R 2
2 (k−1)
It is noted that R2 may be negative when R <
(n−1)

Properties of OLS estimators

Testing Linear Hypotheses about β

We have estimated the regression coefficients β and examined the properties of the OLS
estimators. Let now see how to use these estimators to test various hypotheses about β .
Consider the following examples of typical hypotheses about β .

Supravat Bagli/MLRM
11

i] H 0 : β j=0 this hypothesis tells that regressor X j has no effect on Y. it is a very common
test often referred to as significance test.
ii] H 0 : β j=β j 0 Here β j 0 is some specific value. If for example β j denote income elasticity,
one might wish to test β j =1
iii] H 0 : β 2+ β 3=1 if β 2∧β 3 indicate labour and capital elasticities in a production function ,
the hypothesis examines the presence of CRS.
iv] H 0 : β 2=β 3∨β 2−β 3=0. It examines that X2 and X3 have the same coefficient.
v] H 0 : β 2=β 3=β 4=…=β k =0
β2 0
or, ⌈ ⋮ ⌉ =⌈ ⋮ ⌉ or H 0 :α =0 where α denotes the column vector of slope parameters of
βk 0
order (k-1). This sets up the hypothesis that the complete of regressor has no effect on Y. it
tests the significance of overall relation. The intercept term does not enter into this
hypothesis, interest centers on the variation of Y around its mean and the level of the series is
usually no specific relevance.
β⏟1∧β 2
H 0 : β⏟2=0⏟
vi] here β vector is partition into two sub vectors ⏟ containing respectively

k1 and k2(k-k1) elements. This sets up the hypothesis that a specified subset of the regressor
plays no role in the determination of Y.
vii] H 0 : β 2+ β 3=1 , β 4 + β 6=0 , β 5 + β 6=0 we may test several linear restrictions.

All the examples fit into general linear framework

H 0 : Rβ=r or Rβ−r=0 -- [1]

Where R is a q × k matrix of known constants with q <k and r is a q-vector of known


constants. Each null hypothesis determines the the relevant elements in R and r.
For the foregoing examples we have
i] R= [0 0 …0 1 0… 0] with 1 in the jth position r =0 and q=1
ii] R= [0 0 …0 1 0… 0] with 1 in the jth position r = β j0 and q=1
iii] R= [0 1 1 0 …0] r =1 and q=1
iv] R= [0 1 -1 0 …0] r =0 and q=1

Supravat Bagli/MLRM
12

[]
0
v] R= [0 Ik-1] where 0 is a vector of k-1 zeros r= ⋮ and q=k-1
0 k −1 × 1

[]
0
vi] R= [0 k × k I k ]
2 1 2
r= ⋮ and q=k 2
0 k2 × 1

[ ] []
011000 … 0 1
vii] R= 000101 … 0 r= 0 and q=3
000011 … 0 0 3× 1

The general test may then be specialized to deal with any specific application. Given the OLS
estimator as ^β=( X ¿ X )−1 X ¿ Y =β+ ( X ¿ X )−1 X ¿ U , an obvious step is to compute the vector (R ^β
- r). This vector measures the discrepancy between expectation and observation. If this vector
is, in some sense, “large,” it casts doubt on the null hypothesis, and conversely, if it is “small”
it tends not to contradict the null. As in all conventional testing procedures, the distinction
between large and small is determined from the relevant sampling distribution under the null,
in this case, the distribution of R ^β with hull hypothesis Rβ=r .

E ¿) = R E ¿)= Rβ --[2]

Var ( R ^β )=E ¿R ^β−Rβ ¿ ¿ R ^β−Rβ ¿ =ℜ( ^β−β )¿


¿
--[3]

We have assumed that U is a spherical disturbance. Now we assume that the Ui are
normally distributed. It says that U~N (0, σ 2 I ¿ . Since ^β is a linear function of the U vector, ^β
follows normal distribution. Further R ^β is a linear function of ^β , so we say that R ^β N ¿) it
implies R ^β−Rβ N ¿)
Under null hypothesis Rβ=r
So under null R ^β−r N ¿)
With this formulation we can say
¿ --[4]
[ χ 2q is the sum of square of q standard normal variate]
The distribution in [4] is derived based on the sampling distribution of ^β . The only problem
hindering practical application of Eq. (4) is the presence of the unknownσ 2. However,

Supravat Bagli/MLRM
13

¿
e e 2
2
χ n−k -- [5] which is independent of ^β
σ
Thus the ratio between [4] and [5] gives a suitable statistic where σ 2 is absent.
i.e. ¿ ¿
Finally if we divide the numerator and denominator by their respective degrees of freedom
we get
¿¿ [6]
Or, ¿ ¿ --[6.1]

¿
ee
∧varcov ( β^ ) =s ¿
2 2
Where s =
n−k
Suppose, Cij denote i,j th element in ¿ then s2 C jj=Var ( β^j) and s2 C jt =cov ( β^j ^
β t ) j,t =1,2..k
Let us now consider the hypotheses one by one
i] H 0 : β j=0 under this null hypothesis (R ^β−r ) picks out ^
β j and R ¿picks out the jth
diagonal element in ¿ this equation [6] becomes
β^j
2

2
F (1 , n−k) Now taking square root of F (1 ,n−k ) we get
s C jj
β^j ^
βj
= t n−k statistic -[s1]
s √c jj se . ^
βj
Thus the null hypothesis that Xj has no influence on Y is tested by dividing the estimated
value of the coefficient by its s.e. which follows t distribution with d.f. n-k. if the calculated
value greater than the tabulated value with a specific level of significance we reject the null
hypothesis.
Similarly for the following hypotheses we can test
^
β j−β j 0
ii] H 0 : β j=β j 0 by t n−k statistic [s2]
se . β^
j

Confidence interval of β j
Instead of testing specific hypothesis about β j we may compute 95% level of confidence
interval for β j . Because of random sampling error, it is impossible to learn the exact value of
the true coefficient parameter β j using only the information in a sample. However, it is
possible to use data from random sample to construct the range that contains the true

Supravat Bagli/MLRM
14

population parameter β j with a certain pre-specified probability (say 95%). The range is
called confidence interval and the specified probability is known as confidence level.
For constructing confidence interval we require to test all possible values of β j as null
hypothesis which is almost impractical. Fortunately there is a much easier approach. In
respect of the t statistic in the hypothesis H 0 : β j=β j 0, the trial value β j 0 of β j is rejected at 5%
level of significance if |t n−k|>1.96 for n-k>120. Otherwise, we cannot reject the null at 5%
level of significance. The null would not be rejected if
−1.96 ≤ t n−k ≤ 1.96

β^j−β j 0
−1.96 ≤ ≤ 1.96
se . ^
β j

Or, ^
β j −1.96 se . β^j ≤ β j 0 ≤ ^
β j +1.96 se . ^
βj
Thus the set of values of β j that are not rejected at 5% level of significance consists of the
values within ^
β j ±1.96 se . ^
β j . Thus 95% confidence interval for β j is ^
β j ± t .025 se . β^j.
Similarly 99% confidence interval for β j is ^
β j ± 2.58 se . ^
β j for n-k>120

However this discussion so far has focused two sided confidence interval. We could instead
construct a one sided confidence interval as the set of values of β j that cannot be rejected by a
one sided hypothesis test.

^
β2 + β^3−1
iii] H 0 : β 2+ β 3=1 by statistic [s3]
√ var ¿ ¿ ¿
Alternatively 95% confidence interval for β 2+ β3 is ¿.
^
β 2− ^
β3
iv] H 0 : β 2=β 3∨β 2−β 3=0. by statistic [s4]
√ var ¿ ¿ ¿
95% confidence interval for β 2−β 3 is ¿.

Let us consider the case (v) H 0 : β 2=β 3=β 4=…=β k =0∨H 0 :α =0 . Here we have q=k-1
hypotheses and R= [0 Ik-1] where 0 is a vector of k-1 zeros. Now R ¿ picks out the square sub
matrix of order (k-1) in the bottom right hand corner in ¿.
¿ i
[ ]
To evaluate the sub matrix, partition the matrix X=[i X2] thus X = X where order of X2 is
2

n × k−1

Supravat Bagli/MLRM
15

[ ]
¿ ¿
¿ ii i X2
X X= ¿ ¿
X2 i X2 X 2

Therefore ¿= [ B11 B12


B 21 B22](let)

R¿

The properties of the inverse of partition matrix tells that


B22=¿

Where A is a symmetric and idempotent matrix and AX 2 give the deviation form of the
explanatory variables in our k variable model.

Now R ^β−r under null is α^ and q=k-1. These follows

ESS ESS
¿ ¿
¿¿ = ^
α (x x) ^
α k−1 k−1 ~F(k-1,n-k)
= ¿ =
2
s q e e /n−k RSS /n−k

2
ESS /TSS R
k −1 k−1
Or, ~F(k-1,n-k) or 2 ~F(k-1,n-k) [s5]
RSS /TSS (1−R )
n−k n−k

This test essentially asks whether the mean squares due to regression is significantly larger
than the residual mean square.

H 0 : β⏟2=0⏟
Next we consider the hypotheses under [vi] This hypothesis postulates that a

subset of regressor coefficients is a zero vector, in contrast with the previous example, where
all regressor coefficients were hypothesized to be zero. Partition the regression equation as
follows:

Supravat Bagli/MLRM
16

()
^1
β⏟
Y = ( X 1 X 2 ) ^ +e=X 1 ^
β⏟1 + X 2 ^
β⏟2 +e
β⏟2

^
β⏟1∧ β^2
Where X1 has k1 column including the intercept column, and X2 has k2 column and ⏟
are the corresponding sub vectors of the coefficient OLS estimator.

The partitioning of the X matrix gives

¿=[ B11 B12


B 21 B22 ]
(let)

Now R ¿ picks out the square sub matrix of order (k2) in the bottom right hand corner in ¿.
Here R= [0 k × k I k , r=0 and q=k 2
2 1 2

Therefore, like case [v] we find R ¿


where M 2=I −X 1 ¿ ¿ is a symmetric and idempotent matrix and M 2 X 1=0∧M 2 e=e
Further M 2 Y gives the vector of residuals when Y is regressed on X 1 . Then the numerator in
[6] is
^
β⏟ ¿ ¿ ^
⏟2
2 (X 2 M 2 X 2) β

k2
To understand the meaning of the numerator consider the partition regression
^
Y = X1 ⏟ ^ +e
β1 + X 2 β⏟2

Premultiplying M 2 we get
M 2 Y =M 2 X 1 ^ ^ +M e
β⏟1 + M 2 X 2 β⏟2 2

¿ , M 2 Y =M 2 X 2 ^
β⏟2 +e

I −X 1 ( X 1 ¿¿ ¿ X 1 ¿ X 1 )Y =M 2 X 2 ^
−1 ¿
β⏟2 +e ¿
Or, (

Or, Y−X 1 ¿ ¿
−X 1 ^ ^
β⏟1=M 2 X 2 ⏟
β2 + e
Or, Y

Supravat Bagli/MLRM
17

e r=M 2 X 2 ^
β⏟2 +e
Or,

Squaring both sides gives


er er = ^ ⏟^2 +e e ¿ ¿ e r e r −e e= ^ ⏟^2 ¿ ¿
¿ ¿ ¿ ¿ ¿ ¿ ¿ ¿
β⏟2 ( X 2 M ¿ ¿ 2 X ¿ ¿ 2) β β⏟2 ( X 2 M ¿ ¿ 2 X ¿ ¿ 2) β
or,

The term on the left of this equation is the RSS when Y is regressed just on X1. The last term,
e'e, is the RSS when Y is regressed on [X1 X2]. Thus the middle term measures the increment
in ESS (or equivalently, the reduction in RSS) when X2 is added to the set of regressors. In
^
β⏟
¿ ¿ ^
other words, 2 ( X 2 M ¿ ¿ 2 X ¿ ¿ 2) ⏟
β2 ¿ ¿
is the difference between restricted RSS and

unrestricted RSS. The hypothesis may thus be tested by running two separate regressions.
First regress Y on X1(submatrix of X) and denote the RSS by RSSr then run the regression on
all the Xs, obtaining the RSS, denoted as usual by RSSu. From Eq. (6) the test statistic is
RSSr −RSSu
k2
~ F(k2, n-k)
R SSu
n−k
Dividing the numerator and the denominator by TSS we get
2 2
Ru−R r
k2
~ F(k2, n-k) [s6]
(1−R2u )
n−k
Where R2u∧R 2r indicate the coefficient of determination for the unrestricted regression and
the coefficient of determination for the restricted model. Finally we compare the calculated
value and tabulated value of F and if find the calculated value is greater that the tabulated
value we reject the null hypothesis.
Note that hypothesis testing in [v] is a special case of [vi] when X 1 =the intercept dummy and
X2 includes all explanatory variables. Now if we regress Y on the intercept term TSS=RSS i.e
RSSr=TSS and R2r =0 . Putting R2r =0 in [s6] we get [s5]. Thus [s6] statistic is a general
statistic which can be used for testing all the hypotheses stated above and all other linear
restrictions.

Finally we see how to use [s6] for testing hypotheses in [vii].


We have the unrestricted regression model
Y i=β 1 + β 2 X 2i + β 3 X 3 i + β 4 X 4 i + β 5 X 5 i+ β6 X 6i + … …..+ β k X ki + U i

Supravat Bagli/MLRM
18

Putting restrictions β 2=1−β 3 , β 4 =−β 6, and β 5=−β 6 in the unrestricted model we get
Y i=β 1 +(1−β ¿¿ 3) X 2 i + β 3 X 3 i−β 6 X 4 i−β 6 X 5 i + β 6 X 6 i+ … … ..+ β k X ki +U i ¿
Or, Y i− X 2 i=β 1+ β 3 ( X ¿ ¿ 3i−X 2 i)+ β 6 ( X 6 i−X 4 i−X 5 i )+ … … ..+ β k X ki +U i ¿ restricted model
Now estimating the unrestricted and restricted models compute R2u∧R 2r .
2 2
Ru−R r
3
2 ~ F(3, n-k) [s7]
(1−Ru )
n−k
Note that q may be calculated in several equivalent ways: a. the number of rows in R matrix.
b. number of elements in r vector c. the difference between the number of slope coefficients
in unrestricted and the restricted model. The difference between the degrees of freedom
attaching to RSS in restricted and unrestricted model.

Simple correlation, Multiple correlation and Partial correlation

Simple correlation between two variables X 1 and X2 is the degree of linear association
between the variables. The measure of simple correlation coefficient:
cov (X 1 , X 2)
r 12= . It can be derived without making any reference to the structure
√ var ( X ¿¿ 1) var (X 2 )¿
of causal dependence i.e. regression specification.

Multiple Correlation coefficient is the relation between a explained variable Y and the
explanatory variables (X2,…Xk). Square of this Multiple Correlation coefficient is denoted by
2 ESS
RY .23 … k = ; it is interpreted as the proportion of the sample variation in Y that is explained
TSS
by the OLS regression line. It is known as the Coefficient of Determination for the
regression. It is equal to the squared correlation coefficient between the actual and fitted
2 ESS 2
values of Y i.e. RY .23 … k = =r Y ^Y
TSS
[cov ( Y Y^ ) ]
2

To illustrate the definition r = 2


Y Y^
Now Cov(Y Y^ ¿=Cov ( ( Y^ +e ) Y^ )=Var ( Y^ )
var ( Y ) Var ( Y^ )
2 Var ( Y^ )
r Y Y^ = =ESS /TSS
Var (Y )

Supravat Bagli/MLRM
19

Partial Correlation coefficient between explained variable Y and an explanatory variable, say
Xj, keeping all other X’s unchanged. The square of the partial correlation coefficient is
2
2
r r
2
denoted by r Yj .23 … j−1 , j +1 …k or by j . It is actually the e e where e1 denotes the residual where
1 2

Y is regressed on all Xs except X j and e2 denotes the residual when Xj is regressed on all
other Xs.

To illustrate the relation between partial and simple correlation coefficient consider
Y = β1 + β 2 X 2+ β3 X 3 +U
In order to find partial correlation between Y and X 2 we need to calculate simple correlation
between e1 and e2 where e1 is the residual when we regress Y on X 3 and e2 is the residual
when we regress X2 on X3. Or y= a^ x3 + e1 and x 2=b^ x 3 +e 2
After applying OLS in these deviation form equation we get
sy s2
a^ =r y 3 ^
∧ b=r 23
s3 s3
And Var(y) =Var(a^ x 3 ¿+var (e 1) and Var(x2) =Var(b^ x 3 ¿+var (e 2)

[ ]
2
sy 2 2
 var ( e1 ) =s 2y − r y 3 2
s3 =s y [1−r y 3 ]
s3
2 2
Similarly var ( e2 ) =s 2 [1−r 2 3 ]
n
1
[ ]
Now Cov(e 1 e2 ¿=cov ( y−a^ x 3 ) , ( x 2−b^ x 3 ) = ∑ ( y x 2−¿ a^ x3 x 2−b^ y x3 + a^ x 3 b^ x 3)¿
n i=1
sy s2 sy s2
=cov ( y x 2) −r y 3 cov ( x3 x 2 )−r 23 cov ( y x 3 ) +r y3 r 23 var ( x 3 )
s3 s3 s3 s3
s y s 2 (r y 2−r y 3 r 23−r y 3 r 23 +r y3 r 23)= s y s 2 (r y 2−r y 3 r 23 )

2 2 2 s y s 2 (r y 2−r y 3 r 23 )
r y2.3 =r 2=r e e =


1 2

Therefore, (r y 2−r y3 r 23)


(s 22 [ 1−r 223 ] )(s ¿¿ y 2 [1−r 2y 3 ¿])= ¿¿
√ [ 1−r223 ] [1−r 2y 3¿ ]¿

2 2 2
r YX j , RY .1 ,2 ,. .. ,k ¿ r j :
Relationship among
In order to illustrate the relation consider k=3 and j=3
Thus the MLRM becomes Y = β1 + β 2 X 2+ β3 X 3 +U

Supravat Bagli/MLRM
20

The main objective of the single-equation analysis is to explain the variation in Y. the
multiple regression Y on X2 and X3 give the total ESS can be written as R2Y .23 TSS. Now if we
first run the regression Y on X 2 then ESS can be written as R2Y .2 TSS∨r 2Y 2TSS and RSSr=
(1−r ¿ ¿ Y 2 )TSS ¿
2

Now if we add X3 and want to know the additional contribution of X 3 to explain the
remaining RSS i.e. RSSr we regress e1 on e3; where e1 is the residual when we regress Y on
X2 and e3 is the residual when we regress X 3 on X2 . ESS in this stage can be written as
2 2 2 2
r e e RSSr=r Y 3.2 RSSr =r Y 3.2 (1−r Y 2)TSS. Thus total ESS after inclusion of X2 and X3 can be
1 2

written as
2 2 2 2
r Y 2 TSS +r Y 3.2 (1−r Y 2)TSS which again can be written as RY .23 TSS.

Thus r Y 2 TSS +r Y 3.2 ( 1−r Y 2 ) TSS=R Y .23 TSS


2 2 2 2

Or, RY .23=r Y 2 +r Y 3.2 ( 1−r Y 2 )


2 2 2 2

Similarly if we start with the regression Y on X3 then we get


R2Y .23=r 2Y 3 +r 2Y 2.3 ( 1−r 2Y 3 )

In general we can write for the inclusion of Xj


R2Y .23 … k =r 2Y .23 .. j−1 , j+1.. k + r 2j ( 1−r 2Y .23 .. j−1 , j+1.. k )

Corollary: if r 23=0 i.e. X2 and X3 are not correlated then the variables are said to be
orthogonal. In this case
2
2 ry2
r =
Y 2.3 2
[1−r y 3 ]
Therefore, R2Y .23=r 2Y 3 +r 2Y 2

2
2 F t
We can show that r j= = where F and t are the values the test statistic for
F+ df t 2 +df
testing H 0 : β j=0 i.e for testing the partial influence of X j on Y and df denotes the degrees
of freedom in the regression.
Consider H 0 : β j=0 ; under H0 we can derive RSSr. Now we add X j in regression, it will

explain
r 2j portion of previously unexplained variation in Y; So, unexplained part reduces by
2
r j RSSr

Supravat Bagli/MLRM
21

2
 RSSu =RSSr−r j RSSr =
(1−r j ) RSSr
2

We know for H 0 : β j=0

2
RSSr −RSSu RSS r−(1−r j ) RSS r
2
2 1 1 r j (n−k )
F(1 ,n−k )=t j = = 2
= 2
R SS u (1−r j ) RSSr 1−r j
n−k n−k

2 t 2j t 2j
r j= 2 =
t j +(n−k ) t 2j +df
or,; ;

2
Degrees of Freedom and R̄

RSSr > RSSu, and two different estimators of


σ 2u can be derived from these two models as
2 2
sr ∧s u
If q is the number of restrictions, then RSSr=(n-k+q) s2r and RSSu=(n-k) s2u;

2 2
RSS r−RSSu (n−k +q)s r−(n−k )s u
q q
F(q , n−k)= = 2
R SSu su
n−k

 With little manipulation we get


2
sr c+ F n−k
2
= where c=
s u
1+c q

2 (n−1) s2U
R̄ =1−
We know that ∑ y 2i ; so
R̄2 ∧s2U are inversely related
2 2
2 s r c+t j
If
H 0 : β j =0 , then
F=t j ; s 2 = 1+ c
u

We know that RSSr ≥ RSSu

Supravat Bagli/MLRM
22

RSSr RSSu
It implies ≥≤ or s2r ≥≤ s2u
n−k +1 n−k

If s2r ≤ s2u

2
c+ t j
≤1
1+c

|t |≤1
Again s2r ≤ s2u
s2r s 2u
Implies ≤
TSS/(n−1) TSS /(n−1)
2 2
sr su
1- ≥ 1−
TSS/(n−1) TSS/(n−1)
Therefore R2r ≥ R 2u
So, for any regressor |tj| < 1, we drop that regressor and if Xj is dropped from the regression

R̄2 will go up and the goodness of fit will improve;

If for any regression F is significant but most of the tj’s are insignificant, suspect the presence
of Multicollinearity;

Supravat Bagli/MLRM

You might also like