MLRM
MLRM
The multivariate relation gives rise to a richer array of inference questions and reduces the
chance of omitted variable bias in estimation than the two-variable equation. The
specification of k variable model:
This equation identifies k- 1 explanatory variables (regressors), namely, X 2, X3, . . . , Xk, that
are thought to influence the dependent variable (regressand) Y. subscript ‘i’ indicates the ith
population member(observation).
Y = Xβ+U (1a)
[ ] [] [] []
1 X 21 ⋯ X k 1 Y1 β1 U1
Where, X= ⋮ ⋱ ⋮ Y= ⋮ β=¿ ⋮ and U = ⋮
1 X 2 n ⋯ X kn Yn βk Un
Assumptions
The relationship in [1] is linear in the coefficient parameters ( β j ), however Y and Xs may be
various transformations of the underlying variables of interest. This assumption simply
defines the multiple linear regression model.
A2 No Perfect Collinearity
There are no exact linear relationship among the independent variables. It means that there is
no exact multicollinearity problem. Technically this assumption says that X matric has full
column rank i.e. ρ ( X ) =k this assumption is known as the identification condition. The linear
independence of the columns of X is required for unique determination of the estimates of β j
Supravat Bagli/MLRM
2
[] [ ]
U1
E
[] ]
X 0
value of the disturbance. Which we write as E
U
X
=0 i . e .[ ]⋮ = ⋮
E
Un
X
0
[
The zero conditional mean implies that unconditional mean is also zero, since
[ [ ]]
E [ U i ]=E x E
Ui
X
= E x [ 0 ] =0 for all i
E
[ ]
Y
X
= Xβ [1b]
Assumption A1 and A3 comprises the linear regression model. The regression Y on X is the
condition mean of Y given X. So assumption A3 makes Xβ conditional mean function.
Secondly assumption A3 indicates that unobserved factors in U and the explanatory variables
are uncorrelated. It means that explanatory variables are exogenous. Violation of this
assumption creates the problem of misspecification of the model and the problem of
endogeneity.
A4 Spherical Disturbances
This assumption states that disturbances are homoscedastic and non-autocorrelated. It can be
written as
[ ]
¿
UU 2
E =σ I
X
This assumption has two parts; (a) variance of Ui is constant for all i
And (b) covariance of Ui and Uj is zero for all i and j. Violation of part (a) say the problem of
heteroscedasticity and violation of part (b) creates the problem of autocorrelation. When the
disturbance term satisfies assumption A4, it is called spherical disturbance.
Supravat Bagli/MLRM
3
The primary objective of the estimation of a MLRM is to estimate the conditional mean of Y
given X with some confidence interval. For this purpose we need to estimate k coefficient
parameters βs and conditional variance of U i.e. σ 2. So we estimate k+1 parameters in model
[1a]
If the unknown vector β in [1a] is replaced by some estimate ^β we can write a residual vector
e=Y −X ^β
The OLS principle is to choose ^β to minimise the residual sum of squares i.e. minimise
¿
e e=¿ -- [3]
We know that sum of squares deviations from mean is minimum. This minimising the sum of
the squares of the residuals we actually find out the conditional mean of Y.
¿
RSS=e e=¿
=Y ¿ Y −Y ¿ X ^β− ^β ¿ X ¿ Y + β^ ¿ X ¿ X ^β
= Y ¿ Y −2 β^ ¿ X ¿ Y + ^β¿ X ¿ X ^β
^ β^ ¿ X ¿ Y are scalar and one is the transpose of the other, hence both are equal.]
[Since Y ¿ X β∧
Supravat Bagli/MLRM
4
∂ RSS
=−2 X Y +2 X X ^β=¿ 0
¿ ¿
-- [4] are the k normal equations
∂ β^
Or, ^β=( X ¿ X )−1 X ¿ Y --- [5] this is the OLS estimates of β vector. This expression shows
that how the OLS estimate of β is related to the data.
2
∂ RSS ¿
SOCs requires =2 X X >0 is satisfied as 2 X ¿ X is positive definite matrix by
∂ β^
2
assumption A2.
−1
¿ 0=( X ¿ X ) X ¿ e
Thus the residuals have zero mean, and the regression plane passes through the point of
means in k dimensional space. The remaining elements in Eq. (6) are of the form
n
This condition means that each regressor has zero sample correlation with the residuals.
n n n
1 1
[ sample covariance of Xj and e = cov(Xj,e) = ∑ (X ji ¿− X j )ei= ∑ ( X ji )e i−X j ∑ ei ¿=0
n i=1 n i=1 i=1
n
as ∑ X ji ei =0 and ∑ ei =0]
i
This in turn implies that Y^ ( = X ^β ), the vector of regression values for Y, is uncorrelated with
e, for
Y^ ¿e = X ^β ¿e = ^β ¿ X ¿ e = 0 -- [7]
Supravat Bagli/MLRM
5
Implies Y =Y^ + e
The zero covariance between the regressors and the residual underlie the decomposition of
the sum of squares. Decomposing the Y vector into the part explained by the regression and
the unexplained part,
Y =Y^ + e=X ^β+ e
it follows that
Y Y =¿ = ^β ¿ X ¿ X β^ +e ¿ e
¿
However, Y ¿ Y is the sum of squares of the actual Y values. We are actually interested to
analyse the variation in Y measured by the sum of squared deviation from the sample i.e.
n
∑ ¿¿ ¿
i=1
Thus, subtracting n Y 2 from each side of the previous decomposition gives a revised
decomposition,
Y Y −n Y =¿ ^β X X β−nY
^
¿ 2 ¿ ¿ 2 ¿
+e e
Or Y ¿ Y −n Y 2=¿ Y^ ¿ Y^ −n Y^ 2+ e¿ e since Y =Y^
where TSS indicates the total sum of squares in Y, and ESS and RSS the explained and
residual (unexplained) sum of squares.
An alternative approach is to begin by expressing all the data in the form of deviations from
the sample means. We have sample regression function for ith observation
Y i= β^ 1 + ^β 2 X 2i + …+ ^β k X ki +e i [from equation 2]
Supravat Bagli/MLRM
6
y i= β^ 2 x2 i +…+ β^ k x ki +e i --[10]
This is called the deviation form of the k variable LRM where lowercase letters denote
deviations from sample means. The intercept ^β 1disappears from the deviation form of the
equation, but it may be recovered from [6a].
^β =Y − ^β X −…− β^ X --[10a]
1 2 2 k k
The least-squares slope coefficients are identical in both forms of the regression equation, [2]
and [10] as are the residuals.
Collecting all n observations, the deviation form of the equation may be written compactly
using a transformation matrix,
A=I n − ( 1n ) ii ¿
[11]
If we premultiply A in the matrix form of [2] we get the deviation form of the in matrix
notation as follows.
AY = AX β^ + Ae
Or AY = A [i X 2 ] [] ^β
α^
1
+ Ae
Or y=x α^ + e --[12]
Here X2 is a matrix of n × ( k−1 ) ∧^α is the column vector of the slope estimates of order
( k −1 ) × 1. Premultiplication of A with a vector of n observations, transforms that vector into
deviation form. It follows that Ae = e and Ai = 0. Therefore,
Now premultiplying x ¿ in [12] we get
¿ ¿ ¿
x y=x x α^ + x e
This the set of (k-1) normal equation
−1
Or α^ =( x ¿ x ) x ¿ y --[12a]
¿ ¿ ¿ ¿
From 12 we can write y y= α^ x x α^ + e e
Supravat Bagli/MLRM
7
Or , TSS=ESS+RSS
¿ ¿ ¿ 2
E(U MU )=E (tr U MU )=E(tr U U M )=σ tr ( M )
¿ ¿
=σ 2tr[I n −X ( X X )¿ ¿−1 X ]¿= σ 2 ¿=σ 2 ¿
2 2
¿ σ ¿= ¿ σ [n−k ]
[trace implies the sum of the diagonal elements of a square matrix. trace (AB)=trace(BA) and
tr(ABC)=tr(BCA) where ABC are not necessarily square matrix.]
¿
ee 2
Or, E( )¿ σ
n−k
¿
2 ee
Therefore, s = --[13]
n−k
is the unbiased estimator of σ 2. The square root of s2 is the standard deviation of Y values
about the regression plane. It is often referred to as the standard error of the regression (SER).
The SER is used as a measure of the fit of the regression. The divisor n-k (rather than n)
adjust for the downward bias introduced by estimating k-1 slope coefficients and one
Supravat Bagli/MLRM
8
intercept parameter. When n is large, the effect of the degrees of freedom adjustment is
negligible.
SER is an absolute measure of the goodness of fit. It depends on the unit of Y. SER measure
the spread of the observations around the regression line. So higher the value of SER lower
would be the goodness of fit and vice versa. In other words, large spread means that
prediction of Y made using the selected X variables will often be wrong by a large amount.
Other measures of the Goodness of Fit
The goodness of fit of a linear regression model measure how well the estimated model fits a
given set of data or how well it can explain the population. It is however, difficult to come up
with a perfect measure of the goodness of fit for econometric model. A regression model fit
well if the dependent variable is explained more by the regressor than by the residual. The
coefficient determination R2 is defined as the squared of multiple correlation coefficient is a
common measure of the goodness of fit of a regression model.
2 ESS RSS
R= =1− --[14]
TSS TSS
Thus R2 measures the proportion of the total variation in Y explained by the linear
combination of the regressors. It is the square of the simple correlation coefficient of Y and Y^
. a high value of R2 indicates that we can predict individual outcomes on Y with much
accuracy on the basis of the estimated model.
TSS=RSS when the best fitted regression has no regressor, only intercept. If we add regressor
into the model RSS will reduce i.e. TSS≥RSS.
In one extreme R2 =0, the regression line is horizontal implying no change in Y with the
change in X. in other words X has no explanatory power.
In the other extreme, R2 =1 indicating all the data points lie on the same hyperplane of the
fitted regression and RSS=0. Thus in general 0≤ R2≤1
R2 is used as a measure of the goodness of fit, but it is difficult to say how large does R 2 need
to be considered as a good. The value of R 2 never decrease with addition of explanatory
variables. If added explanatory variable is totally irrelevant the ESS simply remains
unchanged due to the addition of this X. this is the basic limitation of the use of R2 as an
indicator of the goodness of fit.
Secondly, R2 is sensitive to extreme values, so R2 is not robust
Supravat Bagli/MLRM
9
Thirdly, R2 may be negative or greater than one if the intercept term is not included. So it may
not be a good measure of the fitness of the regression.
If intercept term is not included then
Y = X ^β+ e (2)
[ ] [] [] []
X 21 ⋯ X k 1 Y1 ^β e1
1
Where, X= ⋮ ⋱ ⋮ Y= ⋮ ^β=¿ ⋮ and e = ⋮
X 2 n ⋯ X kn Yn ^β en
k
Supravat Bagli/MLRM
10
It is useful for comparing the fit of specifications that differ in the addition or deletion of
explanatory variables. While the unadjusted R2 will never decrease with the addition of any
variable to the set of regressor, the adjusted R2 however, may decrease with the addition of
variables of low explanatory power.
The relation between the adjusted and unadjusted R2 is
2 2
R =R when k=1, it means that when the regression is formulated only with an intercept and
no explanatory variable or one explanatory variable without intercept (which is very rare). In
MLRM when k increases (n-1)/(n-k) increases and (1- R2) falls. The ratio (n-1)/(n-k) is called
the penalty of using more regressor in a model and R2 is the benefit of the addition of
regressor. Whether addition of the regressors improve the explanatory power of the model
depends on the trade-off between R2 and the penalty (n-1)/(n-k). Therefore, adjusted R2 may
not increase with the number of explanatory variables. If the contribution of the additional
regressor to the estimated model is more than the loss of the degrees of freedom R2 will rise
with the rise in the number of regressor otherwise it will decline if the additional explanatory
variable has no explanatory power.
So, clearly R2 ≤ R 2
2 (k−1)
It is noted that R2 may be negative when R <
(n−1)
We have estimated the regression coefficients β and examined the properties of the OLS
estimators. Let now see how to use these estimators to test various hypotheses about β .
Consider the following examples of typical hypotheses about β .
Supravat Bagli/MLRM
11
i] H 0 : β j=0 this hypothesis tells that regressor X j has no effect on Y. it is a very common
test often referred to as significance test.
ii] H 0 : β j=β j 0 Here β j 0 is some specific value. If for example β j denote income elasticity,
one might wish to test β j =1
iii] H 0 : β 2+ β 3=1 if β 2∧β 3 indicate labour and capital elasticities in a production function ,
the hypothesis examines the presence of CRS.
iv] H 0 : β 2=β 3∨β 2−β 3=0. It examines that X2 and X3 have the same coefficient.
v] H 0 : β 2=β 3=β 4=…=β k =0
β2 0
or, ⌈ ⋮ ⌉ =⌈ ⋮ ⌉ or H 0 :α =0 where α denotes the column vector of slope parameters of
βk 0
order (k-1). This sets up the hypothesis that the complete of regressor has no effect on Y. it
tests the significance of overall relation. The intercept term does not enter into this
hypothesis, interest centers on the variation of Y around its mean and the level of the series is
usually no specific relevance.
β⏟1∧β 2
H 0 : β⏟2=0⏟
vi] here β vector is partition into two sub vectors ⏟ containing respectively
k1 and k2(k-k1) elements. This sets up the hypothesis that a specified subset of the regressor
plays no role in the determination of Y.
vii] H 0 : β 2+ β 3=1 , β 4 + β 6=0 , β 5 + β 6=0 we may test several linear restrictions.
Supravat Bagli/MLRM
12
[]
0
v] R= [0 Ik-1] where 0 is a vector of k-1 zeros r= ⋮ and q=k-1
0 k −1 × 1
[]
0
vi] R= [0 k × k I k ]
2 1 2
r= ⋮ and q=k 2
0 k2 × 1
[ ] []
011000 … 0 1
vii] R= 000101 … 0 r= 0 and q=3
000011 … 0 0 3× 1
The general test may then be specialized to deal with any specific application. Given the OLS
estimator as ^β=( X ¿ X )−1 X ¿ Y =β+ ( X ¿ X )−1 X ¿ U , an obvious step is to compute the vector (R ^β
- r). This vector measures the discrepancy between expectation and observation. If this vector
is, in some sense, “large,” it casts doubt on the null hypothesis, and conversely, if it is “small”
it tends not to contradict the null. As in all conventional testing procedures, the distinction
between large and small is determined from the relevant sampling distribution under the null,
in this case, the distribution of R ^β with hull hypothesis Rβ=r .
E ¿) = R E ¿)= Rβ --[2]
We have assumed that U is a spherical disturbance. Now we assume that the Ui are
normally distributed. It says that U~N (0, σ 2 I ¿ . Since ^β is a linear function of the U vector, ^β
follows normal distribution. Further R ^β is a linear function of ^β , so we say that R ^β N ¿) it
implies R ^β−Rβ N ¿)
Under null hypothesis Rβ=r
So under null R ^β−r N ¿)
With this formulation we can say
¿ --[4]
[ χ 2q is the sum of square of q standard normal variate]
The distribution in [4] is derived based on the sampling distribution of ^β . The only problem
hindering practical application of Eq. (4) is the presence of the unknownσ 2. However,
Supravat Bagli/MLRM
13
¿
e e 2
2
χ n−k -- [5] which is independent of ^β
σ
Thus the ratio between [4] and [5] gives a suitable statistic where σ 2 is absent.
i.e. ¿ ¿
Finally if we divide the numerator and denominator by their respective degrees of freedom
we get
¿¿ [6]
Or, ¿ ¿ --[6.1]
¿
ee
∧varcov ( β^ ) =s ¿
2 2
Where s =
n−k
Suppose, Cij denote i,j th element in ¿ then s2 C jj=Var ( β^j) and s2 C jt =cov ( β^j ^
β t ) j,t =1,2..k
Let us now consider the hypotheses one by one
i] H 0 : β j=0 under this null hypothesis (R ^β−r ) picks out ^
β j and R ¿picks out the jth
diagonal element in ¿ this equation [6] becomes
β^j
2
2
F (1 , n−k) Now taking square root of F (1 ,n−k ) we get
s C jj
β^j ^
βj
= t n−k statistic -[s1]
s √c jj se . ^
βj
Thus the null hypothesis that Xj has no influence on Y is tested by dividing the estimated
value of the coefficient by its s.e. which follows t distribution with d.f. n-k. if the calculated
value greater than the tabulated value with a specific level of significance we reject the null
hypothesis.
Similarly for the following hypotheses we can test
^
β j−β j 0
ii] H 0 : β j=β j 0 by t n−k statistic [s2]
se . β^
j
Confidence interval of β j
Instead of testing specific hypothesis about β j we may compute 95% level of confidence
interval for β j . Because of random sampling error, it is impossible to learn the exact value of
the true coefficient parameter β j using only the information in a sample. However, it is
possible to use data from random sample to construct the range that contains the true
Supravat Bagli/MLRM
14
population parameter β j with a certain pre-specified probability (say 95%). The range is
called confidence interval and the specified probability is known as confidence level.
For constructing confidence interval we require to test all possible values of β j as null
hypothesis which is almost impractical. Fortunately there is a much easier approach. In
respect of the t statistic in the hypothesis H 0 : β j=β j 0, the trial value β j 0 of β j is rejected at 5%
level of significance if |t n−k|>1.96 for n-k>120. Otherwise, we cannot reject the null at 5%
level of significance. The null would not be rejected if
−1.96 ≤ t n−k ≤ 1.96
β^j−β j 0
−1.96 ≤ ≤ 1.96
se . ^
β j
Or, ^
β j −1.96 se . β^j ≤ β j 0 ≤ ^
β j +1.96 se . ^
βj
Thus the set of values of β j that are not rejected at 5% level of significance consists of the
values within ^
β j ±1.96 se . ^
β j . Thus 95% confidence interval for β j is ^
β j ± t .025 se . β^j.
Similarly 99% confidence interval for β j is ^
β j ± 2.58 se . ^
β j for n-k>120
However this discussion so far has focused two sided confidence interval. We could instead
construct a one sided confidence interval as the set of values of β j that cannot be rejected by a
one sided hypothesis test.
^
β2 + β^3−1
iii] H 0 : β 2+ β 3=1 by statistic [s3]
√ var ¿ ¿ ¿
Alternatively 95% confidence interval for β 2+ β3 is ¿.
^
β 2− ^
β3
iv] H 0 : β 2=β 3∨β 2−β 3=0. by statistic [s4]
√ var ¿ ¿ ¿
95% confidence interval for β 2−β 3 is ¿.
Let us consider the case (v) H 0 : β 2=β 3=β 4=…=β k =0∨H 0 :α =0 . Here we have q=k-1
hypotheses and R= [0 Ik-1] where 0 is a vector of k-1 zeros. Now R ¿ picks out the square sub
matrix of order (k-1) in the bottom right hand corner in ¿.
¿ i
[ ]
To evaluate the sub matrix, partition the matrix X=[i X2] thus X = X where order of X2 is
2
n × k−1
Supravat Bagli/MLRM
15
[ ]
¿ ¿
¿ ii i X2
X X= ¿ ¿
X2 i X2 X 2
R¿
Where A is a symmetric and idempotent matrix and AX 2 give the deviation form of the
explanatory variables in our k variable model.
ESS ESS
¿ ¿
¿¿ = ^
α (x x) ^
α k−1 k−1 ~F(k-1,n-k)
= ¿ =
2
s q e e /n−k RSS /n−k
2
ESS /TSS R
k −1 k−1
Or, ~F(k-1,n-k) or 2 ~F(k-1,n-k) [s5]
RSS /TSS (1−R )
n−k n−k
This test essentially asks whether the mean squares due to regression is significantly larger
than the residual mean square.
H 0 : β⏟2=0⏟
Next we consider the hypotheses under [vi] This hypothesis postulates that a
subset of regressor coefficients is a zero vector, in contrast with the previous example, where
all regressor coefficients were hypothesized to be zero. Partition the regression equation as
follows:
Supravat Bagli/MLRM
16
()
^1
β⏟
Y = ( X 1 X 2 ) ^ +e=X 1 ^
β⏟1 + X 2 ^
β⏟2 +e
β⏟2
^
β⏟1∧ β^2
Where X1 has k1 column including the intercept column, and X2 has k2 column and ⏟
are the corresponding sub vectors of the coefficient OLS estimator.
Now R ¿ picks out the square sub matrix of order (k2) in the bottom right hand corner in ¿.
Here R= [0 k × k I k , r=0 and q=k 2
2 1 2
k2
To understand the meaning of the numerator consider the partition regression
^
Y = X1 ⏟ ^ +e
β1 + X 2 β⏟2
Premultiplying M 2 we get
M 2 Y =M 2 X 1 ^ ^ +M e
β⏟1 + M 2 X 2 β⏟2 2
¿ , M 2 Y =M 2 X 2 ^
β⏟2 +e
I −X 1 ( X 1 ¿¿ ¿ X 1 ¿ X 1 )Y =M 2 X 2 ^
−1 ¿
β⏟2 +e ¿
Or, (
Or, Y−X 1 ¿ ¿
−X 1 ^ ^
β⏟1=M 2 X 2 ⏟
β2 + e
Or, Y
Supravat Bagli/MLRM
17
e r=M 2 X 2 ^
β⏟2 +e
Or,
The term on the left of this equation is the RSS when Y is regressed just on X1. The last term,
e'e, is the RSS when Y is regressed on [X1 X2]. Thus the middle term measures the increment
in ESS (or equivalently, the reduction in RSS) when X2 is added to the set of regressors. In
^
β⏟
¿ ¿ ^
other words, 2 ( X 2 M ¿ ¿ 2 X ¿ ¿ 2) ⏟
β2 ¿ ¿
is the difference between restricted RSS and
unrestricted RSS. The hypothesis may thus be tested by running two separate regressions.
First regress Y on X1(submatrix of X) and denote the RSS by RSSr then run the regression on
all the Xs, obtaining the RSS, denoted as usual by RSSu. From Eq. (6) the test statistic is
RSSr −RSSu
k2
~ F(k2, n-k)
R SSu
n−k
Dividing the numerator and the denominator by TSS we get
2 2
Ru−R r
k2
~ F(k2, n-k) [s6]
(1−R2u )
n−k
Where R2u∧R 2r indicate the coefficient of determination for the unrestricted regression and
the coefficient of determination for the restricted model. Finally we compare the calculated
value and tabulated value of F and if find the calculated value is greater that the tabulated
value we reject the null hypothesis.
Note that hypothesis testing in [v] is a special case of [vi] when X 1 =the intercept dummy and
X2 includes all explanatory variables. Now if we regress Y on the intercept term TSS=RSS i.e
RSSr=TSS and R2r =0 . Putting R2r =0 in [s6] we get [s5]. Thus [s6] statistic is a general
statistic which can be used for testing all the hypotheses stated above and all other linear
restrictions.
Supravat Bagli/MLRM
18
Putting restrictions β 2=1−β 3 , β 4 =−β 6, and β 5=−β 6 in the unrestricted model we get
Y i=β 1 +(1−β ¿¿ 3) X 2 i + β 3 X 3 i−β 6 X 4 i−β 6 X 5 i + β 6 X 6 i+ … … ..+ β k X ki +U i ¿
Or, Y i− X 2 i=β 1+ β 3 ( X ¿ ¿ 3i−X 2 i)+ β 6 ( X 6 i−X 4 i−X 5 i )+ … … ..+ β k X ki +U i ¿ restricted model
Now estimating the unrestricted and restricted models compute R2u∧R 2r .
2 2
Ru−R r
3
2 ~ F(3, n-k) [s7]
(1−Ru )
n−k
Note that q may be calculated in several equivalent ways: a. the number of rows in R matrix.
b. number of elements in r vector c. the difference between the number of slope coefficients
in unrestricted and the restricted model. The difference between the degrees of freedom
attaching to RSS in restricted and unrestricted model.
Simple correlation between two variables X 1 and X2 is the degree of linear association
between the variables. The measure of simple correlation coefficient:
cov (X 1 , X 2)
r 12= . It can be derived without making any reference to the structure
√ var ( X ¿¿ 1) var (X 2 )¿
of causal dependence i.e. regression specification.
Multiple Correlation coefficient is the relation between a explained variable Y and the
explanatory variables (X2,…Xk). Square of this Multiple Correlation coefficient is denoted by
2 ESS
RY .23 … k = ; it is interpreted as the proportion of the sample variation in Y that is explained
TSS
by the OLS regression line. It is known as the Coefficient of Determination for the
regression. It is equal to the squared correlation coefficient between the actual and fitted
2 ESS 2
values of Y i.e. RY .23 … k = =r Y ^Y
TSS
[cov ( Y Y^ ) ]
2
Supravat Bagli/MLRM
19
Partial Correlation coefficient between explained variable Y and an explanatory variable, say
Xj, keeping all other X’s unchanged. The square of the partial correlation coefficient is
2
2
r r
2
denoted by r Yj .23 … j−1 , j +1 …k or by j . It is actually the e e where e1 denotes the residual where
1 2
Y is regressed on all Xs except X j and e2 denotes the residual when Xj is regressed on all
other Xs.
To illustrate the relation between partial and simple correlation coefficient consider
Y = β1 + β 2 X 2+ β3 X 3 +U
In order to find partial correlation between Y and X 2 we need to calculate simple correlation
between e1 and e2 where e1 is the residual when we regress Y on X 3 and e2 is the residual
when we regress X2 on X3. Or y= a^ x3 + e1 and x 2=b^ x 3 +e 2
After applying OLS in these deviation form equation we get
sy s2
a^ =r y 3 ^
∧ b=r 23
s3 s3
And Var(y) =Var(a^ x 3 ¿+var (e 1) and Var(x2) =Var(b^ x 3 ¿+var (e 2)
[ ]
2
sy 2 2
var ( e1 ) =s 2y − r y 3 2
s3 =s y [1−r y 3 ]
s3
2 2
Similarly var ( e2 ) =s 2 [1−r 2 3 ]
n
1
[ ]
Now Cov(e 1 e2 ¿=cov ( y−a^ x 3 ) , ( x 2−b^ x 3 ) = ∑ ( y x 2−¿ a^ x3 x 2−b^ y x3 + a^ x 3 b^ x 3)¿
n i=1
sy s2 sy s2
=cov ( y x 2) −r y 3 cov ( x3 x 2 )−r 23 cov ( y x 3 ) +r y3 r 23 var ( x 3 )
s3 s3 s3 s3
s y s 2 (r y 2−r y 3 r 23−r y 3 r 23 +r y3 r 23)= s y s 2 (r y 2−r y 3 r 23 )
2 2 2 s y s 2 (r y 2−r y 3 r 23 )
r y2.3 =r 2=r e e =
√
1 2
2 2 2
r YX j , RY .1 ,2 ,. .. ,k ¿ r j :
Relationship among
In order to illustrate the relation consider k=3 and j=3
Thus the MLRM becomes Y = β1 + β 2 X 2+ β3 X 3 +U
Supravat Bagli/MLRM
20
The main objective of the single-equation analysis is to explain the variation in Y. the
multiple regression Y on X2 and X3 give the total ESS can be written as R2Y .23 TSS. Now if we
first run the regression Y on X 2 then ESS can be written as R2Y .2 TSS∨r 2Y 2TSS and RSSr=
(1−r ¿ ¿ Y 2 )TSS ¿
2
Now if we add X3 and want to know the additional contribution of X 3 to explain the
remaining RSS i.e. RSSr we regress e1 on e3; where e1 is the residual when we regress Y on
X2 and e3 is the residual when we regress X 3 on X2 . ESS in this stage can be written as
2 2 2 2
r e e RSSr=r Y 3.2 RSSr =r Y 3.2 (1−r Y 2)TSS. Thus total ESS after inclusion of X2 and X3 can be
1 2
written as
2 2 2 2
r Y 2 TSS +r Y 3.2 (1−r Y 2)TSS which again can be written as RY .23 TSS.
Corollary: if r 23=0 i.e. X2 and X3 are not correlated then the variables are said to be
orthogonal. In this case
2
2 ry2
r =
Y 2.3 2
[1−r y 3 ]
Therefore, R2Y .23=r 2Y 3 +r 2Y 2
2
2 F t
We can show that r j= = where F and t are the values the test statistic for
F+ df t 2 +df
testing H 0 : β j=0 i.e for testing the partial influence of X j on Y and df denotes the degrees
of freedom in the regression.
Consider H 0 : β j=0 ; under H0 we can derive RSSr. Now we add X j in regression, it will
explain
r 2j portion of previously unexplained variation in Y; So, unexplained part reduces by
2
r j RSSr
Supravat Bagli/MLRM
21
2
RSSu =RSSr−r j RSSr =
(1−r j ) RSSr
2
2
RSSr −RSSu RSS r−(1−r j ) RSS r
2
2 1 1 r j (n−k )
F(1 ,n−k )=t j = = 2
= 2
R SS u (1−r j ) RSSr 1−r j
n−k n−k
2 t 2j t 2j
r j= 2 =
t j +(n−k ) t 2j +df
or,; ;
2
Degrees of Freedom and R̄
2 2
RSS r−RSSu (n−k +q)s r−(n−k )s u
q q
F(q , n−k)= = 2
R SSu su
n−k
2 (n−1) s2U
R̄ =1−
We know that ∑ y 2i ; so
R̄2 ∧s2U are inversely related
2 2
2 s r c+t j
If
H 0 : β j =0 , then
F=t j ; s 2 = 1+ c
u
Supravat Bagli/MLRM
22
RSSr RSSu
It implies ≥≤ or s2r ≥≤ s2u
n−k +1 n−k
If s2r ≤ s2u
2
c+ t j
≤1
1+c
|t |≤1
Again s2r ≤ s2u
s2r s 2u
Implies ≤
TSS/(n−1) TSS /(n−1)
2 2
sr su
1- ≥ 1−
TSS/(n−1) TSS/(n−1)
Therefore R2r ≥ R 2u
So, for any regressor |tj| < 1, we drop that regressor and if Xj is dropped from the regression
If for any regression F is significant but most of the tj’s are insignificant, suspect the presence
of Multicollinearity;
Supravat Bagli/MLRM