MLRMSB2
MLRMSB2
The multivariate relation gives rise to a richer array of inference questions and reduces the
chance of omitted variable bias in estimation than the two-variable equation. The specification
of k variable model:
This equation identifies k- 1 explanatory variables (regressors), namely, X2, X3, . . . , Xk, that
are thought to influence the dependent variable (regressand) Y. subscript ‘i’ indicates the ith
population member(observation).
U is stochastic disturbance, captures the randomness of the relationship between the regressand
and the regressor. It contains the unobserved factors affecting Y.
𝑌 = 𝑋𝛽 + 𝑈 (1a)
1 𝑋21 ⋯ 𝑋𝑘1 𝑌1 𝛽1 𝑈1
Where, X=[ ⋮ ⋱ ⋮ ] Y=[ ⋮ ] 𝛽 = [ ⋮ ] and U =[ ⋮ ]
1 𝑋2𝑛 ⋯ 𝑋𝑘𝑛 𝑌𝑛 𝛽𝑘 𝑈𝑛
Assumptions
The relationship in [1] is linear in the coefficient parameters (𝛽𝑗 ), however Y and Xs may be
various transformations of the underlying variables of interest. This assumption simply defines
the multiple linear regression model.
A2 No Perfect Collinearity
There are no exact linear relationship among the independent variables. It means that there is
no exact multicollinearity problem. Technically this assumption says that X matric has full
column rank i.e. 𝜌(𝑋) = 𝑘 this assumption is known as the identification condition. The linear
independence of the columns of X is required for unique determination of the estimates of 𝛽𝑗
Supravat Bagli/MLRM
2
This assumption states that no observations on X convey information about the expected value
𝑈
𝐸 [ 1⁄𝑋] 0
of the disturbance. Which we write as 𝐸[𝑈⁄𝑋] = 0 𝑖. 𝑒. ⋮ =[ ⋮ ]
𝑈𝑛⁄ 0
[𝐸 [ 𝑋 ]]
The zero conditional mean implies that unconditional mean is also zero, since
𝑈
𝐸[𝑈𝑖 ] = 𝐸𝑥 [𝐸 [ 𝑖⁄𝑋]] = 𝐸𝑥 [0] = 0 for all i
𝑈
Since for each 𝑈𝑖 cov(𝐸 [ 𝑖⁄𝑋] , 𝑋) = 𝑐𝑜𝑣(𝑈𝑖 , 𝑋) = 0 𝑓𝑜𝑟 𝑎𝑙𝑙 𝑖
𝐸[𝑌⁄𝑋] = 𝑋𝛽 [1b]
Assumption A1 and A3 comprises the linear regression model. The regression Y on X is the
condition mean of Y given X. So assumption A3 makes 𝑋𝛽 conditional mean function.
Secondly assumption A3 indicates that unobserved factors in U and the explanatory variables
are uncorrelated. It means that explanatory variables are exogenous. Violation of this
assumption creates the problem of misspecification of the model and the problem of
endogeneity.
A4 Spherical Disturbances
This assumption states that disturbances are homoscedastic and non-autocorrelated. It can be
written as
/
𝐸 [𝑈𝑈 ⁄𝑋] = 𝜎 2 𝐼
This assumption has two parts; (a) variance of Ui is constant for all i
And (b) covariance of Ui and Uj is zero for all i and j. Violation of part (a) say the problem of
heteroscedasticity and violation of part (b) creates the problem of autocorrelation. When the
disturbance term satisfies assumption A4, it is called spherical disturbance.
Supravat Bagli/MLRM
3
assumption that the ultimate source of the data in X is statistically and economically unrelated
to the source of U. In other words (𝑌𝑖 , 𝑋2𝑖 , 𝑋3𝑖 … … . . 𝑋𝑘𝑖 ) , i=1,2,…n are independently and
identically distributed. This assumption holds automatically if the data are collected by simple
random sampling.
The primary objective of the estimation of a MLRM is to estimate the conditional mean of Y
given X with some confidence interval. For this purpose we need to estimate k coefficient
parameters 𝛽𝑠 and conditional variance of U i.e. 𝜎 2 . So we estimate k+1 parameters in model
[1a]
If the unknown vector 𝛽 in [1a] is replaced by some estimate 𝛽̂ we can write a residual vector
𝑒 = 𝑌 − 𝑋𝛽̂
Or
𝑌 = 𝑋𝛽̂ + 𝑒 --[2] sample model
The OLS principle is to choose 𝛽̂ to minimise the residual sum of squares i.e. minimise
We know that sum of squares deviations from mean is minimum. This minimising the sum of
the squares of the residuals we actually find out the conditional mean of Y.
=𝑌 / 𝑌 − 𝑌 / 𝑋𝛽̂ − 𝛽̂ / 𝑋 / 𝑌 + 𝛽̂ / 𝑋 / 𝑋𝛽̂
= 𝑌 / 𝑌 − 2𝛽̂ / 𝑋 / 𝑌 + 𝛽̂ / 𝑋 / 𝑋𝛽̂
[Since 𝑌 / 𝑋𝛽̂ 𝑎𝑛𝑑 𝛽̂ / 𝑋 / 𝑌 are scalar and one is the transpose of the other, hence both are
equal.]
Supravat Bagli/MLRM
4
𝜕𝑅𝑆𝑆
̂ = −2𝑋 / 𝑌 + 2𝑋 / 𝑋𝛽̂ =0 -- [4] are the k normal equations
𝜕𝛽
−1
Or, 𝛽̂ = (𝑋 / 𝑋) 𝑋 / 𝑌 --- [5] this is the OLS estimates of 𝛽 vector. This expression shows
that how the OLS estimate of 𝛽 is related to the data.
𝜕2 𝑅𝑆𝑆
SOCs requires ̂2 = 2𝑋 / 𝑋 > 0 is satisfied as 2𝑋 / 𝑋 is positive definite matrix by
𝜕𝛽
assumption A2.
−1 −1
𝛽̂ = (𝑋 / 𝑋) 𝑋 / 𝑋𝛽̂ + (𝑋 / 𝑋) 𝑋 / 𝑒
−1
𝑜𝑟 0 = (𝑋 / 𝑋) 𝑋 / 𝑒
−1
Or 𝑋 / 𝑒 = 0 -- [6] as (𝑋 / 𝑋) ≠0
Thus the residuals have zero mean, and the regression plane passes through the point of
means in k dimensional space. The remaining elements in Eq. (6) are of the form
∑𝑛𝑖 𝑋𝑗𝑖 𝑒𝑖 =0 --[6b] j=1,2,…k, and n (sample size)
This condition means that each regressor has zero sample correlation with the residuals.
1 1
[ sample covariance of Xj and e = cov(Xj,e) = 𝑛 ∑𝑛𝑖=1(𝑋𝑗𝑖 − 𝑋̅𝑗 )𝑒𝑖 = 𝑛 ∑𝑛𝑖=1(𝑋𝑗𝑖 ) 𝑒𝑖 −
This in turn implies that 𝑌̂( = X𝛽̂ ), the vector of regression values for Y, is uncorrelated with
e, for
𝑌̂ / e = 𝑋𝛽̂ / e = 𝛽̂ / 𝑋 / e = 0 -- [7]
Further we have 𝑌 = 𝑌̂ + 𝑒
Implies 𝑌̅ = 𝑌̅̂ + 𝑒̅
Supravat Bagli/MLRM
5
The zero covariance between the regressors and the residual underlie the decomposition of
the sum of squares. Decomposing the Y vector into the part explained by the regression and
the unexplained part,
𝑌 = 𝑌̂ + 𝑒 = 𝑋𝛽̂ + 𝑒
it follows that
𝑌 / 𝑌 = (𝑋𝛽̂ + 𝑒)/ (𝑋𝛽̂ + 𝑒) = 𝛽̂ / 𝑋 / 𝑋𝛽̂ + 𝑒 / 𝑒
However, 𝑌 / 𝑌 is the sum of squares of the actual Y values. We are actually interested to analyse
the variation in Y measured by the sum of squared deviation from the sample i.e.
𝒏
̅)2 = 𝑌 / 𝑌 − 𝑛𝑌̅ 2
∑(Y𝑖 − Y
𝒊=𝟏
Thus, subtracting 𝑛𝑌̅ 2 from each side of the previous decomposition gives a revised
decomposition,
𝑌 / 𝑌 − 𝑛𝑌̅ 2 = 𝛽̂ / 𝑋 / 𝑋𝛽̂ − 𝑛𝑌̅ 2 + 𝑒 / 𝑒
Or 𝑌 / 𝑌 − 𝑛𝑌̅ 2 = 𝑌̂ / 𝑌̂ − 𝑛𝑌̅̂ 2 + 𝑒 / 𝑒 since 𝑌̅ = 𝑌̅̂
where TSS indicates the total sum of squares in Y, and ESS and RSS the explained and residual
(unexplained) sum of squares.
An alternative approach is to begin by expressing all the data in the form of deviations from
the sample means. We have sample regression function for ith observation
𝑌𝑖 = 𝛽̂1 + 𝛽̂2 𝑋2𝑖 + ⋯ + 𝛽̂𝑘 𝑋𝑘𝑖 + 𝑒𝑖 [from equation 2]
And from [6a]
𝑌̅ = 𝛽̂1 + 𝛽̂2 𝑋̅2 + ⋯ + 𝛽̂𝑘 𝑋̅𝑘
Supravat Bagli/MLRM
6
This is called the deviation form of the k variable LRM where lowercase letters denote
deviations from sample means. The intercept 𝛽̂1disappears from the deviation form of the
equation, but it may be recovered from [6a].
𝛽̂1 = 𝑌̅ − 𝛽̂2 𝑋̅2 − ⋯ − 𝛽̂𝑘 𝑋̅𝑘 --[10a]
The least-squares slope coefficients are identical in both forms of the regression equation, [2]
and [10] as are the residuals.
Collecting all n observations, the deviation form of the equation may be written compactly
using a transformation matrix,
1
𝐴 = 𝐼𝑛 − (𝑛) 𝑖𝑖 / [11]
If we premultiply A in the matrix form of [2] we get the deviation form of the in matrix notation
as follows.
𝐴𝑌 = 𝐴𝑋𝛽̂ + 𝐴𝑒
Or
̂
𝐴𝑌 = 𝐴[𝑖 𝑋2 ] [𝛽1 ] + 𝐴𝑒
𝛼̂
Or 𝑦 = 𝑥𝛼̂ + 𝑒 --[12]
Here X2 is a matrix of 𝑛 × (𝑘 − 1) 𝑎𝑛𝑑 𝛼̂ is the column vector of the slope estimates of order
(𝑘 − 1) ×1. Premultiplication of A with a vector of n observations, transforms that vector into
deviation form. It follows that Ae = e and Ai = 0. Therefore,
Now premultiplying 𝑥 / in [12] we get
𝑥 / 𝑦 = 𝑥 / 𝑥𝛼̂ + 𝑥 / 𝑒
This the set of (k-1) normal equation
−1
Or 𝛼̂ = (𝑥 / 𝑥) 𝑥 / 𝑦 --[12a]
From 12 we can write 𝑦 / 𝑦 = 𝛼̂ / 𝑥 / 𝑥𝛼̂ + 𝑒 / 𝑒
Or , TSS=ESS+RSS
Supravat Bagli/MLRM
7
Implies
−1 −1
𝑒 = 𝑌 − 𝑋(𝑋 / 𝑋) 𝑋 / 𝑌= [𝐼 − 𝑋(𝑋 / 𝑋) 𝑋 / ]𝑌 = 𝑀𝑌 where M is a symmetric and idempotent
matrix.
−1 −1 −1
=𝜎 2 tr[𝐼𝑛 − 𝑋(𝑋 / 𝑋) 𝑋 / ]= 𝜎 2 [𝑛 − 𝑡𝑟[[𝑋(𝑋 / 𝑋) 𝑋 / ]= 𝜎 2 [𝑛 − 𝑡𝑟[(𝑋 / 𝑋) 𝑋 / 𝑋]
= 𝜎 2 [𝑛 − 𝑡𝑟[𝐼𝑘 ]= = 𝜎 2 [𝑛 − 𝑘]
[trace implies the sum of the diagonal elements of a square matrix. trace (AB)=trace(BA) and
tr(ABC)=tr(BCA) where ABC are not necessarily square matrix.]
𝑒 /𝑒
Or, 𝐸(𝑛−𝑘) = 𝜎 2
𝑒 /𝑒
Therefore, 𝑠 2 = 𝑛−𝑘 --[13]
is the unbiased estimator of 𝜎 2 . The square root of s2 is the standard deviation of Y values
about the regression plane. It is often referred to as the standard error of the regression (SER).
The SER is used as a measure of the fit of the regression. The divisor n-k (rather than n) adjust
for the downward bias introduced by estimating k-1 slope coefficients and one intercept
parameter. When n is large, the effect of the degrees of freedom adjustment is negligible.
Supravat Bagli/MLRM
8
SER is an absolute measure of the goodness of fit. It depends on the unit of Y. SER measure
the spread of the observations around the regression line. So higher the value of SER lower
would be the goodness of fit and vice versa. In other words, large spread means that prediction
of Y made using the selected X variables will often be wrong by a large amount.
Other measures of the Goodness of Fit
The goodness of fit of a linear regression model measure how well the estimated model fits a
given set of data or how well it can explain the population. It is however, difficult to come up
with a perfect measure of the goodness of fit for econometric model. A regression model fit
well if the dependent variable is explained more by the regressor than by the residual. The
coefficient determination R2 is defined as the squared of multiple correlation coefficient is a
common measure of the goodness of fit of a regression model.
𝐸𝑆𝑆 𝑅𝑆𝑆
𝑅 2 = 𝑇𝑆𝑆 = 1 − 𝑇𝑆𝑆 --[14]
Thus R2 measures the proportion of the total variation in Y explained by the linear combination
of the regressors. It is the square of the simple correlation coefficient of Y and 𝑌̂ . a high value
of R2 indicates that we can predict individual outcomes on Y with much accuracy on the basis
of the estimated model.
TSS=RSS when the best fitted regression has no regressor, only intercept. If we add regressor
into the model RSS will reduce i.e. TSS≥RSS.
In one extreme R2 =0, the regression line is horizontal implying no change in Y with the change
in X. in other words X has no explanatory power.
In the other extreme, R2 =1 indicating all the data points lie on the same hyperplane of the fitted
regression and RSS=0. Thus in general 0≤ R2≤1
R2 is used as a measure of the goodness of fit, but it is difficult to say how large does R2 need
to be considered as a good. The value of R2 never decrease with addition of explanatory
variables. If added explanatory variable is totally irrelevant the ESS simply remains unchanged
due to the addition of this X. this is the basic limitation of the use of R2 as an indicator of the
goodness of fit.
Secondly, R2 is sensitive to extreme values, so R2 is not robust
Thirdly, R2 may be negative or greater than one if the intercept term is not included. So it may
not be a good measure of the fitness of the regression.
Supravat Bagli/MLRM
9
𝑌 = 𝑋𝛽̂ + 𝑒 (2)
theoretical foundation we did not formulate a linear regression model without intercept term.
Supravat Bagli/MLRM
10
(𝑛−1)𝑅𝑆𝑆 (𝑛−1)
𝑅̅ 2 = 1 − (𝑛−𝑘)𝑇𝑆𝑆 = 1 − (𝑛−𝑘) (1 − 𝑅 2 ) --[16]
𝑅̅ 2 = 𝑅 2 when k=1, it means that when the regression is formulated only with an intercept and
no explanatory variable or one explanatory variable without intercept (which is very rare). In
MLRM when k increases (n-1)/(n-k) increases and (1-𝑅 2 ) falls. The ratio (n-1)/(n-k) is called
the penalty of using more regressor in a model and 𝑅 2 is the benefit of the addition of regressor.
Whether addition of the regressors improve the explanatory power of the model depends on
the trade-off between 𝑅 2 and the penalty (n-1)/(n-k). Therefore, adjusted 𝑅 2 may not increase
with the number of explanatory variables. If the contribution of the additional regressor to the
estimated model is more than the loss of the degrees of freedom 𝑅̅ 2 will rise with the rise in
the number of regressor otherwise it will decline if the additional explanatory variable has no
explanatory power.
So, clearly 𝑅̅ 2 ≤ 𝑅 2
(𝑘−1)
It is noted that 𝑅̅ 2 may be negative when 𝑅 2 < (𝑛−1)
We have estimated the regression coefficients 𝛽 and examined the properties of the OLS
estimators. Let now see how to use these estimators to test various hypotheses about𝛽. Consider
the following examples of typical hypotheses about𝛽.
i] 𝐻0 : 𝛽𝑗 = 0 this hypothesis tells that regressor 𝑋𝑗 has no effect on Y. it is a very common test
often referred to as significance test.
ii] 𝐻0 : 𝛽𝑗 = 𝛽𝑗0 Here 𝛽𝑗0 is some specific value. If for example 𝛽𝑗 denote income elasticity,
one might wish to test 𝛽𝑗 = 1
Supravat Bagli/MLRM
11
respectively k1 and k2(k-k1) elements. This sets up the hypothesis that a specified subset of the
regressor plays no role in the determination of Y.
vii] 𝐻0 : 𝛽2 + 𝛽3 = 1, 𝛽4 + 𝛽6 = 0, 𝛽5 + 𝛽6 = 0 we may test several linear restrictions.
𝑯𝟎 : 𝑹𝜷 = 𝒓 or 𝑹𝜷 − 𝒓 = 𝟎 -- [1]
Supravat Bagli/MLRM
12
The general test may then be specialized to deal with any specific application. Given the OLS
−1 −1
estimator as 𝛽̂ = (𝑋 / 𝑋) 𝑋 / 𝑌 = 𝛽 + (𝑋 / 𝑋) 𝑋 / 𝑈, an obvious step is to compute the vector
(R𝛽̂ - r). This vector measures the discrepancy between expectation and observation. If this
vector is, in some sense, “large,” it casts doubt on the null hypothesis, and conversely, if it is
“small” it tends not to contradict the null. As in all conventional testing procedures, the
distinction between large and small is determined from the relevant sampling distribution under
the null, in this case, the distribution of R𝛽̂ with hull hypothesis𝑹𝜷 = 𝒓.
𝑽𝒂𝒓(𝑹𝛽̂ ) = 𝐸(R𝛽̂ − 𝑅𝛽)(R𝛽̂ − 𝑅𝛽)/ = 𝑅𝐸(𝛽̂ − 𝛽)(𝛽̂ − 𝛽)/ 𝑅 / = 𝜎 2 𝑅(𝑋 / 𝑋)−1 𝑅/ --[3]
We have assumed that U is a spherical disturbance. Now we assume that the Ui are normally
distributed. It says that U~N (0, 𝜎 2 𝐼). Since 𝛽̂ is a linear function of the U vector, 𝛽̂ follows
normal distribution. Further 𝑅𝛽̂ is a linear function of 𝛽̂ , so we say that
𝑅𝛽̂ ~𝑁(𝑅𝛽, 𝜎 2 𝑅(𝑋 / 𝑋)−1 𝑅 / ) it implies 𝑅𝛽̂ − 𝑅𝛽 ~ 𝑁(0, 𝜎 2 𝑅(𝑋 / 𝑋)−1 𝑅/ )
Under null hypothesis 𝑅𝛽 = 𝑟
So under null 𝑅𝛽̂ − 𝑟 ~ 𝑁(0, 𝜎 2 𝑅(𝑋 / 𝑋)−1 𝑅 / )
With this formulation we can say
(𝑅𝛽̂ − 𝑟)/ (𝜎 2 𝑅(𝑋 / 𝑋)−1 𝑅 / )−1 (𝑅𝛽̂ − 𝑟) ~ 𝜒𝑞2 --[4]
[𝜒𝑞2 is the sum of square of q standard normal variate]
The distribution in [4] is derived based on the sampling distribution of 𝛽̂ . The only problem
hindering practical application of Eq. (4) is the presence of the unknown 𝜎 2 . However,
𝑒 /𝑒 2
~ 𝜒𝑛−𝑘 -- [5] which is independent of 𝛽̂
𝜎2
Thus the ratio between [4] and [5] gives a suitable statistic where 𝜎 2 is absent.
̂ −𝑟)/ (𝜎2 𝑅(𝑋 / 𝑋)−1 𝑅 / )−1 (𝑅𝛽
(𝑅𝛽 ̂ −𝑟)
i.e. 𝑒/ 𝑒
𝜎2
Finally if we divide the numerator and denominator by their respective degrees of freedom we
get
̂ −𝑟)/ (𝑅(𝑋 / 𝑋)−1 𝑅 / )−1 (𝑅𝛽
(𝑅𝛽 ̂ −𝑟)/𝑞
𝑒/ 𝑒
~ 𝐹(𝑞, 𝑛 − 𝑘) [6]
𝑛−𝑘
Supravat Bagli/MLRM
13
𝑒 𝑒 /
Where 𝑠 2 = 𝑛−𝑘 𝑎𝑛𝑑 𝑣𝑎𝑟𝑐𝑜𝑣(𝛽̂ ) = 𝑠 2 (𝑋 / 𝑋)−1
Suppose, Cij denote i,j th element in (𝑋 / 𝑋)−1 then 𝑠 2 Cjj = Var(𝛽̂𝑗 ) and 𝑠 2 Cjt = cov(𝛽̂𝑗 𝛽̂𝑡 ) j,t
=1,2..k
Let us now consider the hypotheses one by one
i] 𝐻0 : 𝛽𝑗 = 0 under this null hypothesis (𝑅𝛽̂ − 𝑟) picks out 𝛽̂𝑗 and 𝑅(𝑋 / 𝑋)−1 𝑅/ picks out the
jth diagonal element in (𝑋 / 𝑋)−1 this equation [6] becomes
̂𝑗 2
𝛽
~𝐹(1, 𝑛 − 𝑘) Now taking square root of 𝐹(1, 𝑛 − 𝑘) we get
𝑠2 Cjj
̂𝑗
𝛽 ̂𝑗
𝛽
= 𝑠𝑒.𝛽̂ ~𝑡𝑛−𝑘 statistic -[s1]
𝑠√𝑐𝑗𝑗 𝑗
Thus the null hypothesis that Xj has no influence on Y is tested by dividing the estimated value
of the coefficient by its s.e. which follows t distribution with d.f. n-k. if the calculated value
greater than the tabulated value with a specific level of significance we reject the null
hypothesis.
Similarly for the following hypotheses we can test
̂𝑗 −𝛽𝑗0
𝛽
ii] 𝐻0 : 𝛽𝑗 = 𝛽𝑗0 by ̂𝑗 ~𝑡𝑛−𝑘 statistic [s2]
𝑠𝑒.𝛽
Confidence interval of 𝜷𝒋
Instead of testing specific hypothesis about 𝛽𝑗 we may compute 95% level of confidence
interval for𝛽𝑗 . Because of random sampling error, it is impossible to learn the exact value of
the true coefficient parameter 𝛽𝑗 using only the information in a sample. However, it is possible
to use data from random sample to construct the range that contains the true population
parameter 𝛽𝑗 with a certain pre-specified probability (say 95%). The range is called
confidence interval and the specified probability is known as confidence level.
For constructing confidence interval we require to test all possible values of 𝛽𝑗 as null
hypothesis which is almost impractical. Fortunately there is a much easier approach. In respect
of the t statistic in the hypothesis𝐻0 : 𝛽𝑗 = 𝛽𝑗0 , the trial value 𝛽𝑗0 of 𝛽𝑗 is rejected at 5% level of
significance if |𝑡𝑛−𝑘 | > 1.96 for n-k>120. Otherwise, we cannot reject the null at 5% level of
significance. The null would not be rejected if
−1.96 ≤ 𝑡𝑛−𝑘 ≤ 1.96
Supravat Bagli/MLRM
14
𝛽̂𝑗 − 𝛽𝑗0
−1.96 ≤ ≤ 1.96
𝑠𝑒. 𝛽̂𝑗
Or, 𝛽̂𝑗 − 1.96 𝑠𝑒. 𝛽̂𝑗 ≤ 𝛽𝑗0 ≤ 𝛽̂𝑗 + 1.96𝑠𝑒. 𝛽̂𝑗
Thus the set of values of 𝛽𝑗 that are not rejected at 5% level of significance consists of the
values within 𝛽̂𝑗 ± 1.96 𝑠𝑒. 𝛽̂𝑗 . Thus 95% confidence interval for 𝛽𝑗 is 𝛽̂𝑗 ± 𝑡.025 𝑠𝑒. 𝛽̂𝑗 .
Similarly 99% confidence interval for 𝛽𝑗 is𝛽̂𝑗 ± 2.58 𝑠𝑒. 𝛽̂𝑗 for n-k>120
However this discussion so far has focused two sided confidence interval. We could instead
construct a one sided confidence interval as the set of values of 𝛽𝑗 that cannot be rejected by a
one sided hypothesis test.
̂2 +𝛽
𝛽 ̂3 −1
iii] 𝐻0 : 𝛽2 + 𝛽3 = 1 by ~𝑡𝑛−𝑘 statistic [s3]
̂2 +𝛽̂
√𝑣𝑎𝑟(𝛽 3)
̂2 + 𝛽̂
Alternatively 95% confidence interval for 𝛽2 + 𝛽3 is (𝛽 ̂ ̂
3 ) ± 𝑡.025 √𝑣𝑎𝑟(𝛽2 + 𝛽3 ).
̂2 −𝛽
𝛽 ̂3
iv] 𝐻0 : 𝛽2 = 𝛽3 𝑜𝑟 𝛽2 − 𝛽3 = 0. by ~𝑡𝑛−𝑘 statistic [s4]
̂2 −𝛽̂
√𝑣𝑎𝑟(𝛽 3)
̂2 − 𝛽̂
95% confidence interval for 𝛽2 − 𝛽3 is (𝛽 ̂ ̂
3 ) ± 𝑡.025 √𝑣𝑎𝑟(𝛽2 − 𝛽3 ).
/ 𝑖/𝑖 𝑖 / X2
𝑋 𝑋=[ / ]
X2 𝑖 X2 / X2
−1
𝑛 𝑖 / X2 𝐵 𝐵12
/
Therefore (𝑋 𝑋) −1
=[ ] =[ 11 ](let)
/
X2 𝑖 /
X2 X2 𝐵21 𝐵22
Supravat Bagli/MLRM
15
𝐵11 𝐵12 0
𝑅(𝑋 / 𝑋)−1 𝑅 / = [0 I𝑘−1 ] [ ][ ] = 𝐵22
𝐵21 𝐵22 I𝑘−1
Where A is a symmetric and idempotent matrix and AX2 give the deviation form of the
explanatory variables in our k variable model.
𝐸𝑆𝑆 𝐸𝑆𝑆
̂ −𝑟)/ (𝑠2 𝑅(𝑋 / 𝑋)−1 𝑅 / )−1 (𝑅𝛽
(𝑅𝛽 ̂ −𝑟) ̂ / (𝑥 / 𝑥)𝛼
𝛼 ̂ 𝑘−1 𝑘−1
= = = ~F(k-1,n-k)
𝑞 𝑠2 𝑞 𝑒 / 𝑒/𝑛−𝑘 𝑅𝑆𝑆/𝑛−𝑘
𝐸𝑆𝑆/𝑇𝑆𝑆 𝑅2
𝑘−1 𝑘−1
Or, 𝑅𝑆𝑆/𝑇𝑆𝑆 ~F(k-1,n-k) or (1−𝑅2 )
~F(k-1,n-k) [s5]
𝑛−𝑘 𝑛−𝑘
This test essentially asks whether the mean squares due to regression is significantly larger than
the residual mean square.
subset of regressor coefficients is a zero vector, in contrast with the previous example, where
all regressor coefficients were hypothesized to be zero. Partition the regression equation as
follows:
̂
𝛽
⏟1
̂
𝑌 = (𝑋1 𝑋2 ) ( ) + 𝑒 = 𝑋1 𝛽 ̂
⏟1 + 𝑋2 𝛽
⏟2 + 𝑒
̂
𝛽
⏟2
̂
Where X1 has k1 column including the intercept column, and X2 has k2 column and 𝛽 ̂
⏟1 𝑎𝑛𝑑 𝛽
⏟2
Supravat Bagli/MLRM
16
−1
𝑋 /X 𝑋1 / X 2 𝐵 𝐵12
/
(𝑋 𝑋) −1
= [ 1/ 1 ] =[ 11 ](let)
X2 𝑋1 /
X2 X2 𝐵21 𝐵22
Now 𝑅(𝑋 / 𝑋)−1 𝑅 / picks out the square sub matrix of order (k2) in the bottom right hand corner
in (𝑋 / 𝑋)−1 . Here R= [0𝑘2 ×𝑘1 𝐼𝑘2 , r=0 and q=𝑘2
/
Therefore, like case [v] we find 𝑅(𝑋 / 𝑋)−1 𝑅 / = (𝑋2 𝑀2 𝑋2 )−1
where 𝑀2 = 𝐼 − 𝑋1 (𝑋1/ X1 )−1𝑋1 / is a symmetric and idempotent matrix and 𝑀2 𝑋1 =
0 𝑎𝑛𝑑 𝑀2 𝑒 = 𝑒
Further 𝑀2 𝑌 gives the vector of residuals when Y is regressed on 𝑋1. Then the numerator in
[6] is
̂/ (𝑋 / 𝑀 𝑋 ) 𝛽
𝛽⏟ ̂
2 2 2 2 ⏟2
𝑘2
To understand the meaning of the numerator consider the partition regression
̂
𝑌 = 𝑋1 𝛽 ̂
⏟1 + 𝑋2 𝛽
⏟2 + 𝑒
Premultiplying 𝑀2 we get
̂
𝑀2 𝑌 = 𝑀2 𝑋1 𝛽 ̂
⏟1 + 𝑀2 𝑋2 𝛽
⏟2 + 𝑀2 𝑒
̂
𝑂𝑟, 𝑀2 𝑌 = 𝑀2 𝑋2 𝛽
⏟2 + 𝑒
The term on the left of this equation is the RSS when Y is regressed just on X1. The last term,
e'e, is the RSS when Y is regressed on [X1 X2]. Thus the middle term measures the increment
in ESS (or equivalently, the reduction in RSS) when X2 is added to the set of regressors. In
/ /
̂
other words, 𝛽
⏟ ̂
2 (𝑋2 𝑀2 𝑋2 ) 𝛽
⏟2 is the difference between restricted RSS and unrestricted RSS.
The hypothesis may thus be tested by running two separate regressions. First regress Y on
X1(submatrix of X) and denote the RSS by RSSr then run the regression on all the Xs, obtaining
the RSS, denoted as usual by RSSu. From Eq. (6) the test statistic is
Supravat Bagli/MLRM
17
𝑅𝑆𝑆𝑟 −𝑅𝑆𝑆𝑢
𝑘2
𝑅𝑆𝑆𝑢 ~ F(k2, n-k)
𝑛−𝑘
Where 𝑅𝑢2 𝑎𝑛𝑑 𝑅𝑟2 indicate the coefficient of determination for the unrestricted regression and
the coefficient of determination for the restricted model. Finally we compare the calculated
value and tabulated value of F and if find the calculated value is greater that the tabulated value
we reject the null hypothesis.
Note that hypothesis testing in [v] is a special case of [vi] when X1 =the intercept dummy and
X2 includes all explanatory variables. Now if we regress Y on the intercept term TSS=RSS i.e
RSSr=TSS and 𝑅𝑟2 = 0 . Putting 𝑅𝑟2 = 0 in [s6] we get [s5]. Thus [s6] statistic is a general
statistic which can be used for testing all the hypotheses stated above and all other linear
restrictions.
Note that q may be calculated in several equivalent ways: a. the number of rows in R matrix.
b. number of elements in r vector c. the difference between the number of slope coefficients in
unrestricted and the restricted model. The difference between the degrees of freedom attaching
to RSS in restricted and unrestricted model.
Supravat Bagli/MLRM
18
Simple correlation between two variables X1 and X2 is the degree of linear association between
𝑐𝑜𝑣(𝑋1 ,𝑋2 )
the variables. The measure of simple correlation coefficient: 𝑟12 = . It can be
√𝑣𝑎𝑟(𝑋1 ) 𝑣𝑎𝑟(𝑋2 )
derived without making any reference to the structure of causal dependence i.e. regression
specification.
Multiple Correlation coefficient is the relation between a explained variable Y and the
explanatory variables (X2,…Xk). Square of this Multiple Correlation coefficient is denoted
2 𝐸𝑆𝑆
by𝑅𝑌.23…𝑘 = ; it is interpreted as the proportion of the sample variation in Y that is explained
𝑇𝑆𝑆
by the OLS regression line. It is known as the Coefficient of Determination for the regression.
It is equal to the squared correlation coefficient between the actual and fitted values of Y i.e.
2 𝐸𝑆𝑆 2
𝑅𝑌.23…𝑘 = 𝑇𝑆𝑆 = 𝑟𝑌𝑌
̂
2 [𝑐𝑜𝑣(𝑌𝑌̂)]2
To illustrate the definition 𝑟𝑌𝑌 ̂ ̂ ̂ ̂
̂ = 𝑣𝑎𝑟(𝑌)𝑉𝑎𝑟(𝑌̂) Now Cov(𝑌𝑌 ) = 𝐶𝑜𝑣 ((𝑌 + 𝑒)𝑌 ) = 𝑉𝑎𝑟(𝑌)
2 𝑉𝑎𝑟(𝑌̂)
𝑟𝑌𝑌
̂ = = 𝐸𝑆𝑆/𝑇𝑆𝑆
𝑉𝑎𝑟(𝑌)
Partial Correlation coefficient between explained variable Y and an explanatory variable, say
Xj, keeping all other X’s unchanged. The square of the partial correlation coefficient is denoted
2
by 𝑟𝑌𝑗.23…𝑗−1,𝑗+1…𝑘 or by r j2 . It is actually the 𝑟𝑒21 𝑒2 where e1 denotes the residual where Y is
regressed on all Xs except Xj and e2 denotes the residual when Xj is regressed on all other Xs.
To illustrate the relation between partial and simple correlation coefficient consider
𝑌 = 𝛽1 + 𝛽2 𝑋2 + 𝛽3 𝑋3 + 𝑈
In order to find partial correlation between Y and X2 we need to calculate simple correlation
between e1 and e2 where e1 is the residual when we regress Y on X3 and e2 is the residual when
we regress X2 on X3. Or 𝑦 = 𝑎̂𝑥3 + 𝑒1 and 𝑥2 = 𝑏̂𝑥3 + 𝑒2
After applying OLS in these deviation form equation we get
𝑠𝑦 𝑠2
𝑎̂ = 𝑟𝑦3 𝑎𝑛𝑑 𝑏̂ = 𝑟23
𝑠3 𝑠3
And Var(y) =Var(𝑎̂𝑥3 ) + 𝑣𝑎𝑟(𝑒1 ) and Var(x2) =Var(𝑏̂𝑥3 ) + 𝑣𝑎𝑟(𝑒2 )
𝑠𝑦 2
𝑣𝑎𝑟(𝑒1 ) = 𝑠𝑦2 − [𝑟𝑦3 𝑠 ] 𝑠32 = 𝑠𝑦2 [1 − 𝑟𝑦3
2
]
3
Supravat Bagli/MLRM
19
𝑠𝑦 𝑠 𝑠𝑦 𝑠
=𝑐𝑜𝑣(𝑦𝑥2 ) − 𝑟𝑦3 𝑠 𝑐𝑜𝑣(𝑥3 𝑥2 ) − 𝑟23 𝑠2 𝑐𝑜𝑣(𝑦𝑥3 ) + 𝑟𝑦3 𝑠 𝑟23 𝑠2 𝑣𝑎𝑟(𝑥3 )
3 3 3 3
𝑠𝑦 𝑠2 (𝑟𝑦2 − 𝑟𝑦3 𝑟23 − 𝑟𝑦3 𝑟23 + 𝑟𝑦3 𝑟23 )= 𝑠𝑦 𝑠2 (𝑟𝑦2 − 𝑟𝑦3 𝑟23 )
𝒔𝒚 𝒔𝟐 (𝒓𝒚𝟐 −𝒓𝒚𝟑 𝒓𝟐𝟑 ) (𝒓𝒚𝟐 −𝒓𝒚𝟑 𝒓𝟐𝟑 )
Therefore, 𝒓𝟐𝒚𝟐.𝟑 = 𝒓𝟐𝟐 = 𝒓𝟐𝒆𝟏 𝒆𝟐 = =
√(𝒔𝟐𝟐 [𝟏−𝒓𝟐𝟐𝟑 ])(𝒔𝟐𝒚 [𝟏−𝒓𝟐𝒚𝟑 ]) √[𝟏−𝒓𝟐𝟐𝟑 ][𝟏−𝒓𝟐𝒚𝟑 ]
2
Relationship among rYX , RY2.1,2,...,k & r j2 :
j
Corollary: if 𝑟23 = 0 i.e. X2 and X3 are not correlated then the variables are said to be
orthogonal. In this case
2
𝑟𝑦2 2
𝑟𝑌2.3 = 2
[1 − 𝑟𝑦3 ]
Therefore,
2 2 2
𝑅𝑌.23 = 𝑟𝑌3 + 𝑟𝑌2
Supravat Bagli/MLRM
20
𝑭 𝒕𝟐
We can show that 𝒓𝟐𝒋 = 𝑭+𝒅𝒇 = 𝒕𝟐 +𝒅𝒇 where F and t are the values the test statistic for testing
𝑯𝟎 : 𝜷𝒋 = 𝟎 i.e for testing the partial influence of 𝑿𝒋 on Y and df denotes the degrees of
freedom in the regression.
Consider 𝐻0 : 𝛽𝑗 = 0 ; under H0 we can derive RSSr. Now we add 𝑿𝒋 in regression, it will
rj2 RSSr
We know for 𝐻0 : 𝛽𝑗 = 0
t 2j t 2j
or,; r j2 2 ;
t j (n k ) t 2j df
RSSr > RSSu, and two different estimators of u2 can be derived from these two models as
Supravat Bagli/MLRM
21
(n 1) sU2
We know that R 2 1 ; so R 2 & sU2 are inversely related
2
yi
𝑠𝑟2 𝑐+𝑡𝑗2
If H 0 : j 0 , then F t 2j ; 2 =
𝑠𝑢 1+𝑐
If 𝑠𝑟2 ≤ 𝑠𝑢2
𝑐 + 𝑡𝑗2
≤1
1+𝑐
|𝑡| ≤ 1
Again 𝑠𝑟2 ≤ 𝑠𝑢2
𝑠𝑟2 2
𝑠𝑢
Implies ≤
𝑇𝑆𝑆/(𝑛−1) 𝑇𝑆𝑆/(𝑛−1)
𝑠2 𝑠2
1-𝑇𝑆𝑆/(𝑛−1)
𝑟 𝑢
≥ 1 − 𝑇𝑆𝑆/(𝑛−1)
If for any regression F is significant but most of the tj’s are insignificant, suspect the presence
of Multicollinearity;
Supravat Bagli/MLRM