Notes Simple Linear Regression Analysis
Notes Simple Linear Regression Analysis
Chapter 2
Simple Linear Regression Analysis
where y is termed as the dependent or study variable and X is termed as independent or explanatory
variable. The terms 0 and 1 are the parameters of the model. The parameter 0 is termed as intercept
term and the parameter 1 is termed as slope parameter. These parameters are usually called as regression
coefficients. The unobservable error component accounts for the failure of data to lie on the straight line
and represents the difference between the true and observed realization of y . There can be several reasons
for such difference, e.g., the effect of all deleted variables in the model, variables may be qualitative, inherit
randomness in the observations etc. We assume that is observed as independent and identically
distributed random variable with mean zero and constant variance 2 . Later, we will additionally assume
that is normally distributed.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
1
Var ( y | x) 2 .
When the values of 0 , 1 and 2 are known, the model is completely described. The parameters 0 , 1 and
2 are generally unknown in practice and is unobserved. The determination of the statistical model
y 0 1 X depends on the determination (i.e., estimation ) of 0 , 1 and 2 . In order to know the
values of these parameters, n pairs of observations ( xi , yi )(i 1,..., n) on ( X , y ) are observed/collected and
are used to determine these unknown parameters.
Various methods of estimation can be used to determine the estimates of the parameters. Among them, the
methods of least squares and maximum likelihood are the popular methods of estimation.
The principle of least squares estimates the parameters 0 and 1 by minimizing the sum of squares of
difference between the observations and the line in the scatter diagram. Such an idea is viewed from
different perspectives. When the vertical difference between the observations and the line in the scatter
diagram is considered and its sum of squares is minimized to obtain the estimates of 0 and 1 , the method
is known as direct regression. yi
(xi,
Y 0 1 X
(Xi,
xi
Direct regression
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
2
Alternatively, the sum of squares of difference between the observations and the line in horizontal direction
in the scatter diagram can be minimized to obtain the estimates of 0 and 1 . This is known as reverse (or
yi
Y 0 1 X
(xi, yi)
(Xi, Yi)
xi,
Instead of horizontal or vertical errors, if the sum of squares of perpendicular distances between the
observations and the line in the scatter diagram is minimized to obtain the estimates of 0 and 1 , the
yi
(xi
Y 0 1 X
(Xi
)
xi
Major axis regression method
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
3
Instead of minimizing the distance, the area can also be minimized. The reduced major axis regression
method minimizes the sum of the areas of rectangles defined between the observed data points and the
nearest point on the line in the scatter diagram to obtain the estimates of regression coefficients. This is
shown in the following figure:
yi
(xi yi)
Y 0 1 X
(Xi, Yi)
xi
The method of least absolute deviation regression considers the sum of the absolute deviation of the
observations from the line in the vertical direction in the scatter diagram as in the case of direct regression to
obtain the estimates of 0 and 1 .
No assumption is required about the form of probability distribution of i in deriving the least squares
estimates. For the purpose of deriving the statistical inferences only, we assume that i ' s are random
variable with E ( i ) 0, Var ( i ) 2 and Cov ( i , j ) 0 for all i j (i, j 1, 2,..., n). This assumption is
needed to find the mean, variance and other properties of the least squares estimates. The assumption that
i ' s are normally distributed is utilized while constructing the tests of hypotheses and confidence intervals
of the parameters.
Based on these approaches, different estimates of 0 and 1 are obtained which have different statistical
properties. Among them the direct regression approach is more popular. Generally, the direct regression
estimates are referred as the least squares estimates or ordinary least squares estimates.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
4
S ( 0 , 1 ) n
2 ( yt 0 1 xi )
0 i 1
S ( 0 , 1 ) n
2 ( yi 0 1 xi )xi .
1 i 1
S ( 0 , 1 )
0
0
S ( 0 , 1 )
0.
1
The solutions of these two equations are called the direct regression estimators, or usually called as the
ordinary least squares (OLS) estimators of 0 and 1 .
b0 y b1 x
sxy
b1
sxx
where
n n
1 n 1 n
sxy ( xi x )( yi y ), sxx ( xi x ) 2 , x i x , y yi .
i 1 i 1 n i 1 n i 1
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
5
Further, we have
2 S ( 0 , 1 ) n
02
2
i 1
(1) 2n,
2 S ( 0 , 1 ) n
12
2
i 1
xi2
2 S ( 0 , 1 ) n
2 xt 2nx .
0 1 i 1
The Hessian matrix which is the matrix of second order partial derivatives in this case is given as
2 S ( 0 , 1 ) 2 S ( 0 , 1 )
02 0 1
H* 2
S ( , ) 2 S ( 0 , 1 )
0 1
0 1 12
n nx
2 n
nx xi2
i 1
'
2 , x
x '
where (1,1,...,1) ' is a n -vector of elements unity and x ( x1 ,..., xn ) ' is a n -vector of observations on X .
The matrix H * is positive definite if its determinant and the element in the first row and column of H * are
positive. The determinant of H is given by
n
H * 2 n xi2 n 2 x 2
i 1
n
2n ( xi x ) 2
i 1
0.
n
The case when (x x )
i 1
i
2
0 is not interesting because all the observations in this case are identical, i.e.
xi c (some constant). In such a case there is no relationship between x and y in the context of regression
n
analysis. Since (x x )
i 1
i
2
0, therefore H 0. So H is positive definite for any ( 0 , 1 ); therefore
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
6
The difference between the observed value yi and the fitted (or predicted) value yˆi is called as a residual.
Unbiased property:
sxy
Note that b1 and b0 y b1 x are the linear combinations of yi (i 1,..., n).
sxx
Therefore
n
b1 ki yi
i 1
n n
where ki ( xi x ) / sxx . Note that ki 1
i 0 and k x
i 1
i i 1, so
n
E (b1 ) ki E ( yi )
i 1
n
ki ( 0 1 xi ) .
i 1
1.
E (b0 ) E y b1 x
E 0 1 x b1 x
0 1 x 1 x
0 .
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
7
Variances:
Using the assumption that yi ' s are independently distributed, the variance of b1 is
n
Var (b1 ) ki2Var ( yi ) ki k j Cov( yi , y j )
i 1 i j i
(x x ) i
2
2 i
(Cov( yi , y j ) 0 as y1 ,..., yn are independent)
sxx2
2 sxx
=
sxx2
2
= .
sxx
The variance of b0 is
Covariance:
The covariance between b0 and b1 is
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
8
It can further be shown that the ordinary least squares estimators b0 and b1 possess the minimum variance
in the class of linear and unbiased estimators. So they are termed as the Best Linear Unbiased Estimators
(BLUE). Such a property is known as the Gauss-Markov theorem which is discussed later in multiple
linear regression model.
i 1
n
( yi y ) b1 ( xi x )
2
i 1
n n n
( yi y ) 2 b12 ( xi x ) 2 2b1 ( xi x )( yi y )
i 1 i 1 i 1
2 2 2
s yy b s 2b s
1 xx 1 xx
s yy b12 sxx
2
s
s yy xy sxx
sxx
sxy2
s yy
sxx
s yy b1sxy .
n
1 n
where s yy ( yi y ) 2 , y yi .
i 1 n i 1
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
9
Estimation of 2
The estimator of 2 is obtained from the residual sum of squares as follows. Assuming that yi is normally
SSres
~ 2 (n 2).
2
Thus using the result about the expectation of a chi-square random variable, we have
E ( SSres ) (n 2) 2 .
(b ) s 2 1 x
2
Var 0
n sxx
and
2
(b ) s .
Var 1
sxx
n n
It is observed that since ( yi yˆi ) 0, so
i 1
e
i 1
i 0. In the light of this property, ei can be regarded as an
estimate of unknown i (i 1,..., n) . This helps in verifying the different model assumptions on the basis of
n
(ii) yˆ e
i 1
i i 0,
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
10
n n
(iii) y yˆ
i 1
i
i 1
i and
Centered Model:
Sometimes it is useful to measure the independent variable around its mean. In such a case, model
yi 0 1 X i i has a centered version as follows:
yi 0 1 ( xi x ) 1 x (i 1, 2,..., n)
0* 1 ( xi x ) i
S ( , 1 ) yi 1 ( xi x ) .
*
0 i
2 *
0
i 1 i 1
Now solving
S ( 0* , 1 )
0
0*
S ( 0* , 1 )
0,
1*
b0* y
and
sxy
b1
sxx
respectively.
Thus the form of the estimate of slope parameter 1 remains same in usual and centered model whereas the
form of the estimate of intercept term changes in the usual and centered models.
Further, the Hessian matrix of the second order partial derivatives of S ( 0* , 1 ) with respect to 0* and 1
is positive definite at 0* b0* and 1 b1 which ensures that S ( 0* , 1 ) is minimized at 0* b0* and
1 b1 .
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
11
Under the assumption that E ( i ) 0,Var ( i ) 2 and Cov( i j ) 0 for all i j 1, 2,..., n , it follows that
E (b0* ) 0* , E (b1 ) 1 ,
2 2
Var (b0* ) , Var (b1 ) .
n sxx
y y b1 ( x x ),
For example, in analyzing the relationship between the velocity ( y ) of a car and its acceleration ( X ) , the
velocity is zero when acceleration is zero.
Using the data ( xi , yi ), i 1, 2,..., n, the direct regression least squares estimate of 1 is obtained by
n n
minimizing S ( 1 ) i2 ( yi 1 xi ) 2 and solving
i 1 i 1
S ( 1 )
0
1
*
yx i i
b
1
i 1
n
.
x
i 1
2
i
The second order partial derivative of S ( 1 ) with respect to 1 at 1 b1 is positive which insures that b1
minimizes S ( 1 ).
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
12
Using the assumption that E ( i ) 0,Var ( i ) 2 and Cov( i j ) 0 for all i j 1, 2,..., n , the properties
*
x E( y ) i i
E (b ) 1
i 1
n
x
i 1
2
i
x 2
i 1
i 1
n
x i 1
2
i
1
x Var ( y ) 2
i i
Var (b1* ) i 1
2
n 2
xi
i 1
n
x 2
i
2 i 1
2
n 2
xi
i 1
2
n
xi 1
2
i
y 2
i b1 yi xi
i 1 i 1
.
n 1
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
13
distribution N (0, 2 ). Now we use the method of maximum likelihood to estimate the parameters of the
linear regression model
yi 0 1 xi i (i 1, 2,..., n),
the observations yi (i 1, 2,..., n) are independently distributed with N ( 0 1 xi , 2 ) for all i 1, 2,..., n.
The likelihood function of the given observations ( xi , yi ) and unknown parameters 0 , 1 and 2 is
1/ 2
n
1 1
L( xi , yi ; 0 , 1 , 2 ) 2
exp 2 ( yi 0 1 xi ) 2 .
i 1 2 2
The maximum likelihood estimates of 0 , 1 and 2 can be obtained by maximizing L( xi , yi ; 0 , 1 , 2 ) or
equivalently in ln L( xi , yi ; 0 , 1 , 2 ) where
n n 1 n
ln L( xi , yi ; 0 , 1 , 2 ) ln 2 ln 2 2 ( yi 0 1 xi ) 2 .
2 2 2 i 1
The normal equations are obtained by partial differentiation of log-likelihood with respect to 0 , 1 and 2
and equating them to zero as follows:
ln L( xi , yi ; 0 , 1 , 2 ) 1 n
0
2
(y
i 1
i 0 1 xi ) 0
ln L( xi , yi ; 0 , 1 , 2 ) 1 n
1
2
(y
i 1
i 0 1 xi )xi 0
and
ln L( xi , yi ; 0 , 1 , 2 ) n 1 n
2
2 4
2 2
(y
i 1
i 0 1 xi ) 2 0.
The solution of these normal equations give the maximum likelihood estimates of 0 , 1 and 2 as
b0 y b1 x
n
( x x )( y y )
i i
sxy
b1 i 1
n
(x x ) 2 sxx
i
i 1
and
n
( y b i 0 b1 xi ) 2
s 2 i 1
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
14
respectively.
It can be verified that the Hessian matrix of second order partial derivation of ln L with respect to 0 , 1 ,
and 2 is negative definite at 0 b0 , 1 b1 , and 2 s 2 which ensures that the likelihood function is
maximized at these values.
Note that the least squares and maximum likelihood estimates of 0 and 1 are identical. The least squares
and maximum likelihood estimates of 2 are different. In fact, the least squares estimate of 2 is
1 n
s2 ( yi y )2
n 2 i 1
so that it is related to maximum likelihood estimate as
n2 2
s 2 s .
n
Thus b0 and b1 are unbiased estimators of 0 and 1 whereas s 2 is a biased estimate of 2 , but it is
asymptotically unbiased. The variances of b0 and b1 are same as of b0 and b1 respectively but
Var ( s 2 ) Var ( s 2 ).
First we develop a test for the null hypothesis related to the slope parameter
H 0 : 1 10
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
15
2
Assuming 2 to be known, we know that E (b1 ) 1 , Var (b1 ) and b1 is a linear combination of
sxx
2
b1 ~ N 1 ,
sxx
Reject H 0 if Z1 Z / 2
Similarly, the decision rule for one sided alternative hypothesis can also be framed.
The 100 (1 )% confidence interval for 1 can be obtained using the Z1 statistic as follows:
P z /2 Z1 z /2 1
b1 1
P z /2 z /2 1
2
sxx
2 2
P b1 z /2 1 b1 z /2 1.
sxx sxx
2 2
b1 z / 2 , b1 z / 2
sxx sxx
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
16
reject H 0 if t0 tn 2, / 2
where tn 2, / 2 is the / 2 percent point of the t -distribution with (n 2) degrees of freedom. Similarly, the
decision rule for one sided alternative hypothesis can also be framed.
The 100 (1 )% confidence interval of 1 can be obtained using the t0 statistic as follows:
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
17
Consider
P t /2 t0 t /2 1
b1 1
P t /2 t /2 1
ˆ2
sxx
ˆ 2 ˆ 2
P b1 t /2 1 b1 t / 2 1.
sxx sxx
SSres SS res
b1 tn 2, /2 , b1 tn 2, /2 .
(n 2) sxx (n 2) sxx
1 x2
where 2 is known, then using the result that E (b0 ) 0 , Var (b0 ) 2 and b0 is a linear
n sx
combination of normally distributed random variables, the following statistic
b0 00
Z0
1 x2
2
n sxx
Reject H 0 if Z 0 Z /2
where Z /2 is the / 2 percentage points on normal distribution. Similarly, the decision rule for one sided
alternative hypothesis can also be framed.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
18
The 100 (1 )% confidence intervals for 0 when 2 is known can be derived using the Z 0 statistic as
follows:
P z /2 Z 0 z /2 1
b0 0
P z /2 z /2 1
1 x2
2
n s xx
21 x2 21 x2
P b0 z /2
0 b0 z /2 1.
n sxx n sxx
1 x2
21 x2
b0 z / 2 2 ,
0 /2
b z .
n sxx n sxx
where tn 2, / 2 is the / 2 percentage point of the t -distribution with (n 2) degrees of freedom. Similarly,
the decision rule for one sided alternative hypothesis can also be framed.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
19
Consider
P tn 2, /2 t0 tn 2, /2 1
b0 0
P tn 2, /2 tn 2, /2 1
SS res 1 x 2
n 2 n sxx
SSres 1 x 2 SS res 1 x 2
P b0 tn 2, /2 0 b0 t n 2, /2 1.
n 2 n sxx n 2 n sxx
SS res 1 x 2 SSres 1 x 2
b0 tn 2, / 2 , b0 tn 2, / 2 .
n 2 n sxx n 2 n sxx
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
20
SS
P n2 2, /2 res n2 2,1 /2 1
2
SS SS
P 2 res 2 2 res 1 .
n 2,1 / 2 n 2, / 2
SSres SS
2 , 2 res .
n 2,1 / 2 n 2, / 2
confidence that both the estimates of 0 and 1 are correct. Consider the centered version of the linear
regression model
yi 0* 1 ( xi x ) i
sxy
b0* y and b1 ,
sxx
respectively.
s
n xx
are also independently distributed because b0* and b1 are independently distributed. Consequently, the sum
of these two
n(b0* o* ) 2 sxx (b1 1 ) 2
~ 22 .
2
2
Since
SSres
~ n2 2
2
and SSres is independently distributed of b0* and b1 , so the ratio
n 2 Qf
2 SSres
where
n n
Q f n(b0 0 ) 2 2 xt (b0 1 )(b1 1 ) xi2 (b1 1 ) 2 .
i 1 i 1
Since
n 2 Q f
P F2, n 2 1
2 SSres
holds true for all values of 0 and 1 , so the 100 (1 ) % confidence region for 0 and 1 is
n 2 Qf
. F2, n 2;1 . .
2 SSres
This confidence region is an ellipse which gives the 100 (1 )% probability that 0 and 1 are contained
Analysis of variance:
The technique of analysis of variance is usually used for testing the hypothesis related to equality of more
than one parameters, like population means or slope parameters. It is more meaningful in case of multiple
regression model when there are more than one slope parameters. This technique is discussed and
illustrated here to understand the related basic concepts and fundamentals which will be used in developing
the analysis of variance in the next module in multiple linear regression model where the explanatory
variables are more than two.
A test statistic for testing H 0 : 1 0 can also be formulated using the analysis of variance technique as
follows.
Further consider
n n
( yi y )( yˆi y ) ( yi y )b1 ( xi x )
i 1 i 1
n
b12 ( xi x ) 2
i 1
n
( yˆi y ) 2 .
i 1
Thus we have
n n n
i 1
( yi y ) 2 ( yi yˆi ) 2 ( yˆi y ) 2 .
i 1 i 1
n
The term ( y y)
i 1
i
2
is called the sum of squares about the mean, corrected sum of squares of y (i.e.,
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
23
n
The term ( y yˆ )
i 1
i i
2
describes the deviation: observation minus predicted value, viz., the residual sum of
n
squares, i.e., SS res ( yi yˆi ) 2
i 1
n
whereas the term ( yˆ y )
i 1
i
2
describes the proportion of variability explained by regression,
n
SS r e g ( yˆi y ) 2 .
i 1
n
If all observations yi are located on a straight line, then in this case ( y yˆ )
i 1
i i
2
0 and thus
SScorrected SS r e g .
Note that SSr e g is completely determined by b1 and so has only one degrees of freedom. The total sum of
n n
squares s yy ( yi y ) 2 has (n 1) degrees of freedom due to constraint ( y y) 0
i and SS res has
i 1 i 1
All sums of squares are mutually independent and distributed as df2 with df degrees of freedom if the
MS r e g
F0 .
MSE
If H 0 : 1 0 is true, then MSr e g and MSE are independently distributed and thus
F0 ~ F1, n 2 .
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
24
F0 F1,n 2;1
at level of significance. The test procedure can be described in an Analysis of variance table.
Total s yy n 1
Moreover, we have
sxy s yy
b1 rxy .
sxx sxx
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
25
and
SSr e g s yy SS res
( sxy ) 2
sxx
2
b s 1 xx
b1sxy .
residuals, so a measure of quality of fitted model can be based on SSres . When intercept term is present in the
model, a measure of goodness of fit of the model is given by
SS res
R2 1
s yy
SS r e g
.
s yy
This is known as the coefficient of determination. This measure is based on the concept that how much
variation in y ’s stated by s yy is explainable by SSreg and how much unexplainable part is contained in
SS res . The ratio SS r e g / s yy describes the proportion of variability that is explained by regression in relation
to the total variability of y . The ratio SSres / s yy describes the proportion of variability that is not covered
by the regression.
where rxy is the simple correlation coefficient between x and y. Clearly 0 R 2 1 , so a value of R 2 closer
to one indicates the better fit and value of R 2 closer to zero indicates the poor fit.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
26
Suppose we want to predict the value of E ( y ) for a given value of x x0 . Then the predictor is given by
E ( y | x0 ) ˆ y / x0 b0 b1 x0 .
Predictive bias
Then the prediction error is given as
ˆ y| x E ( y ) b0 b1 x0 E ( 0 1 x0 )
0
b0 b1 x0 ( 0 1 x0 )
(b0 0 ) (b1 1 ) x0 .
Then
E ˆ y| x0 E ( y ) E (b0 0 ) E (b1 1 ) x0
00 0
Thus the predictor y / x0 is an unbiased predictor of E ( y ).
Predictive variance:
The predictive variance of ˆ y| x0 is
PV ( ˆ y| x0 ) Var (b0 b1 x0 )
Var y b1 ( x0 x )
Var ( y ) ( x0 x ) 2 Var (b1 ) 2( x0 x )Cov( y , b1 )
2 2 ( x0 x ) 2
0
n sxx
1 ( x x )2
2 0 .
n sxx
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
27
( ˆ ) ˆ 2 1 ( x0 x )
2
PV y| x0
n sxx
1 ( x0 x ) 2
MSE .
n sxx
The predictor ˆ y| x0 is a linear combination of normally distributed random variables, so it is also normally
distributed as
ˆ y| x ~ N 0 1 x0 , PV ˆ y| x
0 0
.
So if 2 is known, then the distribution of
ˆ y| x E ( y | x0 )
0
PV ( ˆ y| x0 )
ˆ y| x0 E ( y | x0 )
P z /2 z /2 1
PV ( ˆ y| x0 )
1 ( x x )2 ( x0 x ) 2
2 1
ˆ y| x0 z /2 2 0 , ˆ
y| x0 /2
z .
n sxx n sxx
When 2 is unknown, it is replaced by ˆ 2 MSE and in this case the sampling distribution of
ˆ y|x E ( y | x0 )
0
1 ( x x )2
MSE 0
n sxx
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
28
Note that the width of prediction interval E ( y | x0 ) is a function of x0 . The interval width is minimum for
x0 x and widens as x0 x increases. This is expected also as the best estimates of y to be made at x -
values lie near the center of the data and the precision of estimation to deteriorate as we move to the
boundary of the x -space.
ŷ0 b0 b1 x0 .
The true value of y in the prediction period is given by y0 0 1 x0 0 where 0 indicates the value that
would be drawn from the distribution of random error in the prediction period. Note that the form of
predictor is the same as of average value predictor but its predictive error and other properties are different.
This is the dual nature of predictor.
Predictive bias:
The predictive error of ŷ0 is given by
yˆ 0 y0 b0 b1 x0 ( 0 1 x0 0 )
(b0 0 ) (b1 1 ) x0 .
Thus, we find that
E ( yˆ 0 y0 ) E (b0 0 ) E (b1 1 ) x0 E ( 0 )
000 0
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
29
Predictive variance
Because the future observation y0 is independent of ŷ0 , the predictive variance of ŷ0 is
PV ( yˆ 0 ) E ( yˆ 0 y0 ) 2
E[(b0 0 ) ( x0 x )(b1 1 ) (b1 1 ) x 0 ]2
Var (b0 ) ( x0 x ) 2 Var (b1 ) x 2Var (b1 ) Var ( 0 ) 2( x0 x )Cov(b0 , b1 ) 2 xCov(b0 , b1 ) 2( x0 x )Var (b1 )
[rest of the terms are 0 assuming the independence of 0 with 1 , 2 ,..., n ]
Var (b0 ) [( x0 x ) 2 x 2 2( x0 x )]Var (b1 ) Var ( ) 2[( x0 x ) 2 x ]Cov(b0 , b1 )
Var (b0 ) x02Var (b1 ) Var ( 0 ) 2 x0Cov(b0 , b1 )
1 x2 2 x 2
2 x02 2 2 x0
n sxx sxx sxx
1 ( x x )2
2 1 0 .
n sxx
Prediction interval:
If 2 is known, then the distribution of
yˆ 0 y0
PV ( yˆ 0 )
yˆ y0
P z /2 0 z /2 1
PV ( yˆ 0 )
which gives the prediction interval for y0 as
1 ( x x )2 1 ( x0 x ) 2
2
yˆ 0 z /2 2 1 0 , ˆ
0 /2
y z 1 .
n sxx n sxx
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
30
follows a t -distribution with (n 2) degrees of freedom. The 100(1- )% prediction interval for ŷ0 in this
case is obtained as
yˆ y0
P t /2,n 2 0 t /2, n 2 1
( yˆ )
PV
0
which gives the prediction interval
1 ( x x )2 1 ( x0 x ) 2
yˆ 0 t /2,n 2 MSE 1 0 , ˆ
y
0 /2,n 2
t MSE 1 .
n s xx n s xx
The prediction interval for ŷ0 is wider than the prediction interval for ˆ y / x0 because the prediction interval
for ŷ0 depends on both the error from the fitted model as well as the error associated with the future
observations.
Y 0 1 X
(xi,
(Xi,
)
x,
Reverse regression
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
31
The reverse regression has been advocated in the analysis of gender (or race) discrimination in salaries. For
example, if y denotes salary and x denotes qualifications and we are interested in determining if there is a
gender discrimination in salaries, we can ask:
“Whether men and women with the same qualifications (value of x) are getting the same salaries
(value of y). This question is answered by the direct regression.”
where i ’s are the associated random error components and satisfy the assumptions as in the case of usual
simple linear regression model. The reverse regression estimates ˆOR of 0* and ˆ1R of 1* for the model
are obtained by interchanging the x and y in the direct regression estimators of 0 and 1 . The estimates are
obtained as
ˆOR x ˆ1R y
and
s yy
ˆ1R
sxy
*
sxy2
SS res sxx .
s yy
Note that
sxy2
ˆ1R b1 rxy2
sxx s yy
where b1 is the direct regression estimator of slope parameter and rxy is the correlation coefficient between x
and y. Hence if rxy2 is close to 1, the two regression lines will be close to each other.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
32
yi
(xi, yi)
Y 0 1 X
(Xi, Yi)
xi
If we assume that the regression line to be fitted is Yi 0 1 X i , then it is expected that all the
observations ( xi , yi ), i 1, 2,..., n lie on this line. But these points deviate from the line and in such a case,
the squared perpendicular distance of observed data ( xi , yi ) (i 1, 2,..., n) from the line is given by
di2 ( X i xi ) 2 (Yi yi ) 2
where ( X i , Yi ) denotes the i th pair of observation without any error which lie on the line.
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
33
n
The objective is to minimize the sum of squared perpendicular distances given by i 1
di2 to obtain the
estimates of 0 and 1 . The observations ( xi , yi ) (i 1, 2,..., n) are expected to lie on the line
Yi 0 1 X i ,
so let
Ei Yi 0 1 X i 0.
n
The regression coefficients are obtained by minimizing d
i 1
i
2
under the constraints Ei ' s using the
where 1 ,..., n are the Lagrangian multipliers. The set of equations are obtained by setting
L0 L L L
0, 0 0, 0 0 and 0 0 (i 1, 2,..., n).
X i Yi 0 1
Thus we find
L0
( X i xi ) i 1 0
X i
L0
(Yi yi ) i 0
Yi
L0 n
0
i 1
i 0
L0 n
1
X
i 1
i i 0.
Since
X i xi i 1
Yi yi i ,
Ei ( yi i ) 0 1 ( xi i 1 ) 0
0 1 xi yi
i .
1 12
n
Also using this i in the equation
i 1
i 0 , we get
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
34
( 0 1 xi yi )
i 1
0
1 12
n
and using ( X i xi ) i 1 0 and X
i 1
i i 0 , we get
( x ) 0.
i 1
i i i 1
( x x 0 i
2
1 i yi xi )
1 ( 0 1 xi yi ) 2
i 1
0. (1)
(1 i2 ) (1 12 ) 2
n
Using i in the equation and using the equation
i 1
i 0 , we solve
( 0 1 xi yi )
i 1
0.
1 12
ˆ0OR y ˆ1OR x
) yxi 1 xxi x xi yi 1 y 1 x 1 xi yi 0
n n
(1
i 1
1
2 2
1 i
i 1
or
n n 2
(1 12 ) xi yi y 1 ( xi x ) 1 ( yi y ) 1 ( xi x ) 0
i 1 i 1
or
n n
(1 12 ) (ui x )(vi 1ui ) 1 (vi 1ui ) 2 0
i 1 i 1
where ui xi x ,
vi yi y .
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
35
n n
Since u u
i 1
i
i 1
i 0, so
n
i 1
u v 1 (ui2 vi2 ) ui vi 0
2
1 i i
or
12 sxy 1 ( sxx s yy ) sxy 0.
2sxy
where sign( sxy ) denotes the sign of sxy which can be positive or negative . So
1 if sxy 0
sign( sxy ) .
1 if sxy 0.
n
Notice that this gives two solutions for ˆ1OR . We choose the solution which minimizes d i
2
. The other
i 1
n
solution maximizes d
i 1
i
2
and is in the direction perpendicular to the optimal solution. The optimal
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
36
(xi yi)
Y 0 1 X
(Xi, Yi)
xi
Suppose the regression line is Yi 0 1 X i on which all the observed points are expected to lie. Suppose
the points ( xi , yi ), i 1, 2,..., n are observed which lie away from the line. The area of rectangle extended
where ( X i , Yi ) denotes the i th pair of observation without any error which lie on the line.
A (X
i 1
i
i 1
i ~ xi )(Yi ~ yi ).
All observed data points ( xi , yi ), (i 1, 2,..., n) are expected to lie on the line
Yi 0 1 X i
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
37
and let
Ei* Yi 0 1 X i 0.
So now the objective is to minimize the sum of areas under the constraints Ei* to obtain the reduced major
axis estimates of regression coefficients. Using the Lagrangian multiplies method, the Lagrangian function is
n n
LR Ai i Ei*
i 1 i 1
n n
( X i xi )(Yi yi ) i Ei*
i 1 i 1
where 1 ,..., n are the Lagrangian multipliers. The set of equations are obtained by setting
LR L L L
0, R 0, R 0, R 0 (i 1, 2,..., n).
X i Yi 0 1
Thus
LR
(Yi yi ) 1i 0
X i
LR
( X i xi ) i 0
Yi
LR n
i 0
0 i 1
LR n
i X i 0.
1 i 1
Now
X i xi i
Yi yi 1i
0 1 X i yi 1i
0 1 ( xi i ) yi 1i
y 0 1 xi
i i .
2 1
n
Substituting i in
i 1
i 0, the reduced major axis regression estimate of 0 is obtained as
ˆ0 RM y ˆ1RM x
where ˆ1RM is the reduced major axis regression estimate of 1 . Using X i xi i , i and ˆ0 RM in
n
X
i 1
i i 0 , we get
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
38
n
yi y 1 x 1 xi yi y 1 x 1 xi
i 1 21
xi
21
0.
Let ui xi x and vi yi y , then this equation can be re-expressed as
n
(v u )(v u 2 x ) 0.
i 1
i 1 i i 1 i 1
n n
Using u u
i 1
i
i 1
i 0, we get
n n
v
i 1
2
i 12 ui2 0.
i 1
Solving this equation, the reduced major axis regression estimate of 1 is obtained as
s yy
ˆ1RM sign( sxy )
sxx
1 if sxy 0
where sign ( sxy )
1 if sxy 0.
We choose the regression estimator which has same sign as of sxy .
In the method of least squares, the estimates of the parameters 0 and 1 in the model
n
yi 0 1 xi i . (i 1, 2,..., n) are chosen such that the sum of squares of deviations
i 1
i
2
is minimum. In
the method of least absolute deviation (LAD) regression, the parameters 0 and 1 are estimated such that
n
the sum of absolute deviations
i 1
i is minimum. It minimizes the absolute vertical sum of errors as in the
Regression Analysis | Chapter 2 | Simple Linear Regression Analysis | Shalabh, IIT Kanpur
39