Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lecture2_241007_162001

Download as pdf or txt
Download as pdf or txt
You are on page 1of 11

LINEAR REGRESSION ANALYSIS

MODULE – II
Lecture - 2

Simple Linear Regression


Analysis
Dr. Shalabh
Department of Mathematics and Statistics
Indian Institute of Technology Kanpur
2

The simple linear regression model

We consider the modeling between the dependent and one independent variable. When there is only one independent
variable in the linear regression model, the model is generally termed as simple linear regression model. When there are
more than one independent variables in the model, then the linear model is termed as the multiple linear regression model.

Consider a simple linear regression model

y =β 0 + β1 X + ε

where y is termed as the dependent or study variable and X is termed as independent or explanatory variable.
The terms β 0 and β1 are the parameters of the model. The parameter β 0 is termed as intercept term and the parameter
β1 is termed as slope parameter. These parameters are usually called as regression coefficients. The unobservable
error component ε accounts for the failure of data to lie on the straight line and represents the difference between the true
and observed realization of y. This is termed as disturbance or error term. There can be several reasons for such
difference, e.g., the effect of all deleted variables in the model, variables may be qualitative, inherit randomness in the
observations etc. We assume that ε is observed as independent and identically distributed random variable with mean zero
and constant variance σ . Later, we will additionally assume that
2
ε is normally distributed.

The independent variable is viewed as controlled by the experimenter, so it is considered as non-stochastic whereas y is
viewed as a random variable with
) β 0 + β1 X
E ( y=
And
Var ( y ) = σ 2 .
3

Sometimes X can also be a random variable. In such a case, instead of simple mean and simple variance of y, we
consider the conditional mean of y given X = x as

) β 0 + β1 x
E ( y | x=
and the conditional variance of y given X = x as

Var ( y | x ) = σ 2 .

When the values of β 0 , β1 and σ 2 are known, the model is completely described.

The parameters β 0 , β1 and σ 2 are generally unknown and ε is unobserved. The determination of the statistical model
y =β 0 + β1 X + ε depends on the determination (i.e., estimation ) of β 0 , β1 and σ 2 .

In order to know the value of the parameters, n pairs of observations ( xi , yi )(i = 1,..., n ) on ( X , y ) are observed/collected
and are used to determine these unknown parameters.

Various methods of estimation can be used to determine the estimates of the parameters. Among them, the least squares
and maximum likelihood principles are the popular methods of estimation.
4

Least squares estimation


Suppose a sample of n sets of paired observations ( xi , yi )(i = 1, 2,..., n) are available. These observations are assumed to
satisfy the simple linear regression model and so we can write
yi =β 0 + β1 xi + ε i (i =1, 2,..., n).

The method of least squares estimates the parameters β 0 and β1 by minimizing the sum of squares of difference between
the observations and the line in the scatter diagram. Such an idea is viewed from different perspectives. When the vertical
difference between the observations and the line in the scatter diagram is considered and its sum of squares is minimized
to obtain the estimates of β 0 and β1 , the method is known as direct regression.
5

Alternatively, the sum of squares of difference between


the observations and the line in horizontal direction in the
scatter diagram can be minimized to obtain the
estimates of β 0 and β1 . This is known as reverse (or
inverse) regression method.

Instead of horizontal or vertical errors, if the sum of


squares of perpendicular distances between the
observations and the line in the scatter diagram is
minimized to obtain the estimates of β 0 and β1, the
method is known as orthogonal regression or major
axis regression method.
6

Instead of minimizing the distance, the area can also be


minimized. The reduced major axis regression
method minimizes the sum of the areas of rectangles
defined between the observed data points and the
nearest point on the line in the scatter diagram to obtain
the estimates of regression coefficients. This is shown in
the following figure:

The method of least absolute deviation regression considers the sum of the absolute deviation of the observations
from the line in the vertical direction in the scatter diagram as in the case of direct regression to obtain the estimates of
β 0 and β1
7

No assumption is required about the form of probability distribution of ε i in deriving the least squares estimates. For
the purpose of deriving the statistical inferences only, we assume that ε i ' s are observed as random variable with

E (ε=
i) 0,Var (ε=
i) σ 2 and Cov (ε i , ε=
j ) 0 for all i ≠ j (i=
, j 1, 2,..., n ).

This assumption is needed to find the mean, variance and other properties of the least squares estimates. The
assumption that εi ' s are normally distributed is utilized while constructing the tests of hypotheses and confidence
intervals of the parameters.

Based on these approaches, different estimates of β 0 and β1 are obtained which have different statistical properties.
Among them the direct regression approach is more popular. Generally, the direct regression estimates are referred as
the least squares estimates or ordinary least squares estimates.
8
Direct regression method

This method is also known as the ordinary least squares estimation. Assuming that a set of n paired observations
on ( xi , yi ), i = 1, 2,..., n are available which satisfy the linear regression model y =β 0 + β1 X + ε . So we can write the
model for each observation as yi =β 0 + β1 xi + ε i , , (i =1, 2,..., n) .

The direct regression approach minimizes the sum of squares due to errors given by
n n
S (β , β =
0 1 ) i
2

=i 1 =i 1
∑ ε= ∑ ( y − β i 0 − β1 xi ) 2

with respect to β 0 and β1.


The partial derivatives of S ( β 0 , β1 ) with respect to β 0 are
∂S ( β 0 , β1 ) n
−2∑ ( yt − β 0 − β1 xi )
=
∂β 0 i =1

and the partial derivative of S ( β 0 , β1 ) with respect to β1 is


∂S ( β 0 , β1 ) n
−2∑ ( yi − β 0 − β1 xi )xi .
=
∂β1 i =1

The solution of β 0 and β1 is obtained by setting


∂S ( β 0 , β1 )
=0
∂β 0
∂S ( β 0 , β1 )
= 0.
∂β1

The solutions of these two equations are called the direct regression estimators, or usually called as the ordinary
least squares (OLS) estimators of β 0 and β1 .
9

This gives the ordinary least squares estimates b0 of β 0 and b1 of β1 as


b0= y − b1 x
s xy
b1 =
s xx
where
n
sxy = ∑ ( x − x )( y − y ),
i =1
i i

n
=
sxx ∑ (x − x ) ,
i =1
i
2

1 n
x= ∑ xi ,
n i =1
1 n
y = ∑ yi .
n i =1

Further, we have

∂ 2 S ( β 0 , β1 ) n

∂β 02
=− 2 ∑
i =1
( −1) =2n,

∂ 2 S ( β 0 , β1 ) n

∂β1 2
= 2 ∑
i =1
xi2

∂ 2 S ( β 0 , β1 ) n
= 2=
∂β 0 ∂β1

i =1
xt 2nx .
The Hessian matrix which is the matrix of second order partial derivatives in this case is given as 10

 ∂ 2 S ( β 0 , β1 ) ∂ 2 S ( β 0 , β1 ) 
 
 ∂β 02 ∂β 0 ∂β1 
H* = 2
 ∂ S (β , β ) ∂ 2 S ( β 0 , β1 ) 
 0 1

 ∂β 0 ∂β1 ∂β12 

 n nx 
= 2 n 
 nx



i =1
2
xi 

 '
= 2   ( , x )
 x '
where  = (1,1,...,1) ' is a n-vector of elements unity and x = ( x1 ,..., xn ) ' is a n-vector of observations on X. The matrix H*
is positive definite if its determinant and the element in the first row and column of H* are positive.

The determinant of H is given by


 n 2 
=H * 2  n∑ xi − n 2 x 2 
 i =1 
n
= 2n∑ ( xi − x ) 2
i =1

≥ 0.
n
The case when ∑ (x − x )
i =1
i
2
=
0 is not interesting because then all the observations are identical, i.e. xi =c (some constant).
n
In such a case there is no relationship between x and y in the context of regression analysis. Since ∑ (x − x )
i =1
i
2
> 0,

therefore H * > 0. So H is positive definite for any ( β 0 , β1 ); therefore S ( β 0 , β1 ) has a global minimum at (b0 , b1 ).
11

The fitted line or the fitted linear regression model is


y= b0 + b1 x
and the predicted values are
yˆi =
b0 + b1 xi (i =
1, 2,..., n).
The difference between the observed value yi and the fitted (or predicted) value yˆ i is called as a residual.
The ith residual is

=ei y= ˆ
i ~ yi (i 1, 2,..., n).
We consider it as
e=
i yi − yˆi
= yi − (b0 + b1 xi ).

You might also like