Lecture2_241007_162001
Lecture2_241007_162001
Lecture2_241007_162001
MODULE – II
Lecture - 2
We consider the modeling between the dependent and one independent variable. When there is only one independent
variable in the linear regression model, the model is generally termed as simple linear regression model. When there are
more than one independent variables in the model, then the linear model is termed as the multiple linear regression model.
y =β 0 + β1 X + ε
where y is termed as the dependent or study variable and X is termed as independent or explanatory variable.
The terms β 0 and β1 are the parameters of the model. The parameter β 0 is termed as intercept term and the parameter
β1 is termed as slope parameter. These parameters are usually called as regression coefficients. The unobservable
error component ε accounts for the failure of data to lie on the straight line and represents the difference between the true
and observed realization of y. This is termed as disturbance or error term. There can be several reasons for such
difference, e.g., the effect of all deleted variables in the model, variables may be qualitative, inherit randomness in the
observations etc. We assume that ε is observed as independent and identically distributed random variable with mean zero
and constant variance σ . Later, we will additionally assume that
2
ε is normally distributed.
The independent variable is viewed as controlled by the experimenter, so it is considered as non-stochastic whereas y is
viewed as a random variable with
) β 0 + β1 X
E ( y=
And
Var ( y ) = σ 2 .
3
Sometimes X can also be a random variable. In such a case, instead of simple mean and simple variance of y, we
consider the conditional mean of y given X = x as
) β 0 + β1 x
E ( y | x=
and the conditional variance of y given X = x as
Var ( y | x ) = σ 2 .
When the values of β 0 , β1 and σ 2 are known, the model is completely described.
The parameters β 0 , β1 and σ 2 are generally unknown and ε is unobserved. The determination of the statistical model
y =β 0 + β1 X + ε depends on the determination (i.e., estimation ) of β 0 , β1 and σ 2 .
In order to know the value of the parameters, n pairs of observations ( xi , yi )(i = 1,..., n ) on ( X , y ) are observed/collected
and are used to determine these unknown parameters.
Various methods of estimation can be used to determine the estimates of the parameters. Among them, the least squares
and maximum likelihood principles are the popular methods of estimation.
4
The method of least squares estimates the parameters β 0 and β1 by minimizing the sum of squares of difference between
the observations and the line in the scatter diagram. Such an idea is viewed from different perspectives. When the vertical
difference between the observations and the line in the scatter diagram is considered and its sum of squares is minimized
to obtain the estimates of β 0 and β1 , the method is known as direct regression.
5
The method of least absolute deviation regression considers the sum of the absolute deviation of the observations
from the line in the vertical direction in the scatter diagram as in the case of direct regression to obtain the estimates of
β 0 and β1
7
No assumption is required about the form of probability distribution of ε i in deriving the least squares estimates. For
the purpose of deriving the statistical inferences only, we assume that ε i ' s are observed as random variable with
E (ε=
i) 0,Var (ε=
i) σ 2 and Cov (ε i , ε=
j ) 0 for all i ≠ j (i=
, j 1, 2,..., n ).
This assumption is needed to find the mean, variance and other properties of the least squares estimates. The
assumption that εi ' s are normally distributed is utilized while constructing the tests of hypotheses and confidence
intervals of the parameters.
Based on these approaches, different estimates of β 0 and β1 are obtained which have different statistical properties.
Among them the direct regression approach is more popular. Generally, the direct regression estimates are referred as
the least squares estimates or ordinary least squares estimates.
8
Direct regression method
This method is also known as the ordinary least squares estimation. Assuming that a set of n paired observations
on ( xi , yi ), i = 1, 2,..., n are available which satisfy the linear regression model y =β 0 + β1 X + ε . So we can write the
model for each observation as yi =β 0 + β1 xi + ε i , , (i =1, 2,..., n) .
The direct regression approach minimizes the sum of squares due to errors given by
n n
S (β , β =
0 1 ) i
2
=i 1 =i 1
∑ ε= ∑ ( y − β i 0 − β1 xi ) 2
The solutions of these two equations are called the direct regression estimators, or usually called as the ordinary
least squares (OLS) estimators of β 0 and β1 .
9
n
=
sxx ∑ (x − x ) ,
i =1
i
2
1 n
x= ∑ xi ,
n i =1
1 n
y = ∑ yi .
n i =1
Further, we have
∂ 2 S ( β 0 , β1 ) n
∂β 02
=− 2 ∑
i =1
( −1) =2n,
∂ 2 S ( β 0 , β1 ) n
∂β1 2
= 2 ∑
i =1
xi2
∂ 2 S ( β 0 , β1 ) n
= 2=
∂β 0 ∂β1
∑
i =1
xt 2nx .
The Hessian matrix which is the matrix of second order partial derivatives in this case is given as 10
∂ 2 S ( β 0 , β1 ) ∂ 2 S ( β 0 , β1 )
∂β 02 ∂β 0 ∂β1
H* = 2
∂ S (β , β ) ∂ 2 S ( β 0 , β1 )
0 1
∂β 0 ∂β1 ∂β12
n nx
= 2 n
nx
∑
i =1
2
xi
'
= 2 ( , x )
x '
where = (1,1,...,1) ' is a n-vector of elements unity and x = ( x1 ,..., xn ) ' is a n-vector of observations on X. The matrix H*
is positive definite if its determinant and the element in the first row and column of H* are positive.
≥ 0.
n
The case when ∑ (x − x )
i =1
i
2
=
0 is not interesting because then all the observations are identical, i.e. xi =c (some constant).
n
In such a case there is no relationship between x and y in the context of regression analysis. Since ∑ (x − x )
i =1
i
2
> 0,
therefore H * > 0. So H is positive definite for any ( β 0 , β1 ); therefore S ( β 0 , β1 ) has a global minimum at (b0 , b1 ).
11
=ei y= ˆ
i ~ yi (i 1, 2,..., n).
We consider it as
e=
i yi − yˆi
= yi − (b0 + b1 xi ).