Linear Regression Model
Linear Regression Model
4
Sasadhar Bera, IIM Ranchi
What is Regression Analysis?
In most of the studies, the true relationship between
output and input variables is unknown. In such
situation, we build empirical model as an
approximating function. This empirical model
facilitates us to understand, interpretation and
implementation the input-output relationship.
Regression analysis is highly useful to develop such
empirical model using historical records or data from
natural phenomena or experiment.
y 0 1x1 2x2 . . . . k xk
9
Sasadhar Bera, IIM Ranchi
Linear Vs. Non-linear Regression Model (Contd.)
Linear Regression Model-1: y 40.44 0.775x1 0.325 x 2
The above regression model parameters are linear form
and describes a plane in the two dimensional x1 , x2 space.
10
Sasadhar Bera, IIM Ranchi
Linear Vs. Non-linear Regression Model (Contd.)
Model-2: y 69.11.63 x1 0.96x12 1.08 x 2 1.22x 22 0.225 x1x 2
The above regression model is linear model because the
parameters (β values) are linear in form. It is a second
order response model. It is to be noted that power of x
does not determine whether or not a model is linear.
Response surface
generated by above
regression equation
11
Sasadhar Bera, IIM Ranchi
Linear Vs. Non-linear Regression Model (Contd.)
Any regression model that is non-linear in parameters are
called non-linear regression.
exp( 1 x )
Model-3: y
2 3 x
1
Model-4: y (e x
e 1 1 2 x2
)
1 2
12
Sasadhar Bera, IIM Ranchi
Linear Vs. Non-linear Regression Model (Contd.)
exp( x )
Scatter plot of non-linear regression model: y
0.01 x
13
Sasadhar Bera, IIM Ranchi
Simple Linear Regression
Suppose that we have collected n observations to fit a
regression model with one predictor variable. The fitted
equation that describes prediction model using simple
linear regression is given below:
yi 0 1x i i
where i = 1, 2, . . .,n n = total observations
yi is ith observation
xi x
16
Sasadhar Bera, IIM Ranchi
Parameter Estimates: Simple Linear Regression
Ordinary Least Square (OLS) approach to estimate model
parameters is given below. yi 0 1x i i
17
Sasadhar Bera, IIM Ranchi
Estimated Regression Line
The fitted or estimated regression line: y 0 1 x
Predicted value or Fit for ith observation (xi):
yi 0 1 x i
Error in the fit called residual: ei y i y i
n
e
i 1
2
i
Standard error of estimate: S = MSE =
n2
18
Sasadhar Bera, IIM Ranchi
Test of Significance of Regression
Hypothesis related to the Significance of Regression
H0: 1 = 0 i. e. no relationship between x and y
H1: 1 0
1 = 0 1 0
19
Sasadhar Bera, IIM Ranchi
Test of Significance of Regression (Contd.)
ANOVA table
Source of Variation DF SS MS FCal
Regression 1 SSR SSR /1 =MSR MSR/MSE
Residual error n –2 SSE SSE / (n-2) = MSE
Total n –1 TSS
2 2
SSR yi y y yi
n n
SSE
i 1
i
i 1
y
n 2
TSS i
y
i 1
1 follows normal distribution with mean β1 and
2
variance where σ2 is error variance.
SXX
21
Sasadhar Bera, IIM Ranchi
Coefficient of Determination
SSR
Coefficient of determination = R2 =
TSS
SSR : Sum of square due to regression
TSS: Total sum of square
23
Sasadhar Bera, IIM Ranchi
An Illustrative Example
The weight and systolic blood pressure of 26 randomly selected
males in the age group 25 to 30 are shown in the following
table. Assume that weight and blood pressure are jointly
normally distributed.
Subject Weight BP Subject Weight BP
1 165 130 14 172 153
2 167 133 15 159 128
3 180 150 16 168 132
4 155 128 17 174 149
5 212 151 18 183 158
6 175 146 19 215 150
7 190 150 20 195 163
8 210 140 21 180 156
9 200 148 22 143 124
10 149 125 23 240 170
11 158 133 24 235 165
12 169 135 25 192 160
13 170 150 26 187 159 24
Sasadhar Bera, IIM Ranchi
An Illustrative Example (Contd.)
i. Construct a scatter diagram of the data
25
Sasadhar Bera, IIM Ranchi
An Illustrative Example (Contd.)
Analysis of Variance
Source DF SS MS F P
Regression 1 2693.6 2693.6 35.74 0.000
Residual Error 24 1808.6 75.4
Total 25 4502.2
27
Sasadhar Bera, IIM Ranchi
An Illustrative Example (Contd.)
Predictor Coef SE Coef T P
Constant 69.10 12.91 5.35 0.000
Weight 0.41942 0.07015 5.98 0.000
29
Sasadhar Bera, IIM Ranchi
Multiple Linear Regression Model
30
Sasadhar Bera, IIM Ranchi
Multiple Linear Regression Model
Multiple linear regression involves one dependent
variable and more than one independent variable. The
equation that describes multiple linear regression model is
given below:
y = β0 + β1 x1 + β2 x2 + . . . + βk xk + ϵ
y is dependent variable and x1, x2 , . . .,xk are
independent variables. These independent variables being
used to predict the dependent variable.
β0 , β1 , β2 , . . ., βk are total (k+1) unknown regression
coefficients. These regression coefficients are estimated
based on observed sample data.
The term ϵ (pronounced as epsilon) is random error.
31
Sasadhar Bera, IIM Ranchi
Matrix Notation: Multiple Linear Regression
Suppose that n number of observations are collected for
response variable (y) and k number of independent
variables present in the regression model.
yn×1 = Xn×(k+1) β(k+1) ×1 + ϵn×1
n = total number of observations, k = total number of
variables, β is model parameters in vector notation.
32
Sasadhar Bera, IIM Ranchi
Estimated Residual and Standard Error
For ith observation (Xi), predicted value or Fit : y i Xi β
Error in the fit called residual: ei y i y i
n
e
i 1
2
i
Mean Square Error = MSE =
n k 1
where n is the total number of observations, k is number
of regressors.
Standard error (SE) of estimate = = MSE
33
Sasadhar Bera, IIM Ranchi
Testing Significance of Regression Model
The test for significance of regression is a test to
determine if there is a linear relationship between the
response variable and regressor variables.
H0 : β1 = β2 = . . . = βk = 0
H1 : At least one βj is not zero
2
y n
2 i
SSR yi y β XT y i1
TSS = SSR + SSE n T
i 1 n
2 n
SSE yi yi y T y β XT y
T
TSS yi y
n 2
i 1 i 1
35
Sasadhar Bera, IIM Ranchi
Significance Test of Individual Parameter