Day 6 Session 1 MLR
Day 6 Session 1 MLR
Day 6 Session 1 MLR
1
Predictive Modeling
Process by which a statistical model is created to best predict the outcome or probability of an
outcome.
Predictive models are developed using historical data or from purposely collected data.
2
Predictive Modeling
General Approach
Set
Set Apply Develop
Develop
Understand
Understand Apply
Business
Business Business TheStatistical
The Statistical
Data
Data Business
Goal
Goal Expertise Model
Model
Expertise
Validate Make
Make
Validate
TheModel
Model Decisions
Decisions
The
3
Multiple Linear Regression
Multiple linear regression is an extension of simple linear regression. It is used when we
want to predict the value of a variable based on the values of two or more other
variables.
The variable we want to predict is called as dependent variable (or sometimes response
variable).
The variables used to predict the value of dependent variable are called as independent
variables (or sometimes, the predictor, explanatory or regressor variables).
4
Statistical Model in Multiple Linear Regression
Where,
Y : Dependent Variable
x1, x2 ,…, xp : Independent Variables
b0, b1 ,…, bp : Parameters of Model
e : Random Error Component
_______________________________________________
___________
6
Case Study
Predicting Job Performance
index
Test of
Technical
Language
General
Aptitude
Information
Performance
Index
7
Get Started With Scatter Plot Matrix
per_index<-read.csv(file.choose(),header=T)
pairs(~jpi+aptitude+tol+technical+general,data=per_index,
col="blue")
8
Performance Index: Mathematical
Model
Sample Size:33
Model:
10
Least Square Estimates of Parameters
Coefficients
Intercept -54.2822 Model Equation
aptitude 0.3236
jpi= -54.2822 + 0.3236(aptitude)
tol 0.0334 + 0.0334(tol) + 1.0955(technical)
+0.5368 (general)
technical 1.0955
general 0.5368
11
Global Testing: Testing whether at least one
variable has significant impact
The aim of Global Testing is to test the null hypothesis that all the model
parameters are simultaneously equal to zero.
Hypotheses:
In other words
12
Global Testing: Partitioning Total Variation
Total Variation
n
∑ (Yi – Y )2
i=1
Unexplained
Explained Variation Variation
n ^ n ^
∑ (Yi – Y )2 ∑ (Yi – Yi )2
i=1 i=1
13
Global Testing: ANOVA and Decision Criterion
<0.000
Model(Explained) p=4 2510.007 627.5017 49.8129
1
H 0: b i = 0 v/s H 1 : bi ≠ 0
; i=1,2,3,4,..,P
Test Statistic
Reject the null hypothesis if P value < 0.05 and conclude that the
variable has significant effect on Y
15
Individual Testing: Case Study
P values for aptitude, technical and general are less than 0.05.
P value for test of language(tol) is more than 0.05. Therefore, tol is
the only insignificant variable.
16
Interpretation of Partial Regression Coefficients
We can infer that for one unit increase in aptitude test score, the job performance
index will increase by 0.3236 units.
17
Measure of Goodness of Fit: R Squared
18
Multiple Linear Regression in R
per_index<-read.csv(file.choose(),header=T)
jpimodel<-lm(jpi~aptitude+tol+technical+general,data=per_index)
jpimodel
_______________________________________________________________
#default output displayed by typing model object name
Call:
lm(formula = jpi ~ aptitude + tol + technical + general, data = per_index)
Coefficients:
(Intercept) aptitude tol technical general
-54.28225 0.32356 0.03337 1.09547 0.53683
19
Multiple Linear Regression in
R(continued)
summary(jpimodel)
Call:
lm(formula = jpi ~ aptitude + tol + technical + general, data = per_index)
Residuals:
Min 1Q Median 3Q Max
-7.2891 -2.7692 0.4562 2.8508 5.6068
20
Multiple Linear Regression in
R(continued)
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -54.28225 7.39453 -7.341 5.41e-08 ***
aptitude 0.32356 0.06778 4.774 5.15e-05 ***
tol 0.03337 0.07124 0.468 0.6431
technical 1.09547 0.18138 6.039 1.65e-06 ***
general 0.53683 0.15840 3.389 0.0021 **
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 3.549 on 28 degrees of freedom
Multiple R-squared: 0.8768, Adjusted R-squared: 0.8592
F-statistic: 49.81 on 4 and 28 DF, p-value: 2.467e-12
21
Summary of Findings
88% of the variation in job performance index is explained by the model and
12% is unexplained variation.
22
Regression Model in Matrix Form
23
Least Square Estimator and its Variance
e Y X
Z e e ei2 (Y X ) (Y X )
Z
0
ˆ X X 1 X Y
V ( ˆ ) V X X X Y
1
V ( ) X X X V (Y ) X ( X X ) 1
ˆ 1
V ( ) X X 2
ˆ 1
24
THANK YOU!!
25