Lecture 12 Regression
Lecture 12 Regression
Regression
⚫ Regression
⚫ Best fitting straight line
for a scatterplot between
two variables
⚫ Purpose
⚫ Prediction
⚫ Ex) X predicts Y
3
Price & Demand
4
X & Y in Regression
⚫ X
⚫ Horizontal axis of the scatterplot
⚫ Independent variable
⚫ Predictor variable
⚫ Mostly continuous variable
⚫ Y
⚫ Vertical axis of the scatterplot
⚫ Dependent variable
⚫ Outcome variable
⚫ Continuous variable
5
Regression
⚫ Simple Regression
⚫ Single dependent variable
⚫ Single independent variable
⚫ Multiple Regression
⚫ Single dependent variable
⚫ Multiple independent variables
6
The Linear Model
Eq. 1
⚫ The fundamental idea is that an outcome for an entity
can be predicted from a model and some error
associated with that prediction.
⚫ (yi) outcome variable
⚫ (Xi) predictor variable
⚫ (b1) : (beta) a parameter associated with the predictor
variable that quantifies the relationship it has with the
outcome variable.
⚫ ( b0.): a parameter that tells us the value of the
outcome when the predictor is zero (errors) (constant)
Linear model
b1
Parameter for the predictor
Gradient (slope) of the line
Direction/Strength of Relationship/Effect
b0
The value of outcome when predictor(s) =
0 (intercept)
Linear Models: Straight Line
⚫ Any straight line can be defined by two things:
⚫ (1)Slope: the slope (or gradient) of the line (usually
denoted by b1); and
the gradient (b1) tells us what the model looks like (its shape) and
the intercept (b0) tells us where the model is (its location in
geometric space).
Straight Lines
Outcome
Variable Error
Eq. 4
Fitting a line to the data
⚫ Simplest Model: the mean
⚫ Without other data, the best guess of the outcome (Y) is
always the mean
SPSS finds the values of the parameters that have the least
amount of error
Total Sum of Squares, SST
⚫ SST
⚫ Total variability (variability between scores and the mean)
Residual Sum of Squares, SSR
SSR
Residual/Error variability (variability between the regression model and the
actual data)
Model Sum of Squares, SSM
SSM
Model variability (difference in variability between the model and the
mean)
Testing the Fit of the Model
⚫ We need to see whether the model is a reasonable ‘fit’ of the
actual data.
⚫ SST
⚫ Total variability (variability between scores and the mean)
⚫ SSR
⚫ Residual/Error variability (variability between the regression
model and the actual data)
⚫ SSM
⚫ Model variability (difference in variability between the
model and the mean)
Testing the Model: ANOVA
Testing the Model: ANOVA
⚫ If the model results in better prediction than using
the mean, then SSM should be greater than SSR
⚫ Mean Squared Error
⚫ Sums of Squares are total values, we use Mean
Squared Error instead.
Testing the Model: R2
⚫ R2
⚫ The proportion of variance accounted for by the
regression model.
⚫ The Pearson Correlation Coefficient between
observed and predicted scores squared
⚫ Adjusted R2
⚫ An estimate of R2 in the population (shrinkage)
Summary
⚫ We can fit linear models predicting an outcome
from one or more predictors
⚫ Parameter estimates (b)
⚫ Tell us about the shape of the model
⚫ Tell us about size and direction of relationship between
predictor and outcome
⚫ Can significance test
⚫ CI tells us about population value
⚫ Use bootstrapping if assumptions are in doubt
⚫ Model Fit
⚫ ANOVA
⚫ R2
Running the
Analysis
2
adjusted R2, gives us
R : how much of the variability in the some idea of how our
outcome is accounted for by the predictors. model generalizes and,
is ideally very close to
Here, the predictor accounts for 33.5% of the our value for R2
outcome (.335*100)
ANOVA
⚫ F-ratio measures how well the model predicts the outcome
(MSM) compared to error in the model (MSR)
⚫ Tells us if using our model is significantly better than using
the mean alone
Both models significantly improved our ability to predict the outcome variable
compared to not fitting the model (using the mean model)
⚫ Assess the contribution of each predictor using t-tests
⚫ Advertising budget: t(196)= 12.26, p<.001
⚫ Did the other predictors contribute significantly to the model?
⚫ No. of radio plays: : t(196)= 12.12, p<.001
⚫ Attractiveness of band: t(196)= 4.55, p<.001
48
Sample SPSS Output
49
⚫ This information can be used to create a
prediction (regression) equation for predicting
work performance of future applicants from
supervisor ratings
Y’ = – 1.156 + 0.033 X
50
• Work Simulation Job Performance may also be
predicted from Arm Strength
• Here is the SPSS output:
51
⚫ This information can be used to create a
prediction (regression) equation for predicting
work performance of future applicants from
supervisor ratings
Y’ = – 4.095 + 0.055 X
52
⚫ We now have two regression equations for
predicting Work Simulation Job Performance
53
• Standard error of prediction using Supervisor Ratings:
54
Example
55