Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

Lecture 12 Regression

The document provides an overview of regression analysis, detailing its purpose in predicting outcomes based on independent and dependent variables. It explains simple and multiple regression, the linear model, and how to assess model fit using ANOVA and R-squared values. Additionally, it discusses the importance of meeting assumptions for reliable results and how to interpret regression coefficients in practical applications.

Uploaded by

Fatima Batool
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Lecture 12 Regression

The document provides an overview of regression analysis, detailing its purpose in predicting outcomes based on independent and dependent variables. It explains simple and multiple regression, the linear model, and how to assess model fit using ANOVA and R-squared values. Additionally, it discusses the importance of meeting assumptions for reliable results and how to interpret regression coefficients in practical applications.

Uploaded by

Fatima Batool
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 55

Regression Analysis

Regression

⚫ This statistic provides a measure of the strength of the


association between two variables in terms of
percentage of variance explained.
Regression

⚫ Regression
⚫ Best fitting straight line
for a scatterplot between
two variables

⚫ Purpose
⚫ Prediction
⚫ Ex) X predicts Y

3
Price & Demand

4
X & Y in Regression
⚫ X
⚫ Horizontal axis of the scatterplot
⚫ Independent variable
⚫ Predictor variable
⚫ Mostly continuous variable

⚫ Y
⚫ Vertical axis of the scatterplot
⚫ Dependent variable
⚫ Outcome variable
⚫ Continuous variable

5
Regression

⚫ Simple Regression
⚫ Single dependent variable
⚫ Single independent variable

⚫ Multiple Regression
⚫ Single dependent variable
⚫ Multiple independent variables

6
The Linear Model

Eq. 1
⚫ The fundamental idea is that an outcome for an entity
can be predicted from a model and some error
associated with that prediction.
⚫ (yi) outcome variable
⚫ (Xi) predictor variable
⚫ (b1) : (beta) a parameter associated with the predictor
variable that quantifies the relationship it has with the
outcome variable.
⚫ ( b0.): a parameter that tells us the value of the
outcome when the predictor is zero (errors) (constant)
Linear model

b1
Parameter for the predictor
Gradient (slope) of the line
Direction/Strength of Relationship/Effect
b0
The value of outcome when predictor(s) =
0 (intercept)
Linear Models: Straight Line
⚫ Any straight line can be defined by two things:
⚫ (1)Slope: the slope (or gradient) of the line (usually
denoted by b1); and

⚫ (2) Intercept: the point at which the line crosses the


vertical axis of the graph (known as the intercept of the
line, b0).

⚫ These parameters b1 and b0 are known as the regression


coefficients.
Regression co-efficients
⚫ Slope (or gradient): b1: the shape of the line
(slope)
⚫ Intercept: b0 : where the line crosses the vertical
(y) axis
Same b0, Different b

the gradient (b1) tells us what the model looks like (its shape) and
the intercept (b0) tells us where the model is (its location in
geometric space).
Straight Lines

Outcome
Variable Error

Intercept: ith participant’s score


Slope: direction/
the point line on predictor variable
strength of
crosses y axis relationship
Example – Album Sales

⚫ Predict number of albums you would sell from how much


you spend on advertising
Example – album sales

⚫ If we spend nothing on advertising, 50 albums were sold (b0)

⚫ What if you spend £5 on advertising?


⚫ Sales = 50 + 100*5 = 550 albums
⚫ This value of 550 album sales is known as a predicted value.
The linear model with several predictors

Eq. 4
Fitting a line to the data
⚫ Simplest Model: the mean
⚫ Without other data, the best guess of the outcome (Y) is
always the mean

⚫ Ordinary Least Squares (OLS) regression:


⚫ Fits a line of best fit to the data
⚫ Estimates the constant (b0) and parameters of each
predictor (b for each X)

SPSS finds the values of the parameters that have the least
amount of error
Total Sum of Squares, SST

⚫ SST
⚫ Total variability (variability between scores and the mean)
Residual Sum of Squares, SSR
SSR
Residual/Error variability (variability between the regression model and the
actual data)
Model Sum of Squares, SSM

SSM
Model variability (difference in variability between the model and the
mean)
Testing the Fit of the Model
⚫ We need to see whether the model is a reasonable ‘fit’ of the
actual data.
⚫ SST
⚫ Total variability (variability between scores and the mean)
⚫ SSR
⚫ Residual/Error variability (variability between the regression
model and the actual data)
⚫ SSM
⚫ Model variability (difference in variability between the
model and the mean)
Testing the Model: ANOVA
Testing the Model: ANOVA
⚫ If the model results in better prediction than using
the mean, then SSM should be greater than SSR
⚫ Mean Squared Error
⚫ Sums of Squares are total values, we use Mean
Squared Error instead.
Testing the Model: R2

⚫ R2
⚫ The proportion of variance accounted for by the
regression model.
⚫ The Pearson Correlation Coefficient between
observed and predicted scores squared
⚫ Adjusted R2
⚫ An estimate of R2 in the population (shrinkage)
Summary
⚫ We can fit linear models predicting an outcome
from one or more predictors
⚫ Parameter estimates (b)
⚫ Tell us about the shape of the model
⚫ Tell us about size and direction of relationship between
predictor and outcome
⚫ Can significance test
⚫ CI tells us about population value
⚫ Use bootstrapping if assumptions are in doubt
⚫ Model Fit
⚫ ANOVA
⚫ R2
Running the
Analysis

⚫ FILE: Album_sales.sav (from


StudyDirect)
⚫ What are our IV and DV?
⚫ How many participants/data points
are there?
⚫ What kind of variables do we
have? (nominal, interval
or scale)
⚫ Does the scatterplot (on p7)
show a positive or negative
relationship between
the two variables?
Running the Analysis
⚫ Analyse Regression Linear…
⚫ Predictor (IV) goes in “Independent(s)”
⚫ Outcome (DV) goes in “Dependent”
Running the Analysis
⚫ Click on “Bootstrap…”
⚫ Runs the analysis on a
sample of your data for
1000 iterations
⚫ Check “Perform
Bootstrapping…”
and choose BCa
⚫ “Continue” then “OK”
to run
Interpretation:
Simple
Regression
Navigating the output
⚫ Model Summary: how useful is our model?
⚫ ANOVA: is our model better than the mean?
⚫ Coefficients: What are the numbers?
⚫ Bootstrap for coefficients
Model summary
⚫ First, is this model is better than using the mean?
⚫ For simple regression, R = correlation coefficient
⚫ Compare errors (differences between predicted and
observed values) for both the mean model and the
regression model
⚫ amount of variance explained by the model vs
the mean (R2)
⚫ Expressed as a percentage
R values range from –1 to 1, so this is a
large positive correlation

2
adjusted R2, gives us
R : how much of the variability in the some idea of how our
outcome is accounted for by the predictors. model generalizes and,
is ideally very close to
Here, the predictor accounts for 33.5% of the our value for R2
outcome (.335*100)
ANOVA
⚫ F-ratio measures how well the model predicts the outcome
(MSM) compared to error in the model (MSR)
⚫ Tells us if using our model is significantly better than using
the mean alone

F(1, 198) = 99.59, p < .001


Coefficients
⚫ Assess individual predictors using t-tests
⚫ H0: our value of b1 is zero
⚫ Therefore should be significant if the predictor is related
⚫ If b1 = 0, the outcome was unchanged by that predictor
variable
⚫ Examines if our value of b is big compared to the error
Is budget a
b0: Intercept b1: Slope
significant predictor?

⚫ T-test: Are our variables significant predictors of our outcome?


⚫ In this case, the t-test tells us the same thing as the ANOVA
⚫ Because only one predictor

⚫ We can also use this table to form our equation


⚫ Intercept (b0): if no money is spent on advertising how many albums will
be sold? (units are in 1,000s)
⚫ 134,140 albums sold when advertising is 0 (134.10 * 1000)
⚫ Coefficient (b1): if we increase our predictor by 1 unit (£1000), how many
more albums will we sell?
⚫ 96 additional albums sold for each £1,000 of advertising budget spent (0.096 *
1000)=96
Bias
⚫ We need the meet the four assumptions:
⚫ Linearity: the relationship to model is actually linear
⚫ Additivity: the outcome can be predicted by adding together all
predictors
⚫ Normality: residuals to be normally distributed for optimal b
estimates, normal sampling distribution for accurate CI and
statistical tests
⚫ Homoscedasticity

⚫ Meeting these assumptions can trust our estimates of b


and their associated confidence intervals and significance
tests
⚫ If not, then we can bootstrap to compute robust parameters and
confidence intervals instead
⚫ The bootstrap CI: the population value for b is likely to fall between .08 and
.11
⚫ Boundaries do not include zero genuine positive relationship between
advertising budget and album sales
⚫ If it contained 0, the true value might be 0 [i.e. no effect] or a negative number
[the opposite of our sample]
⚫ The p value associated with the confidence interval is also highly significant
(p=.001)
Using the Model

⚫ If a company wanted to spend £100,000 on


advertising, how many albums would we predict
they would sell?
⚫ Hint: units are in 1,000s!
⚫ Sales = 134.14 + .096 (100,000)
⚫ Sales = 134.23 (100000)
⚫ Make a prediction: approximately 13,500,000
albums would be sold if the company spent
£100,000 on advertising
Album Sales: More Predictors
⚫ Advertising only accounted for 33.5% of variance
in albums sales, leaving 66.5% variance
unaccounted for
⚫ AlbumSales.sav includes 2 additional predictors:
⚫ Amount of airplay the band receives on the radio
⚫ The attractiveness ratings of the band
⚫ Add these to the model to see if the model
improves
Album Sales: More Predictors
⚫ Analyse Regression Linear…
⚫ Add a second block
for new predictors
Interpretation:
Multiple
Regression
F(1, 198) = 99.59, p < .001 F(3, 196) = 129.50, p < .001

Both models significantly improved our ability to predict the outcome variable
compared to not fitting the model (using the mean model)
⚫ Assess the contribution of each predictor using t-tests
⚫ Advertising budget: t(196)= 12.26, p<.001
⚫ Did the other predictors contribute significantly to the model?
⚫ No. of radio plays: : t(196)= 12.12, p<.001
⚫ Attractiveness of band: t(196)= 4.55, p<.001

⚫ Remember: significance tests are only reliable if we have met our


assumptions!
⚫ Advertising budget: (b1= 0.09)
⚫ As advertising budget increases by 1 unit (£1000), album sales increase by
0.09 units
⚫ Airplay: (b2= 3.37)
⚫ As number of plays on radio 1 per week increases by 1 unit (1 play), album
sales increase by 3.37 units
⚫ Attractiveness: (b3= 11.09)
⚫ As attractiveness rating of band increases by 1 unit album sales increase by
11.09 units
⚫ If assumptions are not met use bootstrap CIs
⚫ Advertising: (b=0.09) [0.07, 0.10], p=.001
⚫ Number of radio plays (b=3.37) [2.80, 3.99], p=.001
⚫ Attractiveness of band (b=11.09) [6.25, 15.10], p=.001
⚫ Bootstrap CIs do not cross zero
⚫ Can conclude confidently that bs are positive (do contribute)
Regression Analysis in SPSS

⚫ X : Supervisor Ratings ⚫ X : Arm Strength

⚫ Y : Job Performance ⚫ Y : Job Performance

48
Sample SPSS Output

⚫ Here is the SPSS output for regressing Work


Simulation Job Performance (Dependent
Variable) against Supervisor Ratings
(Independent Variable)

49
⚫ This information can be used to create a
prediction (regression) equation for predicting
work performance of future applicants from
supervisor ratings

Y’ = – 1.156 + 0.033 X

50
• Work Simulation Job Performance may also be
predicted from Arm Strength
• Here is the SPSS output:

51
⚫ This information can be used to create a
prediction (regression) equation for predicting
work performance of future applicants from
supervisor ratings

Y’ = – 4.095 + 0.055 X

52
⚫ We now have two regression equations for
predicting Work Simulation Job Performance

⚫ Which is the better equation for accurate


prediction?

53
• Standard error of prediction using Supervisor Ratings:

• Standard error of prediction using Arm Strength:

• Which is the better equation?

54
Example

55

You might also like