0% found this document useful (0 votes)

2 views

Lecture 12 Regression

The document provides an overview of regression analysis, detailing its purpose in predicting outcomes based on independent and dependent variables. It explains simple and multiple regression, the linear model, and how to assess model fit using ANOVA and R-squared values. Additionally, it discusses the importance of meeting assumptions for reliable results and how to interpret regression coefficients in practical applications.

Uploaded by

Fatima Batool

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

Lecture 12 Regression

Uploaded by

Fatima Batool

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 55

Regression Analysis

Regression

⚫ This statistic provides a measure of the strength of the

association between two variables in terms of
percentage of variance explained.
Regression

⚫ Regression
⚫ Best fitting straight line
for a scatterplot between
two variables

⚫ Purpose
⚫ Prediction
⚫ Ex) X predicts Y

3
Price & Demand

4
X & Y in Regression
⚫ X
⚫ Horizontal axis of the scatterplot
⚫ Independent variable
⚫ Predictor variable
⚫ Mostly continuous variable

⚫ Y
⚫ Vertical axis of the scatterplot
⚫ Dependent variable
⚫ Outcome variable
⚫ Continuous variable

5
Regression

⚫ Simple Regression
⚫ Single dependent variable
⚫ Single independent variable

⚫ Multiple Regression
⚫ Single dependent variable
⚫ Multiple independent variables

6
The Linear Model

Eq. 1
⚫ The fundamental idea is that an outcome for an entity
can be predicted from a model and some error
associated with that prediction.
⚫ (yi) outcome variable
⚫ (Xi) predictor variable
⚫ (b1) : (beta) a parameter associated with the predictor
variable that quantifies the relationship it has with the
outcome variable.
⚫ ( b0.): a parameter that tells us the value of the
outcome when the predictor is zero (errors) (constant)
Linear model

b1
Parameter for the predictor
Gradient (slope) of the line
Direction/Strength of Relationship/Effect
b0
The value of outcome when predictor(s) =
0 (intercept)
Linear Models: Straight Line
⚫ Any straight line can be defined by two things:
⚫ (1)Slope: the slope (or gradient) of the line (usually
denoted by b1); and

⚫ (2) Intercept: the point at which the line crosses the

vertical axis of the graph (known as the intercept of the
line, b0).

⚫ These parameters b1 and b0 are known as the regression

coefficients.
Regression co-efficients
⚫ Slope (or gradient): b1: the shape of the line
(slope)
⚫ Intercept: b0 : where the line crosses the vertical
(y) axis
Same b0, Different b

the gradient (b1) tells us what the model looks like (its shape) and
the intercept (b0) tells us where the model is (its location in
geometric space).
Straight Lines

Outcome
Variable Error

Intercept: ith participant’s score

Slope: direction/
the point line on predictor variable
strength of
crosses y axis relationship
Example – Album Sales

⚫ Predict number of albums you would sell from how much

you spend on advertising
Example – album sales

⚫ If we spend nothing on advertising, 50 albums were sold (b0)

⚫ What if you spend £5 on advertising?

⚫ Sales = 50 + 100*5 = 550 albums
⚫ This value of 550 album sales is known as a predicted value.
The linear model with several predictors

Eq. 4
Fitting a line to the data
⚫ Simplest Model: the mean
⚫ Without other data, the best guess of the outcome (Y) is
always the mean

⚫ Ordinary Least Squares (OLS) regression:

⚫ Fits a line of best fit to the data
⚫ Estimates the constant (b0) and parameters of each
predictor (b for each X)

SPSS finds the values of the parameters that have the least
amount of error
Total Sum of Squares, SST

⚫ SST
⚫ Total variability (variability between scores and the mean)
Residual Sum of Squares, SSR
SSR
Residual/Error variability (variability between the regression model and the
actual data)
Model Sum of Squares, SSM

SSM
Model variability (difference in variability between the model and the
mean)
Testing the Fit of the Model
⚫ We need to see whether the model is a reasonable ‘fit’ of the
actual data.
⚫ SST
⚫ Total variability (variability between scores and the mean)
⚫ SSR
⚫ Residual/Error variability (variability between the regression
model and the actual data)
⚫ SSM
⚫ Model variability (difference in variability between the
model and the mean)
Testing the Model: ANOVA
Testing the Model: ANOVA
⚫ If the model results in better prediction than using
the mean, then SSM should be greater than SSR
⚫ Mean Squared Error
⚫ Sums of Squares are total values, we use Mean
Squared Error instead.
Testing the Model: R2

⚫ R2
⚫ The proportion of variance accounted for by the
regression model.
⚫ The Pearson Correlation Coefficient between
observed and predicted scores squared
⚫ Adjusted R2
⚫ An estimate of R2 in the population (shrinkage)
Summary
⚫ We can fit linear models predicting an outcome
from one or more predictors
⚫ Parameter estimates (b)
⚫ Tell us about the shape of the model
⚫ Tell us about size and direction of relationship between
predictor and outcome
⚫ Can significance test
⚫ CI tells us about population value
⚫ Use bootstrapping if assumptions are in doubt
⚫ Model Fit
⚫ ANOVA
⚫ R2
Running the
Analysis

⚫ FILE: Album_sales.sav (from

StudyDirect)
⚫ What are our IV and DV?
⚫ How many participants/data points
are there?
⚫ What kind of variables do we
have? (nominal, interval
or scale)
⚫ Does the scatterplot (on p7)
show a positive or negative
relationship between
the two variables?
Running the Analysis
⚫ Analyse Regression Linear…
⚫ Predictor (IV) goes in “Independent(s)”
⚫ Outcome (DV) goes in “Dependent”
Running the Analysis
⚫ Click on “Bootstrap…”
⚫ Runs the analysis on a
sample of your data for
1000 iterations
⚫ Check “Perform
Bootstrapping…”
and choose BCa
⚫ “Continue” then “OK”
to run
Interpretation:
Simple
Regression
Navigating the output
⚫ Model Summary: how useful is our model?
⚫ ANOVA: is our model better than the mean?
⚫ Coefficients: What are the numbers?
⚫ Bootstrap for coefficients
Model summary
⚫ First, is this model is better than using the mean?
⚫ For simple regression, R = correlation coefficient
⚫ Compare errors (differences between predicted and
observed values) for both the mean model and the
regression model
⚫ amount of variance explained by the model vs
the mean (R2)
⚫ Expressed as a percentage
R values range from –1 to 1, so this is a
large positive correlation

2
adjusted R2, gives us
R : how much of the variability in the some idea of how our
outcome is accounted for by the predictors. model generalizes and,
is ideally very close to
Here, the predictor accounts for 33.5% of the our value for R2
outcome (.335*100)
ANOVA
⚫ F-ratio measures how well the model predicts the outcome
(MSM) compared to error in the model (MSR)
⚫ Tells us if using our model is significantly better than using
the mean alone

F(1, 198) = 99.59, p < .001

Coefficients
⚫ Assess individual predictors using t-tests
⚫ H0: our value of b1 is zero
⚫ Therefore should be significant if the predictor is related
⚫ If b1 = 0, the outcome was unchanged by that predictor
variable
⚫ Examines if our value of b is big compared to the error
Is budget a
b0: Intercept b1: Slope
significant predictor?

⚫ T-test: Are our variables significant predictors of our outcome?

⚫ In this case, the t-test tells us the same thing as the ANOVA
⚫ Because only one predictor

⚫ We can also use this table to form our equation

⚫ Intercept (b0): if no money is spent on advertising how many albums will
be sold? (units are in 1,000s)
⚫ 134,140 albums sold when advertising is 0 (134.10 * 1000)
⚫ Coefficient (b1): if we increase our predictor by 1 unit (£1000), how many
more albums will we sell?
⚫ 96 additional albums sold for each £1,000 of advertising budget spent (0.096 *
1000)=96
Bias
⚫ We need the meet the four assumptions:
⚫ Linearity: the relationship to model is actually linear
⚫ Additivity: the outcome can be predicted by adding together all
predictors
⚫ Normality: residuals to be normally distributed for optimal b
estimates, normal sampling distribution for accurate CI and
statistical tests
⚫ Homoscedasticity

⚫ Meeting these assumptions can trust our estimates of b

and their associated confidence intervals and significance
tests
⚫ If not, then we can bootstrap to compute robust parameters and
confidence intervals instead
⚫ The bootstrap CI: the population value for b is likely to fall between .08 and
.11
⚫ Boundaries do not include zero genuine positive relationship between
advertising budget and album sales
⚫ If it contained 0, the true value might be 0 [i.e. no effect] or a negative number
[the opposite of our sample]
⚫ The p value associated with the confidence interval is also highly significant
(p=.001)
Using the Model

⚫ If a company wanted to spend £100,000 on

advertising, how many albums would we predict
they would sell?
⚫ Hint: units are in 1,000s!
⚫ Sales = 134.14 + .096 (100,000)
⚫ Sales = 134.23 (100000)
⚫ Make a prediction: approximately 13,500,000
albums would be sold if the company spent
£100,000 on advertising
Album Sales: More Predictors
⚫ Advertising only accounted for 33.5% of variance
in albums sales, leaving 66.5% variance
unaccounted for
⚫ AlbumSales.sav includes 2 additional predictors:
⚫ Amount of airplay the band receives on the radio
⚫ The attractiveness ratings of the band
⚫ Add these to the model to see if the model
improves
Album Sales: More Predictors
⚫ Analyse Regression Linear…
⚫ Add a second block
for new predictors
Interpretation:
Multiple
Regression
F(1, 198) = 99.59, p < .001 F(3, 196) = 129.50, p < .001

Both models significantly improved our ability to predict the outcome variable
compared to not fitting the model (using the mean model)
⚫ Assess the contribution of each predictor using t-tests
⚫ Advertising budget: t(196)= 12.26, p<.001
⚫ Did the other predictors contribute significantly to the model?
⚫ No. of radio plays: : t(196)= 12.12, p<.001
⚫ Attractiveness of band: t(196)= 4.55, p<.001

⚫ Remember: significance tests are only reliable if we have met our

assumptions!
⚫ Advertising budget: (b1= 0.09)
⚫ As advertising budget increases by 1 unit (£1000), album sales increase by
0.09 units
⚫ Airplay: (b2= 3.37)
⚫ As number of plays on radio 1 per week increases by 1 unit (1 play), album
sales increase by 3.37 units
⚫ Attractiveness: (b3= 11.09)
⚫ As attractiveness rating of band increases by 1 unit album sales increase by
11.09 units
⚫ If assumptions are not met use bootstrap CIs
⚫ Advertising: (b=0.09) [0.07, 0.10], p=.001
⚫ Number of radio plays (b=3.37) [2.80, 3.99], p=.001
⚫ Attractiveness of band (b=11.09) [6.25, 15.10], p=.001
⚫ Bootstrap CIs do not cross zero
⚫ Can conclude confidently that bs are positive (do contribute)
Regression Analysis in SPSS

⚫ X : Supervisor Ratings ⚫ X : Arm Strength

⚫ Y : Job Performance ⚫ Y : Job Performance

48
Sample SPSS Output

⚫ Here is the SPSS output for regressing Work

Simulation Job Performance (Dependent
Variable) against Supervisor Ratings
(Independent Variable)

49
⚫ This information can be used to create a
prediction (regression) equation for predicting
work performance of future applicants from
supervisor ratings

Y’ = – 1.156 + 0.033 X

50
• Work Simulation Job Performance may also be
predicted from Arm Strength
• Here is the SPSS output:

51
⚫ This information can be used to create a
prediction (regression) equation for predicting
work performance of future applicants from
supervisor ratings

Y’ = – 4.095 + 0.055 X

52
⚫ We now have two regression equations for
predicting Work Simulation Job Performance

⚫ Which is the better equation for accurate

prediction?

53
• Standard error of prediction using Supervisor Ratings:

• Standard error of prediction using Arm Strength:

• Which is the better equation?

54
Example

Machine Learning Interview Questions
From Everand
Machine Learning Interview Questions
Tech Interviews
4.5/5 (2)
Econometrics: A Simple Introduction
From Everand
Econometrics: A Simple Introduction
K.H. Erickson
3.5/5 (5)
Introduction of Regression
No ratings yet
Introduction of Regression
57 pages
Unit-4 DS Student
No ratings yet
Unit-4 DS Student
43 pages
Excel Regression Output Interpretation
No ratings yet
Excel Regression Output Interpretation
5 pages
What Is Linear Regression
No ratings yet
What Is Linear Regression
14 pages
MKTG 4110 Class 6
No ratings yet
MKTG 4110 Class 6
10 pages
Statistics 578 Assignment 5 Homework
100% (6)
Statistics 578 Assignment 5 Homework
13 pages
Chapter 5
No ratings yet
Chapter 5
73 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
68 pages
Steps in Logistic Regression
No ratings yet
Steps in Logistic Regression
5 pages
SLRin R
No ratings yet
SLRin R
23 pages
Regression Analysis and Modeling For Decision Support
No ratings yet
Regression Analysis and Modeling For Decision Support
45 pages
CH 14
No ratings yet
CH 14
12 pages
Understanding Correlation and Regression Analysis in SPSS - 2024
No ratings yet
Understanding Correlation and Regression Analysis in SPSS - 2024
5 pages
LinearRegression1 210720 171800
No ratings yet
LinearRegression1 210720 171800
41 pages
Regression Analysis
100% (1)
Regression Analysis
25 pages
Making Confident Decisions
No ratings yet
Making Confident Decisions
37 pages
Regression Analysis Using R
No ratings yet
Regression Analysis Using R
17 pages
Regression basics
No ratings yet
Regression basics
27 pages
Linear Regression
No ratings yet
Linear Regression
36 pages
Regression
100% (1)
Regression
44 pages
11, 12. Predictive Analysis
No ratings yet
11, 12. Predictive Analysis
33 pages
3. Linear Regression
No ratings yet
3. Linear Regression
49 pages
Concepts - Model Evaluation (Data Mining Fundamentals)
No ratings yet
Concepts - Model Evaluation (Data Mining Fundamentals)
40 pages
Revision235
No ratings yet
Revision235
8 pages
Predective Analytics or Inferential Statistics
No ratings yet
Predective Analytics or Inferential Statistics
27 pages
machine-learning-module-3-logistic-regression
No ratings yet
machine-learning-module-3-logistic-regression
22 pages
Regression
No ratings yet
Regression
3 pages
Econometrics
No ratings yet
Econometrics
13 pages
Topic 7-Regression Analysis
No ratings yet
Topic 7-Regression Analysis
56 pages
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
100% (2)
Introductory Econometrics For Finance Chris Brooks Solutions To Review - Chapter 3
7 pages
Statistical Inference, Regression SPSS Report
No ratings yet
Statistical Inference, Regression SPSS Report
73 pages
Unit - 3 Machine Learning
No ratings yet
Unit - 3 Machine Learning
30 pages
Pearson-Correlation-and-Linear-Regression
No ratings yet
Pearson-Correlation-and-Linear-Regression
42 pages
MATH 533 Part C - Regression and Correlation Analysis
0% (1)
MATH 533 Part C - Regression and Correlation Analysis
9 pages
Evans - Analytics2e - PPT - 07 and 08
No ratings yet
Evans - Analytics2e - PPT - 07 and 08
49 pages
Quntative Data Analysis SPSS: Correlation & Regression
No ratings yet
Quntative Data Analysis SPSS: Correlation & Regression
65 pages
ML Unit 2
No ratings yet
ML Unit 2
27 pages
MGT555 CH 6 Regression Analysis
No ratings yet
MGT555 CH 6 Regression Analysis
19 pages
MLR- R and R2
No ratings yet
MLR- R and R2
17 pages
linearregression
No ratings yet
linearregression
18 pages
Module 6A Estimating Relationships
No ratings yet
Module 6A Estimating Relationships
104 pages
6_Classification and Regression Tasks (3)
No ratings yet
6_Classification and Regression Tasks (3)
100 pages
Econometrics Chap 3
No ratings yet
Econometrics Chap 3
19 pages
6_Classification and Regression Tasks
No ratings yet
6_Classification and Regression Tasks
115 pages
Predictive Modelling Using Linear Regression
No ratings yet
Predictive Modelling Using Linear Regression
12 pages
RGRSSN Assgnmnt
No ratings yet
RGRSSN Assgnmnt
11 pages
625 Preliminary
No ratings yet
625 Preliminary
39 pages
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
No ratings yet
Introudction To Regression Analysis and Measuring With Stat Model 1702371825910
16 pages
EDA 4th Module
No ratings yet
EDA 4th Module
26 pages
Number of Observations: It: Number of Variables Plus 1'. Here We Want To Estimate For 1 Variable Only, So Number of
No ratings yet
Number of Observations: It: Number of Variables Plus 1'. Here We Want To Estimate For 1 Variable Only, So Number of
3 pages
Regression Tutorial
No ratings yet
Regression Tutorial
5 pages
One Sample T Test - SPSS
No ratings yet
One Sample T Test - SPSS
23 pages
3-Linear Regreesion-Assumptions
No ratings yet
3-Linear Regreesion-Assumptions
28 pages
linear regression
No ratings yet
linear regression
130 pages
Regression and Introduction To Bayesian Network
No ratings yet
Regression and Introduction To Bayesian Network
12 pages
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
No ratings yet
APznzaaV-S8wLPGsP_Add8mCHq3JcpXzeJ180tg4GWAcHx6DAgMVD3eyvT5dWstrOMVpGkO6YPvB6EzW3QMZ2MOlHap6AIHzt5bF4qrpZ6P5COArRIkGSOpTA3irJqdWr5VzZJgsslAEoNck-7XB6goMBGQ2C1xBIjiLrywLxqEZfdK9zE3-of9LPSjsbB_QkInc2mquD_oyBRUUJcHri
199 pages
U02Lecture06 Regression
No ratings yet
U02Lecture06 Regression
25 pages
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
From Everand
Sample Size for Analytical Surveys, Using a Pretest-Posttest-Comparison-Group Design
Joseph George Caldwell
No ratings yet
Lecture 13 ANOVA
No ratings yet
Lecture 13 ANOVA
36 pages
Lecture 11 Independent Sample t Test
No ratings yet
Lecture 11 Independent Sample t Test
34 pages
Dictatorship Eras in Pakistan
No ratings yet
Dictatorship Eras in Pakistan
16 pages
lecture 10 correlation
No ratings yet
lecture 10 correlation
32 pages
Asm Fam-L and S Sample
No ratings yet
Asm Fam-L and S Sample
47 pages
Discussion 9 Fall 2019 PDF
No ratings yet
Discussion 9 Fall 2019 PDF
3 pages
Data Mining Primer
No ratings yet
Data Mining Primer
5 pages
Christensen Ch10
No ratings yet
Christensen Ch10
17 pages
CH 05 Wooldridge 6e PPT Updated
No ratings yet
CH 05 Wooldridge 6e PPT Updated
8 pages
Fat Tails STATISTICAL CONSEQUENCES OF FAT TAILS PDF
100% (2)
Fat Tails STATISTICAL CONSEQUENCES OF FAT TAILS PDF
364 pages
Instant ebooks textbook Fundamentals of Biostatistics 7th Edition Bernard Rosner download all chapters
No ratings yet
Instant ebooks textbook Fundamentals of Biostatistics 7th Edition Bernard Rosner download all chapters
51 pages
Discrete Random Variables and Probability Distributions
No ratings yet
Discrete Random Variables and Probability Distributions
31 pages
CHPT 1
No ratings yet
CHPT 1
25 pages
ARCH and GARCH Model
No ratings yet
ARCH and GARCH Model
29 pages
Manova One Powerpoint 1 STAT
No ratings yet
Manova One Powerpoint 1 STAT
35 pages
Business Statistics Syllabus
No ratings yet
Business Statistics Syllabus
3 pages
Bayes Gauss
100% (1)
Bayes Gauss
29 pages
Course 3 Capstone Deck TEMPLATE
No ratings yet
Course 3 Capstone Deck TEMPLATE
21 pages
Ch08 - Large-Sample Estimation
No ratings yet
Ch08 - Large-Sample Estimation
28 pages
Zomato 2021 - Rawdata
No ratings yet
Zomato 2021 - Rawdata
7 pages
01 Estimation PDF
No ratings yet
01 Estimation PDF
13 pages
OUtput VALIDITAS, RELIABILITAS, KORELASI
No ratings yet
OUtput VALIDITAS, RELIABILITAS, KORELASI
26 pages
12th Maths Probability Distribution & Binomial Distribution
No ratings yet
12th Maths Probability Distribution & Binomial Distribution
2 pages
What Is Kurtosis - Definition, Examples & Formula
No ratings yet
What Is Kurtosis - Definition, Examples & Formula
10 pages
Assignment 1
No ratings yet
Assignment 1
4 pages
Lab Report
No ratings yet
Lab Report
85 pages
EBE 3194-Presentation
No ratings yet
EBE 3194-Presentation
20 pages
Forecasting in Business (PPT) - 092138
No ratings yet
Forecasting in Business (PPT) - 092138
53 pages
CH 04 - Introduction To Probability: Page 1
0% (1)
CH 04 - Introduction To Probability: Page 1
55 pages
MBA Excel Normally Distributed Random Number Generator
No ratings yet
MBA Excel Normally Distributed Random Number Generator
3 pages
One Sample T Test - SPSS Tutorials - LibGuides at Kent State University
No ratings yet
One Sample T Test - SPSS Tutorials - LibGuides at Kent State University
10 pages
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
No ratings yet
Mathematical Foundations of Computational Linguistics: Manfred Klenner and Jannis Vamvas
32 pages
Solution HW3
No ratings yet
Solution HW3
16 pages
Statistics Examples
No ratings yet
Statistics Examples
3 pages