Marketing Research: Data Analysis VI: Regression Analysis (Part 2)
Marketing Research: Data Analysis VI: Regression Analysis (Part 2)
Marketing Research: Data Analysis VI: Regression Analysis (Part 2)
Lecture 13:
Data Analysis VI: Regression Analysis (Part 2)
SONG LIN
Today’s class
1. Multiple linear regression
2. Regression with categorical measures as IV
3. Regression with interaction effect
4. Regression with nonlinear relationship
2
Today’s class
1. Multiple linear regression
3
Simple linear regression
The population regression model:
4
Why do we need more explanatory variables?
1. Provide more accurate prediction
2. Control for confounding effects
5
Back to the diamond case
6
The 4 C’s of diamonds
• Carat
• Cut
• Color
• Clarity
7
Cut
• Cut determines a diamond’s brilliance by reflecting light
Fair cut Good cut Very good cut Premium cut Ideal cut
8
Color
• Less color means higher quality
K-Z J I H G F E D
9
Clarity
• Most diamonds have tiny imperfections called inclusions—less
inclusions means better clarity
Included Flawless
Source: Blue Nile
10
Multiple linear regression model
Simple
Multiple
Price CaratColor
11
Multiple linear regression model
Price CaratColor
12
Estimation of regression model
In SPSS: Analyze Regression Linear
• Adjusted R2 measures the
amount of variation in Y we
can explain with X
• Including more predictors
increases R2
13
Estimation of regression model
Estimates of , , ,
14
Estimation of regression model
Standardized estimates of , , ,
t statistic measures the size of relative
to its variation (i.e., how many standard
How many standard deviations from the estimate is, where
deviations Y will change, if an is often assumed to zero):
X increases by one standard
deviation.
15
Does the 4 C’s affect diamond price?
16
How to predict price using the 4 C’s?
• How much should I expect to pay for a diamond of
o Carat: 1
o Cut: Good (=1)
o Color: I (=1)
o Clarity: SI2 (=1)
• Same as asking:
• Since , we can substitute the estimates and obtain:
(thousands HKD)
17
Why multiple linear regression?
• What if we run a simple linear regression with Cut:
Price
18
Why multiple linear regression?
• Omitted variable bias :
• One or more important factors are left out from the model
• The omitted variable is correlated with an independent variable
specified in the model
Correlation coefficient =
-0.135
19
Today’s class
1. Multiple linear regression
2. Regression with categorical measures as IV
20
Regression with categorical measures as IV
• Problems with treating cut, color, clarity as continuous
Fair cut Good cut Very good cut Premium cut Ideal cut
The effect of an increase from fair cut to good cut on price is not the
same as the effect of an increase from good cut to very good cut
21
Regression with categorical measures as IV
• Problems with treating cut, color, clarity as continuous
• An alternative approach: dummy coding
• Each dummy variable indicates one attribute level
• Example:
22
Regression with categorical measures as IV
23
Today’s class
1. Multiple linear regression
2. Regression with categorical measures as IV
3. Regression with interaction effect
24
Regression with interaction effect
• So far we have assumed that the effect of each independent
variable is independent of others
• However, this is not always true in reality
• Example: the effect that increasing the size of a diamond will
raise price may depend on the cut of the diamond
≠
Fair cut Ideal cut
25
Regression with interaction effect
Price Carat
26
Regression with interaction effect
• First create a new variable by multiplying the two variables of
interest
• Then run multiple linear regression
28
Today’s class
1. Multiple linear regression
2. Regression with categorical measures as IV
3. Regression with interaction effect
4. Regression with nonlinear relationship
29
Nonlinear relationship
• Recall the simple linear regression:
Price Carat
30
Nonlinear relationship
• Recall the simple linear regression:
Price Carat
Price Carat
31
Nonlinear relationship
• First create a new variable
• Then run multiple linear regression
32
Which regression model to use?
• If the goal is prediction:
• Include more predictors can enhance predictive accuracy as long as
they explain the dependent variable
• Generally, adjusted R2 is a good indicator of fit
• But, there is a danger of overfitting:
• It occurs when the regression model becomes too complex (too
many parameters relative to the number of observations)
• Then the model will exploit the noise () instead of explaining the
underlying relationship (effect of X on Y)
• It fits (almost) perfectly to the observed data, but leads to poor
generalization
33
Overfitting can lead to poor generalization
The overfitting model
fails to predict well
𝑌
Complex nonlinear
regression model
(overfitting)
Simple linear
regression model
𝑋
34
Which regression model to use?
• If the goal is to test the effect of a particular IV on the DV:
• Start with the simple linear regression (e.g., size of diamond on price)
• Then run (a lot of) regressions with other predictors included to
control for confounding effects (e.g., adding cut, color, clarity) or to
identify interaction effects (e.g., interaction between size and cut of
diamond on price)
• Go through advanced procedures to validate causality (e.g., checking
if X is exogenous)
35
Learning objectives
• More advanced statistics or econometrics courses will
cover topics such as:
• Model selection: which X to include and which model to select?
• Discrete dependent variables: such as choice over a set of products
• Causality: when can we use regression analysis to identify causal
relationship?
• Examples: difference-in-difference, regression discontinuity,
instrumental variable, etc.
36
Learning objectives
• As a beginner, you should learn from this course
• The objectives of regression analysis
• How to interpret the regression models
• How to interpret the estimation results from simple and multiple
linear regressions
• How to apply regression models in addressing marketing problems
• Applying regression analysis in your group project is highly
encouraged
37
Individual Assignment: Airbnb
• On Canvas:
• Data file “Airbnb customer data.sav”
• Case file “Case Study on Regression Analysis”
• Work on Q3 to Q6
• Deadline: Nov 16 Friday (11:59pm)
38
Summary
1. Multiple linear regression
2. Regression with categorical measures as IV
3. Regression with interaction effect
4. Regression with nonlinear relationship
39
Course schedule
• Next class: Conjoint Analysis
• Reading: New Way to Measure Consumers' Judgments (by Green
and Wind)