Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Marketing Research: Data Analysis VI: Regression Analysis (Part 2)

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

Marketing Research

Lecture 13:
Data Analysis VI: Regression Analysis (Part 2)

SONG LIN
Today’s class
1. Multiple linear regression
2. Regression with categorical measures as IV
3. Regression with interaction effect
4. Regression with nonlinear relationship

2
Today’s class
1. Multiple linear regression

3
Simple linear regression
The population regression model:

•  is the dependent variable (e.g., sales, taste)


• is the set of independent variables (e.g., ad spend,
freshness)
• and are the parameters of the population model
• represents random error that has zero expectation

4
Why do we need more explanatory variables?
1. Provide more accurate prediction
2. Control for confounding effects

5
Back to the diamond case

6
The 4 C’s of diamonds
• Carat
• Cut
• Color
• Clarity

7
Cut
• Cut determines a diamond’s brilliance by reflecting light

Fair cut Good cut Very good cut Premium cut Ideal cut

Source: Blue Nile

8
Color
• Less color means higher quality

K-Z J I H G F E D

Noticeable color Colorless


Source: Blue Nile

9
Clarity
• Most diamonds have tiny imperfections called inclusions—less
inclusions means better clarity

I3, I2 I1 SI2, SI1 VS2,VS1 VVS2, VVS1 IF, FL

Included Flawless
Source: Blue Nile

10
Multiple linear regression model
 
Simple

 
Multiple

 
Price CaratColor

11
Multiple linear regression model

•  Interpretation of the parameters:


• measures the change in when increases by one unit, holding all the
other ’s fixed
• In other words, we focus on one effect while “controlling” for other
confounding effects (other ’s)

 
Price CaratColor

12
Estimation of regression model
In SPSS: Analyze  Regression  Linear
• Adjusted R2 measures the
amount of variation in Y we
can explain with X
• Including more predictors
increases R2

13
Estimation of regression model

 Estimates of , , ,

How to interpret each of these


estimates?

14
Estimation of regression model

 Standardized estimates of , , ,
t  statistic measures the size of relative
to its variation (i.e., how many standard
How many standard deviations from the estimate is, where
deviations Y will change, if an is often assumed to zero):
X increases by one standard
deviation.

15
Does the 4 C’s affect diamond price?

•  Hypothesis testing: H0: = 0; Ha: ≠ 0


• Decide whether to reject the null based on p-value
• P-value here means the probability of observing ItI ≥ 148.8, given the null
hypothesis is true
• Since p < 5%, the chosen significance level, we conclude that is statistically
different from zero
• In other words, we reject the null in favor of the alternative hypothesis

16
How to predict price using the 4 C’s?
•  How much should I expect to pay for a diamond of
o Carat: 1
o Cut: Good (=1)
o Color: I (=1)
o Clarity: SI2 (=1)
• Same as asking:
• Since , we can substitute the estimates and obtain:

(thousands HKD)

17
Why multiple linear regression?
• What if we run a simple linear regression with Cut:
 
Price

Why negative? Does it make sense?

18
Why multiple linear regression?
• Omitted variable bias :
• One or more important factors are left out from the model
• The omitted variable is correlated with an independent variable
specified in the model

 Correlation coefficient =
-0.135

19
Today’s class
1. Multiple linear regression
2. Regression with categorical measures as IV

20
Regression with categorical measures as IV
• Problems with treating cut, color, clarity as continuous

Fair cut Good cut Very good cut Premium cut Ideal cut

The effect of an increase from fair cut to good cut on price is not the
same as the effect of an increase from good cut to very good cut

21
Regression with categorical measures as IV
• Problems with treating cut, color, clarity as continuous
• An alternative approach: dummy coding
• Each dummy variable indicates one attribute level
• Example:

Five dummy variables are generated


corresponding to the five levels of cut

22
Regression with categorical measures as IV

•  One dummy variable is omitted from the regression! why?


• Because once we know the values of the four dummies, we know the value of the
remaining one (the base level)
• The interpretation is relative: everything has to be compared to the base level
(i.e., fair cut)
• Example: the estimate means, an increase from a fair cut to a good cut can raise
price for 7 thousands, all else equal

23
Today’s class
1. Multiple linear regression
2. Regression with categorical measures as IV
3. Regression with interaction effect

24
Regression with interaction effect
• So far we have assumed that the effect of each independent
variable is independent of others
• However, this is not always true in reality
• Example: the effect that increasing the size of a diamond will
raise price may depend on the cut of the diamond


Fair cut Ideal cut

• How to model such interaction?

25
Regression with interaction effect

 
Price Carat

Both variables need to be included in Interaction term


the model to capture the main effects

26
Regression with interaction effect
•  First create a new variable by multiplying the two variables of
interest
• Then run multiple linear regression

•  A positive and significant coefficient () for the interaction term means


the effect of larger size on price will be larger for better cutting
• =2.7 means that, if we increase one unit of cut, the marginal effect of
diamond size on price increases 2.7 thousands
27
Regression with interaction effect

•  To calculate the marginal effect of diamond size on price


• Let’s fix the grade of Cut
• Marginal effect: how much does price increase if we
increase one carat?
• Answer:
• Note: this quantity depends on cut! (due to interaction)
• Therefore, the marginal effect increases by the amount of
if we increase one unit of Cut

28
Today’s class
1. Multiple linear regression
2. Regression with categorical measures as IV
3. Regression with interaction effect
4. Regression with nonlinear relationship

29
Nonlinear relationship
• Recall the simple linear regression:
 
Price Carat

• But the relationship looks like nonlinear:

30
Nonlinear relationship
• Recall the simple linear regression:
 
Price Carat

• To capture nonlinearity, we can add a quadratic term

 
Price Carat

31
Nonlinear relationship
•  First create a new variable
• Then run multiple linear regression

A  positive and significant coefficient () for the quadratic


term means that the effect of larger size on price will be
even larger for larger diamonds

32
Which regression model to use?
•  If the goal is prediction:
• Include more predictors can enhance predictive accuracy as long as
they explain the dependent variable
• Generally, adjusted R2 is a good indicator of fit
• But, there is a danger of overfitting:
• It occurs when the regression model becomes too complex (too
many parameters relative to the number of observations)
• Then the model will exploit the noise () instead of explaining the
underlying relationship (effect of X on Y)
• It fits (almost) perfectly to the observed data, but leads to poor
generalization

33
Overfitting can lead to poor generalization
The overfitting model
fails to predict well
 𝑌

Complex nonlinear
regression model
(overfitting)

Simple linear
regression model

 𝑋

34
Which regression model to use?
• If the goal is to test the effect of a particular IV on the DV:
• Start with the simple linear regression (e.g., size of diamond on price)
• Then run (a lot of) regressions with other predictors included to
control for confounding effects (e.g., adding cut, color, clarity) or to
identify interaction effects (e.g., interaction between size and cut of
diamond on price)
• Go through advanced procedures to validate causality (e.g., checking
if X is exogenous)

35
Learning objectives
• More advanced statistics or econometrics courses will
cover topics such as:
• Model selection: which X to include and which model to select?
• Discrete dependent variables: such as choice over a set of products
• Causality: when can we use regression analysis to identify causal
relationship?
• Examples: difference-in-difference, regression discontinuity,
instrumental variable, etc.

36
Learning objectives
• As a beginner, you should learn from this course
• The objectives of regression analysis
• How to interpret the regression models
• How to interpret the estimation results from simple and multiple
linear regressions
• How to apply regression models in addressing marketing problems
• Applying regression analysis in your group project is highly
encouraged

37
Individual Assignment: Airbnb

• On Canvas:
• Data file “Airbnb customer data.sav”
• Case file “Case Study on Regression Analysis”

• Work on Q3 to Q6
• Deadline: Nov 16 Friday (11:59pm)

38
Summary
1. Multiple linear regression
2. Regression with categorical measures as IV
3. Regression with interaction effect
4. Regression with nonlinear relationship

39
Course schedule
• Next class: Conjoint Analysis
• Reading: New Way to Measure Consumers' Judgments (by Green
and Wind)

• Individual assignment: Case Study on Regression Analysis

You might also like