Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Homework 4

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

MS6218 Data Science in Marketing

Homework 4
Prof. Gavin Feng
Due at Mar. 1st 5pm

Our first scanner data analysis


In this assignment we will go through the steps to conduct a basic demand analysis. For this purpose, we
will use scanner data obtained from Kraft (now Kraft Heinz). We only use a small subset of all possible
product/geography/variable combinations available in this sort of data (you will see more detailed data sets
later). Our focus is on two levels of aggregation: Account level (Jewel-Osco), and market level (Kraft Central
region = Midwest region).
Products:
• Kraft 32oz mayo
• Hellman’s 32 oz mayo
Time: Weekly data
Variables: market, product, week, sales_units, sales_dollars
Aggregation: The data for the Jewel-Osco account cover sales at all Jewel-Osco stores in Chicago. The data
for the Kraft Central region cover sales in all stores in the Midwest region.

Inspecting the data


Load the data set and inspect the data frame mayo_DF using the summary function. The summary output
displays the two unique values in the variable market (“Jewel” and “Kraft Central”), and also the number
of observations corresponding for each unique value. Similarly, you see the unique product names in the
product variable. If a variable contains a large number of unique values, then these values will not be shown
in the summary output. Instead you can use the function unique: unique(variable/column name), or the
table function that we already encountered in the Introduction to R tutorial.
Let’s summarize the data separately for both products in the Kraft Central region and at the Jewel-Osco
account. In order to do this we’ll learn how to subset data frames in R using the subset function.
mayo_DF = read.csv("Mayo.csv")
mayo_Hellmans_Jewel = mayo_DF[mayo_DF$product=="Hellmans Mayo 32oz"
& mayo_DF$market=="Jewel",]

The first line in the script above extracts all rows in mayo_DF whenever the two conditions prod-
uct==“Hellmans Mayo 32oz” and market==“Jewel” are TRUE. Note that & is the logical AND operator (for
other purposes you may use |, the logical OR). All of the rows satisfying these conditions are then assigned
to the new data frame mayo_Hellmans_Jewel. You can now summarize the newly created data frame.
Now calculate the summary statistics for all four product/market combinations. You’ll see that we don’t
have the same number of observations in the two markets (the first 16 weeks at Jewel-Osco are missing), and
that total unit and dollar sales are higher in the Kraft Central region compared to Jewel (as expected).

1
Create a price variable
Construct an average price variable by dividing dollar sales by unit sales:
mayo_DF$price = mayo_DF$sales_dollars/mayo_DF$sales_units

1. Explain how to interpret this price variable — how does it differ from a product price at the store level?
2. Provide summary statistics (mean, median, and standard deviation) for the product prices, separately
for each product/market combination.Report the statistics in a table. Are the means of prices similar
across the Kraft Central region and Jewel-Osco? Is there more price variation at Jewel or in the Central
Region? Why? What is the implication for our ability to estimate price elasticities with either account
level data or data in a large geographic market?

Time-series plots of prices


A time-series plot is a graph with a variable indicating the progress of time on the x-axis and the variable of
interest on the y-axis. You can easily create time-series plots in R using the plot function:
plot(mayo_DF$week, mayo_DF$price, type = "o")
mayo_DF$price

1.0
0.6

0 20 40 60 80 100

mayo_DF$week
Note the type = “o” option, which stands for “overplotted points and lines” and tells R to connect the
displayed data points with lines. The default for the type option is “p”, and then only data points are plotted,
while “l” produces lines without data points.
The graph created above is messy. Why? Because you plotted the price data for both products in both
markets on the same graph. Instead, we will create time-series plots separately for different product/market
combinations.
mayo_Hellmans_Jewel = subset(mayo_DF, product=="Hellmans Mayo 32oz" & market=="Jewel")
plot(mayo_Hellmans_Jewel$week, mayo_Hellmans_Jewel$price,
type = "o", pch = 21, lwd = 0.4, bg = "limegreen",
main = "Prices of Hellman's Mayo at Jewel-Osco", xlab = "Week", ylab = "Price")

2
Prices of Hellman's Mayo at Jewel−Osco

1.10
Price

0.95

20 40 60 80 100

Week
You can now repeat this process for all product/market combinations.
3. Provide time-series plots for all product/market combinations using your favorite method. There are
some visible differences between the prices at Jewel and the prices in the Kraft Central region. Why?

Plotting a demand relationship


Construct scatter-plots of unit sales versus prices for all product/market combinations. You can do this using
either of the methods discussed above, method 2 of course being the easier. Note: If you create a scatter plot
don’t use the option type = “o” in plot — see the explanation for this option in Section “Time-series plots of
prices.” For a time-series graph, on the other hand, the type = “o” option is useful to connect the data points.
4. Provide scatter-plots of the demand relationship (unit sales versus prices) for all product/market
combinations. Is there evidence for a negatively sloped demand-curve in the data?

Demand estimation: Own-price elasticities


Fit the log-linear demand model to the data. To estimate the regression model you need to create the logs of
prices and unit sales. There are different ways of doing this. First, you can add new variables/columns to the
data frame mayo_DF for the log of price and unit sales:
mayo_DF$log_price = log(mayo_DF$price)
mayo_DF$log_sales_units = log(mayo_DF$sales_units)

5. Estimate the log-linear demand model and provide the regression results separately for all four prod-
uct/market combinations. Discuss the results. Is demand more elastic at Jewel-Osco or in the Kraft
Central region? What is the reason for the observed difference in the elasticities?
6. Using the regression results for the log-linear demand model, calculate the percentage change in unit
sales for a simultaneous 10 percent increase in the price of Kraft and Hellman’s mayo at Jewel-Osco.
Use the exact formula.

Cross-price elasticities
Now we allow for both own and cross-price effects in the log-linear demand model. We first need to reshape
the data such that we have columns with unit sales and price information for both products.

3
The first step is to extract only the data that we need to create the final data frame used for estimation:
mayo_DF_extract = mayo_DF[, c("market", "product", "week", "sales_units", "price")]
head(mayo_DF_extract,3)

market product week sales_units price


1 Jewel Hellmans Mayo 32oz 17 23247 0.9909236
2 Jewel Hellmans Mayo 32oz 18 24131 1.0118934
3 Jewel Hellmans Mayo 32oz 19 24195 0.9959496
In general, for any data frame DF one can extract rows and columns using the syntax: DF[rows to extract,
columns to extract]. In the extraction statement above we did not specify any rows before the comma, which
R interprets as “use all rows in the data frame”. To extract columns we combined several column names
using the combine function c().
Now we reshape the data and create a new data frame with product-level sales unit and price information in
columns:
mayo_DF_wide = reshape(mayo_DF_extract, timevar = "product",
idvar = c("market", "week"), direction = "wide")
head(mayo_DF_wide,3)

market week sales_units.Hellmans Mayo 32oz price.Hellmans Mayo 32oz


1 Jewel 17 23247 0.9909236
2 Jewel 18 24131 1.0118934
3 Jewel 19 24195 0.9959496
sales_units.Kraft Mayo 32oz price.Kraft Mayo 32oz
1 5560 1.0769784
2 5342 1.0885436
3 13864 0.9948067
Fortunately we can quite easily change the annoyingly long variable names:
colnames(mayo_DF_wide) = c("market", "week", "sales_H", "price_H", "sales_K", "price_K")
head(mayo_DF_wide,3)

market week sales_H price_H sales_K price_K


1 Jewel 17 23247 0.9909236 5560 1.0769784
2 Jewel 18 24131 1.0118934 5342 1.0885436
3 Jewel 19 24195 0.9959496 13864 0.9948067
Now you have the data ready to estimate the own and cross price elasticities, separately for Hellman’s and
Kraft mayo at Jewel-Osco and in the Kraft Central region.
7. Obtain estimates of the own and cross-price effects using the log-linear demand model for all prod-
uct/market combinations.
8. One purpose of demand estimation is to understand whether a brand is “vulnerable” to a competitor’s
pricing policies. That is, to what extent is the demand for a product affected by the price of a competing
brand? Based on your demand estimates, which brand is more vulnerable?
9. The price of Hellman’s mayo is cut by 10 percent at Jewel-Osco. Use your estimates to calculate by
what percent the price of Kraft mayo has to be changed at Jewel to obtain the level of sales before the
Hellman’s price cut.

You might also like