M4L 4 IE350 Courseware Topic3&4
M4L 4 IE350 Courseware Topic3&4
M4L 4 IE350 Courseware Topic3&4
COURSEWARE
IE350
ANALYTICAL
TOOLS AND TRENDS
Topics 3 & 4
Prepared by:
COURSE DESCRIPTION
Analytical Tools and Trends is divided into three major areas: analytical tools in
marketing, QM tools, and system dynamics. This introduces approaches in analyzing
marketing data in order to characterize and predict marketing data through
establishing spreadsheet models using Microsoft Excel® PivotTable, Solver & Data
Analysis Add-Ins. This course also explores analytical methodologies in product
research & development. System Dynamics is also integrated to this course in order
to understand the interconnectedness of variables or elements of complex systems
over time.
TOPIC 3 & 4
MODULE 3: ASSOCIATIVE MODELS IN FORECASTING TRENDS
LEARNING OUTCOMES
Articulate the importance of correlation and determination coefficients to
L01
characterize the relationship and variability of data.
Differentiate the necessity of conducting simple and multiple linear regression
L02
analyses
Defend the importance of predictive analytics to a more effective business decision
L03
making process
INTRODUCTION 1,2
Often the marketing analyst needs to determine how variables are related. They want to
determine the nature of relationships between variables of interest. Some commonly
important marketing questions that require analyzing the relationships between two
variables of interest include:
• How does price affect demand?
• How does advertising affect sales?
• How does shelf spaced devoted to a product affect product sales?
This module introduces the simplest tools you can use to model relationships between
variables. It first covers finding the best fits the hypothesized causal relationship between
two variables. You then learn to use correlations to analyze the nature of non-causal
relationships between two or more variables.
Forecasting involves using past data to generate a number, set of numbers, or scenario that
corresponds to a future occurrence. It is absolutely essential to short-range and long-range
planning.
Time Series and Associative models are both quantitative forecast techniques. These are
more objective than qualitative techniques such as the Delphi Technique and market
research.
Suppose you manage a plant that manufactures small refrigerators. National headquarters tells
you how many refrigerators to produce each month. For budgeting purposes, you want to
forecast your monthly operating costs.
Every business analyst should have the ability to estimate the relationship between
important business variables. In Microsoft Excel, the trend curve or trendline feature is often
helpful in determining the relationship between two variables. The variable that analysts try
to predict is called the dependent variable. The variable you use for prediction is called the
independent variable. Here are some examples of business relationships you might want
to estimate:
Examples of Relationships
1. Open your Microsoft Excel sheet and go to File located at the upper left corner.
2. Choose options and you will be led to this pop-up window.
3. Click Add-ins.
4. Click Analysis Toolpak and click Go.
5. Check Analysis Tool Pak and Solver Add-in.
6. Then, you will see these new ribbons in your Data Bar.
CORRELATION COEFFICIENT
COEFFICIENT OF DETERMINATION
COEFFICIENTS
You are interested in predicting monthly operating costs from units produced, which
helps the plant manager determine the operating budget and understand better the cost
to produce refrigerators.
140000
130000
120000
110000
100000
90000
80000
70000
60000
50000
40000
400 600 800 1000 1200 1400
Scatter plot of operating cost versus units produced
When you look at the scatter plot, it seems reasonable that a straight line (or linear
relationship) exists between units produced and monthly operating costs. Now, how
to show the trendline? Go to Chart Design Menu Bar, select Quick Layouts and
select layout with equation.
110000
100000
90000
80000
70000
60000
50000
40000
400 600 800 1000 1200 1400
Units Produced
What is the predicted operating costs from the units produced on the 15th month?
Excel calculates the best-fitting straight line for predicting monthly operating costs from
monthly units produced as follows:
How accurately does this relationship explain the monthly variation in plant operating costs?
Clearly, each month, both the operating cost and the units produced vary. The answer to
this question is the R2 value (0.69). You can state that the linear relationship explains 69
percent of the variation in monthly operating costs. This implies that 31 percent of the
variation in monthly operating costs is explained by other factors. Using multiple regression
you can try to determine other factors that influence operating costs.
People always ask what a good R2 value is. There is really no definitive answer to this
question. With one independent variable, of course, a larger R2 value indicates a better fit of
the data than a smaller R2 value.
How accurately does this relationship explain the monthly variation in plant operating costs?
Multiple regression assumes that the relationship between y and x1, x2, . . . xn has the
following form: Y=Constant+B1x1+B2x2+...Bnxn
Many marketing problems and decisions deal with understanding or estimating the
probability associated with certain events or behaviors, and frequently these events or
behaviors tend to be dichotomous – that is, of one type or of another type. When this is the
case, the marketing analyst must predict a binary dependent variable (one that assumes the
value 0 or 1, representing the inherent dichotomy) from a set of independent variables.
Some example follow:
• Predicting from demographic behavior whether a person will (dependent
variable = 1) or will not (dependent variable = 0) subscribe to a magazine or use
a product.
• Predicting whether a person will or will not respond to a direct mail campaign.
Often the independent variable used are recency (time since last purchase),
frequency (how many orders placed in last year), and monetary value (total
amount purchased in last year).
Logistic regression (LR) is a statistical method similar to linear regression since LR finds an
equation that predicts an outcome for a binary variable, Y, from one or more response
variables, X. However, unlike linear regression the response variables can be categorical
or continuous, as the model does not strictly require continuous data. To predict group
membership, LR uses the log odds ratio rather than probabilities and an iterative maximum
likelihood method rather than a least squares to fit the final model. This means you have
more freedom when using LR and the method may be more appropriate for nonnormally
distributed data or when the samples have unequal covariance matrices.
2. Next, we will have to create a few new columns that we will use to optimize for these
regression coefficients including the logit (score), probability of success & failure, the
likelihood and log likelihood.
3. Find the sum of the log likelihoods, which is the number we will attempt to maximize to
solve for the regression coefficients.
CLUSTER ANALYSIS
Often the marketer needs to categorize objects into groups (or clusters) so that the objects
in each group are similar, and the objects in each group are substantially different from the
objects in the other groups.
• When Proctor & Gamble (P&G) test markets a new cosmetic, it may want to
group U.S. cities into groups that are similar on demographic attributes such as
percentage of Asians, percentage of Blacks, percentage of Hispanics, median
age, unemployment rate, and median income level.
• As marketing analyst at Coca-Cola wants to segment the soft drink market based
on consumer preferences for price sensitivity, preference of diet versus regular
soda, and preference of Coke versus Pepsi.
• Microsoft might cluster its corporate customer based on the price a given
customer is willing to pay for a product. For example, there might be a cluster of
construction companies that are willing to pay a lot for Microsoft Project but not
so much for Power Point.
In our discussion, we will use the first example to learn how Excel Solver makes it easy to
perform a cluster analysis. For example, in the U.S. city illustration, you can find that every
U.S. city is similar to Memphis, Omaha, Los Angeles, or San Francisco. You can also find,
for example, that the cities in the Memphis cluster are dissimilar to the cities in the other
clusters.
In the example, if you cluster using the attribute levels, the percentage of Blacks and
Hispanics in each city will drive the clusters because these values are more spread out than
the other demographic attributes. To remedy this problem, you can standardize each
demographic attribute by subtracting off the attribute’s mean and dividing by the attribute’s
standard deviation.
1. You may want to add at least 5 rows above your data for other calculations. Starting
with, computing for AVERAGE and STDEVA per demographic attribute (%Black,
%Hispanics, %Asian) then proceed to STANDARDIZE.
2. Let us label STANDARDIZE column of %Black as Z1, Z2 for %Hispanic and Z3 for
%Asian.
3. Repeat step in #2 but this time for Z2 and Z3. It should show values in the illustration below.
To check if you run correct STANDARDIZE values, you can check the mean of Z1/Z2/Z3 is
0 and the standard deviation is 1.
Note: The values of the anchors are just their respective Z1, Z2 and
Z3 values.
EXCEL HACKS: Before we proceed , let’s talk about a more visual way or
colorful way rather to find the maximum value. We all know, =MAX right?
This time we will explore CONDITIONAL FORMATTING.
3. Choose any theme of colors. If you select the first one for example, the value with the
darkest red is the maximum value. Or you may try using MORE RULES, because I want
to see only one color in different shades to identify the minimum and maximum values.
Take for example the illustration below.
7. Run the Excel Solver by setting the sum as the objective, to minimize our anchors. It
could the cities in the anchors and the sum data. If not, then we have already correctly
identified the clusters.
Interpretation: You can find that Detroit cluster consists of 15 of 24 cities in the study. El
Paso consists of 8 cities and Honolulu alone is one cluster.
SOURCES
1. Wayne Winston, “Marketing Analytics: Data Driven Techniques with
Microsoft Excel”, 2014
2. Wayne Winston, “Microsoft Excel 2013: Data Analysis and Business
Modelling”, 2014
3. Microsoft Support, “Load the Analysis ToolPak in Excel”. [Online]. Available:
https://support.microsoft.com/en-us/office/load-the-analysis-toolpak-in-
excel-6a63e598-cd6d-42e3-9317-6b40ba1a66b4?ui=en-us&rs=en-
us&ad=us#OfficeVersion=Windows
4. Julien I.E. Hoffman, in Basic Biostatistics for Medical and Biomedical
Practitioners (Second Edition), 2019. [Online]. Available:
https://www.sciencedirect.com/topics/medicine-and-dentistry/logistic-
regression-analysis
5. Zach of Statology, “How to Perform Logistic Regression in Excel”, 2020.
[Online]. Available: https://www.statology.org/logistic-regression-excel/