M4L 4 IE350 Courseware Topic3&4

CEBU INSTITUTE OF TECHNOLOGY – UNIVERSITY
COLLEGE OF ENGINEERING AND ARCHITECTURE

DEPARTMENT OF INDUSTRIAL ENGINEERING
COURSEWARE
IE350
ANALYTICAL
TOOLS AND TRENDS
Topics 3 & 4
Prepared by:
Engr. KRISTAN IAN D. CABAÑA

Instructor
ABOUT THE COURSE
Course / Section IE350 ANALYTICAL TOOLS & TRENDS Credit Units 3
Term Offered Total Hours 54
Instructor Engr. KRISTAN IAN D. CABAŇA
Pre-Requisite/s Second Year Standing
Co-Requisite/s None
COURSE DESCRIPTION
Analytical Tools and Trends is divided into three major areas: analytical tools in
marketing, QM tools, and system dynamics. This introduces approaches in analyzing
marketing data in order to characterize and predict marketing data through
establishing spreadsheet models using Microsoft Excel® PivotTable, Solver & Data
Analysis Add-Ins. This course also explores analytical methodologies in product
research & development. System Dynamics is also integrated to this course in order
to understand the interconnectedness of variables or elements of complex systems
over time.
TOPIC 3 & 4
MODULE 3: ASSOCIATIVE MODELS IN FORECASTING TRENDS
LEARNING OUTCOMES
Articulate the importance of correlation and determination coefficients to
L01
characterize the relationship and variability of data.
Differentiate the necessity of conducting simple and multiple linear regression
L02
analyses
Defend the importance of predictive analytics to a more effective business decision
L03
making process
MODULE 4: PREFERENCE-BASED CUSTOMER SEGMENTATION

LEARNING OUTCOMES
L01 Predict discrete and dichotomous choices of market using logistic regression
Evaluate the product characteristics that drive a consumer’s preference for products
L02
using conjoint and discrete choice analyses
Categorize market data into groups using cluster analysis and appraise differences
L03
among clusters
IE350 ANALYTICAL TOOLS AND TRENDS

Version: EIOR-01 | KIDC-01
MODULE 3:
ASSOCIATIVE MODELS IN FORECASTING TRENDS
INTRODUCTION 1,2
Often the marketing analyst needs to determine how variables are related. They want to
determine the nature of relationships between variables of interest. Some commonly
important marketing questions that require analyzing the relationships between two
variables of interest include:
• How does price affect demand?
• How does advertising affect sales?
• How does shelf spaced devoted to a product affect product sales?
This module introduces the simplest tools you can use to model relationships between
variables. It first covers finding the best fits the hypothesized causal relationship between
two variables. You then learn to use correlations to analyze the nature of non-causal
relationships between two or more variables.
Forecasting involves using past data to generate a number, set of numbers, or scenario that
corresponds to a future occurrence. It is absolutely essential to short-range and long-range
planning.
Time Series and Associative models are both quantitative forecast techniques. These are
more objective than qualitative techniques such as the Delphi Technique and market
research.
SIMPLE LINEAR REGRESSION 1,2
Suppose you manage a plant that manufactures small refrigerators. National headquarters tells
you how many refrigerators to produce each month. For budgeting purposes, you want to
forecast your monthly operating costs.
Every business analyst should have the ability to estimate the relationship between
important business variables. In Microsoft Excel, the trend curve or trendline feature is often
helpful in determining the relationship between two variables. The variable that analysts try
to predict is called the dependent variable. The variable you use for prediction is called the
independent variable. Here are some examples of business relationships you might want
to estimate:
Examples of Relationships

Before we proceed to the steps in modelling regression in Microsoft Excel,
make sure that you have loaded the Analysis ToolPak ribbons.
Load the Analysis ToolPak in Excel3

If you need to develop complex statistical or engineering analyses, you can save steps and
time by using the Analysis ToolPak. You provide the data and parameters for each analysis,
and the tool uses the appropriate statistical or engineering macro functions to calculate and
display the results in an output table. Some tools generate charts in addition to output
tables.
1. Open your Microsoft Excel sheet and go to File located at the upper left corner.
2. Choose options and you will be led to this pop-up window.
3. Click Add-ins.
4. Click Analysis Toolpak and click Go.
5. Check Analysis Tool Pak and Solver Add-in.
6. Then, you will see these new ribbons in your Data Bar.
Now, you are all set!
Now, refer to Microsoft Excel file “02-Regression Analysis”, worksheet

a-costestimate. The worksheet contains data about the units produced
and the monthly plant operating cost for a 14-month period.

QUESTIONS TO ANSWER:
1. What is the predicted operating costs from the units produced
on the 15th month?
2. How accurately does this relationship explain the monthly
variation in plant operating costs?
3. When estimating a straight-line relationship, which functions
can I use to get the slope and intercept of the line that best fits
the data?
Regression Modelling in Excel

Steps on how to do model Regression in Microsoft Excel.
1. Highlight all data. (You may also click any cell with data and click CTRL + A, to
highlight all data).
2. Go to Data Bar and click Data Analysis. Then, a Data Analysis window will prompt.
3. Choose Regression and click OK.

4. Now complete the needed data for analysis indicated in the screenshot below.
Input Y Range -> dependent variable
Input X Range -> independent variable
Tick box on labels, indicating that you have included the label of the column.
5. For output range, it asks where the results will be viewed:

- Click Output Range -> Click arrow up and click any cell of the worksheet (but
not anywhere with the data) to place results.
- Alternatively, you may also choose New Worksheet Ply, which means that the
results generated will automatically be placed to another sheet in the same
excel file.
6. Then, click OK.

Linear Regression Summary Output
CORRELATION COEFFICIENT
COEFFICIENT OF DETERMINATION
COEFFICIENTS
CRITICAL VALUE (T TEST)
You are interested in predicting monthly operating costs from units produced, which
helps the plant manager determine the operating budget and understand better the cost
to produce refrigerators.
140000
130000
120000
110000
100000
90000
80000
70000
60000
50000
40000
400 600 800 1000 1200 1400
Scatter plot of operating cost versus units produced
When you look at the scatter plot, it seems reasonable that a straight line (or linear
relationship) exists between units produced and monthly operating costs. Now, how
to show the trendline? Go to Chart Design Menu Bar, select Quick Layouts and
select layout with equation.
Linear Trend Curve

140000
130000
y = 64.269x + 37894
120000 R² = 0.6882
Monthly Plant Cost
110000
100000
90000
80000
70000
60000
50000
40000
400 600 800 1000 1200 1400
Units Produced
This is the completed trendline.

How does Excel determine the best-fitting line? It chooses the line that minimizes (over all
lines that could be drawn) the sum of the squared vertical distance from each point to the
line. The vertical distance from each point to the line is called an error, or residual. The line
Excel creates is called the least-squares line. You minimize the sum of squared errors
rather than the sum of the errors because in simply summing the errors, positive and
negative errors can cancel each other out. For example, if you add errors, a point 100 units
above the line and a point 100 units below the line cancel each other. If you square errors,
however, Excel uses the fact that your predictions for each point are wrong to find the best-
fitting line.
What is the predicted operating costs from the units produced on the 15th month?
Excel calculates the best-fitting straight line for predicting monthly operating costs from
monthly units produced as follows:
Monthly Operating Costs (y) = 64.269x + 37894

where x is the monthly units produced
Thus, the predicted monthly operating cost on the 15th month is 108,589.67.
How accurately does this relationship explain the monthly variation in plant operating costs?
Clearly, each month, both the operating cost and the units produced vary. The answer to
this question is the R2 value (0.69). You can state that the linear relationship explains 69
percent of the variation in monthly operating costs. This implies that 31 percent of the
variation in monthly operating costs is explained by other factors. Using multiple regression
you can try to determine other factors that influence operating costs.
People always ask what a good R2 value is. There is really no definitive answer to this
question. With one independent variable, of course, a larger R2 value indicates a better fit of
the data than a smaller R2 value.
How accurately does this relationship explain the monthly variation in plant operating costs?
The Excel SLOPE(yrange,xrange) and INTERCEPT(yrange,xrange) functions return the

slope and intercept, respectively, of the least-squares line.
MULTIPLE LINEAR REGRESSION

Simple linear regression describes how to use the trend curve to predict one variable
(called y, or the dependent variable) from another variable (called x, or the independent
variable). However, you often want to use more than one independent variable (called
independent variables x1, x2, . . . xn) to predict the value of a dependent variable. In these
cases, you can use either the multiple regression option in the Excel Data Analysis feature
or the LINEST function to estimate the relationship you want.
Multiple regression assumes that the relationship between y and x1, x2, . . . xn has the
following form: Y=Constant+B1x1+B2x2+...Bnxn
Refer to Excel file “02-Regression Analysis”, worksheet b-costestimate. It

contains the cost of running a plant over 19 months as well as the number of units of
Product A, B, and C produced during each month.

LOGISTIC REGRESSION4,5
Many marketing problems and decisions deal with understanding or estimating the
probability associated with certain events or behaviors, and frequently these events or
behaviors tend to be dichotomous – that is, of one type or of another type. When this is the
case, the marketing analyst must predict a binary dependent variable (one that assumes the
value 0 or 1, representing the inherent dichotomy) from a set of independent variables.
Some example follow:
• Predicting from demographic behavior whether a person will (dependent
variable = 1) or will not (dependent variable = 0) subscribe to a magazine or use
a product.
• Predicting whether a person will or will not respond to a direct mail campaign.
Often the independent variable used are recency (time since last purchase),
frequency (how many orders placed in last year), and monetary value (total
amount purchased in last year).
Logistic regression (LR) is a statistical method similar to linear regression since LR finds an
equation that predicts an outcome for a binary variable, Y, from one or more response
variables, X. However, unlike linear regression the response variables can be categorical
or continuous, as the model does not strictly require continuous data. To predict group
membership, LR uses the log odds ratio rather than probabilities and an iterative maximum
likelihood method rather than a least squares to fit the final model. This means you have
more freedom when using LR and the method may be more appropriate for nonnormally
distributed data or when the samples have unequal covariance matrices.
Now, refer to Microsoft Excel file “02-Regression Analysis”, worksheet c-

subscriber. Suppose you want to predict the chance (based on the
person’s age) that a person will subscribe to Cheradee Series YouTube
channel. The sheet contains the age and subscription status
(1=subscriber, 0=nonsubscriber) for 408 people.
Maximum Likelihood Estimate of Logistic Regression

Model
1. Enter cells for regression coefficients. (Set any dummy values for intercept and slope.)
We will set the values for each of these to 0.1, but we will optimize for them later.
2. Next, we will have to create a few new columns that we will use to optimize for these
regression coefficients including the logit (score), probability of success & failure, the
likelihood and log likelihood.

3. Next, we will create values for each column by using the following formula:
3. Find the sum of the log likelihoods, which is the number we will attempt to maximize to
solve for the regression coefficients.
4. Use the Solver to solve for the regression coefficients.

Once the Solver is installed, go to the Analysis group on the Data tab and
click Solver. Enter the following information:
• Set Objective: Choose cell that contains the sum of the log likelihoods.
• By Changing Variable Cells: Choose the cell range that contains the
regression coefficients.
• Make Unconstrained Variables Non-Negative: Uncheck this box.
• Select a Solving Method: Choose GRG Nonlinear.
Then click Solve.
The Solver automatically calculates the regression coefficient estimates:
Interpreting Logistic Regression Coefficients

In a multiple linear regression, you know how to interpret the coefficient of an independent
variable. In a logistic regression, the interpretation of the coefficient of an independent
variable is much more complex. Suppose in a logistic regression “b” is the coefficient of an
independent variable “x”. It can be shown in our model that a unit increase in “x” increases
the odds ratio (Probability y=1 / Probability y=0) by eb. In our illustration, this means that for
any age a one-year increase in age increases the odds ratio by 1%.

MODULE 4:
PREFERENCE-BASED CUSTOMER SEGMENTATION
CLUSTER ANALYSIS
Often the marketer needs to categorize objects into groups (or clusters) so that the objects
in each group are similar, and the objects in each group are substantially different from the
objects in the other groups.
• When Proctor & Gamble (P&G) test markets a new cosmetic, it may want to
group U.S. cities into groups that are similar on demographic attributes such as
percentage of Asians, percentage of Blacks, percentage of Hispanics, median
age, unemployment rate, and median income level.
• As marketing analyst at Coca-Cola wants to segment the soft drink market based
on consumer preferences for price sensitivity, preference of diet versus regular
soda, and preference of Coke versus Pepsi.
• Microsoft might cluster its corporate customer based on the price a given
customer is willing to pay for a product. For example, there might be a cluster of
construction companies that are willing to pay a lot for Microsoft Project but not
so much for Power Point.
In our discussion, we will use the first example to learn how Excel Solver makes it easy to
perform a cluster analysis. For example, in the U.S. city illustration, you can find that every
U.S. city is similar to Memphis, Omaha, Los Angeles, or San Francisco. You can also find,
for example, that the cities in the Memphis cluster are dissimilar to the cities in the other
clusters.
Now, refer to Microsoft Excel file “03-Cluster Analysis, sheet 01-cities.

Suppose you want to cluster 24 of America’s largest cities. For each city
you have the following demographic data that will be used as the basis of
your cluster analysis: (1) % age black; (2) age Hispanic, & (3) % age
Asian.
For example, Atlanta’s demographic information is

as follows: Atlanta is 67 percent Black, 2 percent
Hispanic and 1 percent Asian. For now, assume
your goal is to group the cities into three clusters.
The basic idea used to identify the clusters is to
choose city to “anchor”, or “center”, each cluster.
You assign each city to the “nearest” cluster
center. Your target cell is then to minimize the sum
of the squared distances from each city to the
closest cluster anchor.

Standardizing the Attributes
In the example, if you cluster using the attribute levels, the percentage of Blacks and
Hispanics in each city will drive the clusters because these values are more spread out than
the other demographic attributes. To remedy this problem, you can standardize each
demographic attribute by subtracting off the attribute’s mean and dividing by the attribute’s
standard deviation.
1. You may want to add at least 5 rows above your data for other calculations. Starting
with, computing for AVERAGE and STDEVA per demographic attribute (%Black,
%Hispanics, %Asian) then proceed to STANDARDIZE.
2. Let us label STANDARDIZE column of %Black as Z1, Z2 for %Hispanic and Z3 for
%Asian.
Input formula in cell F3 as =STANDARDIZE(X,MEAN,STDDEV). Click F4 for the mean and

standard deviation for it to be fixed. Drag down or double click the lower right dot/box at the
end of the cell.
3. Repeat step in #2 but this time for Z2 and Z3. It should show values in the illustration below.
To check if you run correct STANDARDIZE values, you can check the mean of Z1/Z2/Z3 is
0 and the standard deviation is 1.

Choosing the clusters
You can use the Solver to identify a given number of clusters. The key in doing so is to
ensure that the cities in each cluster are demographically similar and cities in different
clusters are demographically different. Using few clusters enables the marketing analyst to
reduce the 49 U.S. cities into a few (in your case three) easily interpreted market segments.
Since the tool is intended for getting the preference of customers, we can assume that the
highest percentage (MAXIMUM) per demographic attribute is the clusters we are looking
for. Which means that Detroit is the highest %Black, El Paso for %Hispanic and Honolulu in
%Asian. We call these as your Anchors/Data Centers.
Note: The values of the anchors are just their respective Z1, Z2 and
Z3 values.
EXCEL HACKS: Before we proceed , let’s talk about a more visual way or
colorful way rather to find the maximum value. We all know, =MAX right?
This time we will explore CONDITIONAL FORMATTING.
1. Highlight all values in the demographic attributes.

2. Go to Conditional Formatting in the Home View and select Color Scales.
3. Choose any theme of colors. If you select the first one for example, the value with the
darkest red is the maximum value. Or you may try using MORE RULES, because I want
to see only one color in different shades to identify the minimum and maximum values.
Take for example the illustration below.

Doing this, we would get these result. You will see Detroit, El Paso and Honulu.
Now, back to the topic.

Finding Optimal Clusters
Let each point’s contribution to the target cell be the squared distance to the closest anchor.
Then choose one anchor from each group to minimize the target cell. This ensures each
point is “close” to an anchor. To do this, you have to compute for the squared distances of
this anchors.
1. Compute for the distance square of Z1.

=SUMXMY2(Z1 value to Z3 values, anchor values of 1)
2. Repeat step #1 for D2 to 2 and D2 to 3. It should show the following values.
3. Get the minimum distribution among these distances. =MIN(I8:K8)

4. Match the Clusters from its MIN distance value to other distance values.
=MATCH(MIN,D2 to 1 to D2 to 3,Ascending Order). It show values presented below.

5. Look which anchor the city will be clustered.
=VLOOKUP(CLUSTER,ANCHORS)
6. Compute for the SUM of all MINIMUM DISTANCE.
7. Run the Excel Solver by setting the sum as the objective, to minimize our anchors. It
could the cities in the anchors and the sum data. If not, then we have already correctly
identified the clusters.

Lastly, summarize the clusters. Copy/paste data to a different blank sheet and SORT DATA
by last column. The clusters now are
Interpretation: You can find that Detroit cluster consists of 15 of 24 cities in the study. El
Paso consists of 8 cities and Honolulu alone is one cluster.
Determining the Correct Number of Clusters

After a while, adding clusters often yields a diminishing improvement in the target cell. To
determine the “correct” number of clusters, you can add one cluster at a time and see if the
additional complexity of adding a cluster yields improved insights into the demographics of
the cities. You can usually start by running three clusters and in this case you have a sum
of 122.3770. Should adding another cluster would just lead you to a sum value close to
122.3770 it means that three is just correct. But if it would give a difference like a sum
value of 90, you may consider adding another cluster.

END OF TOPIC 3
MODULE 3: ASSOCIATIVE MODELS IN FORECASTING
TRENDS
MODULE 4: PREFERENCE-BASED CUSTOMER
SEGMENTATION
SOURCES
1. Wayne Winston, “Marketing Analytics: Data Driven Techniques with
Microsoft Excel”, 2014
2. Wayne Winston, “Microsoft Excel 2013: Data Analysis and Business
Modelling”, 2014
3. Microsoft Support, “Load the Analysis ToolPak in Excel”. [Online]. Available:
https://support.microsoft.com/en-us/office/load-the-analysis-toolpak-in-
excel-6a63e598-cd6d-42e3-9317-6b40ba1a66b4?ui=en-us&rs=en-
us&ad=us#OfficeVersion=Windows
4. Julien I.E. Hoffman, in Basic Biostatistics for Medical and Biomedical
Practitioners (Second Edition), 2019. [Online]. Available:
https://www.sciencedirect.com/topics/medicine-and-dentistry/logistic-
regression-analysis
5. Zach of Statology, “How to Perform Logistic Regression in Excel”, 2020.
[Online]. Available: https://www.statology.org/logistic-regression-excel/


M4L 4 IE350 Courseware Topic3&4

Uploaded by

Document Informationclick to expand document information

Document Informationclick to expand document information

Copyright:

Available Formats

M4L 4 IE350 Courseware Topic3&4

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

M4L 4 IE350 Courseware Topic3&4

Uploaded by

Copyright:

Available Formats

CEBU INSTITUTE OF TECHNOLOGY – UNIVERSITY

COLLEGE OF ENGINEERING AND ARCHITECTURE

Engr. KRISTAN IAN D. CABAÑA

MODULE 4: PREFERENCE-BASED CUSTOMER SEGMENTATION

IE350 ANALYTICAL TOOLS AND TRENDS

SIMPLE LINEAR REGRESSION 1,2

IE350 ANALYTICAL TOOLS AND TRENDS

Load the Analysis ToolPak in Excel3

Now, you are all set!

Now, refer to Microsoft Excel file “02-Regression Analysis”, worksheet

IE350 ANALYTICAL TOOLS AND TRENDS

Regression Modelling in Excel

3. Choose Regression and click OK.

5. For output range, it asks where the results will be viewed:

IE350 ANALYTICAL TOOLS AND TRENDS

CRITICAL VALUE (T TEST)

Linear Trend Curve

This is the completed trendline.

IE350 ANALYTICAL TOOLS AND TRENDS

Monthly Operating Costs (y) = 64.269x + 37894

The Excel SLOPE(yrange,xrange) and INTERCEPT(yrange,xrange) functions return the

MULTIPLE LINEAR REGRESSION

Refer to Excel file “02-Regression Analysis”, worksheet b-costestimate. It

IE350 ANALYTICAL TOOLS AND TRENDS

Now, refer to Microsoft Excel file “02-Regression Analysis”, worksheet c-

Maximum Likelihood Estimate of Logistic Regression

IE350 ANALYTICAL TOOLS AND TRENDS

4. Use the Solver to solve for the regression coefficients.

The Solver automatically calculates the regression coefficient estimates:

Interpreting Logistic Regression Coefficients

IE350 ANALYTICAL TOOLS AND TRENDS

Now, refer to Microsoft Excel file “03-Cluster Analysis, sheet 01-cities.

For example, Atlanta’s demographic information is

IE350 ANALYTICAL TOOLS AND TRENDS

Input formula in cell F3 as =STANDARDIZE(X,MEAN,STDDEV). Click F4 for the mean and

IE350 ANALYTICAL TOOLS AND TRENDS

1. Highlight all values in the demographic attributes.

IE350 ANALYTICAL TOOLS AND TRENDS

Now, back to the topic.

IE350 ANALYTICAL TOOLS AND TRENDS

1. Compute for the distance square of Z1.

2. Repeat step #1 for D2 to 2 and D2 to 3. It should show the following values.

3. Get the minimum distribution among these distances. =MIN(I8:K8)

IE350 ANALYTICAL TOOLS AND TRENDS

6. Compute for the SUM of all MINIMUM DISTANCE.

IE350 ANALYTICAL TOOLS AND TRENDS

Determining the Correct Number of Clusters

IE350 ANALYTICAL TOOLS AND TRENDS

IE350 ANALYTICAL TOOLS AND TRENDS

You might also like