Team ZAID
Team ZAID
Team ZAID
● ARNAV
GITHUB LINK https://github.com/Arnav-spec17/case-study
● MAYURESH
GITHUB LINK
https://github.com/Mayureshwar-Shinde82/McD-/tree/main
Exploring Data
Step1: first glimpse of data
After data collection, exploratory data analysis cleans and if necessary pre-processes the
data. This exploration stage also offers guidance on the most suitable
At a more technical level, data exploration helps to (1) identify the measurement levels of the
variables; (2) investigate the univariate distributions of each of the
The first step before commencing data analysis is to clean the data. This includes checking if
all values have been recorded correctly, and if consistent labels for the levels of categorical
variables have been used. For many metric variables, the range of plausible values is known
in advance. For example, age (in years) can be expected to lie between 0 and 110. It is easy
to check whether any implausible values are contained in the data.
Being familiar with the data avoids misinterpretation of results from complex analyses.
Descriptive numeric and graphic representations provide insights into the data. Statistical
software packages offer a wide variety of tools for descriptive analysis.
Two pre-processing procedures are often used for categorical variables. One is merging
levels of categorical variables before further analysis, the other one is converting categorical
variables to numeric ones, if it makes sense to do so.
Merging levels of categorical variables is useful if the original categories are too
differentiated (too many).
PCA is a statistical technique used to identify patterns in data by reducing the complexity of
a large set of variables. It works by identifying the most important underlying factors ,known
as principal components, that explains the variance in the data. PCA transforms set of
correlated variables into a set of uncorrelated variables.
This step involves assessing the potential profitability and attractiveness of different market
segments based on factors such as size, growth potential, competition and compatibility with
the company’s capabilities and resources.
Implications for marketing mix decisions involve how the chosen target market segment will
impact the marketing mix strategy.
Step2: product
Product refers to developing a product or service that meets the needs and wants of the
target market segment.
Step3: price
Price involves determining the appropriate pricing strategy for the product or service, taking
into consideration factors such as production cost, competition and consumer demand.
Step4: place
Place refers to determining the most effective distribution channels to make the product or
service available to the market segment.
Step5: promotion
1. Exploring Data:
Data exploration can help businesses explore large amounts of data quickly to
betterunderstand next steps in terms of further analysis. This gives the business a more
manageable starting point and a way to target areas of interest. In most cases, data
exploration involves using data visualizations to examine the data at a high level. By
taking this high-level approach, businesses can determine which data is most important
and which may distort the analysis and therefore should be removed. Data exploration
can also be helpful in decreasing time spent on less valuable analysis by selecting the
right path forward from the start.
After data collection, exploratory data analysis cleans and – if necessary – pre- processes
the data. This exploration stage also offers guidance on the most suitable algorithm for
extracting meaningful market segments.
At a more technical level, data exploration helps to following,
The first step before commencing data analysis is to clean the data. This includes checking if
all values have been recorded correctly, and if consistent labels for the levels of categorical
variables have been used. For many metric variables, the range of plausible values is known
in advance. For example, age (in years) can be expected to lie between 0 and 110. It is easy
to check whether any implausible values are contained in the data, which might point to
errors during data collection or data entry.
Being familiar with the data avoids misinterpretation of results from complex
analyses. Descriptive numeric and graphic representations provide insights into the
data. Statistical software packages offer a wide variety of tools for descriptive
analysis.
Helpful graphical methods for numeric data are histograms, boxplots and scatter
plots. Bar plots of frequency counts are useful for the visualisation of categorical
variables. Mosaic plots illustrate the association of multiple categorical variables.
Histograms visualise the distribution of numeric variables. They show how often
observations within a certain value range occur. Histograms reveal if the distribution
of a variable is unimodal and symmetric or skewed. To obtain a histogram, we first
need to create categories of values. We call this binning. The bins must cover the
entire range of observations, and must be adjacent to one another. Usually, they are
of equal length.
1.4.1 CategoricalVariables:
Two pre-processing procedures are often used for categorical variables are following,
Merging levels of categorical variables:
Merging levels of categorical variables is useful if the original categories are too
differentiated
Principal components analysis (PCA) transforms a multivariate data set containing metric
variables to a new data set with variables referred to as principal components which are
uncorrelated and ordered by importance. The first variable (principle component) contains
most of the variability, the second principle component contains the second most variability,
and so on. After transformation, observations (consumers) still have the same relative
positions to one another, and the dimensionality of the new data set is the same because
principal components analysis generates as many new variables as there were old ones.
Principal components analysis basically keeps the data space unchanged, but looks at it
from a different angle.
Principal components analysis works off the covariance or correlation matrix of several
numeric variables. If all variables are measured on the same scale, and have similar data
ranges, it is not important which one to use. If the data ranges are different, the correlation
matrix should be used (which is equivalent to standardising the data).
In most cases, the transformation obtained from principal components analysis is used to
project high-dimensional data into lower dimensions for plotting purposes. In this case, only
a subset of principal components are used, typically the first few because they capture the
most variation. The first two principal components can easily be inspected in a scatter plot.
More than two principal components can be visualised in a scatter plot matrix.
2. Extracting Segments:
2.1 Grouping Consumers:
Data-driven market segmentation analysis is exploratory by nature. Consumer data sets are
typically not well structured. Consumers come in all shapes and forms; a two- dimensional
plot of consumers’ product preferences typically does not contain clear groups of
consumers. Rather, consumer preferences are spread across the entire plot. The
combination of exploratory methods and unstructured consumer data means that results
from any method used to extract market segments from such data will strongly depend on
the assumptions made on the structure of the segments implied by the method. The result of
a market segmentation analysis, therefore, is determined as much by the underlying data as
it is by the extraction algorithm chosen. Segmentation methods shape the segmentation
solution.
Many segmentation methods used to extract market segments are taken from the field of
cluster analysis. In that case, market segments correspond to clusters. As pointed out by
Hennig and Liao (2013), selecting a suitable clustering method requires matching the data
analytic features of the resulting clustering with the context- dependent requirements that are
desired by the researcher (p. 315). It is, therefore, important to explore market segmentation
solutions derived from a range of different clustering methods. It is also important to
understand how different algorithms impose structure on the extracted segments.
2.2.1 HierarchicalMethods:
Hierarchical clustering methods are the most intuitive way of grouping data because they
mimic how a human would approach the task of dividing a set of n observations (consumers)
into k groups (segments). If the aim is to have one large market segment (k = 1), the only
possible solution is one big market segment containing all consumers in data X. At the other
extreme, if the aim is to have as many market segments as there are consumers in the data
set (k = n), the number of market segments has to be n, with each segment containing
exactly one consumer. Each consumer represents their own cluster. Market segmentation
analysis occurs between those two extremes. Divisive hierarchical clustering methods start
with the complete data set X and splits it into two market segments in a first step. Then, each
of the segments is again split into two segments. This process continues until each
consumer has their own market segment. Agglomerative hierarchical clustering approaches
the task from the other end. The starting point is each consumer representing their own
market segment (n singleton clusters). Step-by-step, the two market segments closest to
one another are merged until the complete data set forms one large market segment.
2.2.2 PartitioningMethods:
2.2.3
Hierarchical clustering methods are particularly well suited for the analysis of small data sets
with up to a few hundred observations. For larger data sets, dendrograms are hard to read,
and the matrix of pairwise distances usually does not fit into computer memory. For data sets
containing more than 1000 observations (consumers), clustering methods creating a single
partition are more suitable than a nested sequence of partitions. This means that – instead
of computing all distances between all pairs of observations in the data set at the beginning
of a hierarchical partitioning cluster analysis using a standard implementation – only
distances between each consumer in the data set and the centre of the segments are
computed. For a data set including information about 1000 consumers, for example, the
agglomerative hierarchical clustering algorithm would have to calculate (1000×999)/2 =
499,500 distances for the pairwise distance matrix between all consumers in the data set.
A partitioning clustering algorithm aiming to extract five market segments, in contrast, would
only have to calculate between 5 and 5000 distances at each step of the iterative or
stepwise process (the exact number depends on the algorithm used). In addition, if only a
few segments are extracted, it is better to optimise specifically for that goal, rather than
building the complete dendrogram and then heuristically cutting it into segments.
Following are the Partitioning methods:
Self-Organising Maps.
Neural Network
Hybrid Approaches:
The strength of partitioning clustering algorithms is that they have minimal memory
requirements during calculation, and are therefore suitable for segmenting large data
sets. The disadvantage of partitioning clustering algorithms is that the number of
market segments to be extracted needs to be specified in advance. Partitioning
algorithms also do not enable the data analyst to track changes in segment
membership across segmentation solutions with different number of segments
because these segmentation solutions are not necessarily nested.
Following two are the Hybrid Approaches:
Two-Step Clustering
Bagged Clustering
Model-based methods can be seen as selecting a general structure, and then finetuning the
structure based on the consumer data. The model-based methods used in this section are
called finite mixture models because the number of market segments is finite, and the overall
model is a mixture of segment-specific models.
The two properties of the finite mixture model can be written down in a more formal way,
Property 1: It implies that the segment membership z of a consumer is determined by the
multinomial distribution with segment sizes π:
The simplest case of model-based clustering has no independent variables x, and simply fits
a distribution to y. To compare this with distance-based methods, finite mixtures of
distributions basically use the same segmentation variables: a number of pieces of
information about consumers, such as the activities they engage in when on vacation. No
additional information about these consumers, such as total travel expenditures, is
simultaneously included in the model.
Normal Distribution
Binary Distribution
2.3.3 ExtensionsandVariation:
Finite mixture models are more complicated than distance-based methods. The additional
complexity makes finite mixture models very flexible. It allows using any
statistical model to describe a market segment. As a consequence, finite mixture models can
accommodate a wide range of different data characteristics: for metric data we can use
mixtures of normal distributions, for binary data we can use mixtures of binary distributions.
For nominal variables, we can use mixtures of multinomial distributions or multinomial logit
models (see Sect. 9.4.2). For ordinal variables, several models can be used as the basis of
mixtures (Agresti 2013). Ordinal variables are tricky because they are susceptible to
containing response styles. To address this problem, we can use mixture models
disentangling response style effects from content-specific responses while extracting market
segments (Grün and Dolnicar 2016). In combination with conjoint analysis, mixture models
allow to account for differences in preferences (Frühwirth-Schnatter et al. 2004). An ongoing
conversation in the segmentation literature (e.g. Wedel and Kamakura 2000) is whether
differences between consumers should be modelled using a continuous distribution or
through modelling distinct, well-separated market segments. An extension to mixture models
can reconcile these positions by acknowledging that distinct segments exist, while members
of the same segment can still display variation. This extension is referred to as mixture of
mixed-effects models or heterogeneity model (Verbeke and Lesaffre 1996). It is used in the
marketing and business context to model demand (Allenby et al. 1998).
Most algorithms focus only on extracting segments from data. These algorithms assume that
each of the segmentation variables makes a contribution to determining the segmentation
solution. But this is not always the case. Sometimes, segmentation variables were not
carefully selected, and contain redundant or noisy variables. Preprocessing methods can
identify them. For example, the filtering approach proposed by Steinley and Brusco (2008a)
assesses the clusterability of single variables, and only includes variables above a certain
threshold as segmentation variables. This approach outperforms a range of alternative
variable selection methods (Steinley and Brusco 2008b), but requires metric variables.
Variable selection for binary data is more challenging because single variables are not
informative for clustering, making it impossible to pre-screen or pre-filter variables one by
one.
2.4.1 BiclusteringAlgorithms:
2.4.2 VariableSelectionProcedureforClusteringBinaryData(VSBD):
Brusco (2004) proposed a variable selection procedure for clustering binary data sets. His
VSBD method is based on the k-means algorithm as clustering method, and assumes that
not all variables available are relevant to obtain a good clustering solution. In particular, the
method assumes the presence of masking variables. They need to be identified and
removed from the set of segmentation variables. Removing irrelevant variables helps to
identify the correct segment structure, and eases interpretation. The procedure first identifies
the best small subset of variables to extract segments. Because the procedure is based on
the k-means algorithm, the performance criterion used to assess a specific subset of
variables is the within-cluster sum-ofsquares (the sum of squared Euclidean distances
between each observation and their segment representative). This is the criterion minimised
by the k-means algorithm. After having identified this subset, the procedure adds additional
variables one by one. The variable added is the one leading to the smallest increase in the
within-cluster sum-of-squares criterion. The procedure stops when the increase in
within-cluster sum-of-squares reaches a threshold.
2.4.3 VariableReduction:Factor-ClusterAnalysis:
Data structure analysis provides valuable insights into the properties of the data. These
insights guide subsequent methodological decisions. Most importantly, stability-based data
structure analysis provides an indication of whether natural, distinct, and well-separated
market segments exist in the data or not. If they do, they can be revealed easily. If they do
not, users and data analysts need to explore a large
number of alternative solutions to identify the most useful segments for the organisation. If
there is structure in the data, be it cluster structure or structure of a different kind, data
structure analysis can also help to choose a suitable number of segments to extract.
2.5.1 ClusterIndices:
Because market segmentation analysis is exploratory, data analysts need guidance to make
some of the most critical decisions, such as selecting the number of market segments to
extract. So-called cluster indices represent the most common approach to obtaining such
guidance. Cluster indices provide insight into particular aspects of the market segmentation
solution. Which kind of insight, depends on the nature of the cluster index used.
Generally, two groups of cluster indices are distinguished: internal cluster indices and
external cluster indices.
A simple method to assess how well segments are separated, is to look at the
distances of each consumer to all segment respresentatives. Let dih be the distance
between consumer i and segment representative (centroid, cluster centre) h. Then
can be interpreted as the similarity of consumer i to the representative of segment h,
with hyper parameter γ controlling how differences in distance translate into
differences in similarity. These similarities are between 0 and 1, and sum to 1 for
each consumer i over all segment representatives h, h = 1,...,k.
2.5.3 GlobalStabilityAnalysis:
An alternative approach to data structure analysis that can be used for both distance
and model-based segment extraction techniques is based on resampling methods.
Resampling methods offer insight into the stability of a market segmentation solution
across repeated calculations. To assess the global stability of any given
segmentation
solution, several new data sets are generated using resampling methods, and a number of
segmentation solutions are extracted.
To understand the value of resampling methods for market segmentation analysis, it is
critical to accept that consumer data rarely contain distinct, well-separated market segments
like those in the artificial mobile phone data set. In the worst case, consumer data can be
totally unstructured. Unfortunately, the structure of any given empirical data set is not known
in advance.
2.5.4 SegmentLevelStabilityAnalysis:
Choosing the globally best segmentation solution does not necessarily mean that this
particular segmentation solution contains the single best market segment. Relying on global
stability analysis could lead to selecting a segmentation solution with suitable global stability,
but without a single highly stable segment. It is recommendable, therefore, to assess not
only global stability of alternative market segmentation solutions, but also segment level
stability of market segments contained in those solutions to protect against discarding
solutions containing interesting individual segments from being prematurely discarded. After
all, most organisations only need one single target segment.
EXTRACTING SEGMENTS
Data-driven market segmentation analysis, which requires the use of different
clustering methods to extract market segments from unstructured consumer data.
The choice of algorithm used affects the segmentation solution because different
algorithms impose different structures on the extracted segments. The chapter
describes the advantages and disadvantages of the different extraction methods
used in market segmentation.
The study concludes that no single best algorithm exists, and a good final solution is
achieved by investigating and comparing alternative segmentation solutions based
on data characteristics and expected or desired segment characteristics. This
chapter describes different methods for extracting market segments based on
similarity or distance between consumers.
DISTANCE-BASED METHODS
Distance-based methods aim to find groups of similar consumers based on a
particular notion of similarity or distance, while model-based methods formulate a
stochastic model for the market segments. The choice of extraction method depends
on the data characteristics and expected segment characteristics, such as the size of
the data set, the scale level of the segmentation variables, and the presence of
special structures in the data. The characteristics that consumers should have in
common to be placed in the same segment, and how they should differ from
consumers in other segments, have been specified in an earlier step and need to be
recalled when selecting the extraction method.
Observable characteristics, such as benefits sought, can be directly extracted from
the data, while indirect characteristics, such as price sensitivity, require the use of a
model-based method. In the case of binary segmentation variables, whether to treat
them symmetrically or asymmetrically depends on the desired characteristics of the
segments. Comparing and investigating alternative segmentation solutions is critical
to arriving at a good final solution.
Distance Measures is the method that calculates the distance between the vectors
and that acts a segmentation . In it as well there are many ways to calculate the
distance eg manhattan, euclidean distance etc. Hierarchical clustering methods are
the most intuitive way of grouping data because they mimic how a human would
approach the task of dividing a set of n observations consumers into k groups
segments. Its two main types are divisive and agglomerative clustering is a measure
of distance between groups of observations segments. Underlying both divisive and
agglomerative clustering is a measure of distance between groups of observations
segments. This measure is determined by specify-ing 1 a distance measure dx, y
between observations consumers x and y, and 2 a linkage method. There is no
correct combination of distance and linkage method. Clustering in general, and
hierarchical clustering in specific, are exploratory techniques. very popular alternative
hierarchical clustering method is named after Ward 1963, and based on squared
Euclidean distances. The hierarchical structure is typically represented by a
dendogram.
K-MEANS
K-Means Clustering is an unsupervised learning algorithm that is used to solve the
clustering problems in machine learning or data science. K-means clustering
algorithm, how the algorithm works, along with the Python implementation of
k-means clustering was shown. The algorithm will always converge the stepwise
process used in a partitioning clustering algorithm will always lead to a solution.
Reaching the solution may take longer for large data sets, and large numbers of
market segments, however. The starting point of the process is random. Random
initial segment representatives are chosen at the beginning of the process. Different
random initial representatives centroids will inevitably lead to different market
segmentation solutions. Keeping this in mind is critical to conducting high quality
market segmentation analysis because it serves as a reminder that running one
single calculation with one single algorithm leads to nothing more than one out of
many possible segmentation solutions.
Plus the important point is that the key to a high quality segmentation analysis is
systematic repetition, enabling the data analyst to weed out less useful solutions, and
this means that instead of computing all distances between all pairs of observations
in the data set at the beginning of a hierarchical partitioning cluster analysis using a
standard implementation only distances between each consumer in the data set and
the centre of the segments are computed.
The starting point of the process is random. Random initial segment representatives
are chosen at the beginning of the process. Different random initial representatives
will inevitably lead to different market segmentation solutions. Keeping this in mind is
critical to conducting high quality market segmentation analysis because it serves as
a reminder that running one single calculation with one single algorithm leads to
nothing more than one out of many possible segmentation solutions.
The key to a high quality segmentation analysis is systematic repetition, enabling the
data analyst to weed out less useful solutions, and present to the users of the
segmentation solution managers of the organisation wanting to adopt target
marketing the best available market segment or set of market segments. In addition,
the algorithm requires the specification of the number of segments. The challenge of
determining the optimal number of market segments is as old as the endeavour of
grouping people into segments itself . A number of indices have been proposed to
assist the data analyst .
In any case, partitioning clustering does require the data analyst to specify the
number of market segments to be extracted in advance. Within this community, the
term unsupervised learning is used to refer to clustering because groups of
consumers are created without using an external variable. In fact, the choice of the
distance measure typically has a bigger impact on the nature of the resulting market
segmentation solution than the choice of algorithm .
Specifying the number of clusters is difficult because, typically, consumer data does
not contain distinct, well-separated naturally existing market segments. A popular
approach is to repeat the clustering procedure for different numbers of market
segments , and then compare across those solutions the sum of distances of all
observations to their representative.
INITIALISATION IN K-MEANS
The simplest improvement is to initialise k-means using smart starting values, rather
than randomly drawing k consumers from the data set and using them as starting
points. Using randomly drawn consumers is suboptimal because it may result in
some of those randomly drawn consumers being located very close to one another,
and thus not being representative of the data space. Using starting points that are not
representative of the data space increases the likelihood of the k-means algorithm
getting stuck in what is referred to as a local optimum. A local optimum is a good
solution, but not the best possible solution. One way of avoiding the problem of the
algorithm getting stuck in a local optimum is to initialise it using starting points evenly
spread across the entire data space. Such starting points better represent the entire
data set. Steinley and Brusco compare 12 different strategies proposed to initialise
the k-means algorithm. Based on an extensive simulation study using artificial data
sets of known structure, Steinley and Brusco conclude that the best approach is to
randomly draw many starting points, and select the best set. The best starting points
are those that best represent the data.
k-means uses all consumers in the data set at each iteration of the analysis to
determine the new segment representatives . Hard competitive learning randomly
picks one consumer and moves this consumers closest segment representative a
small step into the direction of the randomly chosen consumer. As a consequence of
this procedural difference, different segmentation solutions can emerge, even if the
same starting points are used to initialise the algorithm. It is also possible that hard
competitive learning finds the globally optimal market segmentation solution, while
k-means gets stuck in a local optimum . An application of hard competitive learning in
market segmentation analysis can be found in Boztug and Reutterer , where the
procedure is used for segment-specific market basket analysis. Here, not only the
segment representative is moved towards the randomly selected consumer. Instead,
also the location of the second closest segment representative is adjusted towards
the randomly selected consumer.
MODEL-BASED METHODS
Model-Based Methods Distance-based methods have a long history of being used in
market segmentation analysis. More recently, model-based methods have been
proposed as an alternative. According to Wedel and Kamakura the pioneers of
model-based methods in market segmentation analysis mixture methodologies have
attracted great interest from applied marketing researchers and consultants. Wedel
and Kamakura predict that in terms of impact on academics and practitioners, next to
conjoint analysis, mixture models will prove to be the most influential methodological
development spawned by marketing problems to date. Having model-based methods
available is particularly useful because these methods extract market segments in a
very different way, thus genuinely offering an alternative extraction technique. As
opposed to distance-based clustering methods, model-based segment extrac-tion
methods do not use similarities or distances to assess which consumers should be
assigned to the same market segment. Model-based methods use the empirical data
to find those values for segment sizes and segment-specific characteristics that best
reflect the data. Model-based methods can be seen as selecting a general structure,
and then fine-tuning the structure based on the consumer data. The model-based
methods used are called finite mixture models because the number of market
segments is finite, and the overall model is a mixture of segment-specific models.
The two properties of the finite mixture model can be written down in a more formal
way. Property 2 states that members of each market segment have segment-specific
characteristics. Maximum likelihood estimation aims at determining the parameter
values for which the observed data is most likely to occur. Iterative methods are
required such as the EM algorithm . This approach regards the segment
memberships z as missing data, and exploits the fact that the likelihood of the
complete data is easier to maximise. If a Bayesian approach is pursued, mixture
models are usually fitted using Markov chain Monte Carlo methods
.The true number of segments is rarely known. A standard strategy to select a good
number of market segments is to extract finite mixture models with a varying number
of segments and compare them. Selecting the correct number of segments is as
problematic in model-based methods as it is to select the correct number of clusters
when using partitioning methods.
INFORMATION CRITERIA
In the framework of maximum likelihood estimation, so-called information criteria are
typically used to guide the data analyst in their choice of the number of market
segments.BIC = logdf − 2 log, ICL = logdf − 2 log + 2ent where df is the number of
all parameters of the model, log is the maximised log-likelihood, and n is the number
of observations. Mean entropy decreases if the assignment of observations to
segments is clear. The entropy is lowest if a consumer has a 100 probability of being
assigned to a certain segment. The entropy is highest if a consumer has the same
probability of being a member of each market segment. All criteria decrease if fewer
parameters are used or the likelihood increases. In contrast, more parameters or
smaller likelihoods will increase them. Because log is larger than 2 for n larger than
7, BIC penalises stronger than AIC for additional parameters, and prefers smaller
models in case different model sizes are recommended. BIC, which takes the
separateness of segments into account. At first glance, finite mixture models may
appear unnecessarily complicated. One possible extension of the presented finite
mixture model includes a model where the segment-specific models differ not only in
the segment characteristics , but also in the general structure. There is an extensive
literature available on finite mixture models including several research monographs .
Finite Mixtures of Distributions The simplest case of model-based clustering has no
independent variables x, and simply fits a distribution to y.
The finite mixture model reduces to hf , h 0, The formulae are the same as in Eq. All
these variables have an approximate univariate normal distribution individually, but
are not independent of each other.
Also,the uncertainty plot is a useful visualisation alerting the data analyst to solutions
that do not induce clear partitions, and pointing to market segments being artificially
created, rather than reflecting the existence of natural market segments in the data.