Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Customer Analytics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 95















2.1.1 R AND RSTUDIO 59







Page 2






4.5 K-MEANS 106



5.1 BOOKS 107

5.2 WEBSITES 107

Page 3


As seen in other manuals, Big Data refers to all the data available to companies
regarding their customers and their processes, as well as their use in order to address
customers in a personalised manner and get better results.

This manual will focus on what is known as Customer Analytics. In other words, it will
focus on the study of customer data in order to help us to design customised actions,
plans and strategies to satisfy and retain our customers.

As seen in other manuals about business strategy, customers are an essential part of
companies since companies could hardly exist without them. That is why our
customers should not be ignored and they should be offered products and promotions
adapted to their needs and expectations because, otherwise, they will buy from our

This manual will discuss the different customer data available to the companies and
the origin of these data, as well as the importance of using them in a way that helps us
to obtain better benefits and to make our customers happy.

We will deal with how to analyse these data, as well as the importance of
segmentation according to certain variables that allow us to personalise actions.

Personalising our actions is fundamental, especially in the current competitive

business environment. If customers are not our objective and we do not try to make
them feel special and important, another company will do it for us.

In short, this manual will focus on three fundamental aspects:

- Customer analytics

- Customer segmentation

- Customer value management

Page 4

These three topics will introduce the origin of our customer data, what tools can be
used to analyse this data and segment our customers based on variables. And,
customer value management will also be discussed.

In addition, the annexes include practical examples for a broader understanding of

customer analytics.

Page 5


As already mentioned in the prologue, customers are a fundamental part of any

company or organisation so it is very important to always consider them when
launching new campaigns or products. The survival of our company depends to a large
extent on their satisfaction and loyalty.

That is why customer analytics should never be ignored since thanks to this analysis we
know what our customers are like and what they need and expect from us, and this
will help us to adapt to them and offer personalised and exclusive actions and


Before delving into customer analytics, we should first consider briefly the amount of
data available to companies today.

Some years ago, just before the rise of the internet and social media, companies only
had the customer they had reached themselves. Thus, a company could create a
database with information about customers such as their employment or address, as
well as information related to their purchases, frequency, amount spent or
satisfaction, among other aspects. These data were collected when customers
requested information, bought or purchased a product or service, or by means of

In addition to these data, companies should now consider all information coming from
the internet or social media since users are increasingly using the networks to inform
or comment on companies and their products. The information that can be extracted
from these sources is really important. In addition to what users publish and share
with other people in the network about us or our products or services, other variables

Page 6

should be taken into account such as user activity, their friends or followers and their
power of influence.

Companies have a large amount of information about their customers that may or may
not be very useful. This is because, depending on how this information is analysed and
used, it may or may not help companies to present a strategy or strategies according
to the profiles of their customers, and will also help acquiring new customers and
retaining current customers.

This new reality born through the democratisation of the internet is forcing companies
to completely change their marketing strategies. As just mentioned, the way this new
information is handled will influence our work positively or negatively.

Companies need to be aware of the importance of all this information; they should not
ignore it and should analyse it and use it to their benefit.

Thus, customer analytics can be defined as the process in which customer behaviour
data are used by companies to help them make key decisions through market
segmentation and predictive analysis. This is a very important process, since the
collected indicators should come from customer actions or attitudes. This means that
the analysed data should come from a source that has collected the customer
behaviour or the data related to it.

These data or this information about our customers can come from different sources:

- CRM or commercial and marketing databases.

- KPIs or metrics collected through surveys.

- Purchase receipts, product return rates, complaints or claims and level of


- Direct observation of customers.

- Web analytics.

Page 7

- Measure in-shop customer behaviour.

- Information of communication campaigns.

- Etc.

As seen, companies have many sources to gather information and data about their
customers. However, these data and information belong to different departments,
which is a difficulty since these data do not stay in the same department. So, its
collection, analysis and use is very important.

This analysis will allow companies to:

- Integrate and analyse the information collected in all these sources.

- Establish the behavioural patterns of customers.

- Develop the customer journey.

- Identify problems in the customer's experience with the product or service

to improve it.

- Segment customers based on variables such as their buying behaviours or


In short, all these data and information allow the company to describe, explain and
predict the behaviour of its customers.

In addition, the main benefit that customer analytics can have is being able to make
decisions based on the data since this leads to having specific results such as:

Page 8

- Conducting campaigns directed exclusively to a particular segment of

customers, which reduces costs.

- Customising the offers and promotions of products and services, trying to

retain customers.

- Preventing customer abandonment and winning back dissatisfied


- Increasing customer loyalty and satisfaction.

It is important to address the two types of data that should be considered in our
analysis. Structured and unstructured data should be differentiated.

▪ Structured data

Structured data is information that is usually found in most databases of any company.
Structured data are text files usually displayed in rows and columns with titles.

These data can be ordered and easily processed by all data mining tools.

Structured data have a structure or predefined form, in which fields are fixed. In
addition, as has already been said, they are stored in tables.

This data type may come from different sources such as:

- Created data: data businesses purposely create data, generally for market
research. This may consist of customer surveys or focus groups. It also
includes more modern methods of research, such as creating a loyalty
programme that collects consumer information or asking users to create an
account and login while they are shopping online.

Page 9

- Provoked data: data created indirectly through prior action, such as

reviews of the company, a product or service.

- Transactional data: Businesses collect data on every transaction

completed, whether the purchase is completed through an online shopping
cart or in-store at the cash register. Businesses also collect data on the
steps that lead to a purchase online.

- Compiled data: summaries of data of companies or public services of a

group nature such as the electoral roll or the number of registered cars.

- Experimental data: created when businesses experiment with different

marketing pieces and messages to see which are most effective with

▪ Unstructured data

Unstructured data do not have a specific format so they cannot be stored in a table
because its information cannot be broken down into basic data types. In other words,
these data can be obtained in multimedia or PDF files or social media content.

The main sources of unstructured data are:

- Captured data: created passively based on user behaviour, such as the

biometric information of movement wristbands, activity tracking
applications or GPS position.

- User-generated data: consists of all of the data individuals are putting on

the internet every day. From social media posts to videos uploaded to
YouTube, individuals are creating a huge amount of data that businesses
can use to better target consumers and get feedback on products.

Page 10

These data cannot be structured or organised. However, they are the most relevant
data and information for companies since this information gives us data on behaviour,
preferences and user and customer satisfaction. This information is very useful when
designing marketing strategies.

In addition to structured and unstructured data, it is interesting to mention semi-

structured data or hybrid data. These data are limited to specific fields, even though
they have markers that separate the different elements. This type of information is not
regular enough to be managed in a standard way. This type of data can be found in
XML or JSON files.

Given these previous considerations, customer analytics will now be discussed in

detail. We will deal with its origins, definition and characteristics, its phases,
technologies it employs, its relationship with big data, the customer journey model
and the implementation of a customer analytics project.

1.1.1 Customer analytics

Before discussing the definition and characteristics of customer analytics in detail, it is

important to highlight that companies, and particularly the marketing department,
have information about their customers and users thanks to social networks.

It is increasingly common to find social media users who share their purchases, their
meals in a restaurant or their treatments in a beauty salon with their followers.
Although these interactions may seem very simple, they can provide very valuable
information about customers and users to a company. As is known, it is essential to
collect these data, analyse them and use them adequately to obtain benefits. It is
important to emphasise that it is unstructured data, so it is very important to make
sense of them.

Page 11

However, to make sense of these data, it is important to understand the 3 Vs of Big

Data that were already studied in other manuals. These are:

▪ Volume: a really large volume of data and information is available, so

controlling collection and segmentation can be difficult.

▪ Variety: as already mentioned, two varieties of data should be considered:

structured and unstructured data.

▪ Velocity: data are generated at an ever increasing speed and as close as

possible to the real time in order to extract its value.

In short, companies have a huge volume of data that increases very rapidly every day
and data can be structured or unstructured.

These three Vs of Big Data force the company, to a certain extent, to carry out a data
quality process since this will allow us to reduce the risk that the data are not accurate
and to make sure that the information used in our analysis is not wrong.

It is also very important to have tools to meet the objectives set that can be used by
different people involved in data analysis. As just mentioned, companies have a large
volume of data that is spread out among all their departments so these tools must be
accessible to both business users, the managers of the marketing department or the

This helps us to monitor what is being done in order to understand what happened,
what is happening at this very moment and what is going to happen.

Page 12

Customer analytics origin

Given these considerations regarding data, it is important to briefly discuss the origin
of customer analytics, as this will help us to better understand the importance of this

Customer analytics focuses on the interest that companies have in their customers in
order to understand them and adapt their strategies to them.

This is not new since the first actions and applications of companies aimed at
automating the activities associated with sales were in the early 90's. This meant that,
little by little, these applications of customer information management were
multiplied, giving specific solutions to the specific needs of customers. This gave rise to
independent information systems for call centre, customer service, help desk and
customer and service support departments.

However, these systems were a significant barrier to the development of a single

customer strategy since this meant that each area of the company had a part of the
customer information, thus obtaining an incomplete view of the customer. This caused
the strategies presented to customers not to be entirely effective since only one part
of the information of each customer was considered.

However, at the end of the 1990s, the first integral solutions were created so that a
comprehensive customer management strategy, now known as CRM, was developed.

This strategy refers to two different concepts. On the one hand, it refers to a
management model of the entire organisation based on customer orientation, while,
on the other hand, it refers to the information system that supports customer
relationship management, sales force and marketing. By implementing a CRM,
companies have a broader view of the interactions with their customers, and they can
respond more efficiently.

Page 13

Customer relationships

Basically a positive relationship with customers should be built to benefit both parties.
On the one hand, customers who get what they want from the company and, on the
other hand, the company that acquires or retains a customer.

Companies need a project that allows them to predict how and when they will satisfy
the needs of their customers. For this, the company must know the preferences,
behaviours and attitudes of their customers as this is the only way they will achieve
positive results. It is crucial that all departments have this information. This means that
departments or areas cannot simply know a part of the information related to
customers. Having all the information will help them to carry out complete actions
oriented at these customers and avoid generalised actions that do not take into
account the preferences, behaviours and attitudes of customers. It is very important
not only to focus on the data collected in a department, but also to consider the data
of all the departments of the company.

The benefits of customer analytics are:

- Acquiring customer value.

- Increasing lifetime value.

- Retaining customers who are at risk of abandonment.

- Increasing customer loyalty, as well as the recommendations and opinions

they generate.

Page 14

For this, it is essential to have a 360-degree view of customers and four data types
should be considered for that:

▪ Descriptive data: attributes, characteristics, socio-demographic data, etc.

▪ Behavioural data: transactions, order details, payment history, etc.

▪ Interaction data: interaction with communication channels: email,

webpage, navigation information, apps, social media and traditional
communication channels.

▪ Attitudinal data: opinions, preferences and needs or desires that can be

found in opinions in social media or in survey answers.

Processing of data

Before moving on to discuss the main features of customer analytics, as well as its
process and the technologies that are used to carry out the analysis, it is important to
introduce the processing of the data.

As seen, companies have a very large volume of data regarding their customers so
their processing and analysis is key to finding the patterns and associations among
them in order to predict the potential behaviour of customers as best as possible.

When dealing with these data to achieve the best results, it should be noted that the
analysts who perform these tasks do not use a single methodology, but rather
different methodologies to group together customers with similar attributes, optimise
marketing efforts, handle service issues and assess brand awareness. They use the
following methods or techniques:

▪ Classification algorithms: it is very important to identify the attributes that

make something happen. With these algorithms we can create alerts that
show that something can happen, such as the risk of abandonment of a

Page 15

customer, so that actions can be undertaken to retain that customer and

prevent them from going to our competitors.

▪ Association algorithms: these algorithms will be used to find associations,

relationships or sequences in our data. Important information that can be
obtained through this technique is about those products that are sold
together or those products that were bought or that other customers were
interested in together with a specific product. This allows us to create
promotions with product packs or create a recommended product space.

▪ Segmentation algorithms: they allow us to create different groups of

customers based on different metrics, such as the average receipt or the
frequency of purchase. This allows us to design different strategies or
promotions for each group. Thus, we will not create generalised strategies
or promotions, but instead we will focus on the different types of
customers we have in order to adapt to them in the best possible way. It is
very important to group our data and find out what patterns to follow and
then design the strategy.

Along with the processing of data, visualisation is very important. Data should not only
be collected and analysed, but this analysis should be understandable for all the
members of the organisation. Otherwise, actions and strategies will hardly be
successful. The visualisations of analysis should help us to understand, inform and
share information quickly and efficiently.

The following visualisation tools should be considered:

Page 16

▪ Dashboards

It is a graphic representation of the main indicators (KPI) involved in achieving business

objectives. It is aimed at making decisions to optimise the strategy of the company.

Dashboards help us to quickly understand the complexity of the available data because
they allow us to create a picture of what is available and go into detail to analyse more
specific aspects.

Some of the characteristics or aspects that should be considered regarding dashboards


- The number of indicators included should not be too high; only the
indicators that are really necessary should be included. This is very
important since, as already said, it is a visualisation tool that will help us to
quickly and easily understand the data available so we should avoid
including indicators that are not significant for what we want to analyse.

- These KPI should also be actionable for the business, that is, they must
allow us to carry out strategic actions to improve.

- As mentioned, data and information must be shared throughout the

company so its visualisation should be as clear and easy to interpret as
possible. Thus, a dashboard must be brief, speak a common language and
offer a suitable graphic representation for the data it represents and visual
enough for its study to be attractive.

- In addition to the perfectly visualised indicators, the dashboard must be

accompanied by an analysis explaining what happened, the
recommendations to solve or improve the situation and the impact this
may have on the business.

Companies can use a wide range of tools to create a dashboard. Data should be
introduced and they are converted into graphic information to make it easier to
interpret them.

Page 17

In the examples below, a representation made with Excel and another representation
obtained through the Google Analytics tool are seen.

Other tools to create these panels are Cyfe, Chart.io, Klipfolio and Clicdata, among

*Example of dashboard. Source: https://www.smartsheet.com/how-create-dashboard-excel

Page 18

*Example of dashboard. Source: https://econsultancy.com/blog/62828-10-useful-google-


▪ Custom reports

As its name suggests, they are customised reports, that is, they are reports created
according to the current needs, in which information and data on the chosen topic will
be obtained. Thus, they only give that specific information that our company wants.

Once we have all the information regarding our company, custom reports will limit this
information to the data that really concerns us.

For example, we may have information related to the sales department of a specific
quarter or year. We have data such as the invoicing, the amount of the average
receipt, the number of daily, weekly and monthly purchases, the most and least sold

Page 19

products, products that were sold together, the origin of the buyers, the payment
method, etc.

It is a huge amount of information, and all of it is useful depending on what we are

interested in at that moment. So, if we want to make promotions for the purchase of
two products together, we need to know all the information relative to the products
that were sold together during the period of time analysed or the products that buyers
visited when they bought a specific product. For this, a custom report with that
information will be created, which allows us to assess if the action we want to carry
out will have benefits.

The main benefits of this type of report are:

- The information distributed in several reports is unified in these reports

according to the metrics and parameters selected.

- Allow us to communicate and report this information in a much faster,

clearer and more accurate way, both to the members of the company and
to customers.

- In addition to selecting the type of information that we want to include,

they also allow us to establish parameters in relation to time.

Tools like those offered by Google Analytics can be used to create these reports.

In short, dashboards give us general information, while custom reports give us only the
information related to a topic.

Given this introduction, it can be said that customer analytics refer to the processes to
capture, manage, analyse and generate strategic value of the customer data of a
company or organisation. In other words, it is about collecting all the information and
all the data related to our customers, visualising them and using them in a way that
provides value.

Page 20

The features that differentiate customer analytics from other analytical systems are:

- Granularity of data: data must be considered at the individual level.

- Predictive: not only describes what the data show, but tries to predict the
reason for this behaviour.

- Multi-platform: the different behaviours of customers should be combined

in the systems that gather their interactions and, as already mentioned, at
the individual level.

- Multi-sector/multi-application: customer analytics not only considers its

consumers, but also analyses the internal audience of the organisation,
such as workers.

- Multi-disciplinary: as already mentioned, customer analytics should be

applied to all the departments and areas of the organisation to be
effective. This allows us to have a global vision and design global strategies.

- Behavioural study: despite the fact that customer analytics includes

descriptive variables such as demographics, income and social status,
thanks to unstructured data, customer analytics allows us to study and
know the behaviour of customers, which is essential to segment these
customers and plan strategies and actions for each group.

- Longitudinal: allows us to study how behavioural patterns evolve.

According to the aforementioned characteristics, customer analytics refers to the

following five types of analysis:

▪ Descriptive analytics: as its name suggests, this first type of analysis

focuses on describing the interactions of customers, that is, it is an analysis
that focuses on what customers purchased, the purchase expense, the
frequency, the number of new customers, the number of customers who
stopped buying, etc.

Page 21

▪ Diagnostic analytics: focuses on understanding the reason why customers

act in a certain way.

▪ Predictive analytics: tries to forecast what might happen in specific


▪ Prescriptive analytics: identifies different groups or customer segments, so

that this involves making decisions related to the interactions of customers
based on scenarios. Examples of prescriptive analytics include identifying
customers to whom retention or loyalty strategies can be applied.

▪ Preventive analytics: focuses on identifying the needs of customers even

before the customers do so that this allows us to anticipate.

In addition to these five general types, there are many other analyses that can be
carried out in customer analytics depending on our objectives or strategies.

Below is a list of strategies along with some of the analysis that should be considered
for their implementation.

Customer segmentation:

- Homogeneous and heterogeneous clustering: is the task of grouping a

set of objects in such a way that objects in the same group (called a
cluster) are more similar to each other than to those in other groups
(clusters). In other words, it is about grouping customers according to
criteria or variables such as address or job, among others.

- Demographic and psychographic segmentation: differentiating

customers on the basis of demographic and psychographic variables.
Demographic variables are age, gender, job, education, etc.
Psychographic variables are personality, lifestyle, interests, etc.

Page 22

- Trends and patterns: involves the study of patterns and trends in

relation to customer behaviour. An example of trend is to buy only
during the Christmas period or during the sales season or when there
are discounts and promotions. This will allow us to segment customers
between regular customers and customers who only buy when they are
offered a discount or on special occasions.

Customer acquisition:

- Tendency to purchase: refers to the analysis of the willingness of

customers to buy or purchase products or services.

- Analysis of purchase factors: identifying and analysing factors or

elements that make someone buy or acquire a product or service.

- Affinity models: affinity analysis allows us to gather a large amount of

information and organise it based on natural affinities or relationships.

Benefits and customer loyalty:

- Tendency of improvement analysis: involves identifying the elements

that help us to retain our customers. This means that these customers
also obtain some benefit or improvement from the company.

- Cross sales and additional sales: involves knowing the products that are
sold together, as well as complementary or additional products that
customers consume or intend to consume.

- Channel preferences: based on studying our different purchasing

channels, as well as establishing the channels that our customers prefer.

- Shopping basket analysis: analysing the shopping cart of our customers

is an effective way to get to know them and their lifestyles since this is
the clearest indicator of what our customers consume. It is about seeing
what they buy and using it to design strategies.

Page 23

- Optimisation: refers to the method to establish the values of the

variables that take part in a process or system in order to obtain the
best possible result.

- Design and decision models: involves analysing strategies to achieve a

perfect design that gives us the best benefits.

- Price elasticity analysis: price elasticity is a measure used in economics

to show the responsiveness or elasticity of the quantity demanded of a
good or service to a change in its price. In other words, it deals with
how consumers react to price changes.

- RFV or RFM models: the abbreviations of this model mean Recency,

Frequency and (Monetary) Value. This model is used mostly in database
marketing. It creates segments based on each of the three variables of
the model. The result of this triple segmentation gives the group that
best defines our customers as a result.

- Sales and customer service improvement models: This type of analysis

allows us to reflect and pay attention to the aspects of our processes
that are not being taken care of or that are inadequately executed. It
gives us the opportunity to improve customer management and to get
to know the elements that can provide better benefits with better

- Social and sentiment analysis: this type of analysis tries to determine

consumer attitude towards a subject. This attitude can be their
evaluation, the affective or emotional state or the emotional
communicative intention. In other words, it is about studying what our
customers feel when they buy one of our products, when they need
assistance, etc.

Page 24

Customer retention:

- Customer churn analysis: refers to the customer attrition rate in a

company. This analysis helps us to identify the cause of the churn (when
a customer ceases his or her relationship with a company) and
implement effective strategies for retention. To determine the
percentage of customers that have churned, take all the customers you
lose during a time frame, such as a month, and divide it by the total
number of customers you had at the beginning of the month. Do not
include any new sales from that month.

- Prediction: predicting what our customers want or need.

- What-if analysis: an improvement tool that allows us to evaluate in

advance the impact of our decisions, strategies or operations. It
establishes possible scenarios in order to analyse the impact of these
scenarios without implementing them, as this could be a risk for the


- Marketing mix modelling (MMM): marketing mix is a technique which

helps in quantifying the impact of several marketing inputs on sales or
Market Share. The purpose of using MMM is to understand how much
each marketing input contributes to sales, and how much to spend on
each marketing input. It analyses the 4 Ps of marketing: Product, Price,
Place, and Promotion.

- Attribution modelling: based on the actions that people carry out

before converting. That is, it focuses on the place where the user
accessed us, which can be an ad in the newspaper, a Facebook post, a
link on a website that talks about our sector, etc.

Page 25

In addition, the type of analysis used can be linked to the customer knowledge that is
intended to be created. These following categories or types of knowledge should be

- Behaviour: understanding customer patterns and behaviours.

- Profitability: understanding the monetary value of customers.

- Life cycle: understanding the relationship between customers and the

organisation in the long term.

- Loyalty: understanding customer loyalty to the company or


- Interest: understanding the probability of customers responding to


- Campaigns: understanding all the factors associated with a particular

event or campaign.

Before moving on to deal with the customer journey and the implementation of a
customer analytics project in our company, it is worth commenting briefly on customer
analytics and Big Data.

In the era of Big Data, given the amount of data and customer information available to
companies, it is very important that they consider the following:

- Good knowledge of Big Data:

o Be able to deal with multiple sources and data formats.

o Be able to work with large volumes of data.

o Be able to capture and extract data at high speed.

o Be able to focus on data quality and the analysis models.

Page 26

- Identify the data that really impacts business:

o Be able to generate actionable information quickly.

o Be able to identify and focus on priority business opportunities.

o Be able to avoid paralysis in decision making due to excessive


- Execute this type of initiative in real time:

o Be able to transform the organisation to avoid informational


o Be able to acquire the right skills.

o Be able to deploy the adequate infrastructure.

Big Data technologies can play a fundamental role in the design of customer
experiences, not only in data analysis, but also in their creation since they enable
further personalisation and analysis of the customer journey.

Given these considerations, the customer journey will be addressed below. After that,
we will learn how to implement a customer analytics process.

1.1.2 Customer journey

It is a Design Thinking tool that captures each of the stages, interactions, channels and
elements that customers go through when interacting with our company and brand.
Instead of looking at just a part of a transaction or experience, the customer journey
documents the full experience of being a customer.

It is important to highlight that the customer journey does not have to start when the
customer starts being our customer and end when our relationship ends, but rather
this map can be focused on a specific stage, such as the purchase process or the

Page 27

attention by the customer service department in relation to the resolution of a

problem. In this way, the start and the end can be established as desired depending on
our needs.

When mapping this journey, the following aspects that make up the process should be

- Define customers: in order to understand customers or consumers,

they should be identified first. The customer experience with the
company is seen as a journey so the objective is to put oneself in the
customer's shoes and understand and record the features of that

- Understand stages: We should observe all the interactions from the

customer perspective in order to determine how they feel and take
advantage of the results to improve. Examples of this include seeing
how the company's app is used. For us as a company, this app can be
intuitive and easy to use, but customers may not feel comfortable
because they do not know how to use it. Thus, understanding how
customers feel can help us to improve the application.

- Record indecisions and motivations: the company should know what

motivates consumers to act in one way or another, as well as their
concerns, indecisions and what they like or what makes them

- Map out touchpoints: A touchpoint refers to any time a customer

comes into contact with our company. We need to list every
touchpoint, from their first experience with us to the last. This way, we
will not miss out on any opportunities to listen to our customers and
make improvements that will keep them happy. Some examples of
these touchpoints include emails, purchasing always through the
website or through the app, etc. In addition to establishing the

Page 28

touchpoints that are repeated, it is also important to identify emotions,

establishing negative, positive and neutral emotions as this will help us
to undertake improvement actions.

- Analyse every key point: We should try to identify the moments in

which customers are happy, angry or lost during their journey. These
are crucial moments as they affect the decision to continue to the next
phase or leave. Thus, it is particularly important to identify those
aspects that make the consumer feel bad because in this way we can
improve them and make their experience better. An example would be
in the purchase process through the website or the application. There
may be some step that upsets consumers because they do not
understand it. So, we should identify what the consumer does not
understand and improve it since this may lead the customer to leave
without buying.

- Internal processes of our company: this is not a mandatory phase, since

we are creating a map of our customers’ journey. However, it can be
very helpful to add the internal processes of our company that allow
customers to carry out each process or interaction. In this way, when
emotions are negative at a specific touchpoint, we will be able to see
the processes that we carry out ourselves so that we can analyse them
to determine what is missing and solve it.

- Opportunities and customer sentiment: Finally, it is essential to

understand what customers feel at each stage. We must identify what
annoys them at the different touchpoints. By identifying negative
experiences, customers' experiences can be improved. Thus, if
consumers have more rewarding experiences, they will return since
they will want to repeat the interactions and they will become loyal
customers. Basically, it is about making their experience as pleasant as
possible so that this is not the only relationship they have with us.

Page 29

Below are two graphic examples of a customer journey map:

Source: https://medium.com/@azviss4/customer-journey-mapping-in-7-bookmarks-

Page 30

Examples: https://josecantero.com/2015/03/07/customer-journey-map-o-mapa-del-

As seen in the examples, the objective is basically to track the journey of consumers
during a process and to assess how they feel in each of the phases of this process. This
allows us to determine the adequate processes and the processes that should be
improved to achieve a positive experience.

1.1.3 Implementation of a customer analytics project

Now that we know how to map a customer journey, we will discuss the different
phases of a customer analytics project.

Projects of this type are usually carried out by an external company. However, the
essential steps and considerations will be discussed in order to analyse customer data
and information correctly and beneficially for the company.

The first thing to do when starting any kind of project is to have a strategic objective.
It is very important to clearly and accurately define what we want to achieve when
starting our analysis project.

To do this, two types of procedures can be used: setting SMART goals and a SWOT

Page 31

▪ SMART criteria

When we say that the objectives should be set using the SMART criteria, we mean that
these objectives should be:

- S à Specific

- M à Measurable

- A à Attainable

- R à Relevant

- T à Time-bound


Goals should be clear and specific, otherwise we will not be able to focus our efforts or
feel truly motivated to achieve them. When drafting goals, try to answer the five "W"

o What do we want to accomplish?

o Why is this goal important?
o Who is involved?
o Where is it located?
o Which resources or limits are involved?

It is important to have measurable goals so that progress can be tracked. Assessing

progress helps us to stay focused, meet our deadlines, and feel the excitement of
getting closer to achieving our goal. A measurable goal should address questions such
as: How much? How many? How will I know when it is accomplished?

Page 32


Goals can be challenging but they also need to be realistic and attainable to be
successful. In other words, it should stretch our abilities, but still remain possible.
When setting an achievable goal, we may be able to identify previously overlooked
opportunities or resources that can bring us closer to it.


Relevance is about ensuring that our goal matters to us, and that it also aligns with
other relevant goals. For example, if the goal is to launch a new product, it should be
something that is in alignment with the overall business objectives. Our team may be
able to launch a new consumer product, but if our company is a B2B that is not
expanding into the consumer market, then the goal would not be relevant.


Every goal needs a target date, so that we have a deadline to focus on and something
to work toward. There are three types of objectives: long-term, mid-term and short-

Some examples regarding data analysis are:

- Expand the market in an area where there are very few customers. Increase
presence from 10% to 25% in six months.

- Customised promotions for each customer within certain segments for the
Christmas campaign.

- Get an average score of 9/10 for our customer service.

- Increase online channel sales by 10% on the current percentage.

To set goals to help us improve before setting objectives, research on the company
and the environment should be carried out to know the situation. A SWOT analysis will
be conducted for this.

Page 33

▪ SWOT analysis

The SWOT analysis is a strategic planning technique used to evaluate a company's

competitive position by identifying its strengths, weaknesses, opportunities and
threats. It helps organisations assess what they can and cannot do, and its potential
opportunities and threats.

The SWOT analysis is usually represented as a matrix using the following methodology:

Attributes of the company Internal factors of the Internal factors of the
company that may be company that may be
helpful to achieve the harmful to achieve the
objectives. objectives.
Attributes of the External factors of the External factors of the
environment company that may be company that may be
helpful to achieve the harmful to achieve the
objectives. objectives.

This analysis has two dimensions, internal and external, to identify the elements of the
company that can be strengths and the elements that are weaknesses as well as the
elements and factors of the environment that can help the company to succeed.

The main objective of this analysis is to help organisations or companies to identify

critical strategic factors and elements that are a competitive advantage over the
competition in order to establish their strategy and achieve their objectives.

In the external analysis, all elements of the environment that may have some type of
relationship with the company should be considered.

Page 34

The following aspects should be analysed in the external analysis:

- Political: the stability of the country, the government system and restrictions on
imports and exports should be taken into account.
- Legal: fiscal trends and legislation should be taken into account.
- Economic: refer to public debt, wage levels, prices and foreign investment.
- Social: demographic distribution, employment and unemployment, and the
sanitation and hygiene system should be taken into account.
- Technological: the speed of technological progress and changes in the systems
should be taken into account.

By analysing these factors, we should be able to identify the elements that can help
the company to achieve its objectives, and the elements that can be harmful.

The internal analysis focuses on identifying the strengths and weaknesses that the
company has internally regarding capital resources, personnel, assets, product quality,
internal and market structure, and consumer perception, among others.

It is very important to establish what we want to achieve before analysing the data we
have since, as stated, there is a huge amount of data available so we are more than
likely to get lost among all these data if we are not clear about what we want to
achieve with them.

Once our strategic objective is established, we should work on asking the right
questions. In this way, we should ask ourselves why things happen so that with each
answer we will outline the issues that need to be analysed, thus building the path to
reach the final result.

It is very important to make sure that our objectives are well defined, so that they are
aligned with our strategy. For this, it is very important to ask the right questions,
otherwise we will not be able to identify the aspects in which we must work in our
data analysis. This means that strategies may be designed for other objectives and not
for the established objectives.

Page 35

In this way, if our goal is to ensure that our customer service is excellent and is well
valued by our customers, we should ask questions about the factors that involve
customer service so that they give us the data or information that should be studied
and analysed in order to design a strategy and improvement actions.

In this phase of questions, it is also important to prioritise the problems that arise by
importance so that the most urgent problems are dealt with first. Organisation is key
since we should not try to solve all problems at the same time, but rather gradually.

Once the objectives are set and the right questions are asked, the data and
information analysis will be conducted.

As mentioned throughout the manual, companies have a large amount of data, both
structured and unstructured, so we should be very clear about the data types that will
be analysed in order to achieve our goal. For example, if we want to increase the
loyalty of our customers, it is very important to analyse the data referring to their
transactions. Otherwise, we will not know their buying tendency and it will be
impossible to predict what their behaviour will be in future purchases.

Although the company has a large amount of data, sometimes, depending on what we
want to analyse, we may lack relevant information. For this, we can always send
surveys to our customers with questions whose answers can help us in the analysis.

As previously seen, there are several types of data. They will be chosen depending on
our objective.

Once the analysis is finished and conclusions are drawn, as well as certain actions and
campaigns to achieve the established objectives, the results of our customer analytics
project will be presented. As seen, there are two tools for the visualisation and
presentation of results, dashboard and custom reports. Both will be used to present
results. It is particularly important to accompany the presentation with actions. These
actions will also involve a subsequent analysis to verify if they caused the desired
effect. As in any strategy, it is fundamental to monitor the development of the action,
since deviations can be identified and corrected.

Page 36

In addition to guiding the company or organisation when carrying out strategies and
actions to achieve specific objectives, this process aims to generate a series of tangible
and intangible benefits for the organisation. These are:

Tangible benefits:

- Increase value for the company by deploying a wide range of strategies

aimed at increasing the customer lifetime value of each customer.

- Increase value for customers, focusing on increasing their satisfaction and

loyalty to the brand.

- Create new products based on customer preferences.

- Identify product packages that meet customer needs.

- Increase the effectiveness of marketing actions by generating more

revenue per action.

- Review price strategies.

- Reduce time and costs associated with decision making.

- Improve customer retention and acquisition costs thanks to strategies

aimed at specific groups of customers.

- Reduce the risk associated with customers by identifying profiles of non-

payment or fraud.

- Improve the productivity of sales force based on the optimisation of loyalty


Page 37

Intangible benefits:

- Add knowledge and tools to management to make decisions about

customers, market and competition.

- Identify market changes and acquire the ability to react to them quickly
thanks to our knowledge.

- Train the organisation to create customer experiences that have an impact

on the value generated for customers, based on an in-depth knowledge of
customer interactions.

- Enable a 360-degree view of customers thanks to the possibility of

analysing the data and information generated in all channels, both online
and offline.

Although the analysis process may seem easy, it is important to note that we can often
come across problems or barriers that make our work more difficult. Customer
analytics strategies are not a simple process. Some of the most frequent barriers that
can be faced during this process are:

- Lack of understanding of the organisation about the customer concept.

- Lack of management support for the initiative.

- Lack of leadership and project management.

- Difficulty in identifying and quantifying the return on investment and the value
generated for the organisation and the customer.

- Lack of funds and resources to carry out the project.

- The quality of the available data is insufficient to successfully address the


Page 38

- Difficulty in integrating the existing Business Intelligence resources,

Datawarehouse resources or the applications available in the organisation.

- Lack of qualified personnel or personnel able to train the existing staff.

- Difficulty in relating and integrating the result of the analysis with the different
business processes.

- Preference for personal experience instead of conclusions derived from data


- Concern for the security of the data or for their privacy.

- Immaturity of the organisations in integrating the presence in social media and

the derived data with the data of a customer available in the channels.

- Paralysis in decision making due to an over-analysis of the data.

Knowing and taking into account the barriers or problems that can be faced when
launching a customer analytics project, it is important to highlight some good practices
that can help us to implement this project. These practices are:

- Ensure the support and commitment of the management to the project.

- Create and establish a collaborative and communicative environment between

the team that develops the project and the beneficiaries.

- Focus on leadership, planning and project management.

- Coordinated development of this type of projects within a larger initiative to

develop a decision making culture oriented to facts and based on data.

- Identify realistic initiatives with measurable objectives.

- Establish and maintain a pragmatic approach to the use of customer analytics

in order to avoid excessive analysis.

Page 39

- Pay special attention to the training of beneficiaries.

- Follow a methodology for the deployment of analytical projects.


Having discussed what customer analytics is as well as the different data that should
be considered and how to carry out a customer analytics project; this section will deal
with another of the main elements of data analysis, customers.

In our relationship with our customers, data are essential since they allow us to
segment customers in different groups, as well to know what they want or need, or
the reasons why they came to us and not to the competition. This is why they are
incredibly helpful not only to retain customers with promotions and personalised
campaigns, but also to achieve what is known as the ideal client.

This section will focus on two aspects: the ideal customer and segmentation, which will
be discussed more in depth in the second chapter.

1.2.1 The ideal customer

The objective of every business is to make money. This means that the company
should take care of certain aspects or elements such as its customers. Companies
should always pay attention to their customers in order to understand their interests
and needs so that they can adapt to them and make them become their ideal

For this, it is fundamental that, before launching any campaign or strategic action, the
customers of the company should be analysed and studied to know their interests,
motivations, needs, the reason why they came to us, their behaviour when shopping,
their origin, social status, etc.

Page 40

In addition, the results and impact of each campaign should be assessed so that it
allows us to improve and achieve full satisfaction of our customers.

To achieve the ideal customer, the following actions should be carried out:

▪ Understand the attributes of our most valuable customers: the most valuable
customers refer to the customers who spend a lot of money in the company so
they bring the most benefits. It is important to know where they come from,
the most used shopping channels, whether they have an influence on social
media or not, their opinion about the company, their needs, etc. It is basically
about knowing our most valuable customers, in order to establish common
characteristics that define our ideal customer. This allows us to create a
customer profile that will be used to attract other customers that fit that

▪ Enhance our data thanks to marketing campaigns: it is important to take

advantage of the data that can be extracted from the different interactions
with our current and future customers since this will allow us to be able to
carry out specific campaigns or actions.

▪ Create profiles based on the characteristics of our customers’ segments:

through statistical analysis and data mining, segments within our customers
and potential users can be found. Thus, based on these segments, profiles can
be created so that our marketing team can use them to develop more effective

▪ Implement a marketing strategy for each individual: a mass email cannot be

expected to reach the right customers with the right message. Having countless
data about our customers means that they can be segmented and classified, so
that each of them can receive a strategy or action through a specific channel
according to their characteristics.

Page 41

▪ Analyse and compare the effectiveness of campaigns: every campaign,

regardless of its size, must be analysed in depth. This allows us to see the
response of each of our customers to that campaign in order to adapt and
improve the aspects that may have failed.

As seen, it is essential not to bother our customers with campaigns or offers they are
not interested in, or with campaigns or offers about products or services that they
have already purchased. We cannot forget that, most of the time, an excessive amount
of offers can diminish brand loyalty so we should not bombard customers, but only
send them what is necessary and may interest them.

Before analysing the data available, it is worth understanding what type of data we
have and how we have achieved it. In this way, we must ask ourselves questions like:

- The users that are going to be analysed were obtained when they subscribed or
when purchasing?

- Do we have recent information or is it from some time ago?

- Was the data obtained through surveys with closed questions or through the
user's own comments in surveys or in the networks?

This type of question will help us focus our analysis and our data sets.

Descriptive and behavioural data are good to start segmenting our customer base in
order to find behaviours of certain groups. However, to truly have a 360-degree view
of our customers, we should also focus on the study and analysis of interaction and
attitude data.

Regarding the achievement of the ideal customer, it is worth discussing customer

abandonment and retention. The data study and analysis will help us to avoid this
abandonment and to manage to retain our customers. In other words, we should
prevent them from going to the competition.

Page 42

In order to increase customer retention, a proactive rather than a reactive attitude

should be adopted.

Customer analysis can achieve the following:

- Identify immediately small problems that could become bigger. This will be
achieved by analysing the differences between customers who have left and
our current customers since deviations identify data points that are
significantly different from the rest.

- Identify customers that are at risk of leaving and classify them with a note
depending on the stage of abandonment they are in.

- Send offers or promotions to avoid abandonment and to retain customers who

are thinking of leaving.

- Monitor the success of customer retention efforts.

- Identify customers who need attention.

In relation to important customers and their loyalty, Customer Life Value (CLV) should
be mentioned.

It is an indicator used by companies to measure the current value of a customer's

future benefits. This indicator consists of three elements or key factors:

- Customer acquisition costs.

- Margin generated by the customer (annual benefit per customer less the
annual retention cost per customer).

- Retention rate.

Since marketing is now more focused and oriented to the use of data, a much more
dynamic approach for the analysis of CLV results has emerged. CLV is no longer

Page 43

focused on the transactional historical information of a customer, but on the future

relationship that a company can have with that customer.

Calculating the CLV for each of our customers permits us to:

- Focus the objective of our campaigns to increase the profitability of our


- Plan the acquisition of new customers.

- Identify customers who will buy.

- Identify customers with whom our internal resources are being wasted.

- Quantify our customers.

The CLV calculation will make us better sellers since we will know how much we can
actually spend to acquire or retain a customer.

There are different ways to calculate the CLV. The easiest way is the following formula:
AP * PF * CL

- AP: average purchase

- PF: purchase frequency, that is, the average number of purchases of a

customer in a year.

- CL: customer lifespan, average of years a customer continues purchasing from

your company.


- AP: 70

- PF: 5

- CL: 3

Page 44

CLV = 70 * 5 * 3 = 1050

An approximation to the CLV of our customers is obtained with this formula, but a
much more accurate result can be obtained by adding the margin in each sale. In that
case, the formula would be as follows: AP * PF * CL * margin.

Margin refers to the benefit obtained in each purchase.

However, we should bear in mind that the value of money changes over time and not
all customers are regular so that the previous formula becomes: (AP * PF * CL *
margin) * (r / I + go)


- I: discount rate of value of money over time (example: 5%)

- r: customer retention rate (example: 75%)

Once the CLV is calculated, the way it is used is crucial to obtain good or bad results. In
order to achieve good results, the following five key points should be considered:

- Establish our acquisition budget

In most companies, almost all marketing budgets are aimed at acquiring new
customers. However, it is very important that the company determines how much it is
willing to spend for the acquisition of new customers, as this will help us to make
better decisions about where to allocate our marketing resources.

Acquisition tolerance can be determined using the net present value of future cash
flows, including the allocation of a percentage of the income of future promotions.

The CLV and the acquisition budget should be recalculated on a regular basis such as
the organisation and considering that customer behaviour changes.

Page 45

- Find the best way to win the best customers

We should make sure to focus on high value customers. We should avoid making
efforts and spending resources on customers who waste our resources and reduce our
benefits. For this, a cost analysis should be conducted.

We can send a standard offer to all our customers, or send a specific personalised offer
to a specific segment of our company. The same or a better result will be obtained, but
our marketing efforts will be focused, costs will decrease and the ROI (return on
investment) of our campaigns will increase.

- Find the best way to grow the relationship with our customers

Thanks to the information and analysis about each customer, the right message can be
sent to the right customer through the right channel.

Basically, this is about knowing our customers very well to be able to anticipate their
needs and offer them what they want. This will make them feel valued and they will
continue their relationship with us.

It is very important to update this information. Surveys can be periodically conducted

to know their level of satisfaction, or a question can be asked at the end of each
transaction, so that it allows us to improve our understanding of the behaviour and
preferences of each of our customers as the relationship progresses.

- Find the best way to retain our best customers longer

Thanks to the algorithms and calculation processes, deviations can be identified as well
as behaviours that put a customer at risk of loss and we can prevent them from leaving
with proactive actions to stay. They also allow us to identify customers who waste our
resources and who do not provide us with benefits, thus being able to allocate those
resources to other customers with higher value.

Page 46

- Use predictions at each touchpoint

Each touchpoint of each department of our company can be analysed. With this, data
and key behavioural information at these points will be gathered, and we will also be
able to analyse in detail the message our customers are receiving to ensure that it is
aligned with our marketing strategy and the message of our campaigns and actions.

One of the best ways to increase CLV is to offer customers actions and promotions
based on their behaviour and preferences. That is, offering them personalised service
based on the results and conclusions drawn from the study and analysis of their data.

A good way to offer a promotion or discount to customers is right at the point of sale.
Some of the strategies that can be carried out at the point of sale are:

- Cross-selling: a technique used to get a customer to spend more by purchasing

a product that is related to what is being bought already.

- Up-selling: induces the customer to purchase more expensive items, upgrades

or other add-ons in an attempt to make a more profitable sale.

Adding unstructured data in our predictive models can help us to personalise even
more the cross-selling and up-selling offers, making them more attractive. It is basically
about knowing customers very well to offer them products and services that they
usually consume.

Page 47

*Example of customer lifetime cycle. Source:


1.2.2 Customer segmentation

This second section will introduce segmentation, which will be discussed in depth in
the next chapter.

Customer segmentation involves dividing our customers into parts; each part is
characterised by features that make it different from the rest.

The first step to segment our customers is using the available data, such as descriptive
data or behavioural data.

A simple way to segment our customers at the beginning is by performing an RFM

analysis, one of the simplest methods of customer segmentation and, at the same
time, one with the best results in the short term. This analysis is based on the Pareto

Page 48

principle, which states that 20% of a company's customers generate 80% of the

Pareto noticed that 80% of Italy's land was owned by 20% of the population. He then
carried out surveys on a variety of other countries and found to his surprise that a
similar distribution applied.

With this analysis, it can be seen to what extent this principle is real in our case and
each customer can be placed in its step of the value pyramid.

The RFM analysis consists of classifying customers by their value according to three

- Recency: how recently a customer has purchased.

- Frequency: how often a customer purchases.

- Monetary: how much a customer spends.

Scales should be created based on these three variables in order to classify customers.
Each customer is given a value according to their group (a number of groups of the
same size, or with the same number of customers, is made). We usually work with 5
values, even though 9-10 values are also commonly used, especially in direct and
online sales. Here is an example:

Page 49

Example of RFM analysis. Source: https://www.shopify.com/blog/customer-lifetime-


It is worth mentioning that the RFM analysis has clear benefits both in strategic
segmentation and tactical segmentation of retail customers. These benefits are:

- Easy to interpret and apply.

- Easier calculation, compared to multivariate techniques.

- Easily integrated into the usual promotional dynamics in a marketing area and
ideal for direct marketing and relationship marketing.

- Allows ongoing work on the recency of purchase and is excellent to reduce

abandonment in the medium term.

- Excellent starting point when segmenting our audience, as it tells us who the
most valuable customers are.

Page 50

As said, this is the first step in segmentation. To offer promotions and offers as
personalised as possible, we should not only focus on group segmentation, but also on
segment individuals.

Micro-segmentation will be used for this, which will allow us to identify the
preferences, needs and behaviours of each of our customers. If offers are personalised
at this level, the revenue of our marketing campaign can be maximised and the
relationship with our customers can be improved.

To achieve the personalisation of each customer, it is necessary to score each of our

customers individually and create indicators that allow us to identify the preferences
of each of these customers. Once an RFM analysis is conducted and our customers are
segmented into different groups, there will be groups such as customers who buy very
frequently and spend a lot of money at each visit.

There may be customers who have not bought for a while within this group, but
because they did it frequently and used to spend a lot, we want them back. To make
them buy again, we send a specific promotion for them. However, not every customer
will react to our communication in the same way. Some of them may prefer to receive
it by post or by email, some others may prefer an online discount code, and some
others may prefer a discount or direct promotion in store, etc.

We should know these customers very well, so that they can be classified according to
specific criteria, which allow us to carry out strategies and actions as personalised as
possible. We should basically create profiles within the groups. There are two types of
analysis to create these profiles:

- Supervised analysis: validates specific hypotheses about customers. For

example, determine if customers of a certain age buy a certain product more
than another group.

- Unsupervised analysis: reveals patterns in the information from a minimum

definition of variables.

Page 51

There are different techniques that are often combined:

- Clustering: a process that divides customers into groups with similar

characteristics. It allows identifying which variables and factors are relevant.

- Classification: a customer segmentation process based on variables and known

factors. This step follows clustering.

- Estimation: refers to the assignment of a numerical value to an entity as part of

decision making such as the intention to purchase or the risk of abandonment.

Having introduced segmentation, the second subject of this manual will deal with
customer segmentation in a more specific and detailed manner.

Page 52


As seen, customer segmentation involves dividing the general group of customers into
different groups based on a wide range of characteristics and attributes.

This chapter will continue dealing with segmentation, but it is essential to address data
exploration before. We should know perfectly what type of data we have before
segmenting our customers by groups, since this will help us to segment accurately. For
example, if we want to segment our customers according to their level of education, it
is important to verify that we have this information from most of our customers.


Before analysing our data, whether it is to segment or to carry out specific marketing
strategies, it is very important to perform data exploration tasks. These will allow us to
perfectly know the data that we have as well as to identify incorrect data.

Although there are several tools for this, here we will focus on the study of the R tool
and the RStudio work platform, which will facilitate the use of R.

2.1.1. R and RStudio

RStudio is an interface that allows us to easily access the full power of R. R needs to be
previously installed to be able to use it. R is an object-oriented language for statistical
calculation and graphic generation that has a wide range of statistical and graphic
techniques that will help us in our analysis and exploration.

R, also known as GNU S implements a dialect of the award-winning S language,

developed at Bell laboratories by John Chambers.

Page 53

Basically, R provides a relatively simple access to a wide variety of statistical and

graphical techniques, and also offers a complete programming language to add new
techniques by defining functions.

S and R are currently the two most widely used languages in statistical research. Their
main attractions are:

- The ability to combine "pre-packaged" analyses (example: a logistic regression)

with ad-hoc analysis, specific to a situation: ability to manipulate and modify
data and functions.

- High quality graphics: data visualisation and graphic production for articles.

- A very dynamic community, with high growth in the number of packages and
made up of highly renowned statisticians.

- There are specific extensions to new areas such as bioinformatics, geostatistics

and graphic models.

- Object-oriented language.

2.1.2. Data exploration

Regardless of what our goal is, it is very important to explore our data before taking

As already mentioned, companies have a very large amount of data so it is very

important to know the data available before creating the model.

No database is perfect no matter how much effort we put in so we should check if

there is lack information, incorrect data or if data have been incorrectly introduced.
For this, data should be explored before starting, because if not we will either have to
modify data that are wrong in the middle of the process or we will create a model
whose results are incorrect.

Page 54

By exploring the data, we will also realise what type of customers we have, which will
help us in the construction of the model.

This section will address the different ways to obtain information about our data. Data
exploration uses a combination of general statistics (means, medians, variance and
counters) and visualisations or data graphs.

▪ Problem identification

As already mentioned, our databases may be not perfect. That is why data exploration
is so important because it will help us to identify any problems that may arise in
relation to our data. The most frequent problems that can be found and how to
identify them graphically will be discussed here.

The following problems should be considered:

- Null values

A few null values or missing data in a variable are not a problem, but many of them in
the same variable are since this will prevent us from performing a real analysis of what
interests us.

For example, if we want to segment our customers according to their level of

education, but there are many null values or missing data in that variable, the data
obtained will not be real so a model in line with reality cannot be established.

It is important to think why we do not have these data, considering whether it is

difficult to obtain or if it is information that the company has recently decided to store.

Regardless of the reason for having null values, we should decide what to do with
them. We can either try to obtain the data related to that variable or include a variable
with null values in our model. In this second case, we should decide whether to delete
all the records where the value is null or to convert these null values into an additional
category or 0 value.

Page 55

- Outliers

Not only should we check to see if there are null values or missing data, but also
determine that non-null values are correct.

The following should be considered:

- If there are negative data when all data should be positive.

- If there are data in text value when they should be expressed in numerical

- If there are data out of range, as having data with value 0 or value 120 in the
field of age.

In most cases, outliers are simply because they were entered incorrectly. As with null
data, we should decide what to do with this incorrect data.

- Range of a set of data

It is very important to pay attention to the variation of our data, which is to say, of the
data needed to segment our customers in relation to the objective or objectives set.
We should ensure that there is enough variation.

For example, if a company that sells medical insurance wants to establish a

relationship between the age of their customers and the types of insurance they sell,
they must ensure that the data regarding the age of their customers have enough
variation to be able to find a strong variation. In this way, if they only have customers
who are 40 to 60 years old, they cannot establish any relationship, since there is not
enough data variation.

We should also consider the opposite case, in other words, that a range of a set of
data is too broad.

The important thing here is to determine if the range of the set of data available is very
broad or very narrow depending on the objectives.

Page 56

A good way to identify these problems is visualisation. Images can offer us another
type of information different from the information given by numbers. For example, a
graph is a good way to see the customer age distribution.

*Customer age distribution graph. Source:


Graphs can give us the following information in relation to our data:

- What the maximum value of the distribution is.

- The peaks in distribution.

- If data are normal.

- The variation of data and if it is concentrated in a certain interval or in a certain


These are the types of graphs that can be used to identify problems in our data:

Page 57


The histogram is a display of statistical information that represents the frequency of

occurrence of specific phenomena which lie within a specific range of values, which are
arranged in consecutive and fixed intervals. The frequency of the data occurrence is
represented by a bar; hence it looks very much like a bar graph.

If we want to determine the number of customers in each age range, the histogram
will show how each column referring to an age range corresponds to a quantity.

*Example of histogram. Source: https://statistics.laerd.com/statistical-


Density plot

A density plot visualises the distribution of data over a continuous interval or time
period. It is a variation of a histogram that uses kernel smoothing to plot values,
allowing for smoother distributions by smoothing out the noise. The peaks of a density
plot help display where values are concentrated over the interval.

Page 58

An advantage density plots have over histograms is that they are better at determining
the distribution shape because they are not affected by the number of bins used (each
bar used in a typical histogram).

*Example of density plot. Source: http://www.sthda.com/english/wiki/ggplot2-density-plot-


Bar graphs

A bar chart presents categorical data with rectangular bars with heights or lengths
proportional to the values that they represent. The bars can be plotted vertically or

Page 59

*Example of vertical bar graph. Source: http://onedigital.mx/ww3/2015/07/06/office-2016-


Bars are vertical in this example. This is due to the fact that few variables are
presented. However, when the number of variables is high, it is recommended to plot
bars horizontally.

*Example of horizontal bar graph. Source: https://www.spss-tutorials.com/spss-bar-charts-


Page 60

In graphs with horizontal bars, it is best to order variables from the highest to the
lowest since this will help reading and interpreting the graph.

The previous graphs showed one variable only. However, there are cases in which a
relationship between two variables needs to be established as well as the type of
relationship. For this, line graphs, scatter charts, hexbin plots and bar graphs for two
categorical variables can be used.

Line graphs

A line graph is used to display quantitative values changing over a continuous interval
or time span.

They are particularly useful to establish if there is a relationship between two values or
variables, since we can see how they are evolving over time.

le of

Page 61

Scatter plot

A scatter plot is a two-dimensional data visualisation that uses dots to represent the
values obtained for two different variables, one plotted along the x-axis and the other
plotted along the y-axis. It shows how much one variable is affected by another.

If the indicators are almost aligned in the scatter plot, it means that the two variables
are highly correlated. If the indicators are distributed evenly throughout the graph, the
correlation will be low or zero.

*Example of a scatter plot. Source: https://aprendiendocalidadyadr.com/diagrama-de-


Page 62

Hexbin plots

This type of graph is like a two-dimensional histogram. Data are divided into hexagons
and the number of data in each hex is represented by a colour.

*Example of a hexbin plot. Source: http://blog.52north.org/2015/04/22/advanced-time-series-


Bar graphs for two categorical variables

This type of graph is exactly the same as the previously discussed bar graph. The only
difference is that two different variables are presented so there should be two bars for
each value.

The placement of the information relative to each variable can be done in several
ways, either one bar on top of the other or one bar next to another, either horizontally
or vertically. The important thing is that the person that visualises the information
knows what variable each bar refers to, in order to be able to establish the existing
relationship between the two variables.

Page 63

*Examples of bar graphs. Source: https://docs.tibco.com/pub/spotfire_web_player/6.0.0-


Although we have talked about establishing the relationship of two variables, it can
also be established with more variables.

Page 64

*Example of bar graph. Source: https://www.tutorialspoint.com/javafx/bar_chart.htm

As seen, data can be explored through a wide range of graphs.

This is basically about gathering information about the data available in order to know
the types of data and their variations. Otherwise, customer segmentation and analysis
may be carried out with unreal or unrepresentative information, and the actions to be
undertaken once the analysis is completed will not be carried out successfully.

Given these considerations, segmentation will be discussed in more detail to know

what techniques should be used for a successful segmentation that will help us achieve
our objectives.

Page 65


Once data are explored and verified, customer segmentation can be carried out. As
already seen, customer segmentation involves dividing our customers based on a wide
range of characteristics or attributes.

Segmentation helps us to:

- Know how many customers our business has.

- The volume of sales represents the type of customer.

- How our customers and key potential customers are: type, consumption,
means, profitability, habits, etc.

- Which customers or potential customers represent real income generation


- Identify new niches or segments to introduce our business.

- Identify critical features of brand consumers.

When it comes to segmentation, it is very important to be clear about what we want

to segment, in order to define and limit the segmentation. Thus, we can segment:

- Markets: refers to the set of potential consumers who share a need.

- Types: is the set of consumers that share common features in terms of activity.

- Segments: the set of consumers that share common features, such as

geographic, demographic, behavioural, etc.

- Niches: a smaller part of the market, a very specific segment and delimited by a
specific variable.

- One to one: each customer is considered unique and is offered completely

individualised product/service and marketing actions.

Page 66

Before any segmentation, it is important to establish what is going to be segmented.

This will depend on the set objectives since it will not be the same segmentation if we
want to focus on carrying out specific promotional actions for each customer or if we
want to make global promotions according to the different types of customers we

Once the general scope of action is defined, we should identify to whom the
segmentation is directed. In customer analytics, we will focus on our customers, but
depending on the objective, such as expanding market share, segmentation can be
directed to other audiences:

- Population: refers to the entire group of people who are part of the
established set.

- Potential customers: consumers who are not our customers but who match
the profile of our customers.

- Customers: people who buy our product or service frequently.

- Consumers: people who consume our product or service. For example, if we

have a children's clothing and toy store, our customers are the parents of the
children, but children will be the consumers since they wear the clothes and
play with the toys.

As for the features or attributes to segment, there are infinite possibilities. We should
always take into account the data we have, if they have enough variation and the
established objectives. Some of the most common and most used variables or criteria
for segmentation are:

Page 67

- Geographic

- Based on the service

- Seniority

- Psychographic

- Sales volume

- Demographic

- Business profitability

- Based on benefits

- Based on use

- Strategic

- Based on behaviour

These are just some of the variables or attributes. As already said, their use will
depend on the set objectives and the available data.

When we have already decided what we are going to segment, to whom and according
to what variables, we must be clear about the strategy we want to carry out, that is,
we must be clear about the objective of the segmentation since it may be aimed at:

- Loyalty: maintain and protect our exclusive customers.

- Bond: increase business relationship.

- Maintain: increase growth rate.

- Attract: attract potential customers or inactive customers.

Page 68

Given these previous considerations, we will focus on the most common classifications
of segmentation types, such as status, sales volume and frequency of purchase. We
will also discuss strategic segmentation, operational segmentation and will review RFM
analysis, previously addressed.

Customer segmentation based on status

Five types of customer status can be differentiated:

- Current customers: customers who buy regularly, both companies and

individuals. Basically, customers who support the business.

- Active customers: customers who buy frequently, and who bought recently or
during a specific period of time established by the company such as a period of
three months. This period of time depends on the type of company or the
product or service it offers.

- Inactive customers: customers who purchased outside the period of time

established by the company. These customers can be targeted at some point to
encourage them to buy again, but before that, they should have been analysed
to know the reasons why they no longer buy with us so that a solution can be

- Potential customers: people who have not purchased, but who have shown
interest in the company and its products by requesting information or quotes.
They can become customers and income generators for the company at any

- Probable customers: have never bought and have not shown interest in us.
However, due to their features, the company considers that they could become
generators of future income.

Page 69

Customer segmentation based on sales volume

To segment our customers based on the sales volume, the 80/20 theory should be
taken into account, that is, 80% of our sales are made by 20% of our customers.

Our customers will be segmented in the following groups:

- Top customers: this first group includes customers that generate a volume that
is far above the average. It is important to know our top customers so that our
efforts and resources can be established based on those criteria.

- Big customers: customers that generate medium-high sales volume. They are
important customers, but they do not have the volume of top customers.

- Average customers: generate average sales volume.

- Low-value customers: their purchases are far below the average.

Customer segmentation based on purchase frequency

An average purchase frequency should be established and our customers should be

classified and segmented according to that frequency.

Depending on our type of business, the average purchase frequency will vary. For
example, in the case of a supermarket, the average frequency can be one or two
weeks, while in a clothing store this frequency can be higher.

The following groups or segments can be distinguished in this case:

- Frequent customers: buy according to the established average. It is very

important to take care of this group of customers and give them preferential
treatment so that they feel valued and their level of purchase can be

Page 70

- Regular customers: buy regularly, but do not purchase the average. It is very
important to maintain a high level of satisfaction by generating actions and
strategies adapted to them in order to achieve an increase in frequency.

- Casual customers: buy from time to time in our company. They also deserve
good service from the company, although most efforts and resources will be
allocated to frequent and regular customers.

Once customers are segmented in these categories, these segments should also be
segmented according to other variables, such as age, income, level of education, place
of residence, the number of people living in the household, tastes and preferences,
etc. It is very important to set clear and specific objectives so that our customers are
segmented according to the ongoing needs.

Basically, customers need to be organised in groups, in order to know the features,

expectations and needs of each group and be able to adapt to them, in order to follow
a specific strategy. This is known as strategic segmentation.

Strategic segmentation

Strategic segmentation enables strategic business decisions based on customer


Some types of strategic segmentation are:

- Segmentation based on profitability objectives per customer: involves

optimising the performance of customers that we already have by reducing
high impact variables such as the churn rate. In this way, customers who are
more likely to leave will be identified so that specific actions on this segment
can be carried out to prevent them from leaving.

Page 71

- Segmentation to establish a format strategy: depending on our type of

company as well as our type of customers, we will work with a certain format.
This is about segmenting customers based on the format they prefer so that we
can adapt to the needs and preferences of each of them when we consider

For example, if we have a luxury fashion store and we want to grow, we can
open another store like the store we already have but in an area where we
have no presence, but where there is a large volume of potential customers or
even current customers that go to where we currently are. We can also open
an outlet offering clothes from past seasons at a lower price in order to win
customers who, due to the prices, cannot buy our products in our current
store. This will also allow us to get rid of clothing from past seasons that our
current customers do not want.

- Segmentation to optimise the assortment: this strategy is particularly useful

for companies in the retail sector that need to select the products presented to
their customers in order to increase the volume of the average shopping cart.
To do this, the variables that are most sensitive at the time of purchase should
be known to establish what type of customers should be targeted to achieve
greater profitability.

Basically, the type of product that customers want should be known, and they
should be offered that product. Zara is an example of this strategy. They
replace the pieces of clothes that are the biggest sellers in each store, which is
why the same clothes are not found in all stores. It depends on what each store
sells the most.

- Segmentation to launch new products or services: if the company decides to

launch a new product or service, the segments that may be interested in it
should be clear before launching it so that this launching is profitable. It is
basically about being almost certain that a significant segment of our
customers will be interested in this new product or service. For example, if our

Page 72

customers are reluctant to use the web to buy, it will make no sense to create
an application to buy since it will not be used.

- Segmentation in a resizing strategy: this strategic segmentation will be carried

out in crisis environments when companies are forced to reduce either their
stores or their products and services. Customers should be segmented based
on the margin they provide to decide what is best for us.

Operational segmentation

There are five groups in operational segmentation:

- Segmentation based on CLV (customer lifetime value): companies often

calculate marketing ROI (return on investment) without taking into account
lifetime value, taking conversion as an isolated value as if they were different
customers. Being able to perform segmentations based on this value can offer
us much more information on ROI, since as seen before, the CLV is an indicator
that measures the current value of a customer’s future benefits.

- Segmentation based on product dimensions, profitability by customers, socio-

demographic dimensions, etc: this type of segmentation can help us to carry
out customised customer relationship campaigns for each segment. These
campaigns result in better rates in the KPIs or corresponding indicators such as
CTR (click through rate), which improve ROI as a result.

- Segmentation of communications with customers: through a simple analysis

of frequency, recency (days since the last purchase), monetary value and other
variables, communications can be segmented and returns can be measured
more precisely.

- Segmentation through social data and geocoding: thanks to networks and

social media, a wide range of data and information about our customers that
was difficult to obtain previously can now be collected. Thanks to this

Page 73

information, customers can be segmented using data regarding their interests,

opinions, preferences, tastes, etc.

- Segmentation combined with association algorithms: this segmentation is

particularly useful for online businesses. The interests of customers should be
identified so that products of their interest can be recommended to them in
future purchases. The information for segmentation is based on the history of
associated products in the transactions of customers. This will allow us to
personalise the most attractive offer for each customer, thus improving
conversion rates.

RFM analysis

As already seen, RFM analysis is a simple and effective segmentation method based on
the Pareto principle, which states that 20% of a company's customers generate 80% of
the revenue.

With this analysis, we intend to verify to what extent this principle is real in our case so
that each customer can be placed in their step in the value pyramid. In other words, it
allows us to order our customers from higher to lower value so that we can identify
customers that generate higher profitability and customers who do not.

The RFM analysis consists of classifying customers by their value according to three

- Recency: how recently a customer has purchased.

- Frequency: how often a customer purchases.

- Monetary: how much a customer spends.

Page 74


Knowing the types and segments of customers of our company or organisation as well
as the data referring to them, we should continue to what is known as customer value
management. This will help us to adapt to each type and each segment in the best way
possible in order to achieve the total satisfaction of our customers, their loyalty and,
therefore, increase our profits.

As has already been said, adapting to different types of customers as well as having a
good relationship with them is the key to lasting benefits. We are only not interested
in selling a lot with a specific campaign that attracts many new customers. In addition
to attracting new customers, we should also retain current customers so that they
become permanent or lasting customers who are happy with the company and with
the treatment they receive.


Customer value management focuses on taking advantage of customer relationship

management and using this to generate benefits and competitive advantage.

This process includes the following items:

- Customer identification

- Contact management

- Campaign management

- Advanced data modelling

- Customer score

In short, it is about managing the communications of the company with customers to

achieve the benefit of both parties.

Page 75

The following components or elements that should be part of customer value

management should be considered:

- Customer identification

- Focus on Data Warehouse customers

- Assignment costs and calculation of profitability

- Campaign management

- Data market and advanced data analysis

- Appropriate retention

Below is what should be considered regarding each of these elements.

▪ Customer identification

As seen in the previous two chapters, for adequate customer management that gives
good results, it is essential to make an appropriate identification in order to know
everything about customers such as residence, education, age, economic status,
consumption habits, needs and satisfaction, among others.

Data and information about our customers should be collected in order to segment
them according to certain variables and know perfectly the different profiles of
customers we have.

As already mentioned, this is a fundamental stage because it will allow us to identify

and get to know our customers so that personalised campaigns and promotions can be
carried out depending on each type. This will allow us not to target all our customers
with a general campaign, but rather campaigns will be customised according to some
factors. In this way, high-value customers who acquire our products or services weekly

Page 76

will not be targeted with the same campaigns as customers with lower value who
purchase monthly.

In short, this is about knowing how to identify the customer types or segments of our
company to then offer them what best suits their characteristics and needs.

▪ Focus on Data Warehouse customers

Data Warehouse refers to an electronic warehouse where companies or organisations

have a large amount of information and data about their customers. This means that
data in a Data Warehouse should be stored safely and reliably, and data should be
easy to retrieve and manage.

It is a data storage architecture that allows business managers to organise, understand

and use their data to make strategic decisions.

The architecture of these warehouses is usually divided into three simplified


- Basic structure: together with operating systems and flat files, it provides raw
data that are stored together with metadata. End users can access them for
analysis, report generation and mining.

- Test area: this area can be placed between the data sources and the
warehouse, and provides a place where data can be cleaned before entering
the warehouse.

- Data marts: systems designed for a particular business line. Data marts are
data subsets that aim to help a specific area of the business to make better
decisions. There may be separate data marts in different departments or
categories such as sales, purchases, inventory, etc., to only access the category
that is needed for the analysis.

Page 77

We mean that we should focus on the analysis of the customer in relation to the data
available, that is to say, in relation to the data we have entered in our Data

▪ Assignment costs and calculation of profitability

As mentioned, we should know customers and differentiate high-value customers

from low-value customers since this will allow us to identify customers that are more
profitable, and we will be able to allocate more resources to high-value customers and
fewer resources to customers that bring less benefits.

In short, the budget for the actions and marketing campaign for each group or
segment of customers should be established. Most resources are used for customers
that bring higher profits.

▪ Campaign management

An excellent campaign management is the result of successful customer identification.

Campaigns will not give positive results without this previous phase.

When a campaign or promotional action based on data collected over time is

launched, it is very important that all results, impressions, comments and
considerations are gathered so that data and information can be used in future

It is very important to carry out a continuous improvement process during the

campaign management process. That is, we should keep track of everything that
happens throughout the campaign so that the elements that were poorly planned can
be rectified on the one hand and, on the other hand, gather data and information that
will help us in future campaigns.

Page 78

▪ Data market and advanced data analysis

The success of customer value management lies mainly in the ability to analyse
customer data, predict their behaviour as well as their needs and focus customer
relationship management mainly on how to treat individual customers so that actions
and promotions for each type of customer can be personalised. It is an essential cycle
or process for successful marketing campaigns and actions.

In this process, advanced data analysis will provide us with a collection of prediction
models. The execution of these models translates into each customer being assigned
with a predictive score that expresses the probability that the customer behaves or
responds to a specific offer, action or promotion in a particular way. These models,
which are functions of a reduced set of variables and are derived from a sample of
customer data, should be executed in the entire customer base.

As already seen, our customers can be segmented based on many variables such as
demographic data, behavioural data or data referring to the frequency of purchase. A
prediction model for each type of segment should be established, since each group or
segment is different. This means that what works with some customers may not work
with some others so it is very important to adapt to each group.

Each of the segments or groups of customers that were identified in the first phase will
be assigned a basic customer marketing strategy. This will allow us to know what kind
of campaigns or actions are appropriate and will be profitable for each segment. As
said, personalisation of campaigns based on the type of customer is fundamental to
obtaining value and achieving our customers’ loyalty. This means that general
campaigns and strategies will not be carried out since they will not work for all

Customer relationship management is very important here. It is the main vehicle for
delivering results of the prediction models and treatment recommendations of the
decision engine of the workers who are in contact with customers. It also helps us to

Page 79

track any interaction with customers as well as the reactions and responses of
customers to the different inputs they receive.

From a data storage perspective, customer relationship management processes are a

primary and transactional system. The information obtained and collected as a result
of interactions with customers is fed back to the data warehouse. This occurs in
combination with comments from other systems.

In customer value management, customer relationship management is very important

as this will greatly help us to plan actions and campaigns focused on each type of
customer. In other words, if we do not analyse and know our relationship with our
customers, we can hardly manage the value of customers. This will have a negative
impact when launching our marketing campaigns because we will not be aware of
important aspects that can contribute to the success or failure of the campaign.

▪ Adequate retention

The design and implementation of actions and specific marketing campaigns for each
group or customer segment aims to retain customers.

Customer retention is effective when we manage to retain the right customers, that is,
the most valuable customers for the company.

As already mentioned, it is particularly important to identify and get to know all of our
customers in order to make an adequate segmentation based on certain criteria. This
will help us to know what customers have higher and lower value in order to retain
high-value customers and turn low-value customers into high-value customers.

In short, customer value management is a strategy that helps companies to acquire,

develop and retain customers.

The company should know the value that customers perceive of our company since
this will help us when planning and defining strategies. To achieve better results, we

Page 80

will not only focus on the most valuable customers, but also on the elements that
make the customer perceive value of our company.

Customer Perceived Value is the evaluated value that a customer perceives to obtain
by buying a product. It is the difference between the total obtained benefits according
to the customer perception and the cost that he had to pay for that. Customer
perceived value is seen in terms of satisfaction of needs a product or service can offer
to a potential customer. The customer will buy the same product again only if he
perceives to be getting some value out of the product. Hence, delivering this value
becomes the motto of marketers.

The following formula shows the components of perceived value:

Customer Perceived Value = Total Perceived Benefits – Total Perceived Costs

As seen, managing these elements will help us to contribute to our customers having a
good perception of our company so they should not be ignored in our customer value
management process.

Customers are the main element of any company or organisation, and that is why their
satisfaction and loyalty should be our main objectives. If they are happy, they will
consider our company has higher value and this will make them become a valuable
customer, which will have a positive impact on our profits.

Given these considerations in relation to customer value management, clusters and

their analysis will now be discussed. They will allow us to identify relationships in our
customers’ data in order to find new patterns to determine the value of our


Page 81


The previous subject showed a series of techniques and graphic representations that
allow us to establish relationships between certain variables in order to segment our
customers and identify higher-value customers.

These techniques are visual so they show relationships between variables that are easy
to see. This section will focus on establishing the relationships between variables that
cannot be easily seen. The so-called unsupervised methods will be used for this.

3.2.1 Cluster analysis

The main objective of the cluster analysis is to group the observations of our data in
different clusters with similar data.

Customers are clustered based on variables that seem to be unrelated. For example, in
a company that sales package holidays, customers will not be only be clustered
according to variables such as age or purchasing power, but also according to their
preferred means of transport, the destinations they have visited and would like to visit,
their preferred type of accommodation, or the leisure activities in which they
participate during trips, among others.

Thanks to dividing our customers into more complex groups, we will be able to offer
them the products that best suit their needs, tastes and preferences.

In regard to this analysis, we will focus on two approaches:

- Hierarchical clustering

- K-means clustering

Before dealing with these two approaches, cluster analysis will be further discussed. As
already mentioned, it is a set of techniques used to classify a group of individuals into

Page 82

homogeneous groups. The main feature of this type of analysis is that groups are
unknown beforehand so these groups should be established.

The main objective of this analysis is to obtain classifications through an analysis with a
significant exploratory nature. Our customers are basically clustered according to
variables that are not easily seen, but require further analysis and exploration.

Cluster analysis seeks to find groups to assign individuals according to some uniformity
criterion so it is essential to establish a measure of similarity or divergence in order to
classify individuals in groups.

Before starting with this analysis, it is important to consider the following:

- Before starting the analysis, it is important to decide if the analysis will start
either from an initial cluster already made, or if each individual element will be
considered as an initial cluster that will be later clustered until obtaining the
final clusters. That is, we should decide if the analysis will start from groups or
segments already made such as age segments, social class or place of
residence, or if each customer will be considered individually.

The complete process of analysis will be structured based on the following points:

- The analysis will be based on a certain number of individuals that, as said, may
already be clustered, or each member will be considered individually.

- A similarity criterion should be established to create a matrix that allows us to

relate the similarities of individuals with each other.

- A classification algorithm will be chosen in order to establish the clustering

structure of individuals.

- That structure will be specified through arboreal or dendrogram diagrams or

other graphs.

Page 83

To be able to make a good cluster analysis that allows us to obtain groups of

customers that can be used to create higher value, the following should be considered
since this is key for a good classification:

- Good selection of the variables that will describe individuals.

- Establish the similarity criteria to be used.

- Properly select the classification algorithm.

Similarity criteria

As stated previously, similarity criteria are very important to analyse clusters. This is
the starting point for establishing clusters of our customers.

Once the selection variables to be considered in our analysis are established, similarity
criteria should be established. In other words, we should establish how similar or
different the individuals are to each other since groups will be created based on this.

To measure how similar the individuals we want to cluster are, there is a wide range of
similarity and dissimilarity indices. Here we will focus on the concept of distance. Four
types will be differentiated:

- Euclidean distance

- Hamming distance

- Manhattan distance

- Cosine similarity

Page 84

*Example of a distance map. Source:


Below is a brief introduction to these distances.

- Euclidean distance

It is the most popular type of distance and it measures the distance between two

This distance will be used when variables are homogeneous and are measured in
similar units and/or when the variance matrix is unknown.

The following link is an example of the Euclidean distance:


Page 85

- Hamming distance

This distance is used for categorical variables such as man/woman or large/

medium/small. Distance can be defined as 0 if two points are in the same category and
as 1 otherwise. Categories can also be sorted to know those that are closer to each
other so a number sequence like 1, 2 and 3 could be used.

The following link is an example of the Hamming distance:


- Manhattan distance

The distance between two points measured along axes at right angles. Manhattan
distance is often used in integrated circuits where wires only run parallel to the X or Y

The following link is an example of the Manhattan distance:


- Cosine similarity

Measures the smallest angle between two vectors (the angle between two vectors is
supposed to be between 0 and 90 degrees).

The following should be considered in regards to this distance:

- Two perpendicular vectors (90 degrees) are the most dissimilar.

- Two parallel vectors are the most similar. They are even identical if both are
assumed to be based on the origin.

Page 86

The following link is an example of the Cosine similarity:


Data preparation

Having discussed the distances in cluster analysis, it is worth dealing with data
preparation. As mentioned throughout the manual, it is not enough to just take the
data and analyse and segment them, it is also very important to prepare such data and
verify that there is enough correct and valid information for the analysis.

If a cosmetics company wants to establish age groups with gender differentiation for
their best-selling products, they should have the whole information, since on the
contrary, they can hardly segment correctly. If they do not know the age of part of
their customers, that part will not be segmented and the analysis and the following
campaign will not have the expected benefits.

Another essential aspect to take into account in data preparation refers to the units or
metrics in which some of these data are expressed as well as their names. We must
ensure that data are expressed in the same unit of measure or with the same name.
Otherwise, they will be considered as different variables.

For example, it is not the same to measure the height of someone as 1.60 m and 160
cm, or to write the name of a city as Barcelona, Barna or Bcn; three names that refer
to the same city but that would be considered different variables.

Given these considerations, the two approaches mentioned at the beginning will be
discussed. After these, the PCA (principal component analysis) will be addressed.

▪ Hierarchical clustering

This approach presents a wide range of methods that aim to group clusters in order to
make a new cluster or to divide an already existing cluster into two different clusters.
Thus, if this agglomeration or division process is carried out successively, some
distance is minimised or some similarity measure is maximised.

Page 87

Strategies for hierarchical clustering generally fall into two types:

- Agglomerative: This is also called the bottom-up approach. Bottom-up

algorithms treat each document as a singleton cluster at the outset and then
successively merge (or agglomerate) pairs of clusters until all clusters have
been merged into a single cluster that contains all documents.

- Divisive: This variant is also called top down clustering. All observations start in
one cluster, and splits are performed recursively as one moves down the

These approaches allow us to build a classification tree called a dendrogram, where

the clustering process can be followed graphically since it shows clusters that merge,
at what level they do it and to measure value between clusters when they are

* Example of a dendogram. Source: https://dlegorreta.wordpress.com/2015/03/18/analisis-de-


Page 88

The different types that can be found in the bottom up and top down methods will not
be discussed. Their calculation will not be addressed either since it is currently
automated through computer software.

▪ K-means clustering

K-means clustering is a type of unsupervised learning which is used when you have
unlabelled data (i.e., data without defined categories or groups). The goal of this
algorithm is to find groups in the data, with the number of groups represented by the
variable K. The algorithm works iteratively to assign each data point to one of K groups
based on the features that are provided. Data points are clustered based on feature

The Κ-means clustering algorithm uses iterative refinement to produce a final result.
The algorithm inputs are the number of Κ clusters and the data set. The data set is a
collection of features for each data point. The algorithms start with initial estimates for
the Κ centroids, which can either be randomly generated or randomly selected from
the data set. The algorithm then iterates between two steps:

- Data assignment step: Each centroid defines one of the clusters. In this step,
each data point is assigned to its nearest centroid based on the squared
Euclidean distance.

- Centroid update step: In this step, the centroids are recomputed. This is done
by taking the mean of all data points assigned to that centroid's cluster.

This is the most popular clustering algorithm when data are numerical and the
distance between them is Euclidean. Although the number of clusters should be
chosen in advance, it is an easy application and implementation method in addition to
being faster than hierarchical clustering in large data sets.

Page 89

Before dealing with PCA in the development of clusters, it is important to consider the
following aspects regarding clustering:

- The main objective of clustering is to find the similarities between our data that
are not easily seen.

- In a good clustering, the points of the same cluster have to be more similar
than the relationship of each of the points to the points of other clusters.

- When clustering, the units of each variable matter. Different units cause
different distances and, potentially, different clusters.

- Clustering is used for data exploration or to start a better algorithm.

- Different clustering algorithms give different results, so different views with

different numbers of clusters should be considered.

Given these considerations, principal component analysis will be now discussed.

▪ PCA for clustering

It is a common practice to apply PCA (principal component analysis) before a clustering

algorithm (such as k-means). It is believed that it improves the clustering results in
practice (noise reduction).

Principal component analysis is a statistical technique used to emphasise variation and

bring out strong patterns in a dataset. It reduces a large set of observations of possibly
correlated variables into a small set that still contains most of the information in the
large set.

The new principal components after the reduction will be a linear combination of the
original variables and they will also be independent of each other.

Interpreting factors is fundamental in PCA since this is not given and should be
deduced after observing the relation of the factors with the initial variables. This
means that both the sign and the magnitude of the correlations should be studied. This

Page 90

task is not always easy, so the knowledge of the expert on the subject of research will
be essential.

The following phases of the principal component analysis should be considered:

- Correlation analysis: principal component analysis makes sense provided there

are high correlations between the different variables since this is indicative of
redundant information and therefore, few factors will explain much of the total

- Selection of the factors: this selection is made in such a way that the first factor
gathers as much as possible of the original variability, the second factor should
collect the maximum possible variability not collected by the first factor and so
on. Of the total of these factors, those that collect the percentage of variability
that is considered sufficient will be chosen. These selected factors will be called
principal components.

- Factor analysis: when the principal components are already selected, they are
represented as a matrix. Each element of this matrix represents the factorial
coefficients of the variables (the correlations between the variables and the
principal components). This matrix will have as many columns as principal
components and as many rows as variables.

- Factor interpretation: for a factor to be easily interpretable, it must meet the

following characteristics:

o Factor coefficients should be close to 1.

o A variable should have high coefficients with only one factor.

o There should not be factors with similar coefficients.

- Calculation of factor scores: scores of the principal components for each case
that allow their graphic representation.

Page 91


After dealing with the theory of how customer analytics work and all the elements and
considerations that should be taken into account for its implementation to be
beneficial for our company or organization, this section will offer all some practical
examples and information.


As seen, the customer journey is a tool that allows us to visually trace and see the
journey of customers in our company.

The following links offer practical examples on how to effectively create a customer
journey map.

Link: https://www.ngdata.com/how-to-create-a-customer-journey-map/

Link: https://business.tutsplus.com/tutorials/customer-journey-map--cms-27014

Link: https://www.youtube.com/watch?v=A2LFJF1SUBg


Introduction to R and RStudio

Link: http://ncss-

RStudio Tutorial For Beginners | RStudio Installation

Link: https://www.youtube.com/watch?v=mcYcjH-1giM

Page 92


Conduct and interpret a cluster analysis

Link: http://www.statisticssolutions.com/cluster-analysis-2/

Cluster analysis, a practical example

Link: https://www.focus-


These links extend the information regarding hierarchical clustering. The different
types of methods and the formulas for their calculation are presented.

Hierarchical clustering methods

Link: https://www.saedsayad.com/clustering_hierarchical.htm

Link: https://uc-r.github.io/hc_clustering


The following links are practical examples of this algorithm that will help us to review
and expand the previous information.

Understanding K-Means clustering with examples

Link: https://www.edureka.co/blog/k-means-clustering/

Link: https://people.revoledu.com/kardi/tutorial/kMean/NumericalExample.htm


Link: https://www.utdallas.edu/~herve/abdi-awPCA2010.pdf

Link: https://www.youtube.com/watch?v=eJ08Gdl5LH0

Page 93



Montserrat García-Alsina. Big data. Gestión y exploración de grandes volúmenes de

datos. Editorial UOC, 217.

Bill Schmarzo. Big Data. El poder de los dtos. Anaya Multimedia, 2014.


Nuria Braulio y Josep Curto, 2015. Customer analytics. http://docplayer.es/17897069-


Carmina Valverde. Customer analytics para principiantes.


40 de fiebre. Las mejores herramientas para construir tu dashboard.


40 de fiebre. Como crear informes personalizados con Google Analytics.


Gabriela Campos Torres, 2016. Customer joruney: qué es y cómo crear uno.

Vanesa Klainer. Sobre la segmentación de clientes.


Guillermo Córdoba, 2014. Segmentación de clientes.


Page 94

Iván Ruíz Sevilla. Sobre el cálculo del valor de vida del cliente.

Page 95

You might also like