Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
584 views

Customer Segmentation Using K-Means Custering Report - ML3

This document is a project report submitted by four students on the topic of "Customer Segmentation Using K-Means Clustering". It discusses customer relationship management (CRM) and the importance of customer segmentation within a CRM framework. Customer segmentation involves using clustering techniques to group customers into homogeneous clusters based on similar characteristics. The report explores clustering models for segmentation, focusing on the K-Means and hierarchical clustering techniques. It aims to develop a hybrid model that combines the two approaches to improve performance over individual models.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
584 views

Customer Segmentation Using K-Means Custering Report - ML3

This document is a project report submitted by four students on the topic of "Customer Segmentation Using K-Means Clustering". It discusses customer relationship management (CRM) and the importance of customer segmentation within a CRM framework. Customer segmentation involves using clustering techniques to group customers into homogeneous clusters based on similar characteristics. The report explores clustering models for segmentation, focusing on the K-Means and hierarchical clustering techniques. It aims to develop a hybrid model that combines the two approaches to improve performance over individual models.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 26

Bansilal Ramnath Agarwal Charitable Trust’s

Vishwakarma Institute of Information Technology


(Department of Electronics & Telecommunication)

A
Project entitled

“Customer Segmentation Using K-Means Custering”


Submitted by

Aditya Mandekar(21820105)
Sujitkumar Sah(17U675)
Naimish Sukhdeve(21820107)
Manasi Mathkari(21820054)

T.Y. Electronics & Tele-Communication

Under supervision of

Prof.Mrs.A.P.Navghane
Prof.Mr.Yogesh Dandawate

Year 2019-2020

1
Bansilal Ramnath Agarwal Charitable Trust’s
Vishwakarma Institute of Information Technology
(Department of Electronics & Telecommunication)

CERTIFICATE

This is to certify that the project “Customer Segmentation Using K-means Clustering” has
been successfully completed by,

Aditya Mandekar
Sujitkumar Sah
Naimish Sukhdeve
Manasi Mathkari

It is a work done by the students and has not been submitted previously by any other
student/students.

The work is done, on the basis of the work allotted to these students, based on various Project
ideas presented by them.

This project report is being submitted as a part of the subject Project at T.Y.-E&TC

Prof.Mrs.A.P.Navghane Prof. (Dr) S.V Kulkarni

Project Guide H.O.D. ENTC

2
INDEX

Sr.no. Contents Page no.

1. Abstract 4

2. Introduction 5

3. Concept of CRM and usability of project 7

4. Clustering techniques 10

5. Merits & Demerits 15

6. Programming details & Graphs 16

7. Conclusion 22

8. References 22

3
Abstract

For any Buissnesman, customer is God. So fulfilling customer needs is one of the topmost

outcome of successful buiseness. Rarely does one size fit all. And that is why we need customer

segmentation as the base of any marketing strategy. Every buyer has individual preferences,

needs, and behavioral patterns. The standards of good customer experience are ever increasing.

Segmentation helps to grow your business by building long-term relationships with your

customers and providing exceptional customer experiences.

Customer Relationship Management(CRM) has always played a

crucial role as a market strategy for providing organizations with the quintessential business

intelligence for building, managing and developing valuable long-term customer relationships. A

number of business enterprises have come to realize the significance of CRM and the application

of technical expertise to achieve competitive advantage.

This project explores the importance of Customer Segmentation as

a core function of CRM as well as the various models for segmenting customers using clustering

techniques. The available clustering models for customer segmentation, in general, and the major

models of K-Means and Hierarchical Clustering, in particular, are studied and the virtues and

vices of the techniques are pointed out. Finally, the possibility of developing a hybrid solution by

the combination of the above two techniques, having the ability to outperform the individual

models, is discussed.

4
Introduction

In the contemporary day and age, the importance of treating customers as the principal asset of

an organization is increasing in value. Organizations are rapidly investing in developing

strategies for better customer acquisition, maintenance and development. The concept of

business intelligence has a crucial role to play in making it possible for organizations to use

technical expertise for acquiring better customer insight for outreach programs. In this scenario,

the concept of CRM garners much attention since it is a comprehensive process of acquiring and

retaining customers, using business intelligence, to maximize the customer value for a business

enterprise.

One of the two most important objectives of CRM is customer development

through customer insight. This objective of CRM entails the usage of an analytical approach in

order to correctly assess customer information and analysis of the value of customers for better

customer insight. Keeping up with the changing times, organizations are modifying their

business flow models by employing systems engineering as well as change management and

designing information technology(IT) solutions that aid them in acquiring new customers, help

retain the present customer base and boost the customer’s lifelong value. According to , due to

the diverse range of products and services available in the market as well as the intense

competition among organizations, customer relationship management has come to play a

significant role in the identification and analysis of a company’s best customers and the

adoption of best marketing strategies to achieve and sustain competitive advantage.

One of the most useful techniques in business analytics for the analysis of

consumer behavior and categorization is customer segmentation. By using clustering techniques,

5
customers with similar means, end and behavior are grouped together into homogeneous clusters.

Customer Segmentation helps organizations in identifying or revealing distinct groups of

customers who think and function differently and follow varied approaches in their spending and

purchasing habits.

Clustering techniques reveal internally homogeneous and externally

heterogeneous groups. Customers vary in terms of behavior, needs, wants and characteristics and

the main goal of clustering techniques is to identify different customer types and segment the

customer base into clusters of similar profiles so that the process of target marketing can be

executed more efficiently. This study aims to explore the avenues of using customer

segmentation, as a business intelligence tool within the CRM framework as well as the use of

clustering techniques for helping organizations redeem a clearer picture of the valuable customer

base. The concepts of customer relationship management, customer segmentation as a core

function of CRM as well as the approach of segmenting customers using clustering techniques

are discussed.

The available clustering models for business analysis in the context of customer

segmentation, the advantages and disadvantages of the two main models chosen for our study-

K-Means and Hierarchical Clustering, as well as the possibility of developing a hybrid model

which can outperform the individual models is surveyed.

6
Concept of CRM and usability of Project

A. Customer Relationship Management:

Customer Relationship Management is an important business approach for developing and

securing steady, long-term customer associations. The modern marketing approach promotes the

usage of CRM as part of the organization’s business strategy for enhancing customer service

satisfaction. CRM enables business enterprises in customer value analysis as well as the targeting

of those customers that prove of greater value. It also helps business organizations in developing

high-quality and long-term customer-company relationships that increase loyalty and profits. An

accurate evaluation of customer profitability and the targeting of high value customers are

important factors that contribute to the success of CRM .

CRM being a customer-centric strategy, it is important for business organizations

to be familiar with their customer base in terms of its characteristics and behavior. These insights

into the customer data can then become useful when it is employed in such Information

Technology (IT) solutions that provide valuable outputs for better targeting the profitable

customers.

CRM plays a major role in targeting the customer base, once it is identified using

essential segmentation strategies. According to the CRM strategy is a closed circular structure

with four dimensions; Customer identification, customer attraction, customer retention, and

customer development. Thus, customer identification which lays the foundation of this structure

clearly implies that the act of grouping or segmenting customers according to their behavior and

characteristics-customer segmentation, emerges as a core function of CRM.

7
B. Customer Segmentation :

As the market is widening, the rate of competition between all business entities is rapidly

growing. Hence, these business enterprises are increasing their expenditure on their marketing

strategies to achieve competitive advantage . In this context, the significance of employing

Information Technology (IT) solutions to marketing campaigns emerges as a pivotal step in a

modern approach to business. Customer Segmentation is a popular technique of partitioning the

customer base into externally distinct and internally uniform groups in order to create varied

marketing strategies for targeting each group according to its characteristics. Generally speaking,

it is defined as the process whereby the consumers of a business enterprise are divided into

groups according to their preferences, characteristics and purchasing behavior.

By studying and analyzing large volumes of collected customer data, businesses can improve

their marketing decisions based on the customer’s preferences. According to, maximum profits

can be generated for any business entity if the resources are utilized judiciously in order to

cultivate the most loyal and useful group of customers once customer segmentation and

clustering have enabled the allocation of customers to such groups. The total customer set can be

divided and grouped into clusters based on their buying behavior, frequency, demographics etc.

Hence, instead of studying each customer individually, firms can group similar customers

together so that their needs can be better understood.

8
C. Use of Customer Segmentation :

Target Marketing and Customer Segmentation are so closely related that they are often used

interchangeably. Target marketing refers to the grouping of buyers based on certain

characteristics which the firms intend to serve. It has been referred to as a personal branding

strategy in the context of a specific customer. There are three steps that ought to be followed in

order to devise segmentation-based marketing strategies. First, customers in the selected market

are segmented into different groups based on their characteristics. Secondly, the segments are

studied for their properties and the different ways in which marketing tactics can be applied to

that specific group. At last, required comparisons on the competing brands and studies about

customer behavior to their products can be completed. Hence, a segmentation model which is

useful will be able to effectively increase the profitability and competitive value for a company.

The next section delves into the concept of using clustering for customer segmentation and the

various algorithms involved.

9
Clustering Techniques

A. Clustering for Segmentation Purposes

Clustering techniques reveal internally homogeneous and externally heterogeneous groups.

Customers vary in terms of behavior, needs, wants and characteristics and the main goal of

clustering techniques is to identify different customer types and segment the customer base into

clusters of similar profiles so that the process of target marketing can be executed more

efficiently. Both, hierarchical and non-hierarchical clustering algorithms are widely used in

customer segmentation, most prominent among them being K-Means and Agglomerative

Hierarchical Clustering. In K-Means has been used as part of their clustering approach.

Although, hierarchical clustering algorithm seems unsuitable to many, have used it for intelligent

customer segmentation for their research and have made use of it for applying clustering

algorithms on the transaction data from a supermarket. K-means and Hierarchical Clustering

algorithms are useful for clustering data and find extensive usage in customer segmentation.

Hence, they will be our main focus of interest.

C. K-Means Clustering:

K-Means is one of the most widely used clustering algorithms, and is simple and efficient. The

aim of K-Means algorithm is to divide M points in N dimensions into K clusters (assume k

centroids) fixed a priori. These centroids should be placed in a wise fashion so that the results are

10
optimal which otherwise can differ if locations of the centroids change. So, they should be

placed as far as possible from each other. Each data point is then taken and associated with the

nearest centroid until no data points are pending. This way an early grouping is done and at this

point, k new centroids have to be recalculated as these will be the centers of the clusters formed

earlier. After having calculated these centroids, the data points are then allocated to the clusters

to the nearest centroids. In this iteration, the centroids change their position stepwise until no

further modifications have to be done and the location of the centroids remain intact.

The K-Means algorithm is relatively simple.

The K cluster points, which will be the centroids, are placed in the space

among the data points. Each data point is assigned to the centroid for which the distance is the

least. After each data object has been assigned, centroids of the new groups are re-calculated.

The above two steps are repeated until the movement of the centroid ceases. This means that the

objective function of having the least squared error is completed and it cannot be improved

further. Hence, we get K clusters as a result.

K-Means algorithm aims at minimizing an objective function, which

here, is the squared-error. It is an indicator of the distance of the data points from their respective

cluster centers. The process in this algorithm always terminates but the relevance or the optimal

configuration cannot be guaranteed even when the condition on the objective function is met.

The algorithm is also sensitive to the selection of the initial random cluster centers. That is why it

runs multiple times to reduce this effect but for a large number of data points, it tends to perform

very well even though it is iterative.

Here we apply K-Means Clustering algorithm on a relatively small

dataset and the results are depicted. The dataset is based on customer information for a mall and

11
has 5 attributes named Customer, Genre, Age, Annual Income and Spending Score. It consists of

200 observations, each of which refers to a unique customer and the spending scores are decided

and calculated by the company, based on their spending habits. Hence annual income and

spending scores are the key indicators in this data. The age attribute of the customers can also be

experimented with, to analyze which age group works best for a business. Any business would

always keep the monetary values of any customer as top indicators. Thus, the annual income and

spending scores of the customers will be best suited for clustering.

As K-Means algorithm requires the number of clusters as input,

below we will use the elbow method to get the optimal number of clusters which can be formed.

It works on the principal that after a certain number of K clusters, the difference in SSE (Sum of

Squared Errors) starts to decrease and diminishes gradually. Here, the WCSS (Within-Cluster-

Sum-of-Squared-errors) metric is used as an indicator of the same. Hence, the K value, specifies

the number of clusters.

In Figure 1, it can be observed that an elbow point occurs at K=5. After K=5, the difference in

WCSS is not so visible. Hence, we will choose to have 5 clusters and provide the same as input

to the K-Means algorithm.

12
As shown in Figure 2, the scatter plot of the clusters is created with Annual income plotted

against X-axis and Spending Score against Y-axis. The data points under each cluster are

represented using distinct colours and the centroids are also highlighted, as shown above.

D. Hierarchical Clustering :

Hierarchical clustering is a method of cluster analysis which builds a hierarchy of data points as

they move into a cluster or out of it. Strategies for this algorithm generally fall into two

categories.

13
1. Agglomerative - This is a bottom-up approach where each observation begins as an

initial cluster and then merges into clusters as they move up the hierarchy. Divisive

technique is a top-down approach where there is only one cluster initially and is then split

into finer cluster groups as they move down the hierarchy. This merging and splitting of

clusters takes place in a greedy manner and the hierarchical algorithm yields a

dendrogram which represents the nested grouping of patterns and the levels at which

groupings change.

2. Divisive - These are quite rarely used in market research and hence the agglomerative

approach is the one that is widely followed by the practitioners. Here in each step, the

two closest clusters are merged based on a specific linkage criterion. The linkage

criterion defines the distance between the two clusters.

The grouping of data points depends very much on the choice of the linkage criterion.

Some of them are complete, single, ward or average. The time complexity of linkage

metrics- based hierarchical clustering is high and it is generally given by O(n3).

The algorithm for the agglomerative hierarchical clustering approach proceeds by taking

each observation in a cluster of its own. A pair of clusters with the shortest distance

between them is chosen. The above two clusters are replaced with a new cluster by

merging the original clusters in the previous step. Previous two steps are repeated until

only one cluster remains and that cluster will contain all the observations.

14
Merits, Demerits & Applications

Merits-

◼ Determine appropriate product pricing.

◼ Develop customized marketing campaigns.

◼ Design an optimal distribution strategy.

◼ Choose specific product features for deployment.

◼ Prioritize new product development efforts.

Demerits-

◼ Difficult to predict K-Value.

◼ With global cluster, it didn't work well.

◼ Different initial partitions can result in different final clusters

◼ It does not work well with clusters (in the original data) of Different size and Different

density.

15
Programming details & Graphs

• We started with loading all the libraries and dependencies. The columns in the dataset are

customer id, gender, age, income and spending score.

• We dropped the id column as that does not seem relevant to the context. Also I plotted

the age frequency of customers.

16
• Next we made a box plot of spending score and annual income to better visualize the

distribution range. The range of spending score is clearly more than the annual income

range.

17
• We made a bar plot to check the distribution of male and female population in the

dataset. The female population clearly outweighs the male counterpart.

18
• Next we made a bar plot to check the distribution of number of customers in each age

group. Clearly the 26–35 age group outweighs every other age group.

19
• We continued with making a bar plot to visualize the number of customers according to

their spending scores. The majority of the customers have spending score in the range

41–60.

20
• Also we made a bar plot to visualize the number of customers according to their annual

income. The majority of the customers have annual income in the range 60000 and

90000.

21
• Next we plotted Within Cluster Sum Of Squares (WCSS) against the the number of
clusters (K Value) to figure out the optimal number of clusters value. WCSS measures
sum of distances of observations from their cluster centroids which is given by the below
formula.

• where Yi is centroid for observation Xi. The main goal is to maximize number of clusters
and in limiting case each data point becomes its own cluster centroid.

22
The Elbow Method:
Calculate the Within Cluster Sum of Squared Errors (WSS) for different values of k, and choose
the k for which WSS first starts to diminish. In the plot of WSS-versus k, this is visible as an
elbow.
The optimal K value is found to be 5 using the elbow method.
Finally I made a 3D plot to visualize the spending score of the customers with their annual
income. The data points are separated into 5 classes which are represented in different colours as
shown in the 3D plot.

23
Final output:

24
Conclusion

 K means clustering is one of the most popular clustering algorithms and usually the first

thing practitioners apply when solving clustering tasks to get an idea of the structure of

the dataset.

 The goal of K means is to group data points into distinct non-overlapping subgroups.

 One of the major application of K means clustering is segmentation of customers to get a

better understanding of them which in turn could be used to increase the revenue of the

company.

25
References

▪ https://www.kdnuggets.com/2019/11/customer-segmentation-using-k-means-

clustering.html

▪ www.wikipedia.org

▪ www.sciencepubco.com/index.php/IJET

▪ http://playwidtech.blogspot.com/2013/02/k-means-clustering-advantages-and.html

26

You might also like