Telecom_Customer_Churn
Telecom_Customer_Churn
Telecom_Customer_Churn
This work is protected by copyright. You are free to use this resource, with proper attribution, for
research and educational purposes. Other uses, such as reproduction or publication, may require the
permission of the copyright holder.
Downloaded On 2024/12/25 23:29:02 -0500
Telecom Customer Churn Prediction
Abhijit Sundararajan
Department of MSIS, Rutgers University
E-mail: abhijit.sundararajan@rutgers.edu
Kemal Gursoy
Department of MSIS, Rutgers University,
E-mail: kgursoy@business.rutgers.edu
Abstract 1. Introduction
Customer churn is often referred to as customer Customer churn is a common problem across
attrition, or customer defection which is the rate at several business in different industries. There is a
which the customers are lost. Telecom companies substantial amount of financial loss associated with
often use customer churn as a key business metrics churning, as the businesses must invest huge sum
to predict the number of customers that will leave of money in getting new customers. Time and
a telecom service provider. Churn is significant in efforts need to be invested into replacing the
the telecommunication industry because it directly customers that are lost. Businesses need to be able
affects the competitiveness of the service provider. to predict when a client is likely to leave and offer
Churn is the proportion of clients leaving the them lucrative incentives to stay. The most
service provider. The service provider should important essence of using predictive analysis is to
therefore find new clients to preserve profitability. determine the customers that are likely to churn.
This is not possible due to the high and difficult cost
of acquiring new customers. The cost of retaining Companies typically make a difference between
an existing customer is far less than acquiring a new voluntary churn and involuntary churn. Voluntary
one, so telephone service companies, Internet churn occurs because of a selection by customer to
service providers, pay TV companies, insurance replace to any other organisation or provider,
firms and alarm monitoring services use the involuntary churn happens because of instances
customer churn to predict profitability. Businesses inclusive of a client's relocation for an extended-
time period in a care facility, loss of life, or the
basically believe that a customer has churned with
relocation to a distant region. In maximum
the amount that has passed since the customer’s
packages, involuntary reasons for churn are
last interaction with the site or service. The full cost
excluded for analytical purposes. Analysts tend to
of customer churn includes both lost revenue and
concentrate on voluntary churn, as it normally
the marketing costs involved with replacing those
occurs because of factors of the corporation-
customers with new ones. Reducing customer customer courting which agencies manage, which
churn is important because cost of acquiring a new includes how billing interactions are treated or how
customer is higher than retaining an existing one. after-income assistance is provided. Predictive
This case is related to telecom industry where analytics use churn prediction fashions that expect
organizations want to know that for given certain purchaser churn through assessing their propensity
parameters whether a person will churn or not. of chance to churn. Seeing that these fashions
generate a small prioritized list of ability defectors,
they're effective at focusing consumer retention
advertising packages at the subset of the consumer
base who're most liable to churn.
There are 2 types of churners: Voluntary churner
and Involuntary Churner.
Involuntary Churners: This type includes the
customers that are removed by the service
1
providers from the subscription list. The customers Figure 1.
that come under this category are the customers
who cheat or are churned for fraudulent activities,
the customers who do not pay their subscription
charges and the customers who do not use the
services.
2. Random Forest
Random is used for classification problems. It is a
4. Experiments
combination of tree predictors such that each tree
depends on the value of a vector randomly Figure 2.
sampled and sampled independently and with the
distribution of all the trees in the forest. After
many trees are generated, they vote for the most
popular class. This procedure is called random
forests. There have been significant improvements
in the classification accuracy from growing an
ensemble of trees and letting them vote for the
most popular class.
3. Support Vector Machine
Support Vector Machine is a supervised learning
algorithm which is used for both classification and
regression challenges. it is mostly used in
classification problems. In this algorithm, we plot
each data item as a point in n-dimensional space. We can infer from the above figure that 27%
where n is number of features you have) with the of the customers in the dataset from Kaggle
value of each feature being the value of a have churned.
coordinate. Then, we perform classification by
finding the hyper-plane that differentiate the two
classes
4. Decision Tree
5. Comparison of Methods
Decision trees can be applied to both classification
and regression problems. It is used to predict a Models Accuracy
qualitative response rather than a quantitative Score
response. We predict that each of the observation Random 0.9355
belongs to the most commonly occurring class. It is Forest
a type of supervised learning algorithm with a Classifier
predefined target variable. While mostly used in SVM 0.8192
classification tasks, it can handle numeric data as
well. This algorithm splits a data sample into two or Decision 0.762
Tree
more homogeneous sets based on the most
significant differentiator in input variables to make Logistic 0.7894
a prediction. With each split, a part of a tree is Regression
being generated. As a result, a tree with decision
nodes and leaf nodes (which are decisions or
classifications) is developed. A tree starts from a 6. Results and Discussion
root node – the best predictor. Out of three We observe that the Support Vector Machine
variables we use, Contract is the most important model is a good fit. This is due to the AUC values for
variable to predict customer churn or not churn. If a training and test are high and similar. We may
customer in a one-year or two-year contract, no consider improving the SVM model by performing
matter he (she) has Paperless billing or not, he (she) a grid search for values of C, gamma and degree of
is less likely to churn. On the other hand, if a the kernel that gives higher accuracy using k-fold
customer is in a month-to-month contract, and in cross-validation.
the tenure group of 0–12 month, and using
3
Conclusion
From the above example, we can see that SVM,
Logistic Regression and Random Forest performed
better than Decision Tree for customer churn
analysis for this dataset. The following things can be
observed from the dataset:
1. Attributes and features such as tenure group,
Contract, Paperless Billing, Monthly Charges and
Internet Service appear to play a role in customer
churn.
2. There seems to be no relationship between the
gender and the churn rate.
3. Customers having a service plan of month-to-
month contract, with Paperless Billing and are
within 12 months tenure, are more likely to churn.
On the other hand, customers with one- or two-
year contract, with longer than 12 months tenure,
that are not using Paperless Billing, are less likely to
churn.
References
1. https://www.datasciencecentral.com/profi
les/blogs/customer-churn-logistic-
regression-with-r
2. https://pdfs.semanticscholar.org/75d3/73f
987be5c2fb5a3cb1830f417c63f09a68d.pdf
3. https://datascienceplus.com/predict-
customer-churn-logistic-regression-
decision-tree-and-random-forest/
4. https://www.kaggle.com/pavanraj159/tele
com-customer-churn-prediction
5. https://towardsdatascience.com/hands-
on-predict-customer-churn-5c2a42806266