Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
55 views6 pages

Churn Analysis in Telecommunication Using Logistic Regression

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

ORIENTAL JOURNAL OF ISSN: 0974-6471

COMPUTER SCIENCE & TECHNOLOGY March 2017,


An International Open Free Access, Peer Reviewed Research Journal Vol. 10, No. (1):
Published By: Oriental Scientific Publishing Co., India. Pgs. 207-212
www.computerscijournal.org

Churn Analysis in Telecommunication


using Logistic Regression
Helen Treasa Sebastian* and Rupali Wagh

Department of Computer science, Christ University, Bangalore, India.


*Corresponding author E-mail: helen.sebastian@cs.christuniversity.in

http://dx.doi.org/10.13005/ojcst/10.01.28

(Received: March 13, 2017; Accepted: March 16, 2017)

ABSTRACT

Since the beginning of data mining the discovery of knowledge from the Databases has
been carried out to solve various problems and has helped the business come up with practical
solutions. Large companies are behind improving revenue due to the increase loss in customers. The
process where one customer leaves one company and joins another is called as churn. This paper
will be discussing how to predict the customers that might churn, R package is being used to do the
prediction. R package helps represent large dataset churn in the form of graphs which will help to
depict the outcome in the form of various data visualizations. Churn is a very important area in which
the telecom domain can make or lose their customers and hence the business/industry spends a lot
of time doing predictions, which in turn helps to make the necessary business conclusions. Churn can
be avoided by studying the past history of the customers. Logistic Regression is been used to make
necessary analysis. To proceed with logistic regression we must first eliminate the outliers that are
present, this has be achieved by cleaning the data (for redundancy, false data etc) and the resultant
has been populated into a prediction excel using which the analysis has been performed.

Keywords: Churn, R Tool, Telecommunication, and Data mining, Logistic Regression.

INTRODUCTION very important for the customers to expand the


business and get back the amount that has been
There are a number of telecommunication invested in the business .
networks that are available and we have the
luxury to choose the one we want based on our The increase in the number of churn
requirements. The increased number of telecoms customers is become the present day challenge to
are a challenge to the telecom companies and the telecom industry and such customers create
many companies are facing huge revenue losses financial burden to the company, identifying such
, to keep the customers many companies invest a customers is the objective of this research paper.
huge revenue in the beginning and thus it becomes Research indicates that the cost of developing
208 Sebastian & Wagh, Orient. J. Comp. Sci. & Technol., Vol. 10(1), 207-212 (2017)

a new customer is approximately 5 higher than churn customers the industry is trying its best to
retaining the new customer. retain the profitable customers and this is named
as churn management.
Literature review
This paper provides an overview of The objective of the paper is to classify
doing a logistic regression with R studio to do an the possible customers that might churn
analysis on the CRM data and come up with the
churn prediction. This helps solving many business In many areas statistical analysis is used
related problems. This paper summarizes on the to predict the customers that might churn.
prediction with graphically representing the result in
Power BI where the actuals and the predictions are The outcomes of churn analysis are the
present and based on which we are also predicting below
the accuracy of the model based on the accuracy
business can decide if this approach helps improve • Improved retention
business or a better approach must be followed. • Propensity modelling
• Prioritized marketing
The voluntary and involuntary techniques • Increased customer value
for customer relationship management are also
discussed in brief. The types of churn can be classified as

Web Chin-Ping Wei and I-Tang Chiu I. Voluntary churn


proposed the churn prediction technique for II. Involuntary churn
customer retention analysis. The author used
the decision tree approach C4.5 on customer When the industry decides to eliminate the
call details. Yi-Fan wang, Ding-A chlang and Mei- customer then it becomes an involuntary churn. The
Hua Hsu discussed a Recommender systemf reason behind this might be due to fraud activities,
or customer churn by proposing a decision tree nonpayment of bills and people who would not
algorithm. Data used for the analysis has covered use the services. When the customer decides to
over 60,000 transactions and of more than 4000 leave the company then it becomes a voluntary
members, over a period of three months. Jadhav churn and such kind of churn happens in a abrupt
and Pawar designed a decision support system and unexpected manner. So it is very important to
using data mining technique. The churn behavior identify the customers that might churn and avoid
of customers ispredicted in advance using this the same, this can be achieved by doing a churn
technique. The authors have used Back propagation analysis.
algorithm ona customer billing data. Tomas Philip
Rúnarsson, Ólafur Magnússon, Birgis Hrafnkelsson Churn managemet
constructed a churn prediction model that can Since acquiring new customers is
output the probabilities that customers will churnin challenging it is very important to retain the current
the near future. In this paper we will be doing churn customers.
analysis for telecom domain with the approach of
logistic regression and then computing the result Churn can be reduced by analyzing
graphically in power BI. Telecommunication industry the past history of the potential customers
can also use this approach to customer retention systematically. Large data is maintained about the
activities within the context of their Customer customers and on performing a proper analysis
Relationship Management efforts. on the same it is possible to predict the probable
customers that might churn.
Churn prediction
In terms of telecommunication the The information that is available can be
customers leaving the current company and moving analyzed in different ways and thereby provide
into another is called as churn, and in the present various ways for the operators to envisage the
scenario considering the surge in the number of churning and evade the same.
Sebastian & Wagh, Orient. J. Comp. Sci. & Technol., Vol. 10(1), 207-212 (2017) 209

Data collection the data that is used is of 2016 and we know the
For analysis the data that is available in customers that are churned, we are trying to use
the telecom dataset has been used and prediction backward regression model here and thereby come
has been done for the same. to conclusion is the model is accurate and based
on the accuracy then decide if the model has to be
Data preparation considered for future predictions or not.
Before the data can be analyzed we have
to clean the data and keep it ready so that the Prediction
desired results can be derived from it. The business is interested in the final
product and it is very important to represent your
Data has be clean so that the redundancy result in a “graphical representation” such a way that
and errors can be removed because having such it is understandable and the result helps business
data will lead to incorrect results as well. make the needed predictions which in turn brings
profit.
In this paper a Churn Analysis has been
applied on Telecom data, here the agenda is to There are many tools that help achieve
know the possible customers that might churn the same for example, Tableau, Power BI, qlikview
from the service provider. R programing is used for etc.
the same this will help give a statistical computing
for the data available, here backward logistic Data visulation tools
regression is been used to achieve the same. The The best way to get your message across
end result would give us the probability of churn for is to use visualization tools, by representing data
each customer. visually it is possible to uncover the surprising
patterns and the patterns that would go unnoticed
Here to do churn analysis Logistic if we took the stats alone
regression is been used, Logistic regression is
a statistical method here the resultant variable Here “Power BI” is the tool that is used
is categorical, rather than continuous. Logistic to do data visualization. Power BI is a business
regression limits the prediction to be in the interval analytics tool it is provided by Microsoft using which
of zero and one. reports can be created.

In this paper we are using Backward Here in this approach, data is already
stepwise regression, this involves taking all the cleaned and the result is populated in a file called”
variables into consideration then testing the deletion Prediction” which will be used to visually show how
of each variable with a certain criteria that is applied the data appears and the impact.
to it and this will be continued till there are no further
variables that are available to be deleted without The churn value is represented and given
any statistical loss of fit. as below

The dataset that is used has 22variables The churn value in the below graph is
available. These are related to Gender, customer_ 20.93% these are the possible customers that would
id, Phone Service etc. The dataset has over 2000 churn from the telecom service provider.
customer related information available.
Here the graph shows the remaining
After applying backward regression the 79.07% would not churn from the service provider.
approach inserted the new cleaned data into a They are of no risk to the business.
new file called as the “prediction” file and this is
having an extra column called “probability”. This new There are many factors based on which
column will give the probability of the customers we can come to a conclusion if the customer would
that might churn from the telecom provider, here churn or not.
210 Sebastian & Wagh, Orient. J. Comp. Sci. & Technol., Vol. 10(1), 207-212 (2017)

The churn prediction based on various To check for the accuracy of the model
factors like age, tenure, job, payment details, This paper has used a confusion matrix
gender, call time, tech support usage etc. can be table which has variable Actuals, Frequency and
achieved. Prediction this confusion table will help describe
the performance of the model.
Below a few tables are used to predict the
possible domain or the kind of people that a likely In this model when the “Actuals=1” then
to churn. the “Predictions must also be=1”, but as we can
see in the below graph at one point when the
Based on Tenure “Actuals=1” the “Predictions=0” so this model is not
In the graph given below it is clearly stated totally reliable since the accuracy is not 100%.
that from a range of 0-30 months are the people
who are most likely to churn and 30-60 months Using accuracy we can get to know the
most likely not and anything above 60 months are accuracy of the model here the model is 80.02%
customers who would ideally not churn. accurate.

Based on customers who use Tech Support The accuracy is good enough for a churn
In the graph given below we can clearly prediction but it is not very accurate, hence using
come to a conclusion that people who use tech SVM (Support vector regression) with R we can get
support are the ones that would not churn where accurate probability and thus the result will be more
as if the customers who are not using the technical reliable another method of getting high accuracy is
support are possible to churn, this might be due by increasing the number of variables that is been
to the lack of knowledge about the services that used.
are provided by the telecom so it is very important
to highlight the kind of customer services that are When more variables are there to do the
been provided so that the services can be put to comparison the result will be precise and thus the
the right use and thus prevent the customers from business can do a near real time prediction with the
churning. given data.

Fig. 1: Churn prediction on the Fig. 2: Churn prediction on the possible


possible churn customers customers who would not churn

Fig. 3: Churn prediction based on tenure Fig. 4: Churn prediction based on tech
Sebastian & Wagh, Orient. J. Comp. Sci. & Technol., Vol. 10(1), 207-212 (2017) 211

RESULTS Future scope


The future scope of this project would be
Telecommunication industry always suffers to get an accuracy of 100% and for this we can
from a very high churn rates when one industry make use of decision tree and in particular using a
offers a better plan than the previous there is a hybrid classification techniques to point out existing
high possibility of the customer churning from the suggestion between churn prediction and customer
present due to a better plan in such a scenario it is lifetime value. The result and the accuracy can be
very difficult to avoid losses but through prediction bettered if we use more variables in the data. The
we can keep it to a minimal level. In this paper the dynamic nature of the industry has ensured that
method used is Logistic Regression (backward data mining has been increasingly significant and
logistic regression) and this helps to identity the the industry is totally relying on the results that the
probable churn customers and then make the data would help predict.
necessary business decisions. Using a decision
tree would give a more appropriate result, by using
logistic regression the result achieved is 80.02%
accurate.

Fig 5: Confusion Matrix Fig 6: Confusion Matrix

Fig 7: Accuracy of the report

REFERENCES

1. Data mining A tutorial based premier, 4. ftp://cran.r-project.org/pub/R/doc/contrib/


Richard J. oiger, Michael Geatz. Zhao_R_and_data_mining.pdf
2. http://www.adobe.com/in/solutions/digital- 5. Jiawei Han and Micheline Kamber, Data
analytics/customer-churn-analysis.html mining,concept and techniques” http://www.
3. Business Intelligence and Insurance, White cs.sfu.ca.
Paper, Wipro Technologies,Bangalore,2001
Sebastian & Wagh, Orient. J. Comp. Sci. & Technol., Vol. 10(1), 207-212 (2017) 212

6. http://www.alteryx.com/solutions/customer- 12. h t t p s : / / w w w. a n a l y t i c s v i d h y a . c o m /
churn-analytics blog/2015/11/beginners-guide-on-logistic-
7. L. Yangi , C. Chiu , Subscriber Churn regression-in-r/
Prediction in Telecommunications 13. Rehman, Ali Mustafa Qamar, Aatif Kamal.
8. http://www.rdatamining.com/ 14. Data Mining in the Telecommunications
9. N. Kamalraj, .A.Malathi, Applying Data Industry, Gary M. Weiss, Fordham University,
Mining Techniques in Telecom Churn USA.
Prediction, in proc.International Journal of 15. Allahyari Soeini and Keyvan Vahidy
Advanced Research in Computer Science Rodpysh,
and Software Engineering, 10, October 16. https://powerbi.microsoft.com/en-us/what-
2013. is-power-bi/
10. http://www.ats.ucla.edu/stat/r/dae/logit.htm 17. https://www.tutorialspoint.com/r/
11. Telecommunication Subscribers‘ Churn 18. Wei, I. Chiu, Turning telecommunication call
Prediction Model Using Machine Learning details to churn prediction: a data mining
Saad Ahmed Qureshi, Ammar Saleem Approach expert System with applications.

You might also like