Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Customer Churn Analysis - Jupyter Notebook

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

Import necessary libraries


In [1]: 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns

Read dataset
In [2]: 1 df=pd.read_csv('Tel_Customer_Churn_Dataset.csv')
2 df.head()

Out[2]:
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService Onlin

7590- No phone
0 Female 0 Yes No 1 No DSL
VHVEG service

5575-
1 Male 0 No No 34 Yes No DSL
GNVDE

3668-
2 Male 0 No No 2 Yes No DSL
QPYBK

7795- No phone
3 Male 0 No No 45 No DSL
CFOCW service

9237-
4 Female 0 No No 2 Yes No Fiber optic
HQITU

5 rows × 21 columns

In [3]: 1 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB

Dropping unwanted columns

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 1/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [4]: 1 df=df.drop(["customerID"], axis = 1)


2 df.head()

Out[4]:
gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity On

No phone
0 Female 0 Yes No 1 No DSL No
service

1 Male 0 No No 34 Yes No DSL Yes

2 Male 0 No No 2 Yes No DSL Yes

No phone
3 Male 0 No No 45 No DSL Yes
service

4 Female 0 No No 2 Yes No Fiber optic No

Converting the 'TotalCharges' column to numeric values

In [5]: 1 df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')


2 df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 7043 non-null object
1 SeniorCitizen 7043 non-null int64
2 Partner 7043 non-null object
3 Dependents 7043 non-null object
4 tenure 7043 non-null int64
5 PhoneService 7043 non-null object
6 MultipleLines 7043 non-null object
7 InternetService 7043 non-null object
8 OnlineSecurity 7043 non-null object
9 OnlineBackup 7043 non-null object
10 DeviceProtection 7043 non-null object
11 TechSupport 7043 non-null object
12 StreamingTV 7043 non-null object
13 StreamingMovies 7043 non-null object
14 Contract 7043 non-null object
15 PaperlessBilling 7043 non-null object
16 PaymentMethod 7043 non-null object
17 MonthlyCharges 7043 non-null float64
18 TotalCharges 7032 non-null float64
19 Churn 7043 non-null object
dtypes: float64(2), int64(2), object(16)
memory usage: 1.1+ MB

Checking for null values

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 2/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [6]: 1 df.isnull().sum()

Out[6]: gender 0
SeniorCitizen 0
Partner 0
Dependents 0
tenure 0
PhoneService 0
MultipleLines 0
InternetService 0
OnlineSecurity 0
OnlineBackup 0
DeviceProtection 0
TechSupport 0
StreamingTV 0
StreamingMovies 0
Contract 0
PaperlessBilling 0
PaymentMethod 0
MonthlyCharges 0
TotalCharges 11
Churn 0
dtype: int64

Treating null values


In [7]: 1 df = df.dropna()

In [8]: 1 df.isnull().sum().sum()

Out[8]: 0

EDA

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 3/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [9]: 1 #churn distribution


2 ​
3 plt.figure(figsize=(8, 6))
4 plt.bar(df['Churn'].unique(), df['Churn'].value_counts(), color=['green', 'yellow'])
5 plt.title('Churn Distribution', fontsize=16, fontweight='bold')
6 plt.xlabel('Churn')
7 plt.ylabel('Count')
8 plt.grid(axis='y', linestyle='--', alpha=0.7)
9 plt.show()

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 4/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [10]: 1 #churn vs gender


2 ​
3 sns.countplot(x='gender', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor = 'Bl
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

In [11]: 1 #churn vs SeniorCitizen


2 ​
3 sns.countplot(x='SeniorCitizen', hue="Churn", data=df, palette=['green', 'yellow'], edgecolo
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 5/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [12]: 1 #churn vs partner


2 ​
3 sns.countplot(x='Partner', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor = 'B
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

In [13]: 1 #churn vs dependents


2 ​
3 sns.countplot(x='Dependents', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor =
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 6/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [14]: 1 #churn vs contract


2 ​
3 sns.countplot(x='Contract', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor =
4 sns.despine()
5 plt.grid(axis='y', linestyle='--', alpha=0.7)

In [15]: 1 #churn vs MonthlyCharges


2 ​
3 ax = sns.kdeplot(df['MonthlyCharges'][df["Churn"] == 'No'], fill = True,color='green')
4 ax = sns.kdeplot(df['MonthlyCharges'][df["Churn"] == 'Yes'],ax =ax, fill= True,color='skyblu
5 ax.legend(["Not Churn","Churn"],loc='upper right')
6 ax.set_ylabel('Density')
7 ax.set_xlabel('Monthly Charges')
8 ax.set_title('Distribution of monthly charges by churn')

Out[15]: Text(0.5, 1.0, 'Distribution of monthly charges by churn')

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 7/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [16]: 1 #churn vs TotalCharges


2 ​
3 ax = sns.kdeplot(df['TotalCharges'][df["Churn"] == 'No'], fill = True,color='green')
4 ax = sns.kdeplot(df['TotalCharges'][df["Churn"] == 'Yes'],ax =ax,fill= True,color='skyblue'
5 ax.legend(["Not Churn","Churn"],loc='upper right')
6 ax.set_ylabel('Density')
7 ax.set_xlabel('TotalCharges')
8 ax.set_title('Distribution of Total Charges by churn')

Out[16]: Text(0.5, 1.0, 'Distribution of Total Charges by churn')

Label encoding
In [17]: 1 df["gender"]=df["gender"].map({"Female":0,"Male":1})
2 df["Partner"]=df["Partner"].map({"No":0,"Yes":1})
3 df["Dependents"]=df["Dependents"].map({"No":0,"Yes":1})
4 df["PhoneService"]=df["PhoneService"].map({"No":0,"Yes":1})
5 df["PaperlessBilling"]=df["PaperlessBilling"].map({"No":0,"Yes":1})
6 df["Churn"]=df["Churn"].map({"No":0,"Yes":1})

In [18]: 1 df=pd.get_dummies(df,drop_first=True)
2 df.head()

Out[18]:
gender SeniorCitizen Partner Dependents tenure PhoneService PaperlessBilling MonthlyCharges TotalCharges

0 0 0 1 0 1 0 1 29.85 29.85

1 1 0 0 0 34 1 0 56.95 1889.50

2 1 0 0 0 2 1 1 53.85 108.15

3 1 0 0 0 45 0 0 42.30 1840.75

4 0 0 0 0 2 1 1 70.70 151.65

5 rows × 31 columns

Assigning dependent and independent variable


localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 8/10
07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [19]: 1 X=df.drop(columns = "Churn")


2 y=df["Churn"]

Machine Learning classification model libraries


In [20]: 1 from sklearn.linear_model import LogisticRegression
2 from sklearn.tree import DecisionTreeClassifier
3 from sklearn.ensemble import RandomForestClassifier
4 from sklearn import metrics
5 from sklearn.metrics import classification_report
6 from sklearn.model_selection import train_test_split

Splitting the dataset into training and testing set


In [21]: 1 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state =12

Logistic regression
In [22]: 1 logmodel = LogisticRegression(random_state=50)
2 logmodel.fit(X_train,y_train)
3 pred = logmodel.predict(X_test)
4 ​
5 print(classification_report(y_test, pred))

precision recall f1-score support

0 0.85 0.88 0.87 1567


1 0.63 0.57 0.60 543

accuracy 0.80 2110


macro avg 0.74 0.73 0.73 2110
weighted avg 0.80 0.80 0.80 2110

C:\Users\msi\anaconda3\Lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWar
ning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.

Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stabl
e/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://sci
kit-learn.org/stable/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(

Decision Tree

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 9/10


07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

In [23]: 1 dtmodel = DecisionTreeClassifier(criterion = "gini", random_state = 50)


2 dtmodel.fit(X_train, y_train)
3 dt_pred = dtmodel.predict(X_test)
4 ​
5 print(classification_report(y_test, dt_pred))

precision recall f1-score support

0 0.83 0.79 0.81 1567


1 0.47 0.52 0.49 543

accuracy 0.72 2110


macro avg 0.65 0.66 0.65 2110
weighted avg 0.73 0.72 0.73 2110

Random Forest
In [24]: 1 rfmodel = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', random_state = 0
2 rfmodel.fit(X_train, y_train)
3 rf_pred = rfmodel.predict(X_test)
4 ​
5 print(classification_report(y_test, rf_pred))

precision recall f1-score support

0 0.84 0.89 0.86 1567


1 0.62 0.50 0.55 543

accuracy 0.79 2110


macro avg 0.73 0.69 0.71 2110
weighted avg 0.78 0.79 0.78 2110

In [25]: 1 # Factors contributing to customer attrition :


2 # 1. Contract - if it is 'Month to month', churn rate is high
3 # 2. Monthly charge - if it is between 65 and 110, churn rate is high
4 # 3. Total charg - churn rate is high when it is less than 2000

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 10/10

You might also like