Customer Churn Analysis - Jupyter Notebook

07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook
Import necessary libraries

In [1]: 1 import pandas as pd
2 import numpy as np
3 import matplotlib.pyplot as plt
4 import seaborn as sns
Read dataset
In [2]: 1 df=pd.read_csv('Tel_Customer_Churn_Dataset.csv')
2 df.head()
Out[2]:
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService Onlin
7590- No phone
0 Female 0 Yes No 1 No DSL
VHVEG service
5575-
1 Male 0 No No 34 Yes No DSL
GNVDE
3668-
2 Male 0 No No 2 Yes No DSL
QPYBK
7795- No phone
3 Male 0 No No 45 No DSL
CFOCW service
9237-
4 Female 0 No No 2 Yes No Fiber optic
HQITU
5 rows × 21 columns
In [3]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB
Dropping unwanted columns
localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 1/10

In [4]: 1 df=df.drop(["customerID"], axis = 1)

2 df.head()
Out[4]:
gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity On
No phone
0 Female 0 Yes No 1 No DSL No
service
1 Male 0 No No 34 Yes No DSL Yes
2 Male 0 No No 2 Yes No DSL Yes
No phone
3 Male 0 No No 45 No DSL Yes
service
4 Female 0 No No 2 Yes No Fiber optic No
Converting the 'TotalCharges' column to numeric values
In [5]: 1 df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

2 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 7043 non-null object
1 SeniorCitizen 7043 non-null int64
2 Partner 7043 non-null object
3 Dependents 7043 non-null object
4 tenure 7043 non-null int64
5 PhoneService 7043 non-null object
6 MultipleLines 7043 non-null object
7 InternetService 7043 non-null object
8 OnlineSecurity 7043 non-null object
9 OnlineBackup 7043 non-null object
10 DeviceProtection 7043 non-null object
11 TechSupport 7043 non-null object
12 StreamingTV 7043 non-null object
13 StreamingMovies 7043 non-null object
14 Contract 7043 non-null object
15 PaperlessBilling 7043 non-null object
16 PaymentMethod 7043 non-null object
17 MonthlyCharges 7043 non-null float64
18 TotalCharges 7032 non-null float64
19 Churn 7043 non-null object
dtypes: float64(2), int64(2), object(16)
memory usage: 1.1+ MB
Checking for null values

In [6]: 1 df.isnull().sum()
Out[6]: gender 0
SeniorCitizen 0
Partner 0
Dependents 0
tenure 0
PhoneService 0
MultipleLines 0
InternetService 0
OnlineSecurity 0
OnlineBackup 0
DeviceProtection 0
TechSupport 0
StreamingTV 0
StreamingMovies 0
Contract 0
PaperlessBilling 0
PaymentMethod 0
MonthlyCharges 0
TotalCharges 11
Churn 0
dtype: int64
Treating null values

In [7]: 1 df = df.dropna()
In [8]: 1 df.isnull().sum().sum()
Out[8]: 0
EDA

In [9]: 1 #churn distribution

2
3 plt.figure(figsize=(8, 6))
4 plt.bar(df['Churn'].unique(), df['Churn'].value_counts(), color=['green', 'yellow'])
5 plt.title('Churn Distribution', fontsize=16, fontweight='bold')
6 plt.xlabel('Churn')
7 plt.ylabel('Count')
8 plt.grid(axis='y', linestyle='--', alpha=0.7)
9 plt.show()

In [10]: 1 #churn vs gender

2
3 sns.countplot(x='gender', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor = 'Bl
4 sns.despine()
In [11]: 1 #churn vs SeniorCitizen

2
3 sns.countplot(x='SeniorCitizen', hue="Churn", data=df, palette=['green', 'yellow'], edgecolo
4 sns.despine()

In [12]: 1 #churn vs partner

2
3 sns.countplot(x='Partner', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor = 'B
4 sns.despine()
In [13]: 1 #churn vs dependents

2
3 sns.countplot(x='Dependents', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor =
4 sns.despine()

In [14]: 1 #churn vs contract

2
3 sns.countplot(x='Contract', hue="Churn", data=df, palette=['green', 'yellow'], edgecolor =
4 sns.despine()
In [15]: 1 #churn vs MonthlyCharges

2
3 ax = sns.kdeplot(df['MonthlyCharges'][df["Churn"] == 'No'], fill = True,color='green')
4 ax = sns.kdeplot(df['MonthlyCharges'][df["Churn"] == 'Yes'],ax =ax, fill= True,color='skyblu
5 ax.legend(["Not Churn","Churn"],loc='upper right')
6 ax.set_ylabel('Density')
7 ax.set_xlabel('Monthly Charges')
8 ax.set_title('Distribution of monthly charges by churn')
Out[15]: Text(0.5, 1.0, 'Distribution of monthly charges by churn')

In [16]: 1 #churn vs TotalCharges

2
3 ax = sns.kdeplot(df['TotalCharges'][df["Churn"] == 'No'], fill = True,color='green')
4 ax = sns.kdeplot(df['TotalCharges'][df["Churn"] == 'Yes'],ax =ax,fill= True,color='skyblue'
5 ax.legend(["Not Churn","Churn"],loc='upper right')
6 ax.set_ylabel('Density')
7 ax.set_xlabel('TotalCharges')
8 ax.set_title('Distribution of Total Charges by churn')
Out[16]: Text(0.5, 1.0, 'Distribution of Total Charges by churn')
Label encoding
In [17]: 1 df["gender"]=df["gender"].map({"Female":0,"Male":1})
2 df["Partner"]=df["Partner"].map({"No":0,"Yes":1})
3 df["Dependents"]=df["Dependents"].map({"No":0,"Yes":1})
4 df["PhoneService"]=df["PhoneService"].map({"No":0,"Yes":1})
5 df["PaperlessBilling"]=df["PaperlessBilling"].map({"No":0,"Yes":1})
6 df["Churn"]=df["Churn"].map({"No":0,"Yes":1})
In [18]: 1 df=pd.get_dummies(df,drop_first=True)
2 df.head()
Out[18]:
gender SeniorCitizen Partner Dependents tenure PhoneService PaperlessBilling MonthlyCharges TotalCharges
0 0 0 1 0 1 0 1 29.85 29.85
1 1 0 0 0 34 1 0 56.95 1889.50
2 1 0 0 0 2 1 1 53.85 108.15
3 1 0 0 0 45 0 0 42.30 1840.75
4 0 0 0 0 2 1 1 70.70 151.65
5 rows × 31 columns
Assigning dependent and independent variable

In [19]: 1 X=df.drop(columns = "Churn")

2 y=df["Churn"]
Machine Learning classification model libraries

In [20]: 1 from sklearn.linear_model import LogisticRegression
2 from sklearn.tree import DecisionTreeClassifier
3 from sklearn.ensemble import RandomForestClassifier
4 from sklearn import metrics
5 from sklearn.metrics import classification_report
6 from sklearn.model_selection import train_test_split
Splitting the dataset into training and testing set

In [21]: 1 X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.3, random_state =12
Logistic regression
In [22]: 1 logmodel = LogisticRegression(random_state=50)
2 logmodel.fit(X_train,y_train)
3 pred = logmodel.predict(X_test)
4
5 print(classification_report(y_test, pred))
precision recall f1-score support
0 0.85 0.88 0.87 1567

1 0.63 0.57 0.60 543
accuracy 0.80 2110

macro avg 0.74 0.73 0.73 2110
weighted avg 0.80 0.80 0.80 2110
C:\Users\msi\anaconda3\Lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWar
ning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stabl
e/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://sci
kit-learn.org/stable/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(
Decision Tree

In [23]: 1 dtmodel = DecisionTreeClassifier(criterion = "gini", random_state = 50)

2 dtmodel.fit(X_train, y_train)
3 dt_pred = dtmodel.predict(X_test)
4
5 print(classification_report(y_test, dt_pred))
0 0.83 0.79 0.81 1567

1 0.47 0.52 0.49 543
accuracy 0.72 2110

macro avg 0.65 0.66 0.65 2110
weighted avg 0.73 0.72 0.73 2110
Random Forest
In [24]: 1 rfmodel = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', random_state = 0
2 rfmodel.fit(X_train, y_train)
3 rf_pred = rfmodel.predict(X_test)
4
5 print(classification_report(y_test, rf_pred))
0 0.84 0.89 0.86 1567

1 0.62 0.50 0.55 543
accuracy 0.79 2110

macro avg 0.73 0.69 0.71 2110
weighted avg 0.78 0.79 0.78 2110
In [25]: 1 # Factors contributing to customer attrition :

2 # 1. Contract - if it is 'Month to month', churn rate is high
3 # 2. Monthly charge - if it is between 65 and 110, churn rate is high
4 # 3. Total charg - churn rate is high when it is less than 2000

Customer Churn Analysis - Jupyter Notebook

Uploaded by

Copyright:

Available Formats

Customer Churn Analysis - Jupyter Notebook

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Customer Churn Analysis - Jupyter Notebook

Uploaded by

Copyright:

Available Formats

07/12/2023, 15:50 Customer Churn Analysis - Jupyter Notebook

Import necessary libraries

Dropping unwanted columns

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 1/10

In [4]: 1 df=df.drop(["customerID"], axis = 1)

1 Male 0 No No 34 Yes No DSL Yes

2 Male 0 No No 2 Yes No DSL Yes

4 Female 0 No No 2 Yes No Fiber optic No

Converting the 'TotalCharges' column to numeric values

In [5]: 1 df['TotalCharges'] = pd.to_numeric(df['TotalCharges'], errors='coerce')

Checking for null values

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 2/10

Treating null values

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 3/10

In [9]: 1 #churn distribution

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 4/10

In [10]: 1 #churn vs gender

In [11]: 1 #churn vs SeniorCitizen

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 5/10

In [12]: 1 #churn vs partner

In [13]: 1 #churn vs dependents

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 6/10

In [14]: 1 #churn vs contract

In [15]: 1 #churn vs MonthlyCharges

Out[15]: Text(0.5, 1.0, 'Distribution of monthly charges by churn')

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 7/10

In [16]: 1 #churn vs TotalCharges

Out[16]: Text(0.5, 1.0, 'Distribution of Total Charges by churn')

Assigning dependent and independent variable

In [19]: 1 X=df.drop(columns = "Churn")

Machine Learning classification model libraries

Splitting the dataset into training and testing set

precision recall f1-score support

0 0.85 0.88 0.87 1567

accuracy 0.80 2110

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 9/10

In [23]: 1 dtmodel = DecisionTreeClassifier(criterion = "gini", random_state = 50)

precision recall f1-score support

0 0.83 0.79 0.81 1567

accuracy 0.72 2110

precision recall f1-score support

0 0.84 0.89 0.86 1567

accuracy 0.79 2110

In [25]: 1 # Factors contributing to customer attrition :

localhost:8888/notebooks/KINGS LABS/4. Customer churn analysis/Customer Churn Analysis.ipynb 10/10

You might also like