Customer Churn Analysis - Jupyter Notebook
Customer Churn Analysis - Jupyter Notebook
Customer Churn Analysis - Jupyter Notebook
Read dataset
In [2]: 1 df=pd.read_csv('Tel_Customer_Churn_Dataset.csv')
2 df.head()
Out[2]:
customerID gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService Onlin
7590- No phone
0 Female 0 Yes No 1 No DSL
VHVEG service
5575-
1 Male 0 No No 34 Yes No DSL
GNVDE
3668-
2 Male 0 No No 2 Yes No DSL
QPYBK
7795- No phone
3 Male 0 No No 45 No DSL
CFOCW service
9237-
4 Female 0 No No 2 Yes No Fiber optic
HQITU
5 rows × 21 columns
In [3]: 1 df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB
Out[4]:
gender SeniorCitizen Partner Dependents tenure PhoneService MultipleLines InternetService OnlineSecurity On
No phone
0 Female 0 Yes No 1 No DSL No
service
No phone
3 Male 0 No No 45 No DSL Yes
service
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 gender 7043 non-null object
1 SeniorCitizen 7043 non-null int64
2 Partner 7043 non-null object
3 Dependents 7043 non-null object
4 tenure 7043 non-null int64
5 PhoneService 7043 non-null object
6 MultipleLines 7043 non-null object
7 InternetService 7043 non-null object
8 OnlineSecurity 7043 non-null object
9 OnlineBackup 7043 non-null object
10 DeviceProtection 7043 non-null object
11 TechSupport 7043 non-null object
12 StreamingTV 7043 non-null object
13 StreamingMovies 7043 non-null object
14 Contract 7043 non-null object
15 PaperlessBilling 7043 non-null object
16 PaymentMethod 7043 non-null object
17 MonthlyCharges 7043 non-null float64
18 TotalCharges 7032 non-null float64
19 Churn 7043 non-null object
dtypes: float64(2), int64(2), object(16)
memory usage: 1.1+ MB
In [6]: 1 df.isnull().sum()
Out[6]: gender 0
SeniorCitizen 0
Partner 0
Dependents 0
tenure 0
PhoneService 0
MultipleLines 0
InternetService 0
OnlineSecurity 0
OnlineBackup 0
DeviceProtection 0
TechSupport 0
StreamingTV 0
StreamingMovies 0
Contract 0
PaperlessBilling 0
PaymentMethod 0
MonthlyCharges 0
TotalCharges 11
Churn 0
dtype: int64
In [8]: 1 df.isnull().sum().sum()
Out[8]: 0
EDA
Label encoding
In [17]: 1 df["gender"]=df["gender"].map({"Female":0,"Male":1})
2 df["Partner"]=df["Partner"].map({"No":0,"Yes":1})
3 df["Dependents"]=df["Dependents"].map({"No":0,"Yes":1})
4 df["PhoneService"]=df["PhoneService"].map({"No":0,"Yes":1})
5 df["PaperlessBilling"]=df["PaperlessBilling"].map({"No":0,"Yes":1})
6 df["Churn"]=df["Churn"].map({"No":0,"Yes":1})
In [18]: 1 df=pd.get_dummies(df,drop_first=True)
2 df.head()
Out[18]:
gender SeniorCitizen Partner Dependents tenure PhoneService PaperlessBilling MonthlyCharges TotalCharges
0 0 0 1 0 1 0 1 29.85 29.85
1 1 0 0 0 34 1 0 56.95 1889.50
2 1 0 0 0 2 1 1 53.85 108.15
3 1 0 0 0 45 0 0 42.30 1840.75
4 0 0 0 0 2 1 1 70.70 151.65
5 rows × 31 columns
Logistic regression
In [22]: 1 logmodel = LogisticRegression(random_state=50)
2 logmodel.fit(X_train,y_train)
3 pred = logmodel.predict(X_test)
4
5 print(classification_report(y_test, pred))
C:\Users\msi\anaconda3\Lib\site-packages\sklearn\linear_model\_logistic.py:460: ConvergenceWar
ning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html (https://scikit-learn.org/stabl
e/modules/preprocessing.html)
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression (https://sci
kit-learn.org/stable/modules/linear_model.html#logistic-regression)
n_iter_i = _check_optimize_result(
Decision Tree
Random Forest
In [24]: 1 rfmodel = RandomForestClassifier(n_estimators = 100, criterion = 'entropy', random_state = 0
2 rfmodel.fit(X_train, y_train)
3 rf_pred = rfmodel.predict(X_test)
4
5 print(classification_report(y_test, rf_pred))