Practical 3 DS
Practical 3 DS
Practical 3 DS
7590- No phone
0 Female 0 Yes No 1 No
VHVEG service
5575-
1 Male 0 No No 34 Yes No
GNVDE
3668-
2 Male 0 No No 2 Yes No
QPYBK
7795- No phone
3 Male 0 No No 45 No
CFOCW service
9237-
4 Female 0 No No 2 Yes No
HQITU
2234-
7039 Female 0 Yes Yes 72 Yes Yes
XADUH
No phone
7040 4801-JZAZL Female 0 Yes Yes 11 No
service
8361-
7041 Male 1 Yes No 4 Yes Yes
LTMKD
In [5]: print(data.head())
print(data.info())
[5 rows x 21 columns]
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 7043 entries, 0 to 7042
Data columns (total 21 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 customerID 7043 non-null object
1 gender 7043 non-null object
2 SeniorCitizen 7043 non-null int64
3 Partner 7043 non-null object
4 Dependents 7043 non-null object
5 tenure 7043 non-null int64
6 PhoneService 7043 non-null object
7 MultipleLines 7043 non-null object
8 InternetService 7043 non-null object
9 OnlineSecurity 7043 non-null object
10 OnlineBackup 7043 non-null object
11 DeviceProtection 7043 non-null object
12 TechSupport 7043 non-null object
13 StreamingTV 7043 non-null object
14 StreamingMovies 7043 non-null object
15 Contract 7043 non-null object
16 PaperlessBilling 7043 non-null object
17 PaymentMethod 7043 non-null object
18 MonthlyCharges 7043 non-null float64
19 TotalCharges 7043 non-null object
20 Churn 7043 non-null object
dtypes: float64(1), int64(2), object(18)
memory usage: 1.1+ MB
None
customerID 7043
Out[7]:
gender 2
SeniorCitizen 2
Partner 2
Dependents 2
tenure 73
PhoneService 2
MultipleLines 3
InternetService 3
OnlineSecurity 3
OnlineBackup 3
DeviceProtection 3
TechSupport 3
StreamingTV 3
StreamingMovies 3
Contract 3
PaperlessBilling 2
PaymentMethod 4
MonthlyCharges 1585
TotalCharges 6531
Churn 2
dtype: int64
In [9]: data.isnull().sum()
customerID 0
Out[9]:
gender 0
SeniorCitizen 0
Partner 0
Dependents 0
tenure 0
PhoneService 0
MultipleLines 0
InternetService 0
OnlineSecurity 0
OnlineBackup 0
DeviceProtection 0
TechSupport 0
StreamingTV 0
StreamingMovies 0
Contract 0
PaperlessBilling 0
PaymentMethod 0
MonthlyCharges 0
TotalCharges 0
Churn 0
dtype: int64
In [11]: data.describe()
[ 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23
24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47
48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
72] [ 11 613 238 200 176 133 110 131 123 119 116 99 117 109 76 99 80 87
97 73 71 63 90 85 94 79 79 72 57 72 72 65 69 64 65 88
50 65 59 56 64 70 65 65 51 61 74 68 64 66 68 68 80 70
68 64 80 65 67 60 76 76 70 72 80 76 89 98 100 95 119 170
362]
[' ' '100.2' '100.25' ... '999.45' '999.8' '999.9'] [11 1 1 ... 1 1 1]
In [16]: sns.pairplot(data)
<seaborn.axisgrid.PairGrid at 0x2981866dae0>
Out[16]:
In [19]: plt.boxplot(data['MonthlyCharges'])
plt.show()
In [21]: X_train.shape
(5634, 20)
Out[21]:
In [22]: y_train.shape
(5634,)
Out[22]:
In [23]: X_test.shape
(1409, 20)
Out[23]:
In [24]: y_test.shape
(1409,)
Out[24]:
In [27]: data
7590- No phone
0 Female 0 Yes No 1 No
VHVEG service
5575-
1 Male 0 No No 34 Yes No
GNVDE
3668-
2 Male 0 No No 2 Yes No
QPYBK
7795- No phone
3 Male 0 No No 45 No
CFOCW service
9237-
4 Female 0 No No 2 Yes No
HQITU
2234-
7039 Female 0 Yes Yes 72 Yes Yes
XADUH
No phone
7040 4801-JZAZL Female 0 Yes Yes 11 No
service
8361-
7041 Male 1 Yes No 4 Yes Yes
LTMKD