0% found this document useful (0 votes)

129 views

Day 1 Python Notebook

The document discusses fundamentals of Python data types including lists, tuples, strings, sets, and dictionaries. It then loads banking customer data from a CSV file and performs exploratory data analysis on the dataset. This includes viewing the head and tail of the data, checking data types and dimensions, and generating descriptive statistics. Finally, it outlines the key components of exploratory data analysis such as descriptive statistics, data visualization, and measures of central tendency, dispersion, and relationships between variables.

Uploaded by

Aviral Saxena

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

129 views

Day 1 Python Notebook

Uploaded by

Aviral Saxena

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 19

Day 1 16/12/19, 17:40

In [134]: import pandas as pd

import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

In [135]: # LISTS/ARRAYS ARE FUNDAMENTAL DATATYPES

# LISTS/ARRAY START WITH SQUARE BRACKETS
# LISTS ARE MUTABLE OR APPENDABLE
numlist=[2,3,5,666,223,89.5]
print(numlist)
charlist=['aaa',"AAA",'c','john']
print(charlist)
booleanlist=[True,False,False,True]
print(booleanlist)

[2, 3, 5, 666, 223, 89.5]

['aaa', 'AAA', 'c', 'john']
[True, False, False, True]

In [136]: # TUPPLES ARE IMMUTABLE OBJECTS

# TUPPLES HAVE PARANTHESIS OR ROUND BRACKETS
numtupple=(23,34,24,25)
print(numtupple)
chartupple=('aaa',"BBBB",'CCC')
print(chartupple)

(23, 34, 24, 25)

('aaa', 'BBBB', 'CCC')

In [137]: # STRINGS are character based/text based

string="hello good morning"
print(string)

hello good morning

In [138]: # Sets created by placing items in curly brackets

# multiple datatypes int,float,string,tupple
# Sets are mutable
myset={1,"Hello",(1,2,3)}
print(myset)

# Ordered Output

{1, 'Hello', (1, 2, 3)}

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 1 of 19
Day 1 16/12/19, 17:40

In [139]: # DICTIONARIES - INTERGER OR STRINGS WITH KEYS

# WITH CURLY BRACKETS
dict1={'Name':'John','Age':24,'Gender':'M'}
print(dict1)
dict1['Name']='Pete'
print(dict1)

{'Name': 'John', 'Age': 24, 'Gender': 'M'}

{'Name': 'Pete', 'Age': 24, 'Gender': 'M'}

In [140]: mydict2={}
mydict2['Key1']=[1,2]
mydict2['key2']=['G','F']
print(mydict2)

{'Key1': [1, 2], 'key2': ['G', 'F']}

In [141]: # Indexing is done using square brackets

# PYTHON STARTS WITH 0
print(numlist[0]) # Positive Indexing
print(numlist[-5]) # Negative Indexing
print(dict1.get('Name')) # DICTIONARIES 'get'

print(dict1['Name']) # DICTIONARIES SQUAREBRACKET

print(numtupple[3])

2
3
Pete
Pete
25

In [142]: # FOR IMPORTING DATA FILEPATH WITH FILENAME &

# EXTENSION & CHANGE SLASHES TO OPPOSITE SIDE
# in DOUBLE QUOTES
bankchurn=pd.read_csv("/Users/rajeshprabhakar/Downloads/Churn_Model
ling.csv")

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 2 of 19
Day 1 16/12/19, 17:40

In [143]: bankchurn.head() # FIRST 5 ROWS of DATA DEFAULT

Out[143]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balance

0 1 15634602 Hargrave 619 France Female 42 2 0.00

1 2 15647311 Hill 608 Spain Female 41 1 83807.86

2 3 15619304 Onio 502 France Female 42 8 159660.80

3 4 15701354 Boni 699 France Female 39 1 0.00

4 5 15737888 Mitchell 850 Spain Female 43 2 125510.82

In [144]: bankchurn.tail()

Out[144]:
RowNumber CustomerId Surname CreditScore Geography Gender Age Tenure Balan

9995 9996 15606229 Obijiaku 771 France Male 39 5

9996 9997 15569892 Johnstone 516 France Male 35 10 57369.

9997 9998 15584532 Liu 709 France Female 36 7

9998 9999 15682355 Sabbatini 772 Germany Male 42 3 75075.

9999 10000 15628319 Walker 792 France Female 28 4 130142.

In [145]: bankchurn.columns

Out[145]: Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geogr

aphy',
'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'Has
CrCard',
'IsActiveMember', 'EstimatedSalary', 'Exited'],
dtype='object')

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 3 of 19
Day 1 16/12/19, 17:40

In [146]: bankchurn.dtypes
# DATA TYPES
# NUMBER - int64 & float64
# CHARACTER - object
# Date - Datetime
# Boolean - True False

Out[146]: RowNumber int64

CustomerId int64
Surname object
CreditScore int64
Geography object
Gender object
Age int64
Tenure int64
Balance float64
NumOfProducts int64
HasCrCard int64
IsActiveMember int64
EstimatedSalary float64
Exited int64
dtype: object

In [147]: bankchurn.shape # NUM of ROWS/OBSERVATIONS

# NUM of COLUMNS/VARIABLES

Out[147]: (10000, 14)

In [148]: bankchurn.describe().transpose()
# describe() - Basic DESCRIPTIVE STATISTICS

Out[148]:
count mean std min 25% 50%

RowNumber 10000.0 5.000500e+03 2886.895680 1.00 2500.75 5.000500e+03

CustomerId 10000.0 1.569094e+07 71936.186123 15565701.00 15628528.25 1.569074e+07

CreditScore 10000.0 6.505288e+02 96.653299 350.00 584.00 6.520000e+02

Age 10000.0 3.892180e+01 10.487806 18.00 32.00 3.700000e+01

Tenure 10000.0 5.012800e+00 2.892174 0.00 3.00 5.000000e+00

Balance 10000.0 7.648589e+04 62397.405202 0.00 0.00 9.719854e+04

NumOfProducts 10000.0 1.530200e+00 0.581654 1.00 1.00 1.000000e+00

HasCrCard 10000.0 7.055000e-01 0.455840 0.00 0.00 1.000000e+00

IsActiveMember 10000.0 5.151000e-01 0.499797 0.00 0.00 1.000000e+00

EstimatedSalary 10000.0 1.000902e+05 57510.492818 11.58 51002.11 1.001939e+05

Exited 10000.0 2.037000e-01 0.402769 0.00 0.00 0.000000e+00

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 4 of 19
Day 1 16/12/19, 17:40

In [149]: # EXPLORATORY DATA ANALYSIS (EDA) - BASIC

# UNDERSTANDING OF DATA
# DESCRIPTIVE STATISTICS+DATA VIZUALIZATION -EDA
# DESCRIPTIVE STATISTICS
# COUNT, MINIMUM, MAXIMUM
# MEASURES OF CENTRAL TENDENCY - MEAN, MEDIAN,
# MODE
# MEASURES OF DISPERSION - VARIANCE FROM MEAN
# RANGE, VARIANCE, STANDARD DEVIATION, QUARTILES
# PERCENTILES,DECILES
# MEASURES OF ASYMMETRY - SKEWNESS & KURTOSIS
# MEASURES OF RELATIONSHIP - COVARIANCE &
# CORRELATION

In [150]: # INDEXING OF ROWS AND COLUMNS - DATAFRAME

# CHARACTER INDEXING - BY COLUMN NAME
bankchurn.Balance.describe()

Out[150]: count 10000.000000

mean 76485.889288
std 62397.405202
min 0.000000
25% 0.000000
50% 97198.540000
75% 127644.240000
max 250898.090000
Name: Balance, dtype: float64

In [151]: bankchurn['Balance'].describe()

Out[151]: count 10000.000000

mean 76485.889288
std 62397.405202
min 0.000000
25% 0.000000
50% 97198.540000
75% 127644.240000
max 250898.090000
Name: Balance, dtype: float64

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 5 of 19
Day 1 16/12/19, 17:40

In [152]: bankchurn[['Balance','Tenure','Age']].describe()
# multi column indexing - double square brackets

Out[152]:
Balance Tenure Age

count 10000.000000 10000.000000 10000.000000

mean 76485.889288 5.012800 38.921800

std 62397.405202 2.892174 10.487806

min 0.000000 0.000000 18.000000

25% 0.000000 3.000000 32.000000

50% 97198.540000 5.000000 37.000000

75% 127644.240000 7.000000 44.000000

max 250898.090000 10.000000 92.000000

In [153]: # NUMERICAL INDEXING - COLUMN NUMBER

# PYTHON STARTS WITH 0
bankchurn.iloc[:,[6,7,8]].describe()
# :, before represents COLUMN Indexing
# IF :, not given default is ROW INDEXING

Out[153]:
Age Tenure Balance

count 10000.000000 10000.000000 10000.000000

mean 38.921800 5.012800 76485.889288

std 10.487806 2.892174 62397.405202

min 18.000000 0.000000 0.000000

25% 32.000000 3.000000 0.000000

50% 37.000000 5.000000 97198.540000

75% 44.000000 7.000000 127644.240000

max 92.000000 10.000000 250898.090000

In [154]: bankchurn.iloc[100:150,4:8] # FIRST ROW INDEX

# SECOND COLUMN INDEX # SUBSETTING DATA FRAME

Out[154]:
Geography Gender Age Tenure

100 France Female 40 6

101 France Female 44 6

102 France Male 31 9

103 Spain Male 36 7

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 6 of 19
Day 1 16/12/19, 17:40

104 Spain Female 65 1

105 Spain Female 46 4

106 Germany Male 32 1

107 Germany Female 36 2

108 Spain Male 33 5

109 Germany Male 35 9

110 Germany Male 30 3

111 Germany Male 39 7

112 France Male 42 2

113 Spain Male 36 9

114 Germany Male 28 9

115 Germany Female 30 9

116 Germany Female 37 6

117 France Female 41 1

118 Germany Female 31 8

119 Germany Male 34 4

120 France Male 34 8

121 Spain Male 39 6

122 France Female 39 6

123 Germany Female 48 10

124 France Female 28 3

125 France Male 42 9

126 France Female 52 1

127 Germany Male 56 0

128 France Female 41 7

129 France Male 24 9

130 France Female 34 3

131 Germany Female 33 9

132 France Male 38 9

133 France Male 25 1

134 Germany Male 39 7

135 Germany Female 50 5

136 Germany Female 34 5

137 France Male 40 2

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 7 of 19
Day 1 16/12/19, 17:40

138 Spain Female 48 2

139 Spain Female 35 1

140 Germany Male 44 10

141 France Male 34 7

142 France Female 43 5

143 Spain Male 52 2

144 France Female 31 5

145 Spain Female 21 5

146 Spain Female 29 8

147 France Male 37 5

148 France Male 44 9

149 France Male 32 0

In [155]: # INDIVIDUAL STATISTICAL FUNCTIONS - NUMPY

print(np.min(bankchurn.Balance))
print(np.max(bankchurn.Balance))
print(np.mean(bankchurn.Balance))
print(np.median(bankchurn.Balance))
print(np.var(bankchurn.Balance))
print(np.std(bankchurn.Balance))
print(np.quantile(bankchurn.Balance,0.25))
print(np.quantile(bankchurn.Balance,0.50))
print(np.quantile(bankchurn.Balance,0.75))
print(np.quantile(bankchurn.Balance,0.80)) # 80%
print(bankchurn.Balance.skew()) # Pandas
print(bankchurn.Balance.kurt()) # Pandas

0.0
250898.09
76485.88928799961
97198.54000000001
3893046832.3731775
62394.285254125454
0.0
97198.54000000001
127644.24
133710.358
-0.14110871094154384
-1.489411767941925

In [156]: print(np.quantile(bankchurn.Balance,0.50))# Q2
print(np.percentile(bankchurn.Balance,50))# Q2

97198.54000000001
97198.54000000001

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 8 of 19
Day 1 16/12/19, 17:40

In [157]: bankchurn.dtypes

Out[157]: RowNumber int64

In [158]: # NON NUMERIC DATA - object

# SOME NUMERIC COLUMNS MAY BE CATEGORICAL IN
# NATURE - 0 & 1 (No- 0 & Yes- 1)

In [159]: pd.value_counts(bankchurn.HasCrCard)

Out[159]: 1 7055
0 2945
Name: HasCrCard, dtype: int64

In [160]: pd.value_counts(bankchurn.Exited)
# 1 - Customer Exited 0 - Customer Not Exited

Out[160]: 0 7963
1 2037
Name: Exited, dtype: int64

In [161]: pd.value_counts(bankchurn.IsActiveMember)

Out[161]: 1 5151
0 4849
Name: IsActiveMember, dtype: int64

In [162]: bankchurn.columns

Out[162]: Index(['RowNumber', 'CustomerId', 'Surname', 'CreditScore', 'Geogr

aphy',
'Gender', 'Age', 'Tenure', 'Balance', 'NumOfProducts', 'Has
CrCard',
'IsActiveMember', 'EstimatedSalary', 'Exited'],
dtype='object')

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 9 of 19
Day 1 16/12/19, 17:40

In [163]: categorycols=['HasCrCard','IsActiveMember',
'Exited','Geography','Gender']

In [164]: for col in categorycols:

freqcounts=pd.value_counts(bankchurn[col])
print(freqcounts)

1 7055
0 2945
Name: HasCrCard, dtype: int64
1 5151
0 4849
Name: IsActiveMember, dtype: int64
0 7963
1 2037
Name: Exited, dtype: int64
France 5014
Germany 2509
Spain 2477
Name: Geography, dtype: int64
Male 5457
Female 4543
Name: Gender, dtype: int64

In [165]: # DATA VIZUALIZATION

pd.value_counts(bankchurn.Gender).plot(
kind='pie')

Out[165]: <matplotlib.axes._subplots.AxesSubplot at 0x1a2adf6610>

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 10 of 19
Day 1 16/12/19, 17:40

In [166]: # ADVANCE PLOTS - HISTOGRAM, BOXPLOT & DENSITY

# CURVE - UNIVARIATE PLOTS
# 3 PLOTS IDENTIFY SKEWNESS, KURTOSIS, OUTLIERS,
# MISSING VALUES, HOW CLOSE TO NORMAL DISTRIBUTION
# HISTOGRAM - FREQUENCY DISTRIBUTION TABLE
# BOXPLOT - QUARTILES ARE USED
# DENSITY CURVE - Z SCORE OR STANDARD SCORE

# BOXPLOT OUTLIERS DETECTTION - IQR=Q3-Q1

# Q1-1.5*IQR
# Q3+1.5*IQR

In [167]: bankchurn.Balance.plot(kind='hist',color="red")

Out[167]: <matplotlib.axes._subplots.AxesSubplot at 0x1a2d146110>

In [168]: bankchurn.EstimatedSalary.plot(kind='box',vert=False)

Out[168]: <matplotlib.axes._subplots.AxesSubplot at 0x1a2ecde950>

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 11 of 19
Day 1 16/12/19, 17:40

In [169]: bankchurn.Balance.plot(kind='density')

Out[169]: <matplotlib.axes._subplots.AxesSubplot at 0x1a3028f4d0>

In [170]: # BIVARIATE STATISTICS

# COVARIANCE & CORRELATION
bankchurn[['Balance','EstimatedSalary']].cov()
# Covariance

Out[170]:
Balance EstimatedSalary

Balance 3.893436e+09 4.592389e+07

EstimatedSalary 4.592389e+07 3.307457e+09

In [171]: bankchurn[['Balance','EstimatedSalary']].corr()
# Correlation

Out[171]:
Balance EstimatedSalary

Balance 1.000000 0.012797

EstimatedSalary 0.012797 1.000000

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 12 of 19
Day 1 16/12/19, 17:40

In [172]: bankchurn.plot(kind='scatter',x='Balance',
y='EstimatedSalary')

Out[172]: <matplotlib.axes._subplots.AxesSubplot at 0x1a2d9c6990>

In [173]: # GROUPING FUNCTION - "groupby" - Slicing Function

# LEFTSIDE GROUPBY MUST BE NUMERICAL
# RIGHTSIDE GROUPBY MUST BE CATEGORICAL
# SPECIFY A STATISTICAL FUNCTION MUST
# 2 TYPES CUSTOMER - EXITED & NOT EXITED
# IS THE AVERAGE BALANCE MAINTAINED BY THESE 2 TYPE
# OF CUSTOMERS EQUAL/SAME.

print(pd.value_counts(bankchurn.Exited))
bankchurn.Balance.groupby(bankchurn.Exited).mean()

0 7963
1 2037
Name: Exited, dtype: int64

Out[173]: Exited
0 72745.296779
1 91108.539337
Name: Balance, dtype: float64

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 13 of 19
Day 1 16/12/19, 17:40

In [174]: pd.crosstab(bankchurn.Exited,bankchurn.Gender,
normalize='index')
# CROSS TABULATION _ FREQUENCY of 2 Categorical
# Variables

Out[174]:
Gender Female Male

Exited

0 0.427477 0.572523

1 0.559156 0.440844

In [175]: bankchurn.Balance.groupby([bankchurn.Exited,
bankchurn.Gender,
bankchurn.Geography]).mean()

Out[175]: Exited Gender

Geography
0 Female
France 58424.310061
Germany 118828.514362
Spain 56594.820688
Male France 61774.503758
Germany 119896.529105
Spain 61871.626285
1 Female France 67755.162630
Germany 119673.872321
Spain 71997.673680
Male France 75710.827800
Germany 121202.242432
Spain 73167.867802
Name: Balance, dtype: float64

In [176]: bankchurn.Balance.groupby(
bankchurn.Exited).mean()

Out[176]: Exited
0 72745.296779
1 91108.539337
Name: Balance, dtype: float64

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 14 of 19
Day 1 16/12/19, 17:40

In [177]: # HYPOTHESIS TESTING - COMPARISION OF MEANS

# COMPARING 2 GROUPS - 2 SAMPLE INDEPENDENT TTEST
# COMPARING MORE THAN 2 GROUPS - ANOVA SINGLE
# FACTOR
# ASSUMPTIONS:
# MEANS OF GROUPS MUST BE DIFFERENT
# NUMERICAL VARIABLE MUST BE CONTINOUS, CLOSER
# TO NORMAL DISTRIBUTION, NO OUTLIERS & MISSING
# VALUES
# OTHER VARIABLE MUST BE CATEGORICAL WITH
# EXACTLY 2 GROUPS OR MORE THAN 2 GROUPS

#NULL - THERE IS NO SIGNIFICANT DIFFERENCE IN

# AVERAGE OF GROUP 1 & GROUP 2
#ALT -THERE IS SIGNIFICANT DIFFERENCE IN
# AVERAGE OF GROUP 1 & GROUP 2

# P-VALUE LESS THAN 0.05, REJECT NULL & ACCEPT

# ALTERNATE
# P-VALUE GREATER THAN 0.05, FAIL TO REJECT NULL
# (ACCEPT NULL) & REJECT ALTERNATE

# PROCESS IN PYTHON
# GROUPBY - TO IDENTIFY NUM OF GROUPS & CONFIRM
# MEAN DIFFERENCE
# SPLIT THE DATAFRAME INTO SUBSETS BASED ON
# GROUPS
# CONDUCT THE RELEVANT TEST & INFER BASED ON
# P-VALUE

In [178]: # SPLIT DATAFRAME INTO EXITED AND NONEXITED

exited=bankchurn[bankchurn.Exited==1]
nonexited=bankchurn[bankchurn.Exited==0]
print(exited.shape)
print(nonexited.shape)

(2037, 14)
(7963, 14)

In [179]: from scipy.stats import ttest_ind

# 2 Sample Independent T test
ttest_ind(exited.Balance,nonexited.Balance,
equal_var=False)
# SINCE p-value less tha 0.05, REJECT NULL

Out[179]: Ttest_indResult(statistic=12.47128032005069, pvalue=6.318663518527

793e-35)

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 15 of 19
Day 1 16/12/19, 17:40

In [180]: # IS THE AVERAGE BALANCE MAINTAINED BY MALE &

# FEMALE CUSTOMERS SAME or EQUAL?
# groupby
# split data into male & female
# Frame NULL & ALT
# Conduct Test & Infer

In [181]: bankchurn.Balance.groupby(
bankchurn.Gender).mean()

Out[181]: Gender
Female 75659.369139
Male 77173.974506
Name: Balance, dtype: float64

In [182]: # NULL - THERE IS NO SIGNIFICANT DIFFERENCE IN

# AVERAGE BALANCE OF MALE & FEMALE CUSTOMERS
# ALT - THERE IS SIGNIFICANT DIFFERENCE IN
# AVERAGE BALANCE OF MALE & FEMALE CUSTOMERS
male=bankchurn[bankchurn.Gender=='Male']
female=bankchurn[bankchurn.Gender=='Female']

In [183]: ttest_ind(male.Balance,female.Balance,
equal_var=False)
# SINCE p-value greater than 0.05, Fail to Reject
# Null

Out[183]: Ttest_indResult(statistic=1.2095754533033731, pvalue=0.22647131768

566747)

In [185]: # IS THE AVERAGE BALANCE MAINTAINED BY DIFFERENT

# GEOGRAPHY EQUAL OR SAME?
bankchurn.Balance.groupby(
bankchurn.Geography).mean()

Out[185]: Geography
France 62092.636516
Germany 119730.116134
Spain 61818.147763
Name: Balance, dtype: float64

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 16 of 19
Day 1 16/12/19, 17:40

In [188]: # SINCE MORE THAN 2 LEVELs - ANOVA ONE WAY or

# ANOVA SINGLE FACTOR
france=bankchurn[bankchurn.Geography=='France']
germany=bankchurn[bankchurn.Geography=='Germany']
spain=bankchurn[bankchurn.Geography=='Spain']

# NULL - THERE IS NO SIGNIFICANT DIFFERENCE IN

# AVERAGE BALANCE OF FRANCE, GERMANY & SPAIN
# alt - THERE IS SIGNIFICANT DIFFERENCE IN
# AVERAGE BALANCE OF FRANCE, GERMANY & SPAIN

In [189]: from scipy.stats import f_oneway

f_oneway(france.Balance,germany.Balance,
spain.Balance)
# SINCE p-value less than 0.05, REJECT NULL

Out[189]: F_onewayResult(statistic=958.4254463368385, pvalue=0.0)

In [191]: # DISTRIBUTIONS
# T TEST - STUDENT T DISTRIBUTION
# ANOVA SINGLE FACTOR - F DISTRIBUTION
# CHI SQUARE TEST - CHI DISTRIBUTION

# BERNOULLI BINOMIAL DISTRIBUTION - BINARY

# YES/NO or TRUE/FALSE - LOGISTIC REGRESSION
# POISSON DISTRIBUTION - INSURANCE DATA
# EXPONENTIAL DISTRIBUTION - TRAFFIC DATA
# NORMAL DISTRIBUTION or GAUSSIAN DISTRIBUTION

In [193]: # ONE NUMERIC VARIABLE OTHER CATEGORICAL VARIABLE

# WITH EXCATLY 2 LEVELS - T TEST
# ONE NUMERIC VARIABLE OTHER CATEGORICAL VARIABLE
# WITH morethan 2 LEVELS - ANOVA SINGLE FACTOR

# BOTH THE VARIABLES NON NUMERICAL or Categorical

# CHI SQUARE TEST OF INDEPENDENCE
# INPUT for CHI SQUARE TEST IS CROSSTABULATION
# NULL - THERE IS NO ASSOCIATION/RELATIONSHIP
# between BOTH VARIABLES
# ALT - THERE IS ASSOCIATION/RELATIONSHIP
# between BOTH VARIABLES

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 17 of 19
Day 1 16/12/19, 17:40

In [194]: # IS THERE ASSOCIATION BETWEEN GENDER & EXITED?

pd.crosstab(bankchurn.Gender,bankchurn.Exited)

Out[194]:
Exited 0 1

Gender

Female 3404 1139

Male 4559 898

In [197]: from scipy.stats import chi2_contingency

chi2_contingency(pd.crosstab(bankchurn.Gender,
bankchurn.Exited))
# Since 2.2482100097131755e-26 less than 0.05, REJECT
# NULL

Out[197]: (112.91857062096116, 2.2482100097131755e-26, 1, array([[3617.5909,

925.4091],
[4345.4091, 1111.5909]]))

In [198]: # IS THERE ASSOCIATION BETWEEN GENDER & HASCRCARD

chi2_contingency(pd.crosstab(bankchurn.Gender,
bankchurn.HasCrCard))
# SINCE p-value greater than 0.05, Fail to REJECT NULL

Out[198]: (0.30756077917984026, 0.5791808600631774, 1, array([[1337.9135, 32

05.0865],
[1607.0865, 3849.9135]]))

In [200]: # PAIRED SAMPLE T TEST - BEFORE & AFTER

# INDIA's INFLATION BEFORE & AFTER DEMONITIZATION
befordemon=[6.59,6.13,6.46,5.30,4.14,3.35]
afterdemon=[2.23,1.86,2.62,2.61,2.21,1.09]
print(np.mean(befordemon))
print(np.mean(afterdemon))

# NULL - THERE IS NO SIGNIFICANT DIFFERENCE IN AVERAGE

# INFLATION BEFORE AND AFTER DEMONITIZATION
# ALT - THERE IS SIGNIFICANT DIFFERENCE IN AVERAGE
# INFLATION BEFORE AND AFTER DEMONITIZATION

5.328333333333334
2.1033333333333335

In [202]: from scipy.stats import ttest_rel

ttest_rel(befordemon,afterdemon)
# SINCE p-value less than 0.05, REJECT NULL

Out[202]: Ttest_relResult(statistic=7.429383454110612, pvalue=0.000696190145

3565623)

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 18 of 19
Day 1 16/12/19, 17:40

In [203]: # BEFORE & AFTER GST MONTHLY INFLATION

beforegst=[1.86,2.62,2.61,2.21,1.09,1.08]
aftergst=[2.52,2.89,3.24,3.97,4.00,5.11]
print(np.mean(beforegst))
print(np.mean(aftergst))

1.9116666666666668
3.6216666666666666

In [204]: ttest_rel(beforegst,aftergst)

Out[204]: Ttest_relResult(statistic=-2.8027263826421933, pvalue=0.0378716146

2210822)

In [ ]:

http://localhost:8889/nbconvert/html/SBIIT%20B2/Day%201%20.ipynb?download=false Page 19 of 19

Hourglass Workout Program by Luisagiuliet 2
76% (21)
Hourglass Workout Program by Luisagiuliet 2
51 pages
12 Week Program: Summer Body Starts Now
89% (45)
12 Week Program: Summer Body Starts Now
70 pages
The Hold Me Tight Workbook - Dr. Sue Johnson
100% (16)
The Hold Me Tight Workbook - Dr. Sue Johnson
187 pages
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
77% (13)
Livingood, Blake - Livingood Daily Your 21-Day Guide To Experience Real Health
260 pages
Facial Gains Guide (001 081)
91% (45)
Facial Gains Guide (001 081)
81 pages
Cheat Code To The Universe
94% (77)
Cheat Code To The Universe
34 pages
Curse of Strahd
95% (467)
Curse of Strahd
258 pages
The Psychiatric Interview - Daniel Carlat
91% (34)
The Psychiatric Interview - Daniel Carlat
473 pages
Shortcut To Shred Ebook Revised 9-9-2015 PDF
86% (7)
Shortcut To Shred Ebook Revised 9-9-2015 PDF
15 pages
Anastasia: The New Broadway Musical (LIBRETTO)
94% (174)
Anastasia: The New Broadway Musical (LIBRETTO)
117 pages
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
94% (212)
COSMIC CONSCIOUSNESS OF HUMANITY - PROBLEMS OF NEW COSMOGONY (V.P.Kaznacheev,. Л. V. Trofimov.)
212 pages
The Secret Language of Attraction
86% (107)
The Secret Language of Attraction
278 pages
How To Develop and Write A Grant Proposal
83% (541)
How To Develop and Write A Grant Proposal
17 pages
KamaSutra Positions
78% (69)
KamaSutra Positions
55 pages
7 Hermetic Principles
93% (28)
7 Hermetic Principles
3 pages
27 Feedback Mechanisms Pogil Key
75% (12)
27 Feedback Mechanisms Pogil Key
6 pages
Frank Hammond - List of Demons
92% (92)
Frank Hammond - List of Demons
3 pages
Phone Codes
78% (27)
Phone Codes
5 pages
36 Questions That Lead To Love
91% (35)
36 Questions That Lead To Love
3 pages
How 2 Setup Trust
97% (307)
How 2 Setup Trust
3 pages
Singer's Anthology Master Song and Show Index 2008 PDF
37% (43)
Singer's Anthology Master Song and Show Index 2008 PDF
38 pages
(Psilocybin) How To Grow Magic Mushrooms A Simple Psilocybe Cubensis Growing Technique PDF
75% (8)
(Psilocybin) How To Grow Magic Mushrooms A Simple Psilocybe Cubensis Growing Technique PDF
48 pages
Cellular Communication POGIL
80% (10)
Cellular Communication POGIL
5 pages
The 36 Questions That Lead To Love - The New York Times
94% (34)
The 36 Questions That Lead To Love - The New York Times
3 pages
100 Questions To Ask Your Partner
80% (35)
100 Questions To Ask Your Partner
2 pages
Jeffrey Epstein39s Little Black Book Unredacted PDF
75% (12)
Jeffrey Epstein39s Little Black Book Unredacted PDF
95 pages
Cisco 700-250
No ratings yet
Cisco 700-250
4 pages
ALCHEMIST
64% (14)
ALCHEMIST
4 pages
1001 Songs
71% (69)
1001 Songs
1,798 pages
Trademark License Agreement
78% (381)
Trademark License Agreement
3 pages
Zodiac Sign & Their Most Common Addictions
63% (30)
Zodiac Sign & Their Most Common Addictions
9 pages
Supervised Learning Project - Ipynb - Colab
No ratings yet
Supervised Learning Project - Ipynb - Colab
14 pages
Churn For Bank Customers
No ratings yet
Churn For Bank Customers
28 pages
seaborn
No ratings yet
seaborn
1 page
House Price Prediction Models
No ratings yet
House Price Prediction Models
16 pages
Decision Tree
No ratings yet
Decision Tree
12 pages
Linear and Multilinear Regression
No ratings yet
Linear and Multilinear Regression
5 pages
SVM Updated
No ratings yet
SVM Updated
12 pages
Twitter Sentiment Analysis
No ratings yet
Twitter Sentiment Analysis
13 pages
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
No ratings yet
Observation: As We Can See We Have Threwe Types of Datatypes I.E. (Int, Float, Object) That Means We Have Both Categorical and Numerical Data
2 pages
Churn Prediction Model
No ratings yet
Churn Prediction Model
36 pages
Python Notes
No ratings yet
Python Notes
118 pages
Loading The Dataset: 'Churn - Modelling - CSV'
No ratings yet
Loading The Dataset: 'Churn - Modelling - CSV'
6 pages
DS - Assig-03-Part-I - Jupyter Notebook
No ratings yet
DS - Assig-03-Part-I - Jupyter Notebook
8 pages
paddy diesease
No ratings yet
paddy diesease
20 pages
Clasificación: Paquetes A Cargar y Data Que Se Va A Utilizar
No ratings yet
Clasificación: Paquetes A Cargar y Data Que Se Va A Utilizar
49 pages
STA1040 MidSem Exam
No ratings yet
STA1040 MidSem Exam
12 pages
SPPUML3
No ratings yet
SPPUML3
12 pages
GmPrac3 - Jupyter Notebook
No ratings yet
GmPrac3 - Jupyter Notebook
10 pages
Customer Mail Analysis
No ratings yet
Customer Mail Analysis
11 pages
Ens Lab Manual: Sir Bhavsinhji Polytechnic Institute Bhavnagar
No ratings yet
Ens Lab Manual: Sir Bhavsinhji Polytechnic Institute Bhavnagar
51 pages
Germany Credit Analysis
No ratings yet
Germany Credit Analysis
41 pages
EDA On FIFA Dataset: Importing Essential Libraries
No ratings yet
EDA On FIFA Dataset: Importing Essential Libraries
21 pages
6
No ratings yet
6
4 pages
Python Fundamentals for Non - Programmers
No ratings yet
Python Fundamentals for Non - Programmers
15 pages
Sunbase Data Assignment
No ratings yet
Sunbase Data Assignment
11 pages
Python 5,6,7
No ratings yet
Python 5,6,7
15 pages
geya-fds
No ratings yet
geya-fds
34 pages
230103-ECON209_S2025__Lab_2.ipynb-Colab
No ratings yet
230103-ECON209_S2025__Lab_2.ipynb-Colab
10 pages
Python Quick Note
No ratings yet
Python Quick Note
23 pages
TSlab1 Revathy
No ratings yet
TSlab1 Revathy
6 pages
Instructions:: Mltest2question - Jupyter Notebook
No ratings yet
Instructions:: Mltest2question - Jupyter Notebook
6 pages
Pandas
No ratings yet
Pandas
49 pages
Aerofit_business_Case - JupyterLab
No ratings yet
Aerofit_business_Case - JupyterLab
36 pages
igcse_compsci_2ed_python_sol
No ratings yet
igcse_compsci_2ed_python_sol
138 pages
Cia Code
No ratings yet
Cia Code
38 pages
Reading Data: #Importing Required Libraries
No ratings yet
Reading Data: #Importing Required Libraries
16 pages
Exploratory Data Analysis66
No ratings yet
Exploratory Data Analysis66
17 pages
Akshara Senthil 10B Computer Record Final
No ratings yet
Akshara Senthil 10B Computer Record Final
58 pages
2 Tekrek M7 KNN - DGX 1
No ratings yet
2 Tekrek M7 KNN - DGX 1
15 pages
DialPlan SipSorcery
No ratings yet
DialPlan SipSorcery
4 pages
18BCE10291 - Outliers Assignment
No ratings yet
18BCE10291 - Outliers Assignment
10 pages
Python Session3 Strings Lists
No ratings yet
Python Session3 Strings Lists
13 pages
Python - Morse Code
No ratings yet
Python - Morse Code
4 pages
Practical File Questions With Answers
No ratings yet
Practical File Questions With Answers
7 pages
QUIZ Week 2 CART Practice PDF
No ratings yet
QUIZ Week 2 CART Practice PDF
10 pages
Divyanshi Kesharwani, Section (C), LNCCBBA21167 PDF
No ratings yet
Divyanshi Kesharwani, Section (C), LNCCBBA21167 PDF
27 pages
Numbers - 1 - Jupyter Notebook
No ratings yet
Numbers - 1 - Jupyter Notebook
8 pages
ZEOGEN
No ratings yet
ZEOGEN
3 pages
DL (Pra 01)
No ratings yet
DL (Pra 01)
9 pages
Output1
No ratings yet
Output1
8 pages
Analyzing Taxi Trends
No ratings yet
Analyzing Taxi Trends
43 pages
Mall Customer
No ratings yet
Mall Customer
1 page
Final Jatin CG
No ratings yet
Final Jatin CG
56 pages
P2 Pseudocode - Topical and Yearly
No ratings yet
P2 Pseudocode - Topical and Yearly
733 pages
SN Travel Jupyter Notebook PDF
No ratings yet
SN Travel Jupyter Notebook PDF
28 pages
Final Documentation Egg Capture
No ratings yet
Final Documentation Egg Capture
15 pages
PyhtonBasics S1
No ratings yet
PyhtonBasics S1
13 pages
Aerofit
No ratings yet
Aerofit
7 pages
Basic Python: What We Have Learned From The Above Simple Program?
No ratings yet
Basic Python: What We Have Learned From The Above Simple Program?
12 pages
Let's Get Coding
From Everand
Let's Get Coding
Philip Searle
No ratings yet
Assignment 1 Solved
No ratings yet
Assignment 1 Solved
4 pages
Cluster Analysis: Grouping Cases or Variables
No ratings yet
Cluster Analysis: Grouping Cases or Variables
42 pages
Calculating Variance and Standard Deviation PDF
100% (1)
Calculating Variance and Standard Deviation PDF
3 pages
file17042024113104
No ratings yet
file17042024113104
42 pages
Beppu Integrated Resort - Team U
No ratings yet
Beppu Integrated Resort - Team U
27 pages
Introduction To Quantitative Research
No ratings yet
Introduction To Quantitative Research
23 pages
PL 300
No ratings yet
PL 300
89 pages
Develop Med Child Neuro - 2020 - Ryan - Progressive Resistance Training For Adolescents With Cerebral Palsy The STAR
No ratings yet
Develop Med Child Neuro - 2020 - Ryan - Progressive Resistance Training For Adolescents With Cerebral Palsy The STAR
12 pages
Exploratory Data Analysis Syllabus
No ratings yet
Exploratory Data Analysis Syllabus
2 pages
Diagnostic Test For Practical Research 1
No ratings yet
Diagnostic Test For Practical Research 1
2 pages
3 (Uo (O
No ratings yet
3 (Uo (O
28 pages
Program Calender - July 2020 Data Science - Sheet1
No ratings yet
Program Calender - July 2020 Data Science - Sheet1
2 pages
Social Research - Pasuc 6 Format - Convert
100% (1)
Social Research - Pasuc 6 Format - Convert
13 pages
ORANGE
No ratings yet
ORANGE
18 pages
Association Rule Mining - Part I
No ratings yet
Association Rule Mining - Part I
21 pages
Analyses of Systems Theory For Construction Accident Prevention
No ratings yet
Analyses of Systems Theory For Construction Accident Prevention
46 pages
PDF Af La Sirenita Colorear - Compress
0% (1)
PDF Af La Sirenita Colorear - Compress
10 pages
An Introduction To Statistical Learning
No ratings yet
An Introduction To Statistical Learning
19 pages
Ram S Resume Feb 2024 New
No ratings yet
Ram S Resume Feb 2024 New
2 pages
Guidelines For Master Paper - BOU
No ratings yet
Guidelines For Master Paper - BOU
5 pages
ML Lab
No ratings yet
ML Lab
9 pages
Coursera Capstone - Project Report
No ratings yet
Coursera Capstone - Project Report
11 pages
AIML Manual V1!6!83 Removed
No ratings yet
AIML Manual V1!6!83 Removed
51 pages
Introductory Econometrics Exam Memo
100% (4)
Introductory Econometrics Exam Memo
9 pages
ARCH GARCH Assignment
No ratings yet
ARCH GARCH Assignment
8 pages
Ch03題庫
No ratings yet
Ch03題庫
14 pages
Auto Sum
No ratings yet
Auto Sum
65 pages
7 Report Writing
No ratings yet
7 Report Writing
10 pages
POM Final
No ratings yet
POM Final
42 pages