Descriptive Analytics2.Ipynb - Colab
Descriptive Analytics2.Ipynb - Colab
ipynb - Colab
Descriptive analytis
1. statistics module
2. pandas
pandas
load dataset
import pandas as pd
data = pd.read_csv('/content/sample_data/Inc_Exp_Data.csv')
print('Dataset dimension:',data.shape)
print('Columns :\n',data.columns)
data.head()
2 10000 4500 2 0 1
3 10000 2000 1 0
dataset contains 50 rows and 7 columns. 6 features are numeric and highetest-qualified feature
is string no null values
data.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 50 entries, 0 to 49
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Mthly_HH_Income 50 non-null int64
1 Mthly_HH_Expense 50 non-null int64
2 No_of_Fly_Members 50 non-null int64
3 Emi_or_Rent_Amt 50 non-null int64
https://colab.research.google.com/drive/1xk9I7-lnL2a6XcZmwsJ8BlsWA7Drd1do#scrollTo=MSxC8SClqcVL&printMode=true 1/9
8/23/24, 11:48 AM descriptive analytics.ipynb - Colab
4 Annual_HH_Income 50 non-null int64
5 Highest_Qualified_Member 50 non-null object
6 No_of_Earning_Members 50 non-null int64
dtypes: int64(6), object(1)
memory usage: 2.9+ KB
data.describe()
bp = data[['Mthly_HH_Income', 'Mthly_HH_Expense','No_of_Fly_Members','No_of_Earning_Mem
plt.show()
https://colab.research.google.com/drive/1xk9I7-lnL2a6XcZmwsJ8BlsWA7Drd1do#scrollTo=MSxC8SClqcVL&printMode=true 2/9
8/23/24, 11:48 AM descriptive analytics.ipynb - Colab
bp=plt.boxplot(data['Mthly_HH_Income'])
print(type(bp))
print(bp.keys())
<class 'dict'>
dict_keys(['whiskers', 'caps', 'boxes', 'medians', 'fliers', 'means'])
{'whiskers': [<matplotlib.lines.Line2D object at 0x7fb109d0e5c0>, <matplotlib.lines.L
remove outliers
data.drop(data[data['Mthly_HH_Income']>mx].index, inplace=True)
plt.boxplot(data['Mthly_HH_Income'])
https://colab.research.google.com/drive/1xk9I7-lnL2a6XcZmwsJ8BlsWA7Drd1do#scrollTo=MSxC8SClqcVL&printMode=true 3/9
8/23/24, 11:48 AM descriptive analytics.ipynb - Colab
https://colab.research.google.com/drive/1xk9I7-lnL2a6XcZmwsJ8BlsWA7Drd1do#scrollTo=MSxC8SClqcVL&printMode=true 4/9
8/23/24, 11:48 AM descriptive analytics.ipynb - Colab
Average earning members: 1.46
categorical fields
visualizations
earn_members = data['No_of_Earning_Members'].unique()
earn_members
array([1, 2, 3, 4])
earn_members = data['No_of_Earning_Members'].unique()
plt.hist(data['No_of_Earning_Members'])
plt.title('Number of earning members in the families')
plt.xlabel('No. of earning members')
plt.ylabel('Count of families')
plt.xticks(earn_members )
#plt.yticks(range(1,len(data),3))
plt.show()
https://colab.research.google.com/drive/1xk9I7-lnL2a6XcZmwsJ8BlsWA7Drd1do#scrollTo=MSxC8SClqcVL&printMode=true 5/9
8/23/24, 11:48 AM descriptive analytics.ipynb - Colab
family_members = data['No_of_Fly_Members'].unique()
plt.hist(data['No_of_Fly_Members'])
plt.title('Number of Flamily Members in the families')
plt.xlabel('No. of family Members')
plt.ylabel('Count of families')
plt.xticks(family_members )
plt.show()
https://colab.research.google.com/drive/1xk9I7-lnL2a6XcZmwsJ8BlsWA7Drd1do#scrollTo=MSxC8SClqcVL&printMode=true 6/9
8/23/24, 11:48 AM descriptive analytics.ipynb - Colab
bar chart
x = range(len(data))
idx = [i+0.4 for i in x]
plt.bar(x,data['No_of_Fly_Members'], width=0.4, label='Family members')
plt.bar(idx,data['No_of_Earning_Members'], width=0.4,label='Earning members')
plt.title('No. of family members & earning members in each family')
plt.legend()
plt.xticks(range(0,51,5))
plt.ylabel('Count')
plt.show()
https://colab.research.google.com/drive/1xk9I7-lnL2a6XcZmwsJ8BlsWA7Drd1do#scrollTo=MSxC8SClqcVL&printMode=true 7/9
8/23/24, 11:48 AM descriptive analytics.ipynb - Colab
plt.plot(data['Mthly_HH_Income'], label='Income')
plt.plot(data['Mthly_HH_Expense'], label='Expenditure')
plt.legend()
plt.title('Family Income vs Expenditure')
plt.ylabel('Amount ')
plt.show()
https://colab.research.google.com/drive/1xk9I7-lnL2a6XcZmwsJ8BlsWA7Drd1do#scrollTo=MSxC8SClqcVL&printMode=true 8/9
8/23/24, 11:48 AM descriptive analytics.ipynb - Colab
x = data['No_of_Earning_Members'].value_counts()
print(x)
plt.pie(x,labels=x.index, autopct='%.0f%%' )
plt.title('Proportion of No. of Earning members in the families ')
plt.show()
1 33
2 12
3 4
4 1
Name: No_of_Earning_Members, dtype: int64
x = data['Highest_Qualified_Member'].value_counts()
print(x)
plt.pie(x,labels=x.index,autopct='%.0f%%' )
plt.title('Proportion of highest qualified in the families ')
plt.show()
Graduate 19
Under-Graduate 10
Professional 10
Post-Graduate 6
Illiterate 5
Name: Highest_Qualified_Member, dtype: int64
https://colab.research.google.com/drive/1xk9I7-lnL2a6XcZmwsJ8BlsWA7Drd1do#scrollTo=MSxC8SClqcVL&printMode=true 9/9