Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
8 views

Python for Machine Learning Visualization 1735231185

Uploaded by

karlTronxo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Python for Machine Learning Visualization 1735231185

Uploaded by

karlTronxo
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 69

Machine Learning Data Visualization:

Prepared By: Syed Afroz Ali (Kaggle Grand Master)

From Basics to Advanced


All Types of Data Visualization Kaggle Note Book
Python for Machine Learning Visualization Part 01:
Machine learning data visualization uses visual techniques like charts and graphs to explore,
understand, and communicate patterns, trends, and insights within the data used to train and
https://www.kaggle.com/code/pythonafroz/python-for-machine-learning-visualization-part-01
evaluate machine learning models.
Python for Machine Learning Visualization Part 02:

Kaggle Code: https://www.kaggle.com/code/pythonafroz/python-for-machine-learning-visualization-part-01


https://www.kaggle.com/code/pythonafroz/python-for-machine-learning-visualization-part-03
import pandas as pd
import numpy as np
import seaborn as sns

import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots

pd.set_option('display.precision', 2)

# Load the dataset


df = pd.read_csv("heart_disease_uci.csv")
df = df.dropna()
df.head(2)

id age sex dataset cp trestbps chol fbs restecg thalch exang oldpeak slope ca thal num

lv fixed
0 1 63 Male Cleveland typical angina 145.0 233.0 True 150.0 False 2.3 downsloping 0.0 0
hypertrophy defect

lv
1 2 67 Male Cleveland asymptomatic 160.0 286.0 False 108.0 True 1.5 flat 3.0 normal 2
hypertrophy

print(f"Records: {df.shape[0]}")
print(f"Columns: {df.shape[1]}")

Records: 299
Columns: 16

top_leagues = df['cp'].value_counts().nlargest(4).index
display(top_leagues)

plt.figure(figsize=(15, 6))
sns.scatterplot(x='age', y='chol', data=df[df['cp'].isin(top_leagues)], hue='cp')
plt.title('Age vs. Cholesterol for Top 4 Chest Pain')
plt.xlabel('Age')
plt.ylabel('Cholesterol')
plt.legend(title='Chest Pain Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

Index(['asymptomatic', 'non-anginal', 'atypical angina', 'typical angina'], dtype='object', name='cp')

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='sex')

fig.update_layout(width=1000, height=500)
fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by Sex)')

fig.show()

Scatter Plot of Cholesterol vs. Age (colored by Sex)

80

70

60
age

50

40

30

100 200 300 400 500

chol

from plotly.offline import iplot

fig = px.box(x = df["age"],


labels={"x":"Age"},
title="5-Number-Summary(Box Plot) of Age")
iplot(fig)

5-Number-Summary(Box Plot) of Age

30 40 50 60 70

Age

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='cp', size = 'oldpeak', size_max = 30, hover_name = 'exang')

fig.update_layout(width=1000, height=500)

fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by cp)')


fig.show()

Scatter Plot of Cholesterol vs. Age (colored by cp)

80
cp

70

60
age

50

40

30

100 200 300 400 500

chol
Python Code Link: https://t.me/AIMLDeepThaught/573

fig = px.scatter(data_frame = df,


x="age",
y="chol",
color="cp",
size='ca',
hover_data=['oldpeak'])

fig.update_layout(title_text="<b> Cholesterol Vs Age <b>",


titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=500,
)

fig.show()

Cholesterol Vs Age

cp

500

400
chol

300

200

100

30 40 50 60 70 80

age

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='cp', size = 'oldpeak', size_max = 30, hover_name = 'exang',facet_col
fig.update_layout(width=1000, height=500)

fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by cp)')

fig.show()

Scatter Plot of Cholesterol vs. Age (colored by cp)


cp=typical angina cp=asymptomatic cp=non-anginal cp=atypical angina
80
cp

70

60
age

50

40

30
100

200

300
400
500
600
700

100

200

300
400
500
600
700

100

200

300
400
500
600
700

100

200

300
400
500
600
700
chol chol chol chol

hover_name='exang' means that the values in the 'exang' column will be shown as tooltips when you hover over the data points
on the scatter plot. This is useful for providing additional information about each data point without cluttering the plot with
labels.

fig=px.bar(df,x='age',y='chol',hover_data=['oldpeak'],color='sex',height=400)
fig.show()

sex
Male
4000
Female

3000
chol

2000

1000

0
30 40 50 60 70

age

def generate_rating_df(df):
rating_df = df.groupby(['cp', 'slope']).agg({'id': 'count'}).reset_index()
rating_df = rating_df[rating_df['id'] != 0]
rating_df.columns = ['cp', 'slope', 'counts']
rating_df = rating_df.sort_values('slope')
return rating_df

rating_df = generate_rating_df(df)
fig = px.bar(rating_df, x='cp', y='counts', color='slope')

fig.update_traces(textposition='auto',
textfont_size=20)

fig.update_layout(barmode='stack')
fig.show()

slope
140 downsloping
flat
upsloping
120

100
counts

80

60

40

20

0
asymptomatic atypical angina non-anginal typical angina

cp

def generate_rating_df(df):
rating_df = df.groupby(['cp', 'slope']).agg({'id': 'count'}).reset_index()
rating_df = rating_df[rating_df['id'] != 0]
rating_df.columns = ['cp', 'slope', 'counts']
rating_df = rating_df.sort_values('slope')
return rating_df

rating_df = generate_rating_df(df)
fig = px.bar(rating_df, x='cp', y='counts', color='slope')

fig.update_traces(textposition='auto',
textfont_size=20)

fig.update_layout(barmode='group')

fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573

slope
80 downsloping
flat
upsloping
70

60

50
counts

40

30

20

10

0
asymptomatic atypical angina non-anginal typical angina

cp

import plotly.express as px
def generate_rating_df(df):
rating_df = df.groupby(['cp', 'slope']).agg({'id': 'count'}).reset_index()
rating_df = rating_df[rating_df['id'] != 0]
rating_df.columns = ['cp', 'slope', 'counts']
rating_df = rating_df.sort_values('slope')
return rating_df

rating_df = generate_rating_df(df)

fig = px.bar(rating_df, x='cp', y='counts', color='slope', barmode='group',


text='counts',
)

fig.update_traces(textposition='auto',
textfont_size=20)

fig.show()

slope
80 84 downsloping
flat
upsloping
70

60

50
counts

49
40 45
36
30
33
20

10
11 5
2 11 3 11 9
0
asymptomatic atypical angina non-anginal typical angina

cp

def generate_rating_df(df):
rating_df = df.groupby(['cp', 'slope']).agg({'id': 'count'}).reset_index()
rating_df = rating_df[rating_df['id'] != 0]
rating_df.columns = ['cp', 'slope', 'counts']
rating_df = rating_df.sort_values('slope')

# Calculate percentages
total_counts = rating_df['counts'].sum()
rating_df['percentage'] = rating_df['counts'] / total_counts * 100

return rating_df

rating_df = generate_rating_df(df)

fig = px.bar(rating_df, x='cp', y='counts', color='slope', text='percentage')

fig.update_traces(
texttemplate='%{text:.1f}%',
textposition='outside',
textfont_size=16
)

fig.update_layout(
barmode='group',
yaxis_title='Count',
xaxis_title='CP',
legend_title='Slope'
)

fig.update_layout(
height=550,
width=1000,
title_text="Distribution of Chest Pain Type by Percentage",
title_font_size=24
)
fig.show()

Distribution of Chest Pain Type by Percentage


28.1% Slope
80

70

60

50
16.4%
15.1%
Count

40
12.0%
11.0%
30

20

3.7% 3.7% 3.7%


10
3.0%
1.7%
0.7% 1.0%
0
asymptomatic atypical angina non-anginal typical angina

CP

fig = px.scatter(data_frame = df,


x="age",
y="chol",
color="cp",
size='ca',
hover_data=['oldpeak'],
marginal_x="histogram",
marginal_y="box",)

fig.update_layout(title_text="<b> Age vs Cholesterol <b>",


titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=550,
)

fig.show()

Python Code Link


Age vs Cholesterol

cp

500

400
chol

300

200

100

30 40 50 60 70 80

age

fig = px.scatter(data_frame = df,


x="age",
y="chol",
color="thalch",
size='ca',
hover_data=['oldpeak'],
marginal_x="histogram",
marginal_y="box")

fig.update_layout(title_text="<b> Age vs Cholesterol <b>",


titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=500,
)

fig.show()

Age vs Cholesterol

500

400
chol

300

200

100

30 40 50 60 70 80

age

fig = px.scatter(data_frame = df,


x="age",
y="chol",
size ="ca",
size_max=30,
color= "sex",
trendline="ols")
fig.update_layout(title_text="<b> Age vs Cholesterol <b>",
titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=500,
)

fig.show()

Age vs Cholesterol

500

400
chol

300

200

100

30 40 50 60 70 80

age

fig = px.scatter(data_frame = df,


x="age",
y="chol",
size ="ca",
size_max=30,
color= "sex",
trendline="ols",
trendline_scope="overall",
trendline_color_override="black")

fig.update_layout(title_text="<b>Chest Pain vs Gender<b>",


titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=550,
)
fig.show()
Chest Pain vs Gender

sex

500

400
chol

300

200

100

30 40 50 60 70 80

age

fig= px.histogram(df, x='age',height=500,width=900,template='simple_white',


color='sex', # adding categorical column
color_discrete_sequence=['purple','pink'])

fig.update_layout(title={'text':'Histogram of Persons by Age','font':{'size':25}}


,title_font_family="Times New Roman",
title_font_color="darkgrey",

title_x=0.2)

fig.update_layout(
font_family='classic-roman',
font_color= 'grey',
yaxis_title={'text': " count", 'font': {'size':18}},
xaxis_title={'text': " Age", 'font': {'size':18}}
)
fig.show()
Histogram of Persons by Age
sex
Male
30
Female

25

20
count

15

10

0
30 40 50 60 70

Age

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Assuming df is your DataFrame


asymptomatic = df[df['cp'] == 'asymptomatic']
non_anginal = df[df['cp'] == 'non-anginal']
atypical_angina = df[df['cp'] == 'atypical angina']
typical_angina = df[df['cp'] == 'typical angina']

fig = make_subplots(rows=2,
cols=2,
specs=[[{'type':'domain'}, {'type':'domain'}],
[{'type':'domain'}, {'type':'domain'}]],
subplot_titles=("Asymptomatic", "Non-Anginal",
"Atypical Angina", "Typical Angina"))

fig.add_trace(go.Pie(labels=asymptomatic["thal"], values=asymptomatic["chol"], name="asymptomatic"), 1, 1)


fig.add_trace(go.Pie(labels=non_anginal["thal"], values=non_anginal["chol"], name="non_anginal"), 1, 2)
fig.add_trace(go.Pie(labels=atypical_angina["thal"], values=atypical_angina["chol"], name="atypical_angina"), 2,
fig.add_trace(go.Pie(labels=typical_angina["thal"], values=typical_angina["chol"], name="typical_angina"), 2, 2)

# Update layout to increase the size of the plot and add main title
fig.update_layout(
height=800,
width=1000,
title_text="Distribution of Cholesterol Levels by Chest Pain Type",
title_font_size=24
)

# Update traces
fig.update_traces(textposition='inside', textfont_size=16)
fig.update_annotations(font_size=20)
fig.show()
Python Code Link: https://t.me/AIMLDeepThaught/573
Distribution of Cholesterol Levels by Chest Pain Type
Asymptomatic Non-Anginal

27.5%
38.1%
54.3%
%
2.33
70.2%

3%
7.6

Atypical Angina Typical Angina

17%
3.7
4%
36.1%

56.2%

79.3%

%
68
7.

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='cp', size = 'oldpeak', size_max = 30, hover_name = 'exang', range_x
labels = dict(oldpeak = 'oldpeak', chol = 'Cholestrol', age = "Age" ), animation_frame = "chol",

fig.update_layout(width=1000, height=600)

fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by cp) with Animation')

fig.show()
Scatter Plot of Cholesterol vs. Age (colored by cp) with Animation

100
cp

80

60
Age

40

20

0
100 200 300 400 500 600 700 800

Cholestrol

Cholestrol=233.0
▶ ◼

233.0 192.0 283.0 335.0 175.0 216.0 248.0 325.0 182.0 217.0 240.0 277.0 196.0 210.0 319.0 241.0

from plotly.offline import iplot

gender = df["sex"].value_counts()
display(gender.head().to_frame())

fig = px.bar(data_frame=gender,
x = gender.index,
y = gender,
color=gender.index,
text_auto="0.3s",
labels={"y": "Frequency", "index": "Gender"}

)
fig.update_traces(textfont_size=24)

iplot(fig) Python Code Link: https://t.me/AIMLDeepThaught/573


count

sex

Male 203

Female 96
sex
200
203 Male
Female

150
Frequency

100

96.0

50

0
Male Female

sex

from plotly.offline import iplot

category = df["cp"].value_counts()

fig = px.bar(category,
x = category.index,
y = (category / sum(category)) * 100,
color=category.index,
labels={"y" : "Frequency in (Percentage%)", "category":"Category"},
title="Frequency of Chest Pain Category in Percentage",
text = category.apply(lambda x: f'{(x / sum(category)) * 100:.1f}%'),
template="plotly_dark"
)

fig.update_layout(showlegend=False)
fig.update_traces(
textfont= {
"family": "consolas",
"size": 20,
}
)

iplot(fig)
Frequency of Chest Pain Category in Percentage

50

48.2%
40
Frequency in (Percentage%)

30

27.8%
20

16.4%
10

7.7%
0
asymptomatic non-anginal atypical angina typical angina

cp

from plotly.offline import iplot

ChestPain = df["cp"].value_counts()

fig = px.pie(values=ChestPain, names = ChestPain.index,


color_discrete_sequence= ["#98EECC", "#FFB6D9", "#99DBF5"],
template="plotly_dark"
)

fig.update_traces(textposition='inside', textfont_size= 20, textinfo='percent+label')


fig.update_layout(showlegend=True,width=1000, height=600)

iplot(fig)

non-anginal
27.8%

asymptomatic
48.2%
a
gin
n
ala
pic .4%
y 16
ngina

at
7.69%
typical a

cp = df["cp"].value_counts()
fig = px.bar(cp,
y = cp.index,
x = (cp / sum(cp)) * 100,
color=cp.index,
labels={"x" : "Frequency in Percentage(%)", "cp":"Chest Pain"},
orientation="h",
title="Frequency of Chest Pain",
text = cp.apply(lambda x: f'{(x / sum(cp)) * 100:.1f}%'),
)

fig.update_layout(showlegend=True,width=1000, height=600)

fig.update_traces(
textfont= {
"family": "consolas",
"size": 20
}
)

iplot(fig) Python Code Link: https://t.me/AIMLDeepThaught/573

Frequency of Chest Pain

Chest Pai

asymptomatic 48.2%

non-anginal 27.8%
Chest Pain

atypical angina 16.4%

typical angina 7.7%

0 10 20 30 40 50

Frequency in Percentage(%)

fig=px.pie(df.groupby('cp',as_index=False)['sex'].count().sort_values(by='sex',ascending=False).reset_index(drop
names='cp',values='sex',color='sex',color_discrete_sequence=px.colors.sequential.Plasma_r,
labels={'cp':'Chest Pain','Sex':'Count'}, template='seaborn',hole=0.4)

fig.update_layout(autosize=False, width=1200, height=700,legend=dict(orientation='v', yanchor='bottom',y=0.40,xanchor


title_x=0.5, showlegend=True)

fig.update_traces(
textfont= {
"family": "consolas",
"size": 20
}
)

fig.show()
Chest Pain

27.8%

48.2%

16.4%

7.69%

import plotly.express as px
from plotly.offline import iplot
import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(1,2,subplot_titles=('Age Distribution','Log Age Distribution'))

fig.append_trace(go.Histogram(x=df['age'],
name='Age Distribution') ,1,1)

fig.append_trace(go.Histogram(x=np.log10(df['age']),
name='Log Age Distribution') ,1,2)

iplot(dict(data=fig))
Age Distribution Log Age Distribution
45 Age Distribution
30 Log Age Distribution
40

25 35

30
20
25

15 20

15
10

10
5
5

0 0
40 60 1.5 1.6 1.7 1.8 1.9

import numpy as np
import plotly.graph_objs as go
from plotly.offline import iplot

# Calculate quartiles and IQR


Q25 = np.quantile(df['chol'], q=0.25)
Q75 = np.quantile(df['chol'], q=0.75)
IQR = Q75 - Q25
cut_off = IQR * 1.5

# Print number of outliers


print('Number of Cholesterol Lower Outliers:', df[df['chol'] <= (Q25 - cut_off)]['chol'].count())
print('Number of Cholesterol Upper Outliers:', df[df['chol'] >= (Q75 + cut_off)]['chol'].count())

# Group by 'cp' and sort by 'age'


temp = df.groupby('cp').sum().sort_values('age', ascending=False)

# Create bar data


data = [
go.Bar(x=temp.index, y=temp['age'], name='Age', text=temp['age'], textposition='auto'),
go.Bar(x=temp.index, y=temp['chol'], name='Cholesterol', text=temp['chol'], textposition='auto')
]

# Define layout
layout = go.Layout(
xaxis=dict(title='Chest Pain', titlefont=dict(size=25)),
yaxis=dict(title='Values', titlefont=dict(size=25)),
showlegend=True,
width=1300,
height=600
)

# Create figure and plot


fig = go.Figure(data=data, layout=layout)
iplot(fig)

Number of Cholesterol Lower Outliers: 1


Number of Cholesterol Upper Outliers: 5
35k 35949

30k

25k
Values

20k
20367

15k

12019
10k

8032
5k
4475
2510
0
asymptomatic non-anginal atypical angina

Chest Pain

import plotly.graph_objs as go
from plotly.offline import iplot

# Assuming df is your DataFrame


top_03_cp = df.groupby('cp').sum()['age'].sort_values(ascending=False)[0:3]
top_03_AGE = df.groupby(by='cp').sum().sort_values(by='age', ascending=False)[0:3]['chol']

data = [
go.Bar(
x=top_03_cp.index,
y=top_03_cp,
name='Top 3 age',
text=top_03_cp,
textposition='auto'
),
go.Bar(
x=top_03_AGE.index,
y=top_03_AGE,
name='Top 3 cholesterol',
text=top_03_AGE,
textposition='auto'
)
]

layout = go.Layout(
title="Grouped Bar Plot For Age and Cholesterol<br>(For The Top Three types of Chest Pain)",
barmode='group'
)

iplot(dict(data=data, layout=layout))
Grouped Bar Plot For Age and Cholesterol
(For The Top Three types of Chest Pain)

Top 3 age
35k 35949 Top 3 cholesterol

30k

25k

20k
20367

15k

12019
10k

8032
5k
4475
2510
0
asymptomatic non-anginal atypical angina

gap_df = pd.read_csv("gapminder_full.csv")

display(gap_df.head(2))

fig = px.bar(data_frame=gap_df,
x="continent",
y="population",
color="continent",
animation_frame="year",
animation_group="country",
range_y=[0,4000000000])
fig.show()
Python Code Link: https://t.me/AIMLDeepThaught/573
country year population continent life_exp gdp_cap

0 Afghanistan 1952 8425333 Asia 28.80 779.45

1 Afghanistan 1957 9240934 Asia 30.33 820.85

4B
continent
3.5B Asia
Europe
Africa
3B
Americas
Oceania
2.5B
population

2B

1.5B

1B

0.5B

0
Asia Europe Africa Americas Oceania

continent

year=1952
▶ ◼

1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

fig = px.scatter(gap_df,x='gdp_cap',y='life_exp',color='continent',size='population',size_max=60,hover_name="country"
animation_frame="year",animation_group='country',log_x=True,range_x=[100,100000],range_y=[25,90],
labels=dict(Population ="Populations",gdp_cap="Gdp Per Capital",life_exp="Life Expentacy"))

fig.update_layout(
height=550,
width=1500,
title_text="Distribution of GDP Cap Vs Life Expentacy",
title_font_size=24
)

fig.show()

Distribution of GDP Cap Vs Life Expentacy


90

80

70
Life Expentacy

60

50

40

30

2 3 4 5 6 7 8 9 2 3 4 5 6 7
100 1000

Gdp Per Capital

year=1952
▶ ◼

1952 1957 1962 1967 1972 1977 1982

#Grouping the data by state


df1 = df[['cp','age','chol','num']]
df1.groupby('cp').sum().head(10).style.background_gradient(cmap='Blues')

age chol num

cp

asymptomatic 8032 35949.000000 225

atypical angina 2510 12019.000000 14

non-anginal 4475 20367.000000 33

typical angina 1285 5454.000000 11

import pandas as pd
import plotly.express as px

grouped_df = df.groupby(['cp', 'thal']).size().reset_index(name='count')

fig = px.bar(grouped_df,
y="cp",
x='count',
color='thal',
title='Count of Passengers by cp and thal',
labels={'count': 'Number of Patients'},
text_auto=True)
fig.show()
Count of Passengers by cp and thal

thal
fixed defect
typical angina 2 13 8
normal
reversable defect

non-anginal 2 59 22
cp

atypical angina 2 39 8

asymptomatic 12 53 79

0 50 100 150

Number of Patients

# color palette for visualizations


import matplotlib.pyplot as plt

colors = ['#2B2E4A', '#E84545', '#903749', '#53354A',]


palette = sns.color_palette( palette = colors)

sns.palplot(palette, size = 2.5)

plt.text(-0.5,
-0.7,
'Color Palette',
{'font':'monospace',
'size': 24,
'weight':'normal'}
)

plt.show()

def format_title(title, subtitle=None, subtitle_font=None, subtitle_font_size=None):


title = f'<b>{title}</b>'
if not subtitle:
return title
subtitle = f'<span style="font-family: {subtitle_font}; font-size: {subtitle_font_size}px;">{subtitle}</span>'
return f'{title}<br>{subtitle}'

import plotly.figure_factory as ff

_ = df.groupby(['cp', 'thal']).chol.size().unstack()
z = _.values.tolist()
x = _.columns.tolist()
y = _.index.tolist()

fig = ff.create_annotated_heatmap(z = z,
x = x,
y = y,
xgap = 3,
ygap = 3,
colorscale = ['#53354A', '#E84545']
)

title = format_title('cp',
'thal.',
'Chol',
12
)

fig.update_layout(title_text = title,
title_x = 0.5,
titlefont={'size': 24,
'family': 'Proxima Nova',
},
template='plotly_dark',
paper_bgcolor='#2B2E4A',
plot_bgcolor='#2B2E4A',

xaxis = {'side': 'bottom'},


xaxis_showgrid = False,
yaxis_showgrid = False,
yaxis_autorange = 'reversed',
)

fig.show()

cp
thal.

asymptomatic 12 53 79

atypical angina 2 39 8

non-anginal 2 59 22

typical angina 2 13 8

fixed defect normal reversable defect

# available templates
template = ['ggplot2','plotly_dark', 'seaborn', 'simple_white', 'plotly']

fig = px.histogram(df,
x="cp",
y=None,
color="sex",
width=1200,
height=450,
histnorm='percent',
color_discrete_map={
"male": "RebeccaPurple", "female": "lightsalmon"
},
template="plotly_dark"
)

fig.update_layout(title="Gender Chest Pain",


font_family="San Serif",
bargap=0.2,
barmode='group',
titlefont={'size': 24},
legend=dict(
orientation="v", y=1, yanchor="top", x=1.25, xanchor="right")
)
fig.show()
Gender Chest Pain

50

percent 40

30

20

10

0
typical angina asymptomatic non-anginal atypical angina

cp

from plotly.subplots import make_subplots

# data students performance


fig = make_subplots(rows=1, cols=2,
specs=[[{'type':'domain'}, {'type':'domain'}],
])
fig.add_trace(
go.Pie(
labels=df['cp'],
title="Chest Pain",
titlefont={'size':20, 'family': 'Serif',},
values=None,
hole=0.85,
), col=1, row=1,
)
fig.update_traces(
hoverinfo='label+value',
textinfo='label+percent',
textfont_size=12,
)

fig.add_trace(
go.Pie(
labels=df['cp'],
title="Chest Pain",
titlefont={'size':20, 'family': 'Serif',},
values=None,
hole=0.5,
), col=2, row=1,
)
fig.update_traces(
hoverinfo='label+value',
textinfo='label+percent',
textfont_size=12,
)
fig.layout.update(title="<b> Heart Disesse <b>",
titlefont={'size':20, 'family': 'Serif',},
showlegend=False,
height=600,
width=1000,
template=None,
)

fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573


Heart Disesse

non-anginal
27.8%
non-anginal
27.8%

asymptomatic asymptomatic
Chest Pain 48.2%
Chest Pain 48.2%

at
yp 16
ic .4%
al
an

ngina
gi
atypical angina

na

7.69%
16.4%

typical a
typical angina
7.69%

from plotly.subplots import make_subplots

# data titanic
fig = make_subplots(rows=1, cols=2,
specs=[[{'type':'domain'}, {'type':'domain'}],
])
fig.add_trace(
go.Pie(
labels=df['cp'],
values=None,
hole=.4,
title='Chest Pain',
titlefont={'color':None, 'size': 24},

),
row=1,col=1
)
fig.update_traces(
hoverinfo='label+value',
textinfo='label+percent',
textfont_size=12,
marker=dict(
colors=['lightgray', 'lightseagreen'],
line=dict(color='#000000',
width=2)
)
)

fig.add_trace(
go.Pie(
labels=df['sex'],
values=None,
hole=.4,
title='Sex',
titlefont={'color':None, 'size': 24},
),
row=1,col=2
)
fig.update_traces(
hoverinfo='label+value',
textinfo='label+percent',
textfont_size=16,
marker=dict(
colors=['lightgray', 'lightseagreen'],
line=dict(color='#000000',
width=2)
)
)
fig.layout.update(title="<b> Heart Desies <b>",
titlefont={'color':None, 'size': 24, 'family': 'San-Serif'},
showlegend=False,
height=600,
width=950,
)
fig.show()

Heart Desies

non-anginal
27.8% Female
32.1%

asymptomatic
48.2%
Chest Pain Sex
at

Male
yp 16

67.9%
ic .4%
al
an
gi
na

typical angina
7.69%

# data students performance


fig = px.sunburst(df,
path=['cp', 'sex'])
fig.update_layout(title_text="<b>Chest Pain vs Gender<b>",
titlefont={'size': 24, 'family':'Serif'},
width=750,
height=750,
)
fig.show()
Chest Pain vs Gender

Male

Female

asymptomatic

typic Female
al an
gina

non-anginal
Male
Male atypical angina

Female

Female Male

fig = px.histogram(df, x="cp",


width=600,
height=400,
histnorm='percent',
category_orders={
"cp": ["asymptomatic", "non-anginal", "atypical angina", "typical angina"],
"sex": ["Male", "Female"]
},
color_discrete_map={
"Male": "RebeccaPurple", "Female": "lightsalmon",
},
template="simple_white"
)

fig.update_layout(title="Chest Pain Type",


font_family="San Serif",
titlefont={'size': 20},
legend=dict(
orientation="v", y=1, yanchor="top", x=1.0, xanchor="right" )
).update_xaxes(categoryorder='total descending')
# custom color
colors = ['gray',] * 4
colors[3] = 'crimson'
colors[0] = 'lightseagreen'

fig.update_traces(marker_color=colors, marker_line_color=None,
marker_line_width=2.5, opacity=None)
fig.show()
Chest Pain Type
50

40

30
percent

20

10

0
asymptomatic non-anginal atypical angina typical angina

cp

fig = px.histogram(df, x="cp",


width=600,
height=500,
histnorm='percent',
template="simple_white",
)
fig.update_layout(title="Types of Chest Pain",
font_family="San Serif",
titlefont={'size': 20},
showlegend=True,
legend=dict(
orientation="v",
y=1.0,
yanchor="top",
x=1.0,
xanchor="right"
)
)
fig.update_traces(marker_color=None, marker_line_color='white',
marker_line_width=1.5, opacity=0.99)
fig.show()

Types of Chest Pain


50

40

30
percent

20

10

0
typical angina asymptomatic non-anginal atypical angina

cp

colors = ['rgba(38, 24, 74, 0.8)', 'rgba(71, 58, 131, 0.8)',


'rgba(122, 120, 168, 0.8)', 'rgba(164, 163, 204, 0.85)',
'rgba(190, 192, 213, 1)']

data = df[['sex']]

fig = px.histogram(df,
y="sex",
orientation='h',
width=800,
height=350,
histnorm='percent',
template="plotly_dark"
)
fig.update_layout(title="<b>Heart Disease<b>",
font_family="San Serif",
bargap=0.2,
barmode='group',
titlefont={'size': 28},
paper_bgcolor='lightgray',
plot_bgcolor='lightgray',
legend=dict(
orientation="v",
y=1,
yanchor="top",
x=1.250,
xanchor="right",)
)
annotations = []
annotations.append(dict(xref='paper', yref='paper',
x=0.0, y=1.2,
text='Heart Disease',
font=dict(family='Arial', size=16, color=colors[2]),
showarrow=False))
annotations.append(dict(xref='paper', yref='paper',
x=0.50, y=0.85,
text='30.4%',
font=dict(family='Arial', size=20, color=colors[2]),
showarrow=False))
annotations.append(dict(xref='paper', yref='paper',
x=1.08, y=0.19,
text='69.6%',
font=dict(family='Arial', size=20, color=colors[2]),
showarrow=False))

fig.update_layout(
autosize=False,
width=800,
height=350,
margin=dict(
l=50,
r=50,
b=50,
t=120,
),
)

fig.update_layout(annotations=annotations)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573

Heart Disease
Heart Disease

Female 30.4%
sex

Male 69.6%

0 10 20 30 40 50 60 70

percent

# Plotting the pie chart


plt.figure(figsize=(20, 5))

# Pie chart
plt.subplot(1, 2, 1)
quality_counts = df['cp'].value_counts()
plt.pie(quality_counts, labels=quality_counts.index, colors=sns.color_palette('PuBuGn', len(quality_counts)), autopct
plt.title('Chest Pain Distribution')

# Count plot
plt.subplot(1, 2, 2)
ax = sns.countplot(data=df, x='cp',palette='PuBuGn')
# Add count values above each bar
for i in range(len(ax.containers)):
ax.bar_label(ax.containers[i], label_type='edge')

plt.title('Chest Pain Distribution')


plt.xlabel('Chest Pain')
plt.ylabel('Count')
plt.tight_layout()
plt.show()

plt.figure(figsize=(20, 5))

for i, col in enumerate(['age', 'chol', 'oldpeak'], 1):


plt.subplot(1, 3, i)
ax = sns.barplot(x='sex', y=col, data=df)
plt.title(f'{col} Comparison')
plt.ylabel(col if i == 1 else '')

# Add count values above each bar


for i in range(len(ax.containers)):
ax.bar_label(ax.containers[i], label_type='edge')

plt.show()

sns.pairplot(df[['cp','age','chol','thalch']], hue='cp', aspect=1.5,dropna=True,palette='bright')


plt.show()
# Group by quality and calculate the mean for each quality
grouped_mean = df[['cp','age','trestbps','chol','thalch']].groupby('cp').mean().round(2)

plt.figure(figsize=(20, 6))

# Plot the grouped bars using Seaborn's barplot


ax = sns.barplot(data=grouped_mean.reset_index().melt(id_vars='cp'),
x='variable', y='value', hue='cp', palette='CMRmap', alpha=0.8)

# Add count values above each bar


for i in range(len(ax.containers)):
ax.bar_label(ax.containers[i], label_type='edge')

plt.xlabel('Features')
plt.ylabel('Mean Value')
plt.title('Grouped Barplot by Chest Pain type')

# Rotate x-axis labels


plt.xticks(rotation=45, ha='right')

plt.legend(title='Chest Pain')
plt.show()

# Visualization 8: Violin Plot - Skill Moves Distribution


plt.figure(figsize=(12, 6))
sns.violinplot(x='cp', y='chol', data=df)
plt.title('Distribution of Chest Pain with Cholesterol ')
plt.xlabel('Chest Pain Type')
plt.ylabel('Cholesterol ')
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Assuming 'df' is your DataFrame


cp_attributes_comparison = df.loc[df['cp'].isin(['asymptomatic', 'non-anginal', 'atypical angina','typical angina'
attributes_to_compare = ['age', 'trestbps', 'chol', 'thalch', 'oldpeak', 'ca']

fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))

for cp in cp_attributes_comparison['cp'].unique():
cp_data = cp_attributes_comparison.loc[cp_attributes_comparison['cp'] == cp]

# Calculate mean values for each attribute


values = cp_data[attributes_to_compare].mean().values.flatten().tolist()
values += values[:1] # Close the circle for radar plot

angles = [n / float(len(attributes_to_compare)) * 2 * np.pi for n in range(len(attributes_to_compare))]


angles += angles[:1]

ax.plot(angles, values, linewidth=2, linestyle='solid', label=cp)


ax.fill(angles, values, alpha=0.25)

# Set the labels


ax.set_xticks(angles[:-1])
ax.set_xticklabels(attributes_to_compare)

# Add legend
plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1))

# Add title
plt.title('Chest Pain Attributes Comparison')

# Show the plot


plt.tight_layout()
plt.show()
jobs = pd.read_csv("jobstreet_all_job_dataset.csv")
jobs = jobs.sample(5000)
jobs = jobs.drop(columns=['job_id'], axis=1)
jobs = jobs.reset_index()
jobs = jobs.drop(columns=['index'], axis=1)
display(jobs.shape)
jobs.head(2)

(5000, 10)
job_title company descriptions location category subcategory role type salary listingDate

RM 3,000
AESD JOB –
MARKETING Marketing & Marketing marketing- Full 2024-03-
0 INTERNATIONAL DESCRIPTIONS\nWork Petaling RM 4,000
EXECUTIVE Communications Assistants/Coordinators executive time 21T08:08:18Z
(M) SDN. BHD. closely with the sales ... per
month

RM 2,500
Job –
E-Commerce JOBSGURU Administration & Client & Sales sales- Full 2024-05-
1 Description\nPerform Petaling RM 3,500
Sales Admin SDN. BHD. Office Support Administration administration time 24T12:59:40Z
CS activities by repl... per
month

import missingno as msno

# Create a figure with two subplots arranged in a 1x2 grid


fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20, 6))

# Plot the original DataFrame with missing values


msno.matrix(jobs, ax=axes[0])
axes[0].set_title("Original DataFrame with Missing Values",fontsize=24,color='Red')

# Drop rows with missing values and plot the resulting DataFrame
job = jobs.dropna()

msno.matrix(job, ax=axes[1])
axes[1].set_title("DataFrame after Dropping Missing Values",fontsize=24,color='Green')

plt.tight_layout()
plt.show()
import re

def clean_and_calculate_mean(salary):
try:
# Remove currency symbols, words, and extra characters
salary = salary.replace('RM', '').replace('MYR', '').replace('$', '').replace('per month', '').replace('p.m.'

# Handle ranges with different separators


if '–' in salary:
salary_range = salary.split('–')
elif '-' in salary:
salary_range = salary.split('-')
elif '—' in salary:
salary_range = salary.split('—')
else:
salary_range = [salary]

# Convert values to integers, handling potential errors


salary_values = []
for value in salary_range:
try:
value = int(float(value.replace(',', '').strip()))
salary_values.append(value)
except ValueError:
pass # Ignore non-numeric values

# Calculate mean if at least two valid values are found


if len(salary_values) >= 2:
salary_mean = sum(salary_values) / len(salary_values)
return salary_mean
else:
return None

except Exception as e:
print(f"Error processing salary '{salary}': {e}")
return None

# Apply the function to the salary column


job['Salary'] = job['salary'].apply(clean_and_calculate_mean)
job = job.drop('salary',axis=1)
job.head(2)

job_title company descriptions location category subcategory role type listingDate Salary

AESD JOB
MARKETING Marketing & Marketing marketing- Full 2024-03-
0 INTERNATIONAL DESCRIPTIONS\nWork Petaling 3500.0
EXECUTIVE Communications Assistants/Coordinators executive time 21T08:08:18Z
(M) SDN. BHD. closely with the sales ...

Job
E-Commerce JOBSGURU Administration & Client & Sales sales- Full 2024-05-
1 Description\nPerform Petaling 3000.0
Sales Admin SDN. BHD. Office Support Administration administration time 24T12:59:40Z
CS activities by repl...

import plotly.express as px
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)

jobType = job['type'].value_counts()

fig = px.pie(values=jobType.values, names=jobType.index.tolist(), color=jobType.index.tolist(),color_discrete_sequenc

fig.update_layout(width=1000, height=800)

fig.update_traces(textfont_size=20)

fig.update_traces(pull=[0.1, 0.1, 0.2], textposition='outside')

# Set layout properties


fig.update_layout(margin = dict(t=50, l=10, r=10, b=25),
title='Employment Types in the Job Market of Malaysia',
title_x=0.5,
title_y=0.98)

fig.show()

Employment Types in the Job Market of Malaysia

4.63%
0.408%

0.0454%

94.9%

top_n = 50

filtered_data = job['job_title'].value_counts().head(top_n).reset_index()
filtered_data.columns = ['job_title', 'count']

fig = px.treemap(filtered_data, path=[px.Constant('all'), 'job_title'], values='count')


fig.update_traces(root_color='lightgrey')

fig.update_traces(textfont_size=16)

fig.update_layout(width=1000, height=600)
fig.update_layout(margin=dict(t=50, l=25, r=25, b=25),
title='Top Job Openings: Job Roles in Malaysia',
title_x=0.5,
title_y=0.98)

fig.show()
Python Code Link: https://t.me/AIMLDeepThaught/573
Top Job Openings: Job Roles in Malaysia

all
Business Development Executive Senior Account Executive Account Assistant
Marketing Executive
Human Resource Executive

Account Executive Purchasing Executive

Senior Marketing Executive Administrative Assistant Customer Service Executive E-Commerce Executive
HR Executive
Finance Executive Project Engineer

Business Development Human Resources Executive


Personal Assistant HR cum Admin Executive
IT Executive
Accounts Assistant

Admin Assistant
Admin Executive ACCOUNT EXECUTIVE
Finance Manager Sales Admin Executive
Customer Service Mechanical Engineer SALES EXECUTIVE
Sales Admin

HR & Admin Executive PURCHASING ASSISTANT


Graphic Designer Sales Manager Software Engineer

Account Clerk

Sales Executive Accounts Executive Accountant

Project Manager HR Manager Production Engineer Senior Finance Executive

Audit Associate Business Development Manager

job.replace('Kuala Lumpur Sentral','Kuala Lumpur', inplace=True)


job.replace('Bangsar South', 'Bangsar', inplace=True)
job.replace('Klang District', 'Klang/Port Klang', inplace=True)
job.replace('Penang Island', 'Penang', inplace=True)
job['location'] = job['location'].str.replace(' District', '')

top_location = ['Kuala Lumpur', 'Petaling', 'Penang']


filtered_data = job[job['location'].isin(top_location)]

job_title_counts = filtered_data['job_title'].value_counts()

top_n = 10
top_job_titles = job_title_counts.head(top_n).index.tolist()

filtered_data = filtered_data[filtered_data['job_title'].isin(top_job_titles)]

fig = px.treemap(filtered_data, path=[px.Constant('all'), 'location', 'job_title'])


fig.update_traces(root_color='lightgrey')
fig.update_layout(width=1000, height=600)
fig.update_layout(margin=dict(t=50, l=25, r=25, b=25),
title='Top Job Opportunities in Kuala Lumpur, Petaling, and Penang',
title_x=0.5,
title_y=0.98)
fig.show()
Top Job Opportunities in Kuala Lumpur, Petaling, and Penang

all

Petaling Kuala Lumpur

Sales Executive Account Assistant Marketing Executive Account Executive Admin Assistant Finance Executive

HR Assistant Sales Executive

Customer Service Executive


Account Executive Customer Service Executive
Accounts Executive

Marketing Executive
Accounts Executive
Business Development Executive
Finance Executive Business Development Executiv

Admin Assistant
Penang
Business Development Executive
Account Assistant Admin Assistant
Account Executive Sales Executive
HR Assistant

# plot a sunburst chart


fig = px.sunburst(job, path=['location','category'])

# configurate the plot layout


fig.update_layout(
margin=dict(t=50, l=25, r=25, b=25),
width=900, # Set the width of the plot
height=800, # Set the height of the plot
title='Job Vacancies Available Across Malaysia by Category',
title_x=0.48,
title_y=0.98
)

fig.show()
Job Vacancies Available Across Malaysia by Category

Defencey
Governm & Technolog
Administr

perty

l
ent &
y & Tourism
g & Strategy

Lega
ent
Science

& Developm

n
ng
Trad Superann Media
& Pro
Hospitalit

uatio
Consultin

rtising n & Traini

es
Info

ity Services

ical
ervic

tion
Estate

s&
Commun

ts
Med
, Art
tio

c
Accounting

rma

En & Arch l Service s


Real

S
Educa

u
Sales

gin itect s
ruc

istic
rod
re &
es &

rin e
Adve
Ma

e&

ur
nst

Log
hca
tion

rP

g
ranc

Co
lt
ation & O

me
&
Insu
rke

Hea

ice
t
cia
& Co

por
En

Ba uring, onsu

ee
sig inan

rv
ing Trans
tin
Hu

gin

s
&C

Se
mm

&F

n
m

g&
M

tio
n
ee
an

ail

er
an

unic
Ca

Ret

t
ica
uf
Re

om

en
rin

e
ffice Supp
Re

t
nk
ll

fac
Co

D
ac
t

ation
ai

Ce

itm
C

un
so

nu
l&

t
Ed on

st
ur
nt

Ma
s C

ur
Ba uca tru

m
Cu

u
in
o

re
ct nsu

cr
mu
ti

ce
g
De nk

Tech
sig ing on

m
i

&

, T om
& on m

Re
Sales

&
n &
Ad Rea
er

Cu

Co
He vert Esta &
Ar ina Tra
F

ra
l

&

nic
a isin
Pr

tre
ch nc in

nolo

&
Ho lth te

st

ns r S
ca g &
re , Art Pro
od
sp

i
Le
ite ia
ita

ct l S ng
ga

Re
l lit
Sc y & s
M & pert

ort
ienc &

&

s
uc
To

po er
Con

ism ed M
Spo sult Tra
e ur
ica ed y ur e
rt ing des
& & &
Rec & Stra
Te

ce
en
Ser

e rvic

ati
ia
reat
tegyvice
ch
l
ts
ion s
no

cr ogi

gy
logy

rt vic

an ting
es

rt

ur
ui

M all C
&
po

on

so
tm tics
up

e
Re
s
Ac S

k
C
e
e

nt
co

ar
un fic
tin Of

um
Adm g n&
tio

H
inis
tra
tion i stra nolo
gy
Petaling n ech
Eng &O dmi icatio
nT
inee ffic A mm
un
ring e Sup &C
o
Man por tion
ufactu t Kuala Lumpur rma
Trans
ring, Info
Marketin port
g & Co
mmun & Log
Sales ications istics
Human Reso Johor Bahru Accounting
urces & Recr
uitment
Information & Comm
unication Techn
Real Estate & Property ology
Retail & Consumer Products
Construction
Call Centre & Customer Service Retail & Consumer Products
Banking & Financial Services
Construction
Administration & Office Support

Design & Architecture


Human Resources & Recruitment
Retail & Consumer Products
Healthcare & Medical
Manufacturing, Transport & Logistics

Education & Training


Administration & Office Support
Administration & Office Support
Human Resources & Recruitment
Administration & Office Support

Banking & Financial Services


Administration & Office
Information & Communication Support
Technology
Mining, Resources
Manufacturing, Transport & Energy
Healthcare & Medical Human Resources &
& Logistics
Accounting
Recruitment
Legal Wangsa Maju
Tangkak Accounting
Engineering
Taman Desa
Science & Technology Taman Connaught Accounting
& Media Engineering
Advertising, Arts Sri Petaling
CEO & General Management
& Conservation Sentul Accounting
Farming, Animals Sarawak Administration Construction

g
& Office Support
Hospitality & Tourism Muar

Engineerin
& Energy Engineering
Mining, Resources Maluri
Malaysia Engineering
Healthcare &

ng
Kudat Medical
Kota Bharu Administration Accounting

Penang
& Office Support
Jasin Retail & Consumer
Bandar Bentong Products
Sri Permaisur Marketing & Accounting
Taman Ampang i Communications
Tun Dr Information Accounting
Ismail & Communication
Technology

ba
Sandakan
Sales
Manufacturing, Accounting
Pontian Transport
Pahang Healthcare & Logistics
Mid Valley & Medical
Kampun City Accounting
g Malaysi Kedah

ting
Sales

Su
a Raya Human Resources Accounting
Sibu &
Division Education Recruitment
Sabak & Training

Accoun
Bernam Administratio Sales
Negeri Accounting
Sembil n & Office
Manufacturin Support
an Accounting
g, Transport

/
Kucha
i Lama & Logistics
Bukit
Daman Design Accounting
Manufacturi & Architecture
sara Accounting
Bukit ng, Transport
Bintan Healthcare & Logistics
g Retail &
Consumer& Medical

m
Mir Per Human
Resources Products
i Div ak
Marketing & Recruitmen
Call Centre & Communica
Manufactu t
isio & Customer
Human ring, Transport
tions

Ala
Sabah n
Service
Resources & Logistics
Insurance & Recruitme
& Superannu

logy
nt
Kuan Engineerin ation

Ko
g
tan
chno
Ba Sales

Sales
ndar ta Se
Advertisin Accountin
g, Arts g

n Te
Marketing & Media
tar Accountin

s
Ma & Communi g

gistic
CEO & Healthcar

icatio
Mo lay Manufact General
cations
e & Medical

nt sia Retail
uring,
TransporManagem

ah
mmun
& Consume ent
t & Logistics
Kia Manufact

t & Lo
r
uring, Products

ther O ra Transpor

& Co
t & Logistics
Farming Engineer

spor
, Animals Accountiing

i
g
ation Ku K
& Conserv ng

ala Buki ulim


Human
Sales
ation

Sh
Adminis

, Tran
Resourc

Inform
ra
tration

t
es &
& Office Recruitm

t Ja
ppor
Support ent

S
turing Office Su tment
Ku ela lil
Sales
or

Klan
Marketi Account

n
Retail

a
ng &
Commuing

Bin Kua la M gor


Informa & Consum

ac
tion er Product nication
& Commu

uf
s
nications

ng
&

Kuala Lum
tu la L
s
Admini Consult

Man ion crui u


stration ing Techno
Hospita
an da
Pe
& Strateg logy

KL
& Office

istic
lu
& Re ns
y

istrat
AccounSuppor
lity & Accoun ting t
ng

s M D g Tourism ting
in ce io Ec ivisio at
Adm ela
Retail

Log
sour icat
& Consum

o
Inform Sales

gi/Serda
er
ationMarket EngineProduc

an Re mun C n
Admin

M
Constr ering ts

ka
& Comm ing
istratio & Comm

Ku
uction

ity
n& unicati unicati

Com Service T ela


Office on Technoons

rt &
Suppo

Hum
Manuf Retail logy
la

Inform rt
ationacturin & Consu

g& Ke eng ka

ch
& Comm g, Transp mer Sales
ng

unicat ort & Produc


er
Call

ketin Custom
spo
Centre Bankin Hospit ion Logist ts
& Custo g & Financ ality Techn ics
& Tourisology

in
ah
mer ial Servic m

Mar ntre & inin


rt
Manuf

p
acturi Real Servic
Se

Huma es
ng, Estate e

ran
on
n ResouTrans

g
&
ort
rces port &Property
g

po
& RecruLogist

Ce Engin itmenics

Call cation &


Banki
Tra

D
Admin Const eering t

g
ng

,T
istrat & FinanEngin ructio
es ion cial eeringn
& Office

up

iv
Servic

Hulu Langat
AccouServic
ra

Se
cial Suppo
Edu nting es

ing
& Finan rt
ng Admi

i
Banki nistra Engin

si
n re Const eering

Go er
Manution &

eS
ructio
Sales

Kot
ructio tectu

r
Const Archi factu Office n

pa
n& al Admi

pur City Ce
Huma ring, Supp

on
cts

u
Desig & Medic Produ nistra n ResouTrans ort

t
hcare tion
umer & Office rces port

Se
Healt

fac
Cons & Logis
& Recru
l& ology
g/P

c
Supp

m
tics
be

Retai itmen

Ch ban
i
& Techn

ng
ort
ce Media ation t

f
Accou
ScienArts &rannu

g
/Ban
Bank Trade nting

u f
g, ing s&
rtisin & Supe

ba s
& FinanEngin Servi
Accou

re
n
tin & O
Adve ance

aK
Insur l cial eerinces

Ma
Lega Servi g
Manu
Reta
factu il
nting ces
ring,& Cons
Infor Call

m
Tran umer Sales

k
mati Cent

n ion
sport Prod
on re
& Com & CustEngin& Logisucts
Se

Ban
omer eerin tics

cou
mun

a
Adm Hosp icatioAcco Servg

K
inistr italit n untin ice
y Tech g

ina
ation
Desi Engi& Tour nolog
& Offic
Man gn neer ism y

Ku
ufac Minin & Arch Accoe Supp ing
Klan

Ma

int
t
turin g, itect untin ort
g, Reso
nu

Ac nistra
Tran urceure g
Kajang

factu Hosp
spor
italit t s & Ener

Johor
&
y & Logis gy
rin Ac Engi Tour tics
g, coun neer

t
Hum ism

balu
ing
Tran
gy
Adman Scie
tin

a
or
inistReso nce

gsar
g

lai
sp ratiource &
ort
Tech
n &s &

olo
Recr nolo
Offic
& Lo

mi
t
Info Engi uitm gy
e Supp
neer ent

en chn
pp
rma

gy
Ma tion gis ing ort
nu Mar & Com
tic
ntre

keti Trad
fact s
m
ng mun es
Edu icati & Serv

uit Te
&

Ad
ur Com

lo
catio on Saleices

Su
ing, Info mun
Tech s
icati n & Trai

cr ion
rma nolo
tion ons
Tr Scie Acco ning gy

les g no
an Adm& Com

Re
nce unti
inist mun
sp
t
&
Techng

s
ratio
or
& nica ion
icati nolo
n & on
t&

e
Sale gy
Offic Tech
s

ch
Man Acco
Lo e nolo

Sa erin
ufac unti Sup

s
fic
t
turi Scie gi ng port gy
Hu stics
ce u a En
ng, nce
m Tran

ur mm unic
gin
&

e
an spo Tech

f
rt nolo
Re Acc &

ee Sales
so
ounLog gy

T
Info Man

e so tingistic

o
rma ufacMar

rin
O
s
ur tion

gin Re & C mm
turiketi
ce Adm & ng,ng

g
Com

n
&
s& inis Tran Com

En an
tratmun spo mun

Co
ion icat rt
Re

& io
& ion & icat

n cr AccounOfficeTec Log ions


istic
ui
tio & ia
Suphno

at
m
logys
Scie tm ting por
Hum Ret

Hu rma n
ail nce
en t

g ed
an &
Mar Res Con &
Tec t

io ic
Info Adm sum
our hno

in & M ke rma inis ces

et ts
er Sale logy
tion Pro s
tin Adm trat &

fo
at
Hum ion EngRec Leg duc
inis &

un
g
In ark ising, Arrvices
trat Call an
Com & Con ineeruit al ts
& Offi stru
ringment

Ad
ion CenRes mu
Acc ce ctio
Co & tre our nica ounSup n

tr
Offi & ces tion
vice m ce Cus & Saletingport

M ve &
tomRecTechnos

m m
Ser Sup

g m
rt Se er ts un por er ruit
Ser me logy

is
tom duc Acc t

in icat
Man vicent
Cus Pro Ret oun
Ad es tre & er

in Com
Humufa Mar ail ting
io
ctu ket & Tra
ad l Cen Consum logsy
is ns
s

an ring ing Con des

in
Tr Res , & sum

tr
hno Adm &
& ourTra Com er Ser

r
s

Cal
Tec vice

En
ail inis Hea cesnsp
& Ser mu Pro vice
tic

istic

Ret

at
e ial trat lthc & ort nica

Sa
duc
enc re are Rec& Log tion tss

Ad
ee &
ion
Sci Financ

Ac gin
ctu g ty & Eng & ruit isti s

m
n

io
& ininper Off ine Med me
g ctio hite
gis

le
Tra Hum Ma ice erin ical ntcs
kin stru & Arc& Pro
an rke

m
& Sup g

n
Ad
s
BanCon ign ion

co ee
ate Resting por
Des catEst our & t
Lo

Edu l

in on

in
&
cesCom

un rin
Rea
& mu Sal
Rec nic es

is
O

In
ruit atio
&

tin g
Ma me ns

g ti
Log

tr
ffic

fo
nuf nt
act Hu Ma
ns

at
t

ma rke

rm
uri

En rma
or

ng, n ting
Cal Res

e
Tra

io
l
Heour & Com
e atio

nsp Cen

ati
g

althces
re S un sp

Su
ort tre mu Sal

n
& car & Rec
& nic

on
Log Cus Enge & rui atioes
in

an

isti tom ine Me tm ns

pp

&
gy

Acc cs er erindic ent


erv ic

Ma
Ser g al

&
nuf
le unt

oun

fo
act Re vic
hit sto om Tr

O
g

rt &

or
ting
uri tail e

Co
ort gisti hnolo

Ad ng, &

ffic
ic

ver TraCo

t
nsu

In
tu er m

tisi nsp
,

Ac up
ng, me
& ing

ort
Art &r Pro
Sa cco

in

m
Adm

e
S

co
En s Log duc
gin
al &

un
Me isti ts
eer
es

S
un
dia cs
r

in
s

ing
tu

is
ec

ic
u C

Hu

tin
Ma n
spo

tr
ec m

ma
cs
A

ati

rke Re
Cal
at
ke fac

ort

tin sou
l Cen

g
po
io
er

nT

g rce
t

Ban n
on

& s Eng & rucSer


en

De
n

tre

Co &
Edu Fin
kin
sig
&

mm Re ine
&
g

itm
u

rt
cat Coanc re
Ad

Cu

&
&

unicru ng g
sto
O

ion nstial
Arc C

Te
s entr tin

Arc
an

catitm
m

me
ffic
ru

tio

hite vice

ion ent
Tra tionvic
eri inin
r
&

in
Ma

Ser
upp
Rec

ch

s
ctu
e
Inf
istr
ran
M

Acc
Lo

orm

Sup
Ma atio
e

Hum

no
& O nsp nica
ne

Ma Res

oun

es
nuf n
&

at

rke
an

ting
Returin
act & Com

po
Ca ar

nu
Adm
es

Hu

io

ting ces

lo
ail g, mu ign
Man
&

rt
& Tra nic & Arc
rc

our

&
&

m
s

gy
Con nspatio
Res n

Com Rec
vice

Ac
M

&
De ll C

ou

Des nic
an ctio
Hum st n

an

sumort n Tec
Ser

mu
&
fact

Off
u

itm upp
Con ig

En Su

er & Loghno re
eS
ort
g
ial

Sal ducisti y
ufa
ru

Re Des

Pro
co

atio
ruit
anc

,T

gi
ice
mm
l

es ts cs
Man
dica

Info

me

hite
ns
Fin

rism

ne
so
Me

Ad ng,

Man tion

nt
tin

rma
&

ctu
gi
Tou

un

er
ctu

log
ur
g

&

inis

uf

ufa &
kin

Ret ket , mu
Rec sum inin Med s

Mar ring
are

uri
Tra s & vice

ecru e S
&

pp
in
ia
Ban

ce

ctu Com
en

ail ing Tra nica Con ce


min Tra
Hos lthc

lity

ac

Adm
ts
& Art Ser

g
duc

& & nsp tion stru Sup


pita

or
Co

s & Ar
Hea

ig
g

tin
ri

tu

Con Com ort


inis
&

Offic
tion Pro

Hos

t
n
Ret Legcat sing des

trat

sum mu & hno n t


rin
ng,
rea er

pita
Edu erti Tra

&
Re
ra
Sa oun
ion ,

ion
istr nsp
ffic

Hu

er nicaLog log
Info

Ac tionMar& Com Edu

lity
g,

g
trati

Pro tion
En

&
rt & Con

cr
urin

ent

Offi
&

rma
man n

&

duc
istics
Spoail al

Tec ctio por


Adv

ch
Tr
Man
ati ng, T

ui
Tou
&

Sa
coketingmunica&catiCusontomnciaOffilceSerSuplepors
ort

ts
ati

istic
tm
rism

s
an

ite
Call Baninistrat

le

s
on

Adm

un& Comtion &TecTraer Service

y
Re

en
sp

ct

s
Cenking
o

Ac
ment

ran
&R

cruitm

ur
tre &
or

t
uf

so
Hu minis
ing

Supp

tinmunication
Information
ati

e
t&
c

ur
on

ion
les

Ad

Fina&
ma

&
actu

co
Info
ri

on &

sport & Log

En
ce

g slogy
ort
on

Con
rma Call

Lo erin
Ac

rces

spo

n Re tio

Off
rm

hno g
s&

gine
act

Manufac
min factu

stru

Sa
tion

inin vices
un

gist g
Ac

Mar Comtre

ctio
&
Admi

&
ring,

keti mun
Re

Reang & icat tom


Cen
ice

n
sour n & Of Acco
& Recruit

&O

tra

ics
Info

t
fo

ou

l Esta

tin
crui
co
Lo
rt &
Enginee tion & Office
s & Re

&

Comion er
unt
Hum istr

te mun
Cus
ces

Su
In

Res

rma

&
gis

tm
En
nu

Prop
un

g
nuf

nistra

Tran

TechServ
Call Centre &
Administration

icat nolo
trati
ns

Administration & Office Support

& Re e Su ting

en

erty
pp
ring
ices municatio

tion

ions gy
Mini
turing,

tics
ffice

t
Ma

gin
& Communicat

Acc

ice
an

ng,

tin
Log

ScieReso Cons
fic

ort
cru ppor
Marke munica

nce urce truct


ion cial & Com

& Com

spor
Acco

Mark

& Tech
Serv

tion

itm
ce
y ucts

g
eeri
Finan eting
Ad

ure

s&
eting

Hum ing,
Ma

Farm
turing, Tran
r Prod
cal

isti

nolo gy
un

en
o

ting
itect

oun

Ener
Consing & Mark

ications
& Medi

nolog

Accounting

an Anim
inis

& Comm

gy
sour

ion
inee

Transpor
ume
thca& Arch

t
Resources

Reso als
Sup

Realurces & Cons


Engin
Desi truct

& Tech

& Of

t
Human

& Lo
RetaLega re

ce Cons

& Com Techno

ng
Banking

cs
Estat & Recruervat
Acc
Heal gn

ring

unica
Bank

Human Resources & Recruitment


l

gy

Informa
Hospitality
Scienil &

Marketing
Technolo

e&
tions
mun

Huma

eerin
tion

ti
nistra

Prope
Call
untin
Marketing & Communications

fice

gist
Customer Service

tion &
an Re

munica logy

n Resou
Adm

itmen
Resources

Centre
Human

rty t
port
ication

& Office Support

ng
Com

g
Design
ounting

ion
Commun
& Finan

Call Centre
Administr
Eng

ion Techno
Sales

Information & Communication Technology

rces& Archit Servic


& Custom

ics
Commun

t & Log

Resource

Supp
Sale
Accounting

& Recruecture e
& Comm
ng &

Engineering
ture

tion

ication

tion
Manufacturing, Transport & Logistics

Retail
ation
Accoun

& Custome

Healthca

er
& Conserva
& Architec

tion &

s & Recruitm
Admi

Marketing & Communication Technology

itmen
Engineering

cial Servi

& Consum
Technolo

Design Constru
& Office

s
Hum

Educatio
Marketi

& Recruit
& Property
Construc Informa

Training

gy

Manufac

ort
& Tourism

t
re &
Banking
Human
& Technolo

s
Advertising, Arts & Media

r Service

g
Design

& Architec
Design & Architecture

Enginee
Animals

Support

gy

n & Training
unications

istics

Medical

er Product
Communications
Educatio tion

ent
Design
Consulting
Farming, n &

Insurance & Superannuation


Retail & Consumer Products

Sales
Estate

Acc

& Financial
Sales

ring
Education & Strategy

& Property

ction
ting
Science & Technology
Information & Training

ture
Science

ces
Sales

& Architecture
Science & Technology
Design & Architecture

logy

s
Education & Training
Trades & Services

ment
Real

Sales
& Strategy
Services
&
Consulting

Real Estate

def remove_outliers_iqr(df, column):


Q1 = df[column].quantile(0.10)
Q3 = df[column].quantile(0.85)

IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR


upper_bound = Q3 + 1.5 * IQR

df_outlier_free = df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]

return df_outlier_free

jobs = remove_outliers_iqr(job, 'Salary')

#===========================================================================================================#

# Get top 20 job titles from both DataFrames


top_leagues_job = job['job_title'].value_counts().nlargest(20).index
top_leagues_jobs = jobs['job_title'].value_counts().nlargest(20).index

# Combine the top job titles


top_leagues = top_leagues_job.union(top_leagues_jobs)

# Plotting
plt.figure(figsize=(16, 6))
sns.boxplot(x='Salary', y='job_title', data=job[job['job_title'].isin(top_leagues)])
plt.title('Distribution of Salary by Top 20 job_title from job')
plt.xlabel('Salary')
plt.ylabel('job_title')
plt.show()

plt.figure(figsize=(16, 6))
sns.boxplot(x='Salary', y='job_title', data=jobs[jobs['job_title'].isin(top_leagues)])
plt.title('Distribution of Salary by Top 20 job_title from After Removing Outliers')
plt.xlabel('Salary')
plt.ylabel('job_title')
plt.show()

top_leagues = jobs['job_title'].value_counts().nlargest(15)

plt.figure(figsize=(20, 8))

colors = sns.cubehelix_palette(len(top_leagues), light=0.7, dark=0.2)


bar_plot = sns.barplot(x=top_leagues.index, y=top_leagues.values, palette=colors)

for index, value in enumerate(top_leagues.values):


label = f"{value:,}"
plt.text(index, value + 0.1, label, ha='center', va='bottom', fontsize=15, color='#A52A2A')

plt.title('Top 15 Leagues by Player Count')


plt.xlabel('League')
plt.ylabel('Player Count')
plt.xticks(rotation=90)
plt.tight_layout()

plt.show()

tips_df = pd.read_csv("tip.csv")

display(tips_df.head(2))
fig = px.bar(tips_df,
x="sex",
y="total_bill",
color="smoker",
barmode="group",
facet_row="time",
facet_col="day",
category_orders={"day": ["Thur", "Fri", "Sat", "Sun"],
"time": ["Lunch", "Dinner"]})
fig.show()

total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2

1 10.34 1.66 Male No Sun Dinner 3

day=Thur day=Fri day=Sat day=Sun


smoker
800
No
Yes

time=Lunch
600
total_bill

400

200

800

time=Dinner
600
total_bill

400

200

0
Male Female Male Female Male Female Male Female

sex sex sex sex

fig = px.box(tips_df,
x="time",
y="total_bill",
points="all")
fig.show()
50

40
total_bill

30

20

10

Dinner Lunch

time

fig = px.box(tips_df,
x="time",
y="total_bill",
points="outliers")
fig.show()

50

40
total_bill

30

20

10

Dinner Lunch

time

fig = px.box(tips_df,
x="day",
y="total_bill",
color="smoker" )
fig.update_traces(quartilemethod="linear")
fig.show()
smoker
50
No
Yes

40
total_bill

30

20

10

Sun Sat Thur Fri

day

fig = px.box(tips_df,
x="time",
y="total_bill",
color="smoker",
notched=True,
hover_data=["day"] # add day column to hover data
)
fig.show()

smoker
50
No
Yes

40
total_bill

30

20

10

Dinner Lunch

time

plt.figure(figsize=(20, 10))

x = jobs['job_title'].head(20)
y = jobs['Salary'].head(20)

# Plot the scatter plot with country names and numbers on y-axis
marker_sizes = jobs['Salary']
for i, country in enumerate(x):
plt.scatter(country, y.iloc[i], s=(marker_sizes.iloc[i])/20, label=country, alpha=0.7)
plt.text(country, y.iloc[i], f'{y.iloc[i]:,.0f}', ha='center', va='bottom', rotation='vertical', fontsize=10
# Set y-axis to display numbers in billions
plt.ticklabel_format(style='plain', axis='y', useOffset=False, scilimits=(9, 9))

plt.xlabel('Job Title')
plt.ylabel('Salary')
plt.title('Scatter Plot Job Title for Salary')
plt.xticks(rotation=90)
plt.grid(True)
plt.tight_layout()

plt.show()

jobs = pd.read_csv("jobstreet_all_job_dataset.csv")
jobs = jobs.sample(5000)
jobs = jobs.drop(columns=['job_id'], axis=1)
jobs = jobs.reset_index()
jobs = jobs.drop(columns=['index'], axis=1)
display(jobs.shape)
display(jobs.head(2))

from wordcloud import WordCloud

text = str(list(jobs['category'])).replace(',', '').replace('[', '').replace("'", '').replace(']', '')

wordcloud = WordCloud(background_color = 'white', width = 1600, height = 800, max_words = 121).generate(text)


plt.imshow(wordcloud)

plt.axis('off')
plt.show()

(5000, 10)
job_title company descriptions location category subcategory role type salary listingDate

Mass JOB
Manager,
Rapid PURPOSE
Government Kuala 2024-05-
0 Transit :\nTo organize Construction Project Management manager Contract/Temp NaN
& Authority Lumpur 10T04:06:31Z
Corporation & participate in
Liasion
Sdn Bhd a ...

Designing
Saraya Information & information-
Junior IT solutions, Seremban 2024-04-
1 Goodmaid Communication Developers/Programmers technology- Full time NaN
Executive implementation, District 08T00:14:09Z
Sdn Bhd Technology executive
customiza...
# Drop rows with missing values and plot the resulting DataFrame
job = jobs.dropna()

import re

def clean_and_calculate_mean(salary):
try:
# Remove currency symbols, words, and extra characters
salary = salary.replace('RM', '').replace('MYR', '').replace('$', '').replace('per month', '').replace('p.m.'

# Handle ranges with different separators


if '–' in salary:
salary_range = salary.split('–')
elif '-' in salary:
salary_range = salary.split('-')
elif '—' in salary:
salary_range = salary.split('—')
else:
salary_range = [salary]

# Convert values to integers, handling potential errors


salary_values = []
for value in salary_range:
try:
value = int(float(value.replace(',', '').strip()))
salary_values.append(value)
except ValueError:
pass # Ignore non-numeric values

# Calculate mean if at least two valid values are found


if len(salary_values) >= 2:
salary_mean = sum(salary_values) / len(salary_values)
return salary_mean
else:
return None

except Exception as e:
print(f"Error processing salary '{salary}': {e}")
return None

# Apply the function to the salary column


job['Salary'] = job['salary'].apply(clean_and_calculate_mean)
job = job.drop('salary',axis=1)
job.head(2)

job_title company descriptions location category subcategory role type listingDate Salary

CTOS Data Attend to all inbound Call Centre & Customer


Specialist, Kuala call-centre- Full 2024-04-
4 Systems Sdn and outbound calls/ Customer Service - Call 2750.0
Contact Centre Lumpur role time 19T02:57:09Z
Bhd emai... Service Centre

Multilingual | Skills and Kampung Call Centre & Customer customer-


Private Full 2024-04-
5 Customer Support Abilities:\nSkilled Malaysia Customer Service - Call support- 5500.0
Advertiser time 05T12:56:47Z
Specialist communicator.\n... Raya Service Centre specialist

dfp = job['role'].value_counts().head(10).sort_values(ascending = True).reset_index()


dfl = job['location'].value_counts().head(10).sort_values(ascending = True).reset_index()
dfc = job['company'].value_counts().head(10).sort_values(ascending = True).reset_index()

fig = go.Figure()

fig.add_trace(go.Bar(y = dfp['role'],
orientation='h',
name = 'Position',
marker = dict(color = 'LightCoral')))

fig.add_trace(go.Bar(y = dfl['location'],
orientation='h',
name = 'Location',
marker = dict(color = 'CadetBlue')))

fig.add_trace(go.Bar(y = dfc['company'],
orientation='h',
name = 'Company',
marker = dict(color = 'SteelBlue')))

fig.update_layout(
updatemenus=[
dict(
type = "buttons",
direction="left",
pad={"r": 10, "t": 10},
showactive=True,
x=0.16,
xanchor="left",
y=1.12,
yanchor="top",
font = dict(color = 'Indigo',size = 14),
buttons=list([
dict(label="All",
method="update",
args=[ {"visible": [True, True, True]},
{'showlegend' : True}
]),
dict(label="Position",
method="update",
args=[ {"visible": [True, False, False]},
{'showlegend' : True}
]),
dict(label='Location',
method="update",
args=[ {"visible": [False, True, False]},
{'showlegend' : True}
]),
dict(label='Company',
method="update",
args=[ {"visible": [False, False, True]},
{'showlegend' : True}]),
]),
)])

fig.update_layout(
annotations=[
dict(text="Choose:", showarrow=False,
x=0, y=1.075, yref="paper", align="right",
font=dict(size=16,color = 'DarkSlateBlue'))])

fig.update_layout(title ="Top 10 Positions, Locations and Companies",


title_x = 0.5,
title_font = dict(size = 20, color = 'MidnightBlue'))

fig.show()

Top 10 Positions, Locations and Companies


All
Choose: Position Location Company

Michael Page International (Malaysia) Sdn Bhd Position


Location
Ambition Group Malaysia Sdn Bhd
Company
Agensi Pekerjaan Hays (Malaysia) Sdn Bhd
MumsMe Sdn Bhd
Elabram Systems Sdn Bhd
Petaling
Selangor
Penang Island
Seberang Perai
Kuala Lumpur City Centre
sales-executive
accounts-executive
human-resource-executive
finance-executive
purchasing-executive
0 5

# Read data
df = pd.read_csv('US_Job_Market.csv')
df.head(3)

position company description reviews location

Development Director\nALS Therapy Development Atlanta, GA


0 Development Director ALS TDI NaN
... 30301

An Ostentatiously-Excitable Principal The Hexagon


1 Job Description\n\n"The road that leads to acc... NaN Atlanta, GA
Research... Lavish

2 Data Scientist Xpert Staffing Growing company located in the Atlanta, GA are... NaN Atlanta, GA

#!pip install dash

import pandas as pd
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
import plotly.graph_objects as go

dfd1 = df[df['position']== 'Data Scientist']


dfd2 = df[df['position']== 'Senior Data Scientist']
dfd3 = df[df['position']== 'Research Analyst']
dfd4 = df[df['position']== 'Data Engineer']

# Add 'position' column to each dataframe


redf1 = dfd1[["location", "position"]].value_counts().nlargest(10).sort_values(ascending = True).reset_index()
redf2 = dfd2[["location", "position"]].value_counts().nlargest(10).sort_values(ascending = True).reset_index()
redf3 = dfd3[["location", "position"]].value_counts().nlargest(10).sort_values(ascending = True).reset_index()
redf4 = dfd4[["location", "position"]].value_counts().nlargest(10).sort_values(ascending = True).reset_index()
# Create Plotly figure
fig = go.Figure()

fig.add_trace(go.Bar(x = redf1["location"],
y = redf1["count"],
marker = dict(color = 'Tomato'),
name = 'Data Scientist'))
fig.add_trace(go.Bar(x = redf2['location'],
y = redf2['count'],
name = 'Senior Data Scientist',
marker = dict(color = 'LightCoral')))
fig.add_trace(go.Bar(x = redf3['location'],
y = redf3['count'],
name = 'Research Analyst',
marker = dict(color = 'SteelBlue')))
fig.add_trace(go.Bar(x = redf4['location'],
y = redf4['count'],
name = 'Data Engineer',
marker = dict(color = 'CadetBlue')))

# Update Layout with dropdown functionality


fig.update_layout(
updatemenus=[
dict(
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.13,
xanchor="left",
y=1.12,
yanchor="top",
font = dict(color = 'Indigo',size = 14),
buttons=list([
dict(label="All",
method="update",
args=[ {"visible": [True, True, True, True]},
{'showlegend' : True}
]),
dict(label="Data Scientist",
method="update",
args=[ {"visible": [True, False, False, False]},
{'showlegend' : True}
]),
dict(label='Senior Data Scientist',
method="update",
args=[ {"visible": [False, True, False, False]},
{'showlegend' : True}
]),
dict(label='Research Analyst',
method="update",
args=[ {"visible": [False, False, True, False]},
{'showlegend' : True}
]),
dict(label='Data Engineer',
method="update",
args=[ {"visible": [False, False, False, True]},
{'showlegend' : True}]),
]),
)])

fig.update_layout(
annotations=[
dict(text="Choose:", showarrow=False,
x=0, y=1.075, yref="paper", align="right",
font=dict(size=16,color = 'DarkSlateBlue'))])

fig.update_layout(title ="The distribution of states by four Positions",


title_x = 0.5,
title_font = dict(size = 20, color = 'MidnightBlue'))

fig.show()
The distribution of states by four Positions

80 Choose: All

Data Scientist
Senior Data Scientist
Research Analyst
60 Data Engineer

40

20

0
Austin, TX

San Diego, CA

Seattle, WA

Atlanta, GA

Los Angeles, CA

Washington, DC

Chicago, IL

Boston, MA

San Francisco, CA

New York, NY

Mountain View, CA

Alameda, CA

New York, NY 10176

Washington, DC 20036

Sunnyvale, CA

San Mateo, CA
df = pd.read_csv("heart_disease_uci.csv")
df = df.dropna()
df.head(2)

id age sex dataset cp trestbps chol fbs restecg thalch exang oldpeak slope ca thal num

lv fixed
0 1 63 Male Cleveland typical angina 145.0 233.0 True 150.0 False 2.3 downsloping 0.0 0
hypertrophy defect

lv
1 2 67 Male Cleveland asymptomatic 160.0 286.0 False 108.0 True 1.5 flat 3.0 normal 2
hypertrophy

g = sns.jointplot(x="chol", y="thalch", data=df, kind="kde", color="b")


g.plot_joint(plt.scatter, c="b", s=30, linewidth=1, marker="+")
g.ax_joint.collections[0].set_alpha(0)
g.set_axis_labels("chol", "thalch");

# --- Create Jointplot ---


# --- Create Jointplot ---
jointplot = sns.jointplot(x = 'age', y = 'chol', data = df, hue = 'sex', palette = 'PuRd')

# --- Jointplot Titles & Text ---


jointplot.fig.suptitle('Jointplot between Age and Chol', fontweight = 'heavy', y = 1.05, fontsize = '14',
fontfamily = 'sans-serif', color = 'black');

import matplotlib.pyplot as plt

# Define the labels


labels = ['chol', 'age', 'thalch','oldpeak']

# Calculate counts
counts = [df[label].sum() for label in labels]

# Create the bar plot


plt.figure(figsize=(10, 6))
bars = plt.bar(labels, counts, color=['skyblue', 'Red', 'Green'])

# Add value labels on top of each bar with some vertical offset
for bar, count in zip(bars, counts):
yval = bar.get_height() + 0.1 # Add a small offset
plt.text(bar.get_x() + bar.get_width() / 2, yval, str(count), ha='center')

plt.title('Sum of Each Class')


plt.xlabel('Class')
plt.ylabel('Total')
plt.show() Python Code Link: https://t.me/AIMLDeepThaught/573
plt.rcParams['figure.figsize'] = (15,5)
plt.subplot(1, 2, 1)
chart = df.groupby('cp')['age'].mean().sort_values(ascending = False).plot(kind = 'bar', color = 'orangered')
chart.set_xticklabels(chart.get_xticklabels(), rotation = 0)
plt.title('Chest Pain Based on Age', fontsize = 15, color = 'b', pad = 12)
plt.xlabel('Chest Pain')
plt.ylabel('Age')

plt.subplot(1, 2, 2)
chart = df.groupby('thal')['oldpeak'].mean().sort_values(ascending = False).plot(kind = 'bar', color = 'gold')
chart.set_xticklabels(chart.get_xticklabels(), rotation = 0)
plt.title('Thal from Old Peak', fontsize = 15, color = 'b', pad = 12)
plt.xlabel('Thal')
plt.ylabel('Old Peak')
plt.show()

plt.figure(figsize = (12,4))
ax = sns.countplot(x=df.cp)
for bars in ax.containers:
ax.bar_label(bars)
plt.title("Count of Levels", fontsize = 15);
plt.figure(figsize = (8,5))
sns.kdeplot(df.age, shade = True, color = "r")
plt.title("Age Histogram", fontsize = 20)
plt.show()
print("Histogram's skewness is {} and kurtosis is {}".format(df.age.skew(), df.age.kurtosis()))

Histogram's skewness is -0.21485314045391055 and kurtosis is -0.5174882052116159

import scipy.stats as stats

df_numeric = df.select_dtypes(include='number')

results = []

for col in df_numeric.columns:


skewness = df_numeric[col].skew()
kurtosis = df_numeric[col].kurt()
results.append([col, skewness, kurtosis])

df_stats = pd.DataFrame(results, columns=['Column', 'Skewness', 'Kurtosis'])


df_stats
Column Skewness Kurtosis

0 id 0.90 3.95

1 age -0.21 -0.52

2 trestbps 0.70 0.80

3 chol 1.03 4.35

4 thalch -0.53 -0.09

5 oldpeak 1.24 1.52

6 ca 1.19 0.26

7 num 1.05 -0.16

from scipy.stats import norm

dfx = df[['chol', 'age', 'thalch','oldpeak']]

for col in dfx:


stats.probplot(dfx[col],plot=plt)
plt.title(col)
plt.show();
sns.set(rc={'figure.figsize':(20,7)})
sns.set_style("white")
sns.scatterplot(data=df, x="chol", y="trestbps", size="oldpeak", hue='cp',legend=True, sizes=(10, 500));

sns.set(rc={'figure.figsize':(20,7)})
sns.relplot(y='trestbps',x='chol',data=df,kind='scatter',size='oldpeak',hue='cp',aspect=1.2);

# Filter out rows with 'oldpeak' equal to 0.0


df_filtered = df[df['oldpeak'] != 0.0]

# Create the countplot


plt.figure(figsize=(20, 7))
sns.countplot(data=df_filtered, x='oldpeak', order=sorted(df_filtered['oldpeak'].unique()))

# Access bars through the current axes


for bar in plt.gca().patches:
plt.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.1, int(bar.get_height()), ha='center', va='botto

# Add labels and title


plt.title('Tumor Size Distribution (Excluding 0.0)')
plt.ylabel('Number of Patients')
plt.xlabel('Oldpeak Value')
plt.xticks(rotation=0) # Rotate x-axis labels for better readability
plt.show()

plt.figure(figsize=(20, 7))
# Create the countplot
sns.countplot(data=df, x='age', order=sorted(df['age'].unique()))

# Access bars through the current axes


for bar in plt.gca().patches:
plt.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.1, int(bar.get_height()), ha='center', va='botto

# Add labels and title


plt.title('Age Distribution of Patients')
plt.ylabel('Number of Patients')
plt.xlabel('Age')
plt.show()

import matplotlib.pyplot as plt


import seaborn as sns
from scipy import stats # Import stats module

def diagnostic_plots(df, variable):

plt.figure(figsize=(17, 5))
plt.subplot(1, 3, 1)
sns.distplot(df[variable])
plt.title('Histogram')

plt.subplot(1, 3, 2)
stats.probplot(df[variable], dist="norm", plot=plt) # Use stats.probplot
plt.ylabel('RM quantiles')

plt.subplot(1, 3, 3)
sns.boxplot(x=df[variable])
plt.title('Boxplot')
plt.show()

for col in df[['age','chol']].select_dtypes(exclude="O").columns[:20].to_list():


diagnostic_plots(df,col)

corr = df.select_dtypes('number').drop('id',axis=1).corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
fig, ax = plt.subplots(figsize=(15,10))
sns.heatmap(corr, cmap='Spectral_r', mask=mask, square=True, annot=True, linewidth=0.5, cbar_kws={"shrink" : 0.5
df = df.drop('id',axis=1)
df.head(2)

age sex dataset cp trestbps chol fbs restecg thalch exang oldpeak slope ca thal num

0 63 Male Cleveland typical angina 145.0 233.0 True lv hypertrophy 150.0 False 2.3 downsloping 0.0 fixed defect 0

1 67 Male Cleveland asymptomatic 160.0 286.0 False lv hypertrophy 108.0 True 1.5 flat 3.0 normal 2

plt.figure(figsize=(20, 5))
sns.set_context("paper")

kdeplt = sns.kdeplot(
data=df,
x="chol",
hue="sex",
palette='Dark2',
alpha=0.7,
lw=2,
)

kdeplt.set_title("Cholesterol values distribution\nMale VS Female", fontsize=12)


kdeplt.set_xlabel("Cholesterol", fontsize=12)

# Calculate mean cholesterol for each sex


mean_male = df[df['sex'] == 'Male']['chol'].mean()
mean_female = df[df['sex'] == 'Female']['chol'].mean()

# Add vertical lines for mean cholesterol


plt.axvline(x=mean_male, color="#2986cc", ls="--", lw=1.3)
plt.axvline(x=mean_female, color="#c90076", ls="--", lw=1.3)

# Add text annotations


plt.text(mean_male, plt.gca().get_ylim()[1], f"Mean Cholesterol / Male: {mean_male:.2f}",
fontsize=10, color="#2986cc", ha='right', va='top')
plt.text(mean_female, plt.gca().get_ylim()[1], f"Mean Cholesterol / Female: {mean_female:.2f}",
fontsize=10, color="#c90076", ha='left', va='top')

plt.show() Python Code Link: https://t.me/AIMLDeepThaught/573


heart_df_fg = sns.FacetGrid(
data=df,
col="sex",
hue="sex",
row="cp",
height=4,
aspect=1.3,
palette='Dark2',
col_order=["Male", "Female"],
)
heart_df_fg.map_dataframe(sns.regplot, "age", "chol")
plt.show()
x = df.groupby("cp")["chol"].min().index
y = df.groupby("cp")["chol"].min().values
df = pd.DataFrame({'cp':x,
'chol':y })

fig = px.bar(df,
x='cp',
y='chol',
color='cp', #color represents brand
title='Chol Value'
)
fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573

Chol Value

180
cp
asymptomatic
160 atypical angina
non-anginal
140 typical angina

120

100
chol

80

60

40

20

0
asymptomatic atypical angina non-anginal typical angina

cp

df = pd.read_csv("heart_disease_uci.csv")
df = df.dropna()
df = df.drop('id',axis =1 )
df.head(2)

age sex dataset cp trestbps chol fbs restecg thalch exang oldpeak slope ca thal num

0 63 Male Cleveland typical angina 145.0 233.0 True lv hypertrophy 150.0 False 2.3 downsloping 0.0 fixed defect 0

1 67 Male Cleveland asymptomatic 160.0 286.0 False lv hypertrophy 108.0 True 1.5 flat 3.0 normal 2

plt.figure(figsize=(18,5))

plt.subplot(1,5,1)
sns.distplot(df['age'],color='DeepPink')
plt.subplot(1,5,2)
sns.distplot(df['chol'],color='Green')
plt.subplot(1,5,3)
sns.distplot(df['thalch'],color='Red')
plt.subplot(1,5,4)
sns.distplot(df['oldpeak'],color='Magenta')

plt.tight_layout()
plt.show()
df_cpy = df.copy("Deep")
df_cpy = df_cpy.select_dtypes("number")
df_cpy = df_cpy[['age','chol','thalch','oldpeak']]

fig,axis=plt.subplots(ncols=4,nrows=1,figsize=(15,5))
index=0
axis=axis.flatten()

for col,values in df_cpy.items():


sns.boxplot(y=col,data=df_cpy,color='r',ax=axis[index])
index+=1
plt.tight_layout(pad=0.5,w_pad=0.7,h_pad=5.0)

df_cpy = df.copy("Deep")
df_cpy = df_cpy.select_dtypes("number")
df_cpy = df_cpy[['age','chol','thalch','oldpeak']]

flierprops = dict(markerfacecolor='g', color='g', alpha=0.5)

n_cols = 4
n_rows = int(np.ceil(df_cpy.shape[-1]*2 / n_cols))
fig, axes = plt.subplots(n_rows, n_cols, figsize=(4 * n_cols, 3 * n_rows))
for i, (col) in enumerate(list(df_cpy.columns)):
mean = df_cpy[col].mean()
median = df_cpy[col].median()
sns.histplot(df_cpy[col], ax=axes.flatten()[2*i], kde=True)
sns.boxplot(x=df_cpy[col], orient='h', ax=axes.flatten()[2*i+1], color='g')
axes.flatten()[2*i+1].vlines(mean, ymin = -1, ymax = 1, color='r', label=f"For [{col}]\nMean: {mean:.2}\nMedian:
axes.flatten()[2*i+1].legend()

if i % n_cols == 0:
ax.set_ylabel('Frequency')
else:
ax.set_ylabel('')
plt.tight_layout()

sns.set(style='whitegrid', palette="deep", font_scale=1.1, rc={"figure.figsize": [20, 6]})

sns.histplot(df['chol'], bins = 30).set(xlabel = "Chol");


df2 = df[['age','sex','chol']]

f, (ax_box, ax_hist) = plt.subplots(2, sharex=True, gridspec_kw={"height_ratios": (.15, .85)})

ax_box.title.set_text('Age countplot and Boxplot')


sns.boxplot(df2["age"], orient="h" ,ax=ax_box)
sns.histplot(data=df2, x="age", ax=ax_hist)
ax_box.set(xlabel='')
plt.show()

#is online delivery available?


colors = ("darkorange", "green",'Red','Pink')
explodes = [0.5, 0.5,0.75, .50]
df["cp"].value_counts(sort=False).plot.pie(colors=colors,
textprops={'fontsize': 15},
autopct = '%4.1f',
startangle= 90,
radius =2,
rotatelabels=True,
shadow = True) ;
wine = pd.read_csv("WineQT.csv")
wine.head(2)

fixed volatile citric residual free sulfur total sulfur


chlorides density pH sulphates alcohol quality Id
acidity acidity acid sugar dioxide dioxide

0 7.4 0.70 0.0 1.9 0.08 11.0 34.0 1.0 3.51 0.56 9.4 5 0

1 7.8 0.88 0.0 2.6 0.10 25.0 67.0 1.0 3.20 0.68 9.8 5 1

NUMERICAL = wine[['fixed acidity', 'volatile acidity', 'residual sugar',


'chlorides', 'free sulfur dioxide', 'total sulfur dioxide',
'pH', 'alcohol']]
fig, axes = plt.subplots(2, 4)
fig.set_figheight(12)
fig.set_figwidth(16)
for i,col in enumerate(NUMERICAL):
sns.histplot(wine[col],ax=axes[(i // 4) -1 ,(i % 4)], kde = True)
axes[(i // 4) -1 ,(i % 4)].axvline(wine[col].mean(), color='k', linestyle='dashed', linewidth=1)
#set configuration for charts
plt.rcParams["figure.figsize"]=[18 , 6]
plt.rcParams["font.size"]=15
plt.rcParams["legend.fontsize"]="medium"
plt.rcParams["figure.titlesize"]="medium"

def plot_disribution(data , x ,color,bins ):


mean = data[x].mean()
std = data[x].std()
info=dict(data = data , x = x , color = color)
plt.subplot(1 , 3 , 1 , title =f"Ditstribution of {x} column")
sns.distplot(a=data[x] , bins = bins)
plt.xlabel(f"bins of {x}")
plt.axvline(mean , label ="mean" , color ="red")
plt.ylabel("frequency")
plt.legend(["${\sigma}$ = %d"%std , f"mean = {mean:.2f}"])
plt.title(f"histogram of {x} column")
plt.subplot(1 , 3 , 2)
sns.boxplot(**info)
plt.xlabel(f"{x}")
plt.title(f"box plot of {x} column")
plt.subplot(1 , 3 , 3)
sns.swarmplot(**info)
plt.xlabel(f"{x}")
plt.title(f"distribution of points in {x} column")
plt.suptitle(f"Distribution of {x} column" , fontsize =15 , color="red")
plt.show()

age_bins = np.arange(29 , 77+5 , 5)


base_color = sns.color_palette()[4]
plot_disribution(data = df , x ="chol" , color = base_color , bins=age_bins)
plot , ax = plt.subplots(1 , 3 , figsize=(20,6))
sns.histplot(data = df.loc[df["thal"]== 'normal'] , x = "age" , hue = "sex",binwidth=2,ax = ax[0],palette = sns.
sns.histplot(data = df.loc[df["thal"]== 'reversable defect'] , x = "age" , hue = "sex",binwidth=2,ax = ax[1],palette
sns.histplot(data = df.loc[df["thal"]== 'fixed defect'] , x = "age" , hue = "sex",binwidth=2,ax = ax[2],palette
plt.show()

sex = ["Male", "Female"]


values = df["sex"].value_counts()
color = ["#FF0000", "#000000"]

plt.figure(figsize = (5, 7))


plt.pie(values, labels = sex, colors = color, explode = (0.1, 0), textprops = {"color":"w"}, autopct = "%.2f%%",

plt.legend();

#plotting
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 9))
fig.suptitle(' Highest and Lowest Correlation ', size = 20, weight='bold')
axs = [ax1, ax2]
#kdeplot
sns.kdeplot(data=df, y='chol', x='thalch', ax=ax1, color="red")
ax1.set_title('Chol Vs Thalch', size = 14, weight='bold', pad=20)

#kdeplot
sns.kdeplot(data=df, y='chol', x='oldpeak', ax=ax2, color='Blue')
ax2.set_title('Chol Vs Oldpeak', size = 14, weight='bold', pad=20);

df1 = pd.read_csv('US_Job_Market.csv')
df1 = df1.dropna().reset_index()
df1 = df1.drop('index',axis=1)
df1.head(2) Python Code Link: https://t.me/AIMLDeepThaught/573
position company description reviews location

Operation DEPARTMENT: Program OperationsPOSITION Atlanta, GA


0 Data Analyst 44.0
HOPE LOCATIO... 30303

Assistant Professor -TT - Signal Processing & Emory


1 DESCRIPTION\nThe Emory University Department o... 550.0 Atlanta, GA
... University

plt.figure(figsize=(20, 7))

# Filter for the top 10 most frequent companies


df_v = df1['company'].value_counts().head(10).reset_index()
df_v.columns = ['company', 'count']

# Calculate the percentage


total = df1['company'].value_counts().sum()
df_v['percentage'] = (df_v['count'] / total) * 100

# Create the bar plot


plot = sns.barplot(y='company', x='count', data=df_v)

# Annotate the bars with the percentage


for index, row in df_v.iterrows():
plot.text(row['count'], index, f"{row['percentage']:.2f}%", color='black', ha="left")

plt.xticks(rotation=0)
plt.title('Top 10 Most Frequent Companies')
plt.show()
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
from wordcloud import WordCloud, STOPWORDS
import pandas as pd
import requests
from io import BytesIO

# Download mask image


mask_url = "https://cdn.pixabay.com/photo/2013/07/12/17/47/test-pattern-152459_1280.png"
response = requests.get(mask_url)
mask_image = Image.open(BytesIO(response.content))
wordcloud_mask = np.array(mask_image)

# Generate word cloud


plt.figure(figsize=(15,15))
all_text = " ".join(df1['company'].values.tolist())
wordcloud = WordCloud(width=800,
height=800,
stopwords=STOPWORDS,
background_color='white',
max_words=800,
colormap="hsv",
mask=wordcloud_mask).generate(all_text)

# Display the word cloud


plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

from wordcloud import WordCloud

# create a word cloud for positive reviews


positive_reviews = df1[df1['location'] == 'Atlanta, GA']['company'].str.cat(sep=' ')
positive_cloud = WordCloud(width=1500, height=800, max_words=100, background_color='white').generate(positive_reviews

plt.figure(figsize=(20, 6), facecolor=None)


plt.imshow(positive_cloud)
plt.axis("off")
plt.tight_layout(pad=0)
plt.show()

import plotly.graph_objs as go
values = df1['company'].value_counts()[:10]
labels=values.index
text=values.index
fig = go.Figure(data=[go.Pie(values=values,labels=labels,hole=.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
marker=dict(line=dict(color='#000000', width=3)))
fig.update_layout(title="Most popular Jobs in USA",
titlefont={'size': 30},
)
fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573

Most popular Jobs in USA

Amazon.com
Ball Aerospace
Microsoft
Google
187 NYU Langone Health
Fred Hutchinson Cancer Research Center
357
KPMG

137 Broad Institute


Facebook
Walmart eCommerce

134
45
49

76
70 66 49

print("Count of unique Jobs in USA")


locationCount=df1['company'].value_counts().head(10).sort_values(ascending=True)
locationCount
fig=plt.figure(figsize=(18,10))
locationCount.plot(kind="barh",fontsize=8)
plt.ylabel("Job names",fontsize=25,color="red",fontweight='bold')
plt.title("Jobs Vs. COUNT GRAPH",fontsize=40,color="BLACK",fontweight='bold')
for v in range(len(locationCount)):
plt.text(v+locationCount[v],v,locationCount[v],fontsize=10,color="BLACK",fontweight='bold')

Count of unique Jobs in USA

z=df1['position'].value_counts().head(10)
fig=px.bar(z,x=z.index,y=z.values,color=z.index,text=z.values,labels={'index':'job title','y':'count','text':'count'
fig.show()

Top 10 Popular Roles in Data Sceince

200
position
204
Data Scientist
Senior Data Scientist
Research Analyst
150 Data Engineer
Machine Learning Engineer
Sr. Data Scientist
count

Principal Data Scientist


100
Quantitative Analyst
Research Scientist
Lead Data Scientist
50
53
44
39
26 22 20 20 20 17
0
Da Se R Da Ma Sr Pr Qu Re Le
ta nio ese ta ch . in an se ad
Sc rD ar
c En ine Data cipa tit ar
c D
ien ata hA gin L S lD ati h S ata
tis na ee ea cie ata ve cie Sc
t Sc lys r rn n A n ien
ien ing tis Sc n
tis t En t ien alys tist tis
t
t gin tis t
ee t
r

position

# Plotting Outliers
col = 1
plt.figure(figsize = (20, 10))
for i in wine.columns:
if col < 11:
plt.subplot(2, 5, col)
plt.boxplot(wine[i])
plt.xlabel(i)
col = col + 1
s = sns.countplot(x = 'cp',data = df)
sizes=[]
for p in s.patches:
height = p.get_height()
sizes.append(height)
s.text(p.get_x()+p.get_width()/2.,
height + 3,
'{:1.2f}%'.format(height/len(df)*100),
ha="center", fontsize=16)

#checking the target variables for distribution


sns.distplot(df['chol'],color='Red')
plt.axvline(x=df['chol'].mean(), color='Blue', linestyle='--', linewidth=2)
plt.title('Chol');

wine.iloc[:, :-1].describe().T.sort_values(by='std' , ascending = False)\


.style.background_gradient(cmap='GnBu')\
.bar(subset=["max"], color='#BB0000')\
.bar(subset=["mean",], color='green')

count mean std min 25% 50% 75% max

total sulfur dioxide 1143.000000 45.914698 32.782130 6.000000 21.000000 37.000000 61.000000 289.000000

free sulfur dioxide 1143.000000 15.615486 10.250486 1.000000 7.000000 13.000000 21.000000 68.000000

fixed acidity 1143.000000 8.311111 1.747595 4.600000 7.100000 7.900000 9.100000 15.900000

residual sugar 1143.000000 2.532152 1.355917 0.900000 1.900000 2.200000 2.600000 15.500000

alcohol 1143.000000 10.442111 1.082196 8.400000 9.500000 10.200000 11.100000 14.900000

quality 1143.000000 5.657043 0.805824 3.000000 5.000000 6.000000 6.000000 8.000000

citric acid 1143.000000 0.268364 0.196686 0.000000 0.090000 0.250000 0.420000 1.000000

volatile acidity 1143.000000 0.531339 0.179633 0.120000 0.392500 0.520000 0.640000 1.580000

sulphates 1143.000000 0.657708 0.170399 0.330000 0.550000 0.620000 0.730000 2.000000

pH 1143.000000 3.311015 0.156664 2.740000 3.205000 3.310000 3.400000 4.010000

chlorides 1143.000000 0.086933 0.047267 0.012000 0.070000 0.079000 0.090000 0.611000

density 1143.000000 0.996730 0.001925 0.990070 0.995570 0.996680 0.997845 1.003690

df[df["age"] >= 50].describe().style.background_gradient(cmap='RdPu')

age trestbps chol thalch oldpeak ca num

count 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000

mean 59.159624 134.793427 252.220657 144.708920 1.214085 0.835681 1.103286

std 5.731645 18.575307 54.953695 22.377966 1.197004 0.969460 1.280713

min 50.000000 94.000000 100.000000 71.000000 0.000000 0.000000 0.000000

25% 55.000000 120.000000 214.000000 130.000000 0.100000 0.000000 0.000000

50% 58.000000 132.000000 246.000000 148.000000 1.000000 1.000000 1.000000

75% 63.000000 145.000000 283.000000 161.000000 1.800000 1.000000 2.000000

max 77.000000 200.000000 564.000000 195.000000 6.200000 3.000000 4.000000

def highlight_min(s, props=''):


return np.where(s == np.nanmin(s.values), props, '')
df.describe().style.apply(highlight_min, props='color:yellow;background-color:Grey', axis=0)

age trestbps chol thalch oldpeak ca num

count 299.000000 299.000000 299.000000 299.000000 299.000000 299.000000 299.000000

mean 54.521739 131.715719 246.785953 149.327759 1.058528 0.672241 0.946488

std 9.030264 17.747751 52.532582 23.121062 1.162769 0.937438 1.230409

min 29.000000 94.000000 100.000000 71.000000 0.000000 0.000000 0.000000

25% 48.000000 120.000000 211.000000 132.500000 0.000000 0.000000 0.000000

50% 56.000000 130.000000 242.000000 152.000000 0.800000 0.000000 0.000000

75% 61.000000 140.000000 275.500000 165.500000 1.600000 1.000000 2.000000

max 77.000000 200.000000 564.000000 202.000000 6.200000 3.000000 4.000000

Prepared By: Syed Afroz Ali

Python Code Link

You might also like