0% found this document useful (0 votes)

8 views

Python for Machine Learning Visualization 1735231185

Uploaded by

karlTronxo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views

Python for Machine Learning Visualization 1735231185

Uploaded by

karlTronxo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 69

Machine Learning Data Visualization:

Prepared By: Syed Afroz Ali (Kaggle Grand Master)

From Basics to Advanced

All Types of Data Visualization Kaggle Note Book
Python for Machine Learning Visualization Part 01:
Machine learning data visualization uses visual techniques like charts and graphs to explore,
understand, and communicate patterns, trends, and insights within the data used to train and
https://www.kaggle.com/code/pythonafroz/python-for-machine-learning-visualization-part-01
evaluate machine learning models.
Python for Machine Learning Visualization Part 02:

Kaggle Code: https://www.kaggle.com/code/pythonafroz/python-for-machine-learning-visualization-part-01

https://www.kaggle.com/code/pythonafroz/python-for-machine-learning-visualization-part-03
import pandas as pd
import numpy as np
import seaborn as sns

import plotly.express as px
import plotly.graph_objects as go
import plotly.figure_factory as ff
from plotly.subplots import make_subplots

pd.set_option('display.precision', 2)

# Load the dataset

df = pd.read_csv("heart_disease_uci.csv")
df = df.dropna()
df.head(2)

id age sex dataset cp trestbps chol fbs restecg thalch exang oldpeak slope ca thal num

lv fixed
0 1 63 Male Cleveland typical angina 145.0 233.0 True 150.0 False 2.3 downsloping 0.0 0
hypertrophy defect

lv
1 2 67 Male Cleveland asymptomatic 160.0 286.0 False 108.0 True 1.5 flat 3.0 normal 2
hypertrophy

print(f"Records: {df.shape[0]}")
print(f"Columns: {df.shape[1]}")

Records: 299
Columns: 16

top_leagues = df['cp'].value_counts().nlargest(4).index
display(top_leagues)

plt.figure(figsize=(15, 6))
sns.scatterplot(x='age', y='chol', data=df[df['cp'].isin(top_leagues)], hue='cp')
plt.title('Age vs. Cholesterol for Top 4 Chest Pain')
plt.xlabel('Age')
plt.ylabel('Cholesterol')
plt.legend(title='Chest Pain Type', bbox_to_anchor=(1.05, 1), loc='upper left')
plt.show()

Index(['asymptomatic', 'non-anginal', 'atypical angina', 'typical angina'], dtype='object', name='cp')

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='sex')

fig.update_layout(width=1000, height=500)
fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by Sex)')

fig.show()

Scatter Plot of Cholesterol vs. Age (colored by Sex)

60
age

100 200 300 400 500

chol

from plotly.offline import iplot

fig = px.box(x = df["age"],

labels={"x":"Age"},
title="5-Number-Summary(Box Plot) of Age")
iplot(fig)

5-Number-Summary(Box Plot) of Age

30 40 50 60 70

Age

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='cp', size = 'oldpeak', size_max = 30, hover_name = 'exang')

fig.update_layout(width=1000, height=500)

fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by cp)')

fig.show()

Scatter Plot of Cholesterol vs. Age (colored by cp)

80
cp

60
age

100 200 300 400 500

chol
Python Code Link: https://t.me/AIMLDeepThaught/573

fig = px.scatter(data_frame = df,

x="age",
y="chol",
color="cp",
size='ca',
hover_data=['oldpeak'])

fig.update_layout(title_text=" Cholesterol Vs Age ",

titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=500,
)

fig.show()

Cholesterol Vs Age

500

400
chol

300

200

100

30 40 50 60 70 80

age

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='cp', size = 'oldpeak', size_max = 30, hover_name = 'exang',facet_col
fig.update_layout(width=1000, height=500)

fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by cp)')

fig.show()

Scatter Plot of Cholesterol vs. Age (colored by cp)

cp=typical angina cp=asymptomatic cp=non-anginal cp=atypical angina
80
cp

60
age

30
100

200

300
400
500
600
700

100

200

300
400
500
600
700

100

200

300
400
500
600
700

100

200

300
400
500
600
700
chol chol chol chol

hover_name='exang' means that the values in the 'exang' column will be shown as tooltips when you hover over the data points
on the scatter plot. This is useful for providing additional information about each data point without cluttering the plot with
labels.

fig=px.bar(df,x='age',y='chol',hover_data=['oldpeak'],color='sex',height=400)
fig.show()

sex
Male
4000
Female

3000
chol

2000

1000

0
30 40 50 60 70

age

def generate_rating_df(df):
rating_df = df.groupby(['cp', 'slope']).agg({'id': 'count'}).reset_index()
rating_df = rating_df[rating_df['id'] != 0]
rating_df.columns = ['cp', 'slope', 'counts']
rating_df = rating_df.sort_values('slope')
return rating_df

rating_df = generate_rating_df(df)
fig = px.bar(rating_df, x='cp', y='counts', color='slope')

fig.update_traces(textposition='auto',
textfont_size=20)

fig.update_layout(barmode='stack')
fig.show()

slope
140 downsloping
flat
upsloping
120

100
counts

0
asymptomatic atypical angina non-anginal typical angina

rating_df = generate_rating_df(df)
fig = px.bar(rating_df, x='cp', y='counts', color='slope')

fig.update_traces(textposition='auto',
textfont_size=20)

fig.update_layout(barmode='group')

fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573

slope
80 downsloping
flat
upsloping
70

50
counts

0
asymptomatic atypical angina non-anginal typical angina

import plotly.express as px
def generate_rating_df(df):
rating_df = df.groupby(['cp', 'slope']).agg({'id': 'count'}).reset_index()
rating_df = rating_df[rating_df['id'] != 0]
rating_df.columns = ['cp', 'slope', 'counts']
rating_df = rating_df.sort_values('slope')
return rating_df

rating_df = generate_rating_df(df)

fig = px.bar(rating_df, x='cp', y='counts', color='slope', barmode='group',

text='counts',
)

fig.update_traces(textposition='auto',
textfont_size=20)

fig.show()

slope
80 84 downsloping
flat
upsloping
70

50
counts

49
40 45
36
30
33
20

10
11 5
2 11 3 11 9
0
asymptomatic atypical angina non-anginal typical angina

# Calculate percentages
total_counts = rating_df['counts'].sum()
rating_df['percentage'] = rating_df['counts'] / total_counts * 100

return rating_df

rating_df = generate_rating_df(df)

fig = px.bar(rating_df, x='cp', y='counts', color='slope', text='percentage')

fig.update_traces(
texttemplate='%{text:.1f}%',
textposition='outside',
textfont_size=16
)

fig.update_layout(
barmode='group',
yaxis_title='Count',
xaxis_title='CP',
legend_title='Slope'
)

fig.update_layout(
height=550,
width=1000,
title_text="Distribution of Chest Pain Type by Percentage",
title_font_size=24
)
fig.show()

Distribution of Chest Pain Type by Percentage

28.1% Slope
80

50
16.4%
15.1%
Count

40
12.0%
11.0%
30

3.7% 3.7% 3.7%

10
3.0%
1.7%
0.7% 1.0%
0
asymptomatic atypical angina non-anginal typical angina

fig = px.scatter(data_frame = df,

x="age",
y="chol",
color="cp",
size='ca',
hover_data=['oldpeak'],
marginal_x="histogram",
marginal_y="box",)

fig.update_layout(title_text=" Age vs Cholesterol ",

titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=550,
)

fig.show()

Python Code Link

Age vs Cholesterol

500

400
chol

300

200

100

30 40 50 60 70 80

age

fig = px.scatter(data_frame = df,

x="age",
y="chol",
color="thalch",
size='ca',
hover_data=['oldpeak'],
marginal_x="histogram",
marginal_y="box")

fig.update_layout(title_text=" Age vs Cholesterol ",

titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=500,
)

fig.show()

Age vs Cholesterol

500

400
chol

300

200

100

30 40 50 60 70 80

age

fig = px.scatter(data_frame = df,

x="age",
y="chol",
size ="ca",
size_max=30,
color= "sex",
trendline="ols")
fig.update_layout(title_text=" Age vs Cholesterol ",
titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=500,
)

fig.show()

Age vs Cholesterol

500

400
chol

300

200

100

30 40 50 60 70 80

age

fig = px.scatter(data_frame = df,

x="age",
y="chol",
size ="ca",
size_max=30,
color= "sex",
trendline="ols",
trendline_scope="overall",
trendline_color_override="black")

fig.update_layout(title_text="Chest Pain vs Gender",

titlefont={'size': 24, 'family':'Serif'},
width=1000,
height=550,
)
fig.show()
Chest Pain vs Gender

sex

500

400
chol

300

200

100

30 40 50 60 70 80

age

fig= px.histogram(df, x='age',height=500,width=900,template='simple_white',

color='sex', # adding categorical column
color_discrete_sequence=['purple','pink'])

fig.update_layout(title={'text':'Histogram of Persons by Age','font':{'size':25}}

,title_font_family="Times New Roman",
title_font_color="darkgrey",

title_x=0.2)

fig.update_layout(
font_family='classic-roman',
font_color= 'grey',
yaxis_title={'text': " count", 'font': {'size':18}},
xaxis_title={'text': " Age", 'font': {'size':18}}
)
fig.show()
Histogram of Persons by Age
sex
Male
30
Female

20
count

0
30 40 50 60 70

Age

import plotly.graph_objects as go
from plotly.subplots import make_subplots

# Assuming df is your DataFrame

asymptomatic = df[df['cp'] == 'asymptomatic']
non_anginal = df[df['cp'] == 'non-anginal']
atypical_angina = df[df['cp'] == 'atypical angina']
typical_angina = df[df['cp'] == 'typical angina']

fig = make_subplots(rows=2,
cols=2,
specs=[[{'type':'domain'}, {'type':'domain'}],
[{'type':'domain'}, {'type':'domain'}]],
subplot_titles=("Asymptomatic", "Non-Anginal",
"Atypical Angina", "Typical Angina"))

fig.add_trace(go.Pie(labels=asymptomatic["thal"], values=asymptomatic["chol"], name="asymptomatic"), 1, 1)

fig.add_trace(go.Pie(labels=non_anginal["thal"], values=non_anginal["chol"], name="non_anginal"), 1, 2)
fig.add_trace(go.Pie(labels=atypical_angina["thal"], values=atypical_angina["chol"], name="atypical_angina"), 2,
fig.add_trace(go.Pie(labels=typical_angina["thal"], values=typical_angina["chol"], name="typical_angina"), 2, 2)

# Update layout to increase the size of the plot and add main title
fig.update_layout(
height=800,
width=1000,
title_text="Distribution of Cholesterol Levels by Chest Pain Type",
title_font_size=24
)

# Update traces
fig.update_traces(textposition='inside', textfont_size=16)
fig.update_annotations(font_size=20)
fig.show()
Python Code Link: https://t.me/AIMLDeepThaught/573
Distribution of Cholesterol Levels by Chest Pain Type
Asymptomatic Non-Anginal

27.5%
38.1%
54.3%
%
2.33
70.2%

3%
7.6

Atypical Angina Typical Angina

17%
3.7
4%
36.1%

56.2%

79.3%

%
68
7.

import plotly.express as px

fig = px.scatter(df, x='chol', y='age', color='cp', size = 'oldpeak', size_max = 30, hover_name = 'exang', range_x
labels = dict(oldpeak = 'oldpeak', chol = 'Cholestrol', age = "Age" ), animation_frame = "chol",

fig.update_layout(width=1000, height=600)

fig.update_layout(title_text='Scatter Plot of Cholesterol vs. Age (colored by cp) with Animation')

fig.show()
Scatter Plot of Cholesterol vs. Age (colored by cp) with Animation

100
cp

60
Age

0
100 200 300 400 500 600 700 800

Cholestrol

Cholestrol=233.0
▶ ◼

233.0 192.0 283.0 335.0 175.0 216.0 248.0 325.0 182.0 217.0 240.0 277.0 196.0 210.0 319.0 241.0

from plotly.offline import iplot

gender = df["sex"].value_counts()
display(gender.head().to_frame())

fig = px.bar(data_frame=gender,
x = gender.index,
y = gender,
color=gender.index,
text_auto="0.3s",
labels={"y": "Frequency", "index": "Gender"}

)
fig.update_traces(textfont_size=24)

iplot(fig) Python Code Link: https://t.me/AIMLDeepThaught/573

count

sex

Male 203

Female 96
sex
200
203 Male
Female

150
Frequency

100

96.0

0
Male Female

sex

from plotly.offline import iplot

category = df["cp"].value_counts()

fig = px.bar(category,
x = category.index,
y = (category / sum(category)) * 100,
color=category.index,
labels={"y" : "Frequency in (Percentage%)", "category":"Category"},
title="Frequency of Chest Pain Category in Percentage",
text = category.apply(lambda x: f'{(x / sum(category)) * 100:.1f}%'),
template="plotly_dark"
)

fig.update_layout(showlegend=False)
fig.update_traces(
textfont= {
"family": "consolas",
"size": 20,
}
)

iplot(fig)
Frequency of Chest Pain Category in Percentage

48.2%
40
Frequency in (Percentage%)

27.8%
20

16.4%
10

7.7%
0
asymptomatic non-anginal atypical angina typical angina

from plotly.offline import iplot

ChestPain = df["cp"].value_counts()

fig = px.pie(values=ChestPain, names = ChestPain.index,

color_discrete_sequence= ["#98EECC", "#FFB6D9", "#99DBF5"],
template="plotly_dark"
)

fig.update_traces(textposition='inside', textfont_size= 20, textinfo='percent+label')

fig.update_layout(showlegend=True,width=1000, height=600)

iplot(fig)

non-anginal
27.8%

asymptomatic
48.2%
a
gin
n
ala
pic .4%
y 16
ngina

at
7.69%
typical a

cp = df["cp"].value_counts()
fig = px.bar(cp,
y = cp.index,
x = (cp / sum(cp)) * 100,
color=cp.index,
labels={"x" : "Frequency in Percentage(%)", "cp":"Chest Pain"},
orientation="h",
title="Frequency of Chest Pain",
text = cp.apply(lambda x: f'{(x / sum(cp)) * 100:.1f}%'),
)

fig.update_layout(showlegend=True,width=1000, height=600)

fig.update_traces(
textfont= {
"family": "consolas",
"size": 20
}
)

iplot(fig) Python Code Link: https://t.me/AIMLDeepThaught/573

Frequency of Chest Pain

Chest Pai

asymptomatic 48.2%

non-anginal 27.8%
Chest Pain

atypical angina 16.4%

typical angina 7.7%

0 10 20 30 40 50

Frequency in Percentage(%)

fig=px.pie(df.groupby('cp',as_index=False)['sex'].count().sort_values(by='sex',ascending=False).reset_index(drop
names='cp',values='sex',color='sex',color_discrete_sequence=px.colors.sequential.Plasma_r,
labels={'cp':'Chest Pain','Sex':'Count'}, template='seaborn',hole=0.4)

fig.update_layout(autosize=False, width=1200, height=700,legend=dict(orientation='v', yanchor='bottom',y=0.40,xanchor

title_x=0.5, showlegend=True)

fig.update_traces(
textfont= {
"family": "consolas",
"size": 20
}
)

fig.show()
Chest Pain

27.8%

48.2%

16.4%

7.69%

import plotly.express as px
from plotly.offline import iplot
import plotly.graph_objects as go
from plotly.subplots import make_subplots

fig = make_subplots(1,2,subplot_titles=('Age Distribution','Log Age Distribution'))

fig.append_trace(go.Histogram(x=df['age'],
name='Age Distribution') ,1,1)

fig.append_trace(go.Histogram(x=np.log10(df['age']),
name='Log Age Distribution') ,1,2)

iplot(dict(data=fig))
Age Distribution Log Age Distribution
45 Age Distribution
30 Log Age Distribution
40

25 35

30
20
25

15 20

15
10

10
5
5

0 0
40 60 1.5 1.6 1.7 1.8 1.9

import numpy as np
import plotly.graph_objs as go
from plotly.offline import iplot

# Calculate quartiles and IQR

Q25 = np.quantile(df['chol'], q=0.25)
Q75 = np.quantile(df['chol'], q=0.75)
IQR = Q75 - Q25
cut_off = IQR * 1.5

# Print number of outliers

print('Number of Cholesterol Lower Outliers:', df[df['chol'] <= (Q25 - cut_off)]['chol'].count())
print('Number of Cholesterol Upper Outliers:', df[df['chol'] >= (Q75 + cut_off)]['chol'].count())

# Group by 'cp' and sort by 'age'

temp = df.groupby('cp').sum().sort_values('age', ascending=False)

# Create bar data

data = [
go.Bar(x=temp.index, y=temp['age'], name='Age', text=temp['age'], textposition='auto'),
go.Bar(x=temp.index, y=temp['chol'], name='Cholesterol', text=temp['chol'], textposition='auto')
]

# Define layout
layout = go.Layout(
xaxis=dict(title='Chest Pain', titlefont=dict(size=25)),
yaxis=dict(title='Values', titlefont=dict(size=25)),
showlegend=True,
width=1300,
height=600
)

# Create figure and plot

fig = go.Figure(data=data, layout=layout)
iplot(fig)

Number of Cholesterol Lower Outliers: 1

Number of Cholesterol Upper Outliers: 5
35k 35949

30k

25k
Values

20k
20367

15k

12019
10k

8032
5k
4475
2510
0
asymptomatic non-anginal atypical angina

Chest Pain

import plotly.graph_objs as go
from plotly.offline import iplot

# Assuming df is your DataFrame

top_03_cp = df.groupby('cp').sum()['age'].sort_values(ascending=False)[0:3]
top_03_AGE = df.groupby(by='cp').sum().sort_values(by='age', ascending=False)[0:3]['chol']

data = [
go.Bar(
x=top_03_cp.index,
y=top_03_cp,
name='Top 3 age',
text=top_03_cp,
textposition='auto'
),
go.Bar(
x=top_03_AGE.index,
y=top_03_AGE,
name='Top 3 cholesterol',
text=top_03_AGE,
textposition='auto'
)
]

layout = go.Layout(
title="Grouped Bar Plot For Age and Cholesterol (For The Top Three types of Chest Pain)",
barmode='group'
)

iplot(dict(data=data, layout=layout))
Grouped Bar Plot For Age and Cholesterol
(For The Top Three types of Chest Pain)

Top 3 age
35k 35949 Top 3 cholesterol

30k

25k

20k
20367

15k

12019
10k

8032
5k
4475
2510
0
asymptomatic non-anginal atypical angina

gap_df = pd.read_csv("gapminder_full.csv")

display(gap_df.head(2))

fig = px.bar(data_frame=gap_df,
x="continent",
y="population",
color="continent",
animation_frame="year",
animation_group="country",
range_y=[0,4000000000])
fig.show()
Python Code Link: https://t.me/AIMLDeepThaught/573
country year population continent life_exp gdp_cap

0 Afghanistan 1952 8425333 Asia 28.80 779.45

1 Afghanistan 1957 9240934 Asia 30.33 820.85

4B
continent
3.5B Asia
Europe
Africa
3B
Americas
Oceania
2.5B
population

1.5B

0.5B

0
Asia Europe Africa Americas Oceania

continent

year=1952
▶ ◼

1952 1957 1962 1967 1972 1977 1982 1987 1992 1997 2002 2007

fig = px.scatter(gap_df,x='gdp_cap',y='life_exp',color='continent',size='population',size_max=60,hover_name="country"
animation_frame="year",animation_group='country',log_x=True,range_x=[100,100000],range_y=[25,90],
labels=dict(Population ="Populations",gdp_cap="Gdp Per Capital",life_exp="Life Expentacy"))

fig.update_layout(
height=550,
width=1500,
title_text="Distribution of GDP Cap Vs Life Expentacy",
title_font_size=24
)

fig.show()

Distribution of GDP Cap Vs Life Expentacy

70
Life Expentacy

2 3 4 5 6 7 8 9 2 3 4 5 6 7
100 1000

Gdp Per Capital

year=1952
▶ ◼

1952 1957 1962 1967 1972 1977 1982

#Grouping the data by state

df1 = df[['cp','age','chol','num']]
df1.groupby('cp').sum().head(10).style.background_gradient(cmap='Blues')

age chol num

asymptomatic 8032 35949.000000 225

atypical angina 2510 12019.000000 14

non-anginal 4475 20367.000000 33

typical angina 1285 5454.000000 11

import pandas as pd
import plotly.express as px

grouped_df = df.groupby(['cp', 'thal']).size().reset_index(name='count')

fig = px.bar(grouped_df,
y="cp",
x='count',
color='thal',
title='Count of Passengers by cp and thal',
labels={'count': 'Number of Patients'},
text_auto=True)
fig.show()
Count of Passengers by cp and thal

thal
fixed defect
typical angina 2 13 8
normal
reversable defect

non-anginal 2 59 22
cp

atypical angina 2 39 8

asymptomatic 12 53 79

0 50 100 150

Number of Patients

# color palette for visualizations

import matplotlib.pyplot as plt

colors = ['#2B2E4A', '#E84545', '#903749', '#53354A',]

palette = sns.color_palette( palette = colors)

sns.palplot(palette, size = 2.5)

plt.text(-0.5,
-0.7,
'Color Palette',
{'font':'monospace',
'size': 24,
'weight':'normal'}
)

plt.show()

def format_title(title, subtitle=None, subtitle_font=None, subtitle_font_size=None):

title = f'{title}'
if not subtitle:
return title
subtitle = f'{subtitle}'
return f'{title} {subtitle}'

import plotly.figure_factory as ff

_ = df.groupby(['cp', 'thal']).chol.size().unstack()
z = _.values.tolist()
x = _.columns.tolist()
y = _.index.tolist()

fig = ff.create_annotated_heatmap(z = z,
x = x,
y = y,
xgap = 3,
ygap = 3,
colorscale = ['#53354A', '#E84545']
)

title = format_title('cp',
'thal.',
'Chol',
12
)

fig.update_layout(title_text = title,
title_x = 0.5,
titlefont={'size': 24,
'family': 'Proxima Nova',
},
template='plotly_dark',
paper_bgcolor='#2B2E4A',
plot_bgcolor='#2B2E4A',

xaxis = {'side': 'bottom'},

xaxis_showgrid = False,
yaxis_showgrid = False,
yaxis_autorange = 'reversed',
)

fig.show()

cp
thal.

asymptomatic 12 53 79

atypical angina 2 39 8

non-anginal 2 59 22

typical angina 2 13 8

fixed defect normal reversable defect

# available templates
template = ['ggplot2','plotly_dark', 'seaborn', 'simple_white', 'plotly']

fig = px.histogram(df,
x="cp",
y=None,
color="sex",
width=1200,
height=450,
histnorm='percent',
color_discrete_map={
"male": "RebeccaPurple", "female": "lightsalmon"
},
template="plotly_dark"
)

fig.update_layout(title="Gender Chest Pain",

font_family="San Serif",
bargap=0.2,
barmode='group',
titlefont={'size': 24},
legend=dict(
orientation="v", y=1, yanchor="top", x=1.25, xanchor="right")
)
fig.show()
Gender Chest Pain

percent 40

0
typical angina asymptomatic non-anginal atypical angina

from plotly.subplots import make_subplots

# data students performance

fig = make_subplots(rows=1, cols=2,
specs=[[{'type':'domain'}, {'type':'domain'}],
])
fig.add_trace(
go.Pie(
labels=df['cp'],
title="Chest Pain",
titlefont={'size':20, 'family': 'Serif',},
values=None,
hole=0.85,
), col=1, row=1,
)
fig.update_traces(
hoverinfo='label+value',
textinfo='label+percent',
textfont_size=12,
)

fig.add_trace(
go.Pie(
labels=df['cp'],
title="Chest Pain",
titlefont={'size':20, 'family': 'Serif',},
values=None,
hole=0.5,
), col=2, row=1,
)
fig.update_traces(
hoverinfo='label+value',
textinfo='label+percent',
textfont_size=12,
)
fig.layout.update(title=" Heart Disesse ",
titlefont={'size':20, 'family': 'Serif',},
showlegend=False,
height=600,
width=1000,
template=None,
)

fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573

Heart Disesse

non-anginal
27.8%
non-anginal
27.8%

asymptomatic asymptomatic
Chest Pain 48.2%
Chest Pain 48.2%

at
yp 16
ic .4%
al
an

ngina
gi
atypical angina

7.69%
16.4%

typical a
typical angina
7.69%

from plotly.subplots import make_subplots

# data titanic
fig = make_subplots(rows=1, cols=2,
specs=[[{'type':'domain'}, {'type':'domain'}],
])
fig.add_trace(
go.Pie(
labels=df['cp'],
values=None,
hole=.4,
title='Chest Pain',
titlefont={'color':None, 'size': 24},

),
row=1,col=1
)
fig.update_traces(
hoverinfo='label+value',
textinfo='label+percent',
textfont_size=12,
marker=dict(
colors=['lightgray', 'lightseagreen'],
line=dict(color='#000000',
width=2)
)
)

fig.add_trace(
go.Pie(
labels=df['sex'],
values=None,
hole=.4,
title='Sex',
titlefont={'color':None, 'size': 24},
),
row=1,col=2
)
fig.update_traces(
hoverinfo='label+value',
textinfo='label+percent',
textfont_size=16,
marker=dict(
colors=['lightgray', 'lightseagreen'],
line=dict(color='#000000',
width=2)
)
)
fig.layout.update(title=" Heart Desies ",
titlefont={'color':None, 'size': 24, 'family': 'San-Serif'},
showlegend=False,
height=600,
width=950,
)
fig.show()

Heart Desies

non-anginal
27.8% Female
32.1%

asymptomatic
48.2%
Chest Pain Sex
at

Male
yp 16

67.9%
ic .4%
al
an
gi
na

typical angina
7.69%

# data students performance

fig = px.sunburst(df,
path=['cp', 'sex'])
fig.update_layout(title_text="Chest Pain vs Gender",
titlefont={'size': 24, 'family':'Serif'},
width=750,
height=750,
)
fig.show()
Chest Pain vs Gender

Male

Female

asymptomatic

typic Female
al an
gina

non-anginal
Male
Male atypical angina

Female

Female Male

fig = px.histogram(df, x="cp",

width=600,
height=400,
histnorm='percent',
category_orders={
"cp": ["asymptomatic", "non-anginal", "atypical angina", "typical angina"],
"sex": ["Male", "Female"]
},
color_discrete_map={
"Male": "RebeccaPurple", "Female": "lightsalmon",
},
template="simple_white"
)

fig.update_layout(title="Chest Pain Type",

font_family="San Serif",
titlefont={'size': 20},
legend=dict(
orientation="v", y=1, yanchor="top", x=1.0, xanchor="right" )
).update_xaxes(categoryorder='total descending')
# custom color
colors = ['gray',] * 4
colors[3] = 'crimson'
colors[0] = 'lightseagreen'

fig.update_traces(marker_color=colors, marker_line_color=None,
marker_line_width=2.5, opacity=None)
fig.show()
Chest Pain Type
50

30
percent

0
asymptomatic non-anginal atypical angina typical angina

fig = px.histogram(df, x="cp",

width=600,
height=500,
histnorm='percent',
template="simple_white",
)
fig.update_layout(title="Types of Chest Pain",
font_family="San Serif",
titlefont={'size': 20},
showlegend=True,
legend=dict(
orientation="v",
y=1.0,
yanchor="top",
x=1.0,
xanchor="right"
)
)
fig.update_traces(marker_color=None, marker_line_color='white',
marker_line_width=1.5, opacity=0.99)
fig.show()

Types of Chest Pain

30
percent

0
typical angina asymptomatic non-anginal atypical angina

colors = ['rgba(38, 24, 74, 0.8)', 'rgba(71, 58, 131, 0.8)',

'rgba(122, 120, 168, 0.8)', 'rgba(164, 163, 204, 0.85)',
'rgba(190, 192, 213, 1)']

data = df[['sex']]

fig = px.histogram(df,
y="sex",
orientation='h',
width=800,
height=350,
histnorm='percent',
template="plotly_dark"
)
fig.update_layout(title="Heart Disease",
font_family="San Serif",
bargap=0.2,
barmode='group',
titlefont={'size': 28},
paper_bgcolor='lightgray',
plot_bgcolor='lightgray',
legend=dict(
orientation="v",
y=1,
yanchor="top",
x=1.250,
xanchor="right",)
)
annotations = []
annotations.append(dict(xref='paper', yref='paper',
x=0.0, y=1.2,
text='Heart Disease',
font=dict(family='Arial', size=16, color=colors[2]),
showarrow=False))
annotations.append(dict(xref='paper', yref='paper',
x=0.50, y=0.85,
text='30.4%',
font=dict(family='Arial', size=20, color=colors[2]),
showarrow=False))
annotations.append(dict(xref='paper', yref='paper',
x=1.08, y=0.19,
text='69.6%',
font=dict(family='Arial', size=20, color=colors[2]),
showarrow=False))

fig.update_layout(
autosize=False,
width=800,
height=350,
margin=dict(
l=50,
r=50,
b=50,
t=120,
),
)

fig.update_layout(annotations=annotations)
fig.update_xaxes(showgrid=False)
fig.update_yaxes(showgrid=False)
fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573

Heart Disease
Heart Disease

Female 30.4%
sex

Male 69.6%

0 10 20 30 40 50 60 70

percent

# Plotting the pie chart

plt.figure(figsize=(20, 5))

# Pie chart
plt.subplot(1, 2, 1)
quality_counts = df['cp'].value_counts()
plt.pie(quality_counts, labels=quality_counts.index, colors=sns.color_palette('PuBuGn', len(quality_counts)), autopct
plt.title('Chest Pain Distribution')

# Count plot
plt.subplot(1, 2, 2)
ax = sns.countplot(data=df, x='cp',palette='PuBuGn')
# Add count values above each bar
for i in range(len(ax.containers)):
ax.bar_label(ax.containers[i], label_type='edge')

plt.title('Chest Pain Distribution')

plt.xlabel('Chest Pain')
plt.ylabel('Count')
plt.tight_layout()
plt.show()

plt.figure(figsize=(20, 5))

for i, col in enumerate(['age', 'chol', 'oldpeak'], 1):

plt.subplot(1, 3, i)
ax = sns.barplot(x='sex', y=col, data=df)
plt.title(f'{col} Comparison')
plt.ylabel(col if i == 1 else '')

# Add count values above each bar

for i in range(len(ax.containers)):
ax.bar_label(ax.containers[i], label_type='edge')

plt.show()

sns.pairplot(df[['cp','age','chol','thalch']], hue='cp', aspect=1.5,dropna=True,palette='bright')

plt.show()
# Group by quality and calculate the mean for each quality
grouped_mean = df[['cp','age','trestbps','chol','thalch']].groupby('cp').mean().round(2)

plt.figure(figsize=(20, 6))

# Plot the grouped bars using Seaborn's barplot

ax = sns.barplot(data=grouped_mean.reset_index().melt(id_vars='cp'),
x='variable', y='value', hue='cp', palette='CMRmap', alpha=0.8)

# Add count values above each bar

for i in range(len(ax.containers)):
ax.bar_label(ax.containers[i], label_type='edge')

plt.xlabel('Features')
plt.ylabel('Mean Value')
plt.title('Grouped Barplot by Chest Pain type')

# Rotate x-axis labels

plt.xticks(rotation=45, ha='right')

plt.legend(title='Chest Pain')
plt.show()

# Visualization 8: Violin Plot - Skill Moves Distribution

plt.figure(figsize=(12, 6))
sns.violinplot(x='cp', y='chol', data=df)
plt.title('Distribution of Chest Pain with Cholesterol ')
plt.xlabel('Chest Pain Type')
plt.ylabel('Cholesterol ')
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np

# Assuming 'df' is your DataFrame

cp_attributes_comparison = df.loc[df['cp'].isin(['asymptomatic', 'non-anginal', 'atypical angina','typical angina'
attributes_to_compare = ['age', 'trestbps', 'chol', 'thalch', 'oldpeak', 'ca']

fig, ax = plt.subplots(figsize=(10, 10), subplot_kw=dict(polar=True))

for cp in cp_attributes_comparison['cp'].unique():
cp_data = cp_attributes_comparison.loc[cp_attributes_comparison['cp'] == cp]

# Calculate mean values for each attribute

values = cp_data[attributes_to_compare].mean().values.flatten().tolist()
values += values[:1] # Close the circle for radar plot

angles = [n / float(len(attributes_to_compare)) * 2 * np.pi for n in range(len(attributes_to_compare))]

angles += angles[:1]

ax.plot(angles, values, linewidth=2, linestyle='solid', label=cp)

ax.fill(angles, values, alpha=0.25)

# Set the labels

ax.set_xticks(angles[:-1])
ax.set_xticklabels(attributes_to_compare)

# Add legend
plt.legend(loc='upper right', bbox_to_anchor=(1.3, 1))

# Add title
plt.title('Chest Pain Attributes Comparison')

# Show the plot

plt.tight_layout()
plt.show()
jobs = pd.read_csv("jobstreet_all_job_dataset.csv")
jobs = jobs.sample(5000)
jobs = jobs.drop(columns=['job_id'], axis=1)
jobs = jobs.reset_index()
jobs = jobs.drop(columns=['index'], axis=1)
display(jobs.shape)
jobs.head(2)

(5000, 10)
job_title company descriptions location category subcategory role type salary listingDate

RM 3,000
AESD JOB –
MARKETING Marketing & Marketing marketing- Full 2024-03-
0 INTERNATIONAL DESCRIPTIONS\nWork Petaling RM 4,000
EXECUTIVE Communications Assistants/Coordinators executive time 21T08:08:18Z
(M) SDN. BHD. closely with the sales ... per
month

RM 2,500
Job –
E-Commerce JOBSGURU Administration & Client & Sales sales- Full 2024-05-
1 Description\nPerform Petaling RM 3,500
Sales Admin SDN. BHD. Office Support Administration administration time 24T12:59:40Z
CS activities by repl... per
month

import missingno as msno

# Create a figure with two subplots arranged in a 1x2 grid

fig, axes = plt.subplots(nrows=1, ncols=2, figsize=(20, 6))

# Plot the original DataFrame with missing values

msno.matrix(jobs, ax=axes[0])
axes[0].set_title("Original DataFrame with Missing Values",fontsize=24,color='Red')

# Drop rows with missing values and plot the resulting DataFrame
job = jobs.dropna()

msno.matrix(job, ax=axes[1])
axes[1].set_title("DataFrame after Dropping Missing Values",fontsize=24,color='Green')

plt.tight_layout()
plt.show()
import re

def clean_and_calculate_mean(salary):
try:
# Remove currency symbols, words, and extra characters
salary = salary.replace('RM', '').replace('MYR', '').replace('$', '').replace('per month', '').replace('p.m.'

# Handle ranges with different separators

if '–' in salary:
salary_range = salary.split('–')
elif '-' in salary:
salary_range = salary.split('-')
elif '—' in salary:
salary_range = salary.split('—')
else:
salary_range = [salary]

# Convert values to integers, handling potential errors

salary_values = []
for value in salary_range:
try:
value = int(float(value.replace(',', '').strip()))
salary_values.append(value)
except ValueError:
pass # Ignore non-numeric values

# Calculate mean if at least two valid values are found

if len(salary_values) >= 2:
salary_mean = sum(salary_values) / len(salary_values)
return salary_mean
else:
return None

except Exception as e:
print(f"Error processing salary '{salary}': {e}")
return None

# Apply the function to the salary column

job['Salary'] = job['salary'].apply(clean_and_calculate_mean)
job = job.drop('salary',axis=1)
job.head(2)

job_title company descriptions location category subcategory role type listingDate Salary

AESD JOB
MARKETING Marketing & Marketing marketing- Full 2024-03-
0 INTERNATIONAL DESCRIPTIONS\nWork Petaling 3500.0
EXECUTIVE Communications Assistants/Coordinators executive time 21T08:08:18Z
(M) SDN. BHD. closely with the sales ...

Job
E-Commerce JOBSGURU Administration & Client & Sales sales- Full 2024-05-
1 Description\nPerform Petaling 3000.0
Sales Admin SDN. BHD. Office Support Administration administration time 24T12:59:40Z
CS activities by repl...

import plotly.express as px
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)

jobType = job['type'].value_counts()

fig = px.pie(values=jobType.values, names=jobType.index.tolist(), color=jobType.index.tolist(),color_discrete_sequenc

fig.update_layout(width=1000, height=800)

fig.update_traces(textfont_size=20)

fig.update_traces(pull=[0.1, 0.1, 0.2], textposition='outside')

# Set layout properties

fig.update_layout(margin = dict(t=50, l=10, r=10, b=25),
title='Employment Types in the Job Market of Malaysia',
title_x=0.5,
title_y=0.98)

fig.show()

Employment Types in the Job Market of Malaysia

4.63%
0.408%

0.0454%

94.9%

top_n = 50

filtered_data = job['job_title'].value_counts().head(top_n).reset_index()
filtered_data.columns = ['job_title', 'count']

fig = px.treemap(filtered_data, path=[px.Constant('all'), 'job_title'], values='count')

fig.update_traces(root_color='lightgrey')

fig.update_traces(textfont_size=16)

fig.update_layout(width=1000, height=600)
fig.update_layout(margin=dict(t=50, l=25, r=25, b=25),
title='Top Job Openings: Job Roles in Malaysia',
title_x=0.5,
title_y=0.98)

fig.show()
Python Code Link: https://t.me/AIMLDeepThaught/573
Top Job Openings: Job Roles in Malaysia

all
Business Development Executive Senior Account Executive Account Assistant
Marketing Executive
Human Resource Executive

Account Executive Purchasing Executive

Senior Marketing Executive Administrative Assistant Customer Service Executive E-Commerce Executive
HR Executive
Finance Executive Project Engineer

Business Development Human Resources Executive

Personal Assistant HR cum Admin Executive
IT Executive
Accounts Assistant

Admin Assistant
Admin Executive ACCOUNT EXECUTIVE
Finance Manager Sales Admin Executive
Customer Service Mechanical Engineer SALES EXECUTIVE
Sales Admin

HR & Admin Executive PURCHASING ASSISTANT

Graphic Designer Sales Manager Software Engineer

Account Clerk

Sales Executive Accounts Executive Accountant

Project Manager HR Manager Production Engineer Senior Finance Executive

Audit Associate Business Development Manager

job.replace('Kuala Lumpur Sentral','Kuala Lumpur', inplace=True)

job.replace('Bangsar South', 'Bangsar', inplace=True)
job.replace('Klang District', 'Klang/Port Klang', inplace=True)
job.replace('Penang Island', 'Penang', inplace=True)
job['location'] = job['location'].str.replace(' District', '')

top_location = ['Kuala Lumpur', 'Petaling', 'Penang']

filtered_data = job[job['location'].isin(top_location)]

job_title_counts = filtered_data['job_title'].value_counts()

top_n = 10
top_job_titles = job_title_counts.head(top_n).index.tolist()

filtered_data = filtered_data[filtered_data['job_title'].isin(top_job_titles)]

fig = px.treemap(filtered_data, path=[px.Constant('all'), 'location', 'job_title'])

fig.update_traces(root_color='lightgrey')
fig.update_layout(width=1000, height=600)
fig.update_layout(margin=dict(t=50, l=25, r=25, b=25),
title='Top Job Opportunities in Kuala Lumpur, Petaling, and Penang',
title_x=0.5,
title_y=0.98)
fig.show()
Top Job Opportunities in Kuala Lumpur, Petaling, and Penang

all

Petaling Kuala Lumpur

Sales Executive Account Assistant Marketing Executive Account Executive Admin Assistant Finance Executive

HR Assistant Sales Executive

Customer Service Executive

Account Executive Customer Service Executive
Accounts Executive

Marketing Executive
Accounts Executive
Business Development Executive
Finance Executive Business Development Executiv

Admin Assistant
Penang
Business Development Executive
Account Assistant Admin Assistant
Account Executive Sales Executive
HR Assistant

# plot a sunburst chart

fig = px.sunburst(job, path=['location','category'])

# configurate the plot layout

fig.update_layout(
margin=dict(t=50, l=25, r=25, b=25),
width=900, # Set the width of the plot
height=800, # Set the height of the plot
title='Job Vacancies Available Across Malaysia by Category',
title_x=0.48,
title_y=0.98
)

fig.show()
Job Vacancies Available Across Malaysia by Category

Defencey
Governm & Technolog
Administr

perty

l
ent &
y & Tourism
g & Strategy

Lega
ent
Science

& Developm

n
ng
Trad Superann Media
& Pro
Hospitalit

uatio
Consultin

rtising n & Traini

es
Info

ity Services

ical
ervic

tion
Estate

s&
Commun

ts
Med
, Art
tio

c
Accounting

rma

En & Arch l Service s

Real

S
Educa

u
Sales

gin itect s
ruc

istic
rod
re &
es &

rin e
Adve
Ma

ur
nst

Log
hca
tion

g
ranc

Co
lt
ation & O

me
&
Insu
rke

Hea

ice
t
cia
& Co

por
En

Ba uring, onsu

ee
sig inan

rv
ing Trans
tin
Hu

gin

s
&C

Se
mm

n
m

g&
M

tio
n
ee
an

ail

er
an

unic
Ca

Ret

t
ica
uf
Re

en
rin

e
ffice Supp
Re

t
nk
ll

fac
Co

D
ac
t

ation
ai

itm
C

un
so

nu
l&

t
Ed on

st
ur
nt

Ma
s C

ur
Ba uca tru

m
Cu

u
in
o

re
ct nsu

cr
mu
ti

ce
g
De nk

Tech
sig ing on

m
i

, T om
& on m

Re
Sales

&
n &
Ad Rea
er

Co
He vert Esta &
Ar ina Tra
F

ra
l

nic
a isin
Pr

tre
ch nc in

nolo

&
Ho lth te

ns r S
ca g &
re , Art Pro
od
sp

i
Le
ite ia
ita

ct l S ng
ga

Re
l lit
Sc y & s
M & pert

ort
ienc &

s
uc
To

po er
Con

ism ed M
Spo sult Tra
e ur
ica ed y ur e
rt ing des
& & &
Rec & Stra
Te

ce
en
Ser

e rvic

ati
ia
reat
tegyvice
ch
l
ts
ion s
no

cr ogi

gy
logy

rt vic

an ting
es

ur
ui

M all C
&
po

so
tm tics
up

e
Re
s
Ac S

k
C
e
e

nt
co

ar
un fic
tin Of

um
Adm g n&
tio

H
inis
tra
tion i stra nolo
gy
Petaling n ech
Eng &O dmi icatio
nT
inee ffic A mm
un
ring e Sup &C
o
Man por tion
ufactu t Kuala Lumpur rma
Trans
ring, Info
Marketin port
g & Co
mmun & Log
Sales ications istics
Human Reso Johor Bahru Accounting
urces & Recr
uitment
Information & Comm
unication Techn
Real Estate & Property ology
Retail & Consumer Products
Construction
Call Centre & Customer Service Retail & Consumer Products
Banking & Financial Services
Construction
Administration & Office Support

Design & Architecture

Human Resources & Recruitment
Retail & Consumer Products
Healthcare & Medical
Manufacturing, Transport & Logistics

Education & Training

Administration & Office Support
Administration & Office Support
Human Resources & Recruitment
Administration & Office Support

Banking & Financial Services

Administration & Office
Information & Communication Support
Technology
Mining, Resources
Manufacturing, Transport & Energy
Healthcare & Medical Human Resources &
& Logistics
Accounting
Recruitment
Legal Wangsa Maju
Tangkak Accounting
Engineering
Taman Desa
Science & Technology Taman Connaught Accounting
& Media Engineering
Advertising, Arts Sri Petaling
CEO & General Management
& Conservation Sentul Accounting
Farming, Animals Sarawak Administration Construction

g
& Office Support
Hospitality & Tourism Muar

Engineerin
& Energy Engineering
Mining, Resources Maluri
Malaysia Engineering
Healthcare &

ng
Kudat Medical
Kota Bharu Administration Accounting

Penang
& Office Support
Jasin Retail & Consumer
Bandar Bentong Products
Sri Permaisur Marketing & Accounting
Taman Ampang i Communications
Tun Dr Information Accounting
Ismail & Communication
Technology

ba
Sandakan
Sales
Manufacturing, Accounting
Pontian Transport
Pahang Healthcare & Logistics
Mid Valley & Medical
Kampun City Accounting
g Malaysi Kedah

ting
Sales

Su
a Raya Human Resources Accounting
Sibu &
Division Education Recruitment
Sabak & Training

Accoun
Bernam Administratio Sales
Negeri Accounting
Sembil n & Office
Manufacturin Support
an Accounting
g, Transport

/
Kucha
i Lama & Logistics
Bukit
Daman Design Accounting
Manufacturi & Architecture
sara Accounting
Bukit ng, Transport
Bintan Healthcare & Logistics
g Retail &
Consumer& Medical

m
Mir Per Human
Resources Products
i Div ak
Marketing & Recruitmen
Call Centre & Communica
Manufactu t
isio & Customer
Human ring, Transport
tions

Ala
Sabah n
Service
Resources & Logistics
Insurance & Recruitme
& Superannu

logy
nt
Kuan Engineerin ation

Ko
g
tan
chno
Ba Sales

Sales
ndar ta Se
Advertisin Accountin
g, Arts g

n Te
Marketing & Media
tar Accountin

s
Ma & Communi g

gistic
CEO & Healthcar

icatio
Mo lay Manufact General
cations
e & Medical

nt sia Retail
uring,
TransporManagem

ah
mmun
& Consume ent
t & Logistics
Kia Manufact

t & Lo
r
uring, Products

ther O ra Transpor

& Co
t & Logistics
Farming Engineer

spor
, Animals Accountiing

i
g
ation Ku K
& Conserv ng

ala Buki ulim

Human
Sales
ation

Sh
Adminis

, Tran
Resourc

Inform
ra
tration

t
es &
& Office Recruitm

t Ja
ppor
Support ent

S
turing Office Su tment
Ku ela lil
Sales
or

Klan
Marketi Account

n
Retail

a
ng &
Commuing

Bin Kua la M gor

Informa & Consum

ac
tion er Product nication
& Commu

uf
s
nications

ng
&

Kuala Lum
tu la L
s
Admini Consult

Man ion crui u

stration ing Techno
Hospita
an da
Pe
& Strateg logy

KL
& Office

istic
lu
& Re ns
y

istrat
AccounSuppor
lity & Accoun ting t
ng

s M D g Tourism ting
in ce io Ec ivisio at
Adm ela
Retail

Log
sour icat
& Consum

o
Inform Sales

gi/Serda
er
ationMarket EngineProduc

an Re mun C n
Admin

M
Constr ering ts

ka
& Comm ing
istratio & Comm

Ku
uction

ity
n& unicati unicati

Com Service T ela

Office on Technoons

rt &
Suppo

Hum
Manuf Retail logy
la

Inform rt
ationacturin & Consu

g& Ke eng ka

ch
& Comm g, Transp mer Sales
ng

unicat ort & Produc

er
Call

ketin Custom
spo
Centre Bankin Hospit ion Logist ts
& Custo g & Financ ality Techn ics
& Tourisology

in
ah
mer ial Servic m

Mar ntre & inin

rt
Manuf

p
acturi Real Servic
Se

Huma es
ng, Estate e

ran
on
n ResouTrans

g
&
ort
rces port &Property
g

po
& RecruLogist

Ce Engin itmenics

Call cation &

Banki
Tra

D
Admin Const eering t

g
ng

,T
istrat & FinanEngin ructio
es ion cial eeringn
& Office

iv
Servic

Hulu Langat
AccouServic
ra

Se
cial Suppo
Edu nting es

ing
& Finan rt
ng Admi

i
Banki nistra Engin

si
n re Const eering

Go er
Manution &

eS
ructio
Sales

Kot
ructio tectu

r
Const Archi factu Office n

pa
n& al Admi

pur City Ce
Huma ring, Supp

on
cts

u
Desig & Medic Produ nistra n ResouTrans ort

t
hcare tion
umer & Office rces port

Se
Healt

fac
Cons & Logis
& Recru
l& ology
g/P

c
Supp

m
tics
be

Retai itmen

Ch ban
i
& Techn

ng
ort
ce Media ation t

f
Accou
ScienArts &rannu

g
/Ban
Bank Trade nting

u f
g, ing s&
rtisin & Supe

ba s
& FinanEngin Servi
Accou

re
n
tin & O
Adve ance

aK
Insur l cial eerinces

Ma
Lega Servi g
Manu
Reta
factu il
nting ces
ring,& Cons
Infor Call

m
Tran umer Sales

k
mati Cent

n ion
sport Prod
on re
& Com & CustEngin& Logisucts
Se

Ban
omer eerin tics

cou
mun

a
Adm Hosp icatioAcco Servg

K
inistr italit n untin ice
y Tech g

ina
ation
Desi Engi& Tour nolog
& Offic
Man gn neer ism y

Ku
ufac Minin & Arch Accoe Supp ing
Klan

int
t
turin g, itect untin ort
g, Reso
nu

Ac nistra
Tran urceure g
Kajang

factu Hosp
spor
italit t s & Ener

Johor
&
y & Logis gy
rin Ac Engi Tour tics
g, coun neer

t
Hum ism

balu
ing
Tran
gy
Adman Scie
tin

a
or
inistReso nce

gsar
g

lai
sp ratiource &
ort
Tech
n &s &

olo
Recr nolo
Offic
& Lo

mi
t
Info Engi uitm gy
e Supp
neer ent

en chn
pp
rma

gy
Ma tion gis ing ort
nu Mar & Com
tic
ntre

keti Trad
fact s
m
ng mun es
Edu icati & Serv

uit Te
&

Ad
ur Com

lo
catio on Saleices

Su
ing, Info mun
Tech s
icati n & Trai

cr ion
rma nolo
tion ons
Tr Scie Acco ning gy

les g no
an Adm& Com

Re
nce unti
inist mun
sp
t
&
Techng

s
ratio
or
& nica ion
icati nolo
n & on
t&

e
Sale gy
Offic Tech
s

ch
Man Acco
Lo e nolo

Sa erin
ufac unti Sup

s
fic
t
turi Scie gi ng port gy
Hu stics
ce u a En
ng, nce
m Tran

ur mm unic
gin
&

e
an spo Tech

f
rt nolo
Re Acc &

ee Sales
so
ounLog gy

T
Info Man

e so tingistic

o
rma ufacMar

rin
O
s
ur tion

gin Re & C mm
turiketi
ce Adm & ng,ng

g
Com

n
&
s& inis Tran Com

En an
tratmun spo mun

Co
ion icat rt
Re

& io
& ion & icat

n cr AccounOfficeTec Log ions

istic
ui
tio & ia
Suphno

at
m
logys
Scie tm ting por
Hum Ret

Hu rma n
ail nce
en t

g ed
an &
Mar Res Con &
Tec t

io ic
Info Adm sum
our hno

in & M ke rma inis ces

et ts
er Sale logy
tion Pro s
tin Adm trat &

fo
at
Hum ion EngRec Leg duc
inis &

un
g
In ark ising, Arrvices
trat Call an
Com & Con ineeruit al ts
& Offi stru
ringment

Ad
ion CenRes mu
Acc ce ctio
Co & tre our nica ounSup n

tr
Offi & ces tion
vice m ce Cus & Saletingport

M ve &
tomRecTechnos

m m
Ser Sup

g m
rt Se er ts un por er ruit
Ser me logy

is
tom duc Acc t

in icat
Man vicent
Cus Pro Ret oun
Ad es tre & er

in Com
Humufa Mar ail ting
io
ctu ket & Tra
ad l Cen Consum logsy
is ns
s

an ring ing Con des

in
Tr Res , & sum

tr
hno Adm &
& ourTra Com er Ser

r
s

Cal
Tec vice

En
ail inis Hea cesnsp
& Ser mu Pro vice
tic

istic

Ret

at
e ial trat lthc & ort nica

Sa
duc
enc re are Rec& Log tion tss

Ad
ee &
ion
Sci Financ

Ac gin
ctu g ty & Eng & ruit isti s

m
n

io
& ininper Off ine Med me
g ctio hite
gis

le
Tra Hum Ma ice erin ical ntcs
kin stru & Arc& Pro
an rke

m
& Sup g

n
Ad
s
BanCon ign ion

co ee
ate Resting por
Des catEst our & t
Lo

Edu l

in on

in
&
cesCom

un rin
Rea
& mu Sal
Rec nic es

is
O

In
ruit atio
&

tin g
Ma me ns

g ti
Log

tr
ffic

fo
nuf nt
act Hu Ma
ns

at
t

ma rke

rm
uri

En rma
or

ng, n ting
Cal Res

e
Tra

io
l
Heour & Com
e atio

nsp Cen

ati
g

althces
re S un sp

Su
ort tre mu Sal

n
& car & Rec
& nic

on
Log Cus Enge & rui atioes
in

isti tom ine Me tm ns

&
gy

Acc cs er erindic ent

erv ic

Ma
Ser g al

&
nuf
le unt

oun

fo
act Re vic
hit sto om Tr

O
g

rt &

or
ting
uri tail e

Co
ort gisti hnolo

Ad ng, &

ffic
ic

ver TraCo

t
nsu

In
tu er m

tisi nsp
,

Ac up
ng, me
& ing

ort
Art &r Pro
Sa cco

m
Adm

e
S

co
En s Log duc
gin
al &

un
Me isti ts
eer
es

S
un
dia cs
r

in
s

ing
tu

is
ec

ic
u C

tin
Ma n
spo

tr
ec m

ma
cs
A

ati

rke Re
Cal
at
ke fac

ort

tin sou
l Cen

g
po
io
er

g rce
t

Ban n
on

& s Eng & rucSer

De
n

tre

Co &
Edu Fin
kin
sig
&

mm Re ine
&
g

itm
u

rt
cat Coanc re
Ad

&
&

unicru ng g
sto
O

ion nstial
Arc C

Te
s entr tin

Arc
an

catitm
m

me
ffic
ru

tio

hite vice

ion ent
Tra tionvic
eri inin
r
&

in
Ma

Ser
upp
Rec

s
ctu
e
Inf
istr
ran
M

Acc
Lo

orm

Sup
Ma atio
e

Hum

no
& O nsp nica
ne

Ma Res

oun

es
nuf n
&

rke
an

ting
Returin
act & Com

po
Ca ar

nu
Adm
es

ting ces

lo
ail g, mu ign
Man
&

rt
& Tra nic & Arc
rc

our

&
&

m
s

gy
Con nspatio
Res n

Com Rec
vice

Ac
M

&
De ll C

Des nic
an ctio
Hum st n

sumort n Tec
Ser

mu
&
fact

Off
u

itm upp
Con ig

En Su

er & Loghno re
eS
ort
g
ial

Sal ducisti y
ufa
ru

Re Des

Pro
co

atio
ruit
anc

gi
ice
mm
l

es ts cs
Man
dica

Info

hite
ns
Fin

rism

ne
so
Me

Ad ng,

Man tion

nt
tin

rma
&

ctu
gi
Tou

er
ctu

log
ur
g

inis

ufa &
kin

Ret ket , mu
Rec sum inin Med s

Mar ring
are

uri
Tra s & vice

ecru e S
&

pp
in
ia
Ban

ctu Com
en

ail ing Tra nica Con ce

min Tra
Hos lthc

lity

Adm
ts
& Art Ser

g
duc

& & nsp tion stru Sup

pita

or
Co

s & Ar
Hea

ig
g

tin
ri

Con Com ort

inis
&

Offic
tion Pro

Hos

t
n
Ret Legcat sing des

trat

sum mu & hno n t

rin
ng,
rea er

pita
Edu erti Tra

&
Re
ra
Sa oun
ion ,

ion
istr nsp
ffic

er nicaLog log
Info

Ac tionMar& Com Edu

lity
g,

g
trati

Pro tion
En

&
rt & Con

cr
urin

ent

Offi
&

rma
man n

duc
istics
Spoail al

Tec ctio por

Adv

ch
Tr
Man
ati ng, T

ui
Tou
&

Sa
coketingmunica&catiCusontomnciaOffilceSerSuplepors
ort

ts
ati

istic
tm
rism

s
an

ite
Call Baninistrat

s
on

Adm

un& Comtion &TecTraer Service

y
Re

en
sp

s
Cenking
o

Ac
ment

ran
&R

cruitm

ur
tre &
or

t
uf

so
Hu minis
ing

Supp

tinmunication
Information
ati

e
t&
c

ur
on

ion
les

Fina&
ma

&
actu

co
Info
ri

on &

sport & Log

En
ce

g slogy
ort
on

Con
rma Call

Lo erin
Ac

rces

spo

n Re tio

Off
rm

hno g
s&

gine
act

Manufac
min factu

stru

Sa
tion

inin vices
un

gist g
Ac

Mar Comtre

ctio
&
Admi

&
ring,

keti mun
Re

Reang & icat tom

Cen
ice

n
sour n & Of Acco
& Recruit

tra

ics
Info

t
fo

l Esta

tin
crui
co
Lo
rt &
Enginee tion & Office
s & Re

Comion er
unt
Hum istr

te mun
Cus
ces

Su
In

Res

rma

&
gis

tm
En
nu

Prop
un

g
nuf

nistra

Tran

TechServ
Call Centre &
Administration

icat nolo
trati
ns

Administration & Office Support

& Re e Su ting

erty
pp
ring
ices municatio

tion

ions gy
Mini
turing,

tics
ffice

t
Ma

gin
& Communicat

Acc

ice
an

ng,

tin
Log

ScieReso Cons
fic

ort
cru ppor
Marke munica

nce urce truct

ion cial & Com

& Com

spor
Acco

Mark

& Tech
Serv

tion

itm
ce
y ucts

g
eeri
Finan eting
Ad

ure

s&
eting

Hum ing,
Ma

Farm
turing, Tran
r Prod
cal

isti

nolo gy
un

en
o

ting
itect

oun

Ener
Consing & Mark

ications
& Medi

nolog

Accounting

an Anim
inis

& Comm

gy
sour

ion
inee

Transpor
ume
thca& Arch

t
Resources

Reso als
Sup

Realurces & Cons

Engin
Desi truct

& Tech

& Of

t
Human

& Lo
RetaLega re

ce Cons

& Com Techno

ng
Banking

cs
Estat & Recruervat
Acc
Heal gn

ring

unica
Bank

Human Resources & Recruitment

Informa
Hospitality
Scienil &

Marketing
Technolo

e&
tions
mun

Huma

eerin
tion

ti
nistra

Prope
Call
untin
Marketing & Communications

fice

gist
Customer Service

tion &
an Re

munica logy

n Resou
Adm

itmen
Resources

Centre
Human

rty t
port
ication

& Office Support

ng
Com

g
Design
ounting

ion
Commun
& Finan

Call Centre
Administr
Eng

ion Techno
Sales

Information & Communication Technology

rces& Archit Servic

& Custom

ics
Commun

t & Log

Resource

Supp
Sale
Accounting

& Recruecture e
& Comm
ng &

Engineering
ture

tion

ication

tion
Manufacturing, Transport & Logistics

Retail
ation
Accoun

& Custome

Healthca

er
& Conserva
& Architec

tion &

s & Recruitm
Admi

Marketing & Communication Technology

itmen
Engineering

cial Servi

& Consum
Technolo

Design Constru
& Office

s
Hum

Educatio
Marketi

& Recruit
& Property
Construc Informa

Training

Manufac

ort
& Tourism

t
re &
Banking
Human
& Technolo

s
Advertising, Arts & Media

r Service

g
Design

& Architec
Design & Architecture

Enginee
Animals

Support

n & Training
unications

istics

Medical

er Product
Communications
Educatio tion

ent
Design
Consulting
Farming, n &

Insurance & Superannuation

Retail & Consumer Products

Sales
Estate

Acc

& Financial
Sales

ring
Education & Strategy

& Property

ction
ting
Science & Technology
Information & Training

ture
Science

ces
Sales

& Architecture
Science & Technology
Design & Architecture

logy

s
Education & Training
Trades & Services

ment
Real

Sales
& Strategy
Services
&
Consulting

Real Estate

def remove_outliers_iqr(df, column):

Q1 = df[column].quantile(0.10)
Q3 = df[column].quantile(0.85)

IQR = Q3 - Q1

lower_bound = Q1 - 1.5 * IQR

upper_bound = Q3 + 1.5 * IQR

df_outlier_free = df[(df[column] >= lower_bound) & (df[column] <= upper_bound)]

return df_outlier_free

jobs = remove_outliers_iqr(job, 'Salary')

#===========================================================================================================#

# Get top 20 job titles from both DataFrames

top_leagues_job = job['job_title'].value_counts().nlargest(20).index
top_leagues_jobs = jobs['job_title'].value_counts().nlargest(20).index

# Combine the top job titles

top_leagues = top_leagues_job.union(top_leagues_jobs)

# Plotting
plt.figure(figsize=(16, 6))
sns.boxplot(x='Salary', y='job_title', data=job[job['job_title'].isin(top_leagues)])
plt.title('Distribution of Salary by Top 20 job_title from job')
plt.xlabel('Salary')
plt.ylabel('job_title')
plt.show()

plt.figure(figsize=(16, 6))
sns.boxplot(x='Salary', y='job_title', data=jobs[jobs['job_title'].isin(top_leagues)])
plt.title('Distribution of Salary by Top 20 job_title from After Removing Outliers')
plt.xlabel('Salary')
plt.ylabel('job_title')
plt.show()

top_leagues = jobs['job_title'].value_counts().nlargest(15)

plt.figure(figsize=(20, 8))

colors = sns.cubehelix_palette(len(top_leagues), light=0.7, dark=0.2)

bar_plot = sns.barplot(x=top_leagues.index, y=top_leagues.values, palette=colors)

for index, value in enumerate(top_leagues.values):

label = f"{value:,}"
plt.text(index, value + 0.1, label, ha='center', va='bottom', fontsize=15, color='#A52A2A')

plt.title('Top 15 Leagues by Player Count')

plt.xlabel('League')
plt.ylabel('Player Count')
plt.xticks(rotation=90)
plt.tight_layout()

plt.show()

tips_df = pd.read_csv("tip.csv")

display(tips_df.head(2))
fig = px.bar(tips_df,
x="sex",
y="total_bill",
color="smoker",
barmode="group",
facet_row="time",
facet_col="day",
category_orders={"day": ["Thur", "Fri", "Sat", "Sun"],
"time": ["Lunch", "Dinner"]})
fig.show()

total_bill tip sex smoker day time size

0 16.99 1.01 Female No Sun Dinner 2

1 10.34 1.66 Male No Sun Dinner 3

day=Thur day=Fri day=Sat day=Sun

smoker
800
No
Yes

time=Lunch
600
total_bill

400

200

800

time=Dinner
600
total_bill

400

200

0
Male Female Male Female Male Female Male Female

sex sex sex sex

fig = px.box(tips_df,
x="time",
y="total_bill",
points="all")
fig.show()
50

40
total_bill

Dinner Lunch

time

fig = px.box(tips_df,
x="time",
y="total_bill",
points="outliers")
fig.show()

40
total_bill

Dinner Lunch

time

fig = px.box(tips_df,
x="day",
y="total_bill",
color="smoker" )
fig.update_traces(quartilemethod="linear")
fig.show()
smoker
50
No
Yes

40
total_bill

Sun Sat Thur Fri

day

fig = px.box(tips_df,
x="time",
y="total_bill",
color="smoker",
notched=True,
hover_data=["day"] # add day column to hover data
)
fig.show()

smoker
50
No
Yes

40
total_bill

Dinner Lunch

time

plt.figure(figsize=(20, 10))

x = jobs['job_title'].head(20)
y = jobs['Salary'].head(20)

# Plot the scatter plot with country names and numbers on y-axis
marker_sizes = jobs['Salary']
for i, country in enumerate(x):
plt.scatter(country, y.iloc[i], s=(marker_sizes.iloc[i])/20, label=country, alpha=0.7)
plt.text(country, y.iloc[i], f'{y.iloc[i]:,.0f}', ha='center', va='bottom', rotation='vertical', fontsize=10
# Set y-axis to display numbers in billions
plt.ticklabel_format(style='plain', axis='y', useOffset=False, scilimits=(9, 9))

plt.xlabel('Job Title')
plt.ylabel('Salary')
plt.title('Scatter Plot Job Title for Salary')
plt.xticks(rotation=90)
plt.grid(True)
plt.tight_layout()

plt.show()

jobs = pd.read_csv("jobstreet_all_job_dataset.csv")
jobs = jobs.sample(5000)
jobs = jobs.drop(columns=['job_id'], axis=1)
jobs = jobs.reset_index()
jobs = jobs.drop(columns=['index'], axis=1)
display(jobs.shape)
display(jobs.head(2))

from wordcloud import WordCloud

text = str(list(jobs['category'])).replace(',', '').replace('[', '').replace("'", '').replace(']', '')

wordcloud = WordCloud(background_color = 'white', width = 1600, height = 800, max_words = 121).generate(text)

plt.imshow(wordcloud)

plt.axis('off')
plt.show()

(5000, 10)
job_title company descriptions location category subcategory role type salary listingDate

Mass JOB
Manager,
Rapid PURPOSE
Government Kuala 2024-05-
0 Transit :\nTo organize Construction Project Management manager Contract/Temp NaN
& Authority Lumpur 10T04:06:31Z
Corporation & participate in
Liasion
Sdn Bhd a ...

Designing
Saraya Information & information-
Junior IT solutions, Seremban 2024-04-
1 Goodmaid Communication Developers/Programmers technology- Full time NaN
Executive implementation, District 08T00:14:09Z
Sdn Bhd Technology executive
customiza...
# Drop rows with missing values and plot the resulting DataFrame
job = jobs.dropna()

import re

# Handle ranges with different separators

if '–' in salary:
salary_range = salary.split('–')
elif '-' in salary:
salary_range = salary.split('-')
elif '—' in salary:
salary_range = salary.split('—')
else:
salary_range = [salary]

# Convert values to integers, handling potential errors

salary_values = []
for value in salary_range:
try:
value = int(float(value.replace(',', '').strip()))
salary_values.append(value)
except ValueError:
pass # Ignore non-numeric values

# Calculate mean if at least two valid values are found

if len(salary_values) >= 2:
salary_mean = sum(salary_values) / len(salary_values)
return salary_mean
else:
return None

except Exception as e:
print(f"Error processing salary '{salary}': {e}")
return None

# Apply the function to the salary column

job['Salary'] = job['salary'].apply(clean_and_calculate_mean)
job = job.drop('salary',axis=1)
job.head(2)

job_title company descriptions location category subcategory role type listingDate Salary

CTOS Data Attend to all inbound Call Centre & Customer

Specialist, Kuala call-centre- Full 2024-04-
4 Systems Sdn and outbound calls/ Customer Service - Call 2750.0
Contact Centre Lumpur role time 19T02:57:09Z
Bhd emai... Service Centre

Multilingual | Skills and Kampung Call Centre & Customer customer-

Private Full 2024-04-
5 Customer Support Abilities:\nSkilled Malaysia Customer Service - Call support- 5500.0
Advertiser time 05T12:56:47Z
Specialist communicator.\n... Raya Service Centre specialist

dfp = job['role'].value_counts().head(10).sort_values(ascending = True).reset_index()

dfl = job['location'].value_counts().head(10).sort_values(ascending = True).reset_index()
dfc = job['company'].value_counts().head(10).sort_values(ascending = True).reset_index()

fig = go.Figure()

fig.add_trace(go.Bar(y = dfp['role'],
orientation='h',
name = 'Position',
marker = dict(color = 'LightCoral')))

fig.add_trace(go.Bar(y = dfl['location'],
orientation='h',
name = 'Location',
marker = dict(color = 'CadetBlue')))

fig.add_trace(go.Bar(y = dfc['company'],
orientation='h',
name = 'Company',
marker = dict(color = 'SteelBlue')))

fig.update_layout(
updatemenus=[
dict(
type = "buttons",
direction="left",
pad={"r": 10, "t": 10},
showactive=True,
x=0.16,
xanchor="left",
y=1.12,
yanchor="top",
font = dict(color = 'Indigo',size = 14),
buttons=list([
dict(label="All",
method="update",
args=[ {"visible": [True, True, True]},
{'showlegend' : True}
]),
dict(label="Position",
method="update",
args=[ {"visible": [True, False, False]},
{'showlegend' : True}
]),
dict(label='Location',
method="update",
args=[ {"visible": [False, True, False]},
{'showlegend' : True}
]),
dict(label='Company',
method="update",
args=[ {"visible": [False, False, True]},
{'showlegend' : True}]),
]),
)])

fig.update_layout(
annotations=[
dict(text="Choose:", showarrow=False,
x=0, y=1.075, yref="paper", align="right",
font=dict(size=16,color = 'DarkSlateBlue'))])

fig.update_layout(title ="Top 10 Positions, Locations and Companies",

title_x = 0.5,
title_font = dict(size = 20, color = 'MidnightBlue'))

fig.show()

Top 10 Positions, Locations and Companies

All
Choose: Position Location Company

Michael Page International (Malaysia) Sdn Bhd Position

Location
Ambition Group Malaysia Sdn Bhd
Company
Agensi Pekerjaan Hays (Malaysia) Sdn Bhd
MumsMe Sdn Bhd
Elabram Systems Sdn Bhd
Petaling
Selangor
Penang Island
Seberang Perai
Kuala Lumpur City Centre
sales-executive
accounts-executive
human-resource-executive
finance-executive
purchasing-executive
0 5

# Read data
df = pd.read_csv('US_Job_Market.csv')
df.head(3)

position company description reviews location

Development Director\nALS Therapy Development Atlanta, GA

0 Development Director ALS TDI NaN
... 30301

An Ostentatiously-Excitable Principal The Hexagon

1 Job Description\n\n"The road that leads to acc... NaN Atlanta, GA
Research... Lavish

2 Data Scientist Xpert Staffing Growing company located in the Atlanta, GA are... NaN Atlanta, GA

#!pip install dash

import pandas as pd
from dash import Dash, dcc, html
from dash.dependencies import Input, Output
import plotly.graph_objects as go

dfd1 = df[df['position']== 'Data Scientist']

dfd2 = df[df['position']== 'Senior Data Scientist']
dfd3 = df[df['position']== 'Research Analyst']
dfd4 = df[df['position']== 'Data Engineer']

# Add 'position' column to each dataframe

redf1 = dfd1[["location", "position"]].value_counts().nlargest(10).sort_values(ascending = True).reset_index()
redf2 = dfd2[["location", "position"]].value_counts().nlargest(10).sort_values(ascending = True).reset_index()
redf3 = dfd3[["location", "position"]].value_counts().nlargest(10).sort_values(ascending = True).reset_index()
redf4 = dfd4[["location", "position"]].value_counts().nlargest(10).sort_values(ascending = True).reset_index()
# Create Plotly figure
fig = go.Figure()

fig.add_trace(go.Bar(x = redf1["location"],
y = redf1["count"],
marker = dict(color = 'Tomato'),
name = 'Data Scientist'))
fig.add_trace(go.Bar(x = redf2['location'],
y = redf2['count'],
name = 'Senior Data Scientist',
marker = dict(color = 'LightCoral')))
fig.add_trace(go.Bar(x = redf3['location'],
y = redf3['count'],
name = 'Research Analyst',
marker = dict(color = 'SteelBlue')))
fig.add_trace(go.Bar(x = redf4['location'],
y = redf4['count'],
name = 'Data Engineer',
marker = dict(color = 'CadetBlue')))

# Update Layout with dropdown functionality

fig.update_layout(
updatemenus=[
dict(
direction="down",
pad={"r": 10, "t": 10},
showactive=True,
x=0.13,
xanchor="left",
y=1.12,
yanchor="top",
font = dict(color = 'Indigo',size = 14),
buttons=list([
dict(label="All",
method="update",
args=[ {"visible": [True, True, True, True]},
{'showlegend' : True}
]),
dict(label="Data Scientist",
method="update",
args=[ {"visible": [True, False, False, False]},
{'showlegend' : True}
]),
dict(label='Senior Data Scientist',
method="update",
args=[ {"visible": [False, True, False, False]},
{'showlegend' : True}
]),
dict(label='Research Analyst',
method="update",
args=[ {"visible": [False, False, True, False]},
{'showlegend' : True}
]),
dict(label='Data Engineer',
method="update",
args=[ {"visible": [False, False, False, True]},
{'showlegend' : True}]),
]),
)])

fig.update_layout(
annotations=[
dict(text="Choose:", showarrow=False,
x=0, y=1.075, yref="paper", align="right",
font=dict(size=16,color = 'DarkSlateBlue'))])

fig.update_layout(title ="The distribution of states by four Positions",

title_x = 0.5,
title_font = dict(size = 20, color = 'MidnightBlue'))

fig.show()
The distribution of states by four Positions

80 Choose: All
▼
Data Scientist
Senior Data Scientist
Research Analyst
60 Data Engineer

0
Austin, TX

San Diego, CA

Seattle, WA

Atlanta, GA

Los Angeles, CA

Washington, DC

Chicago, IL

Boston, MA

San Francisco, CA

New York, NY

Mountain View, CA

Alameda, CA

New York, NY 10176

Washington, DC 20036

Sunnyvale, CA

San Mateo, CA
df = pd.read_csv("heart_disease_uci.csv")
df = df.dropna()
df.head(2)

id age sex dataset cp trestbps chol fbs restecg thalch exang oldpeak slope ca thal num

lv fixed
0 1 63 Male Cleveland typical angina 145.0 233.0 True 150.0 False 2.3 downsloping 0.0 0
hypertrophy defect

lv
1 2 67 Male Cleveland asymptomatic 160.0 286.0 False 108.0 True 1.5 flat 3.0 normal 2
hypertrophy

g = sns.jointplot(x="chol", y="thalch", data=df, kind="kde", color="b")

g.plot_joint(plt.scatter, c="b", s=30, linewidth=1, marker="+")
g.ax_joint.collections[0].set_alpha(0)
g.set_axis_labels("chol", "thalch");

# --- Create Jointplot ---

# --- Create Jointplot ---
jointplot = sns.jointplot(x = 'age', y = 'chol', data = df, hue = 'sex', palette = 'PuRd')

# --- Jointplot Titles & Text ---

jointplot.fig.suptitle('Jointplot between Age and Chol', fontweight = 'heavy', y = 1.05, fontsize = '14',
fontfamily = 'sans-serif', color = 'black');

import matplotlib.pyplot as plt

# Define the labels

labels = ['chol', 'age', 'thalch','oldpeak']

# Calculate counts
counts = [df[label].sum() for label in labels]

# Create the bar plot

plt.figure(figsize=(10, 6))
bars = plt.bar(labels, counts, color=['skyblue', 'Red', 'Green'])

# Add value labels on top of each bar with some vertical offset
for bar, count in zip(bars, counts):
yval = bar.get_height() + 0.1 # Add a small offset
plt.text(bar.get_x() + bar.get_width() / 2, yval, str(count), ha='center')

plt.title('Sum of Each Class')

plt.xlabel('Class')
plt.ylabel('Total')
plt.show() Python Code Link: https://t.me/AIMLDeepThaught/573
plt.rcParams['figure.figsize'] = (15,5)
plt.subplot(1, 2, 1)
chart = df.groupby('cp')['age'].mean().sort_values(ascending = False).plot(kind = 'bar', color = 'orangered')
chart.set_xticklabels(chart.get_xticklabels(), rotation = 0)
plt.title('Chest Pain Based on Age', fontsize = 15, color = 'b', pad = 12)
plt.xlabel('Chest Pain')
plt.ylabel('Age')

plt.subplot(1, 2, 2)
chart = df.groupby('thal')['oldpeak'].mean().sort_values(ascending = False).plot(kind = 'bar', color = 'gold')
chart.set_xticklabels(chart.get_xticklabels(), rotation = 0)
plt.title('Thal from Old Peak', fontsize = 15, color = 'b', pad = 12)
plt.xlabel('Thal')
plt.ylabel('Old Peak')
plt.show()

plt.figure(figsize = (12,4))
ax = sns.countplot(x=df.cp)
for bars in ax.containers:
ax.bar_label(bars)
plt.title("Count of Levels", fontsize = 15);
plt.figure(figsize = (8,5))
sns.kdeplot(df.age, shade = True, color = "r")
plt.title("Age Histogram", fontsize = 20)
plt.show()
print("Histogram's skewness is {} and kurtosis is {}".format(df.age.skew(), df.age.kurtosis()))

Histogram's skewness is -0.21485314045391055 and kurtosis is -0.5174882052116159

import scipy.stats as stats

df_numeric = df.select_dtypes(include='number')

results = []

for col in df_numeric.columns:

skewness = df_numeric[col].skew()
kurtosis = df_numeric[col].kurt()
results.append([col, skewness, kurtosis])

df_stats = pd.DataFrame(results, columns=['Column', 'Skewness', 'Kurtosis'])

df_stats
Column Skewness Kurtosis

0 id 0.90 3.95

1 age -0.21 -0.52

2 trestbps 0.70 0.80

3 chol 1.03 4.35

4 thalch -0.53 -0.09

5 oldpeak 1.24 1.52

6 ca 1.19 0.26

7 num 1.05 -0.16

from scipy.stats import norm

dfx = df[['chol', 'age', 'thalch','oldpeak']]

for col in dfx:

stats.probplot(dfx[col],plot=plt)
plt.title(col)
plt.show();
sns.set(rc={'figure.figsize':(20,7)})
sns.set_style("white")
sns.scatterplot(data=df, x="chol", y="trestbps", size="oldpeak", hue='cp',legend=True, sizes=(10, 500));

sns.set(rc={'figure.figsize':(20,7)})
sns.relplot(y='trestbps',x='chol',data=df,kind='scatter',size='oldpeak',hue='cp',aspect=1.2);

# Filter out rows with 'oldpeak' equal to 0.0

df_filtered = df[df['oldpeak'] != 0.0]

# Create the countplot

plt.figure(figsize=(20, 7))
sns.countplot(data=df_filtered, x='oldpeak', order=sorted(df_filtered['oldpeak'].unique()))

# Access bars through the current axes

for bar in plt.gca().patches:
plt.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.1, int(bar.get_height()), ha='center', va='botto

# Add labels and title

plt.title('Tumor Size Distribution (Excluding 0.0)')
plt.ylabel('Number of Patients')
plt.xlabel('Oldpeak Value')
plt.xticks(rotation=0) # Rotate x-axis labels for better readability
plt.show()

plt.figure(figsize=(20, 7))
# Create the countplot
sns.countplot(data=df, x='age', order=sorted(df['age'].unique()))

# Access bars through the current axes

for bar in plt.gca().patches:
plt.text(bar.get_x() + bar.get_width() / 2, bar.get_height() + 0.1, int(bar.get_height()), ha='center', va='botto

# Add labels and title

plt.title('Age Distribution of Patients')
plt.ylabel('Number of Patients')
plt.xlabel('Age')
plt.show()

import matplotlib.pyplot as plt

import seaborn as sns
from scipy import stats # Import stats module

def diagnostic_plots(df, variable):

plt.figure(figsize=(17, 5))
plt.subplot(1, 3, 1)
sns.distplot(df[variable])
plt.title('Histogram')

plt.subplot(1, 3, 2)
stats.probplot(df[variable], dist="norm", plot=plt) # Use stats.probplot
plt.ylabel('RM quantiles')

plt.subplot(1, 3, 3)
sns.boxplot(x=df[variable])
plt.title('Boxplot')
plt.show()

for col in df[['age','chol']].select_dtypes(exclude="O").columns[:20].to_list():

diagnostic_plots(df,col)

corr = df.select_dtypes('number').drop('id',axis=1).corr()
# Generate a mask for the upper triangle
mask = np.zeros_like(corr)
mask[np.triu_indices_from(mask)] = True
fig, ax = plt.subplots(figsize=(15,10))
sns.heatmap(corr, cmap='Spectral_r', mask=mask, square=True, annot=True, linewidth=0.5, cbar_kws={"shrink" : 0.5
df = df.drop('id',axis=1)
df.head(2)

age sex dataset cp trestbps chol fbs restecg thalch exang oldpeak slope ca thal num

0 63 Male Cleveland typical angina 145.0 233.0 True lv hypertrophy 150.0 False 2.3 downsloping 0.0 fixed defect 0

1 67 Male Cleveland asymptomatic 160.0 286.0 False lv hypertrophy 108.0 True 1.5 flat 3.0 normal 2

plt.figure(figsize=(20, 5))
sns.set_context("paper")

kdeplt = sns.kdeplot(
data=df,
x="chol",
hue="sex",
palette='Dark2',
alpha=0.7,
lw=2,
)

kdeplt.set_title("Cholesterol values distribution\nMale VS Female", fontsize=12)

kdeplt.set_xlabel("Cholesterol", fontsize=12)

# Calculate mean cholesterol for each sex

mean_male = df[df['sex'] == 'Male']['chol'].mean()
mean_female = df[df['sex'] == 'Female']['chol'].mean()

# Add vertical lines for mean cholesterol

plt.axvline(x=mean_male, color="#2986cc", ls="--", lw=1.3)
plt.axvline(x=mean_female, color="#c90076", ls="--", lw=1.3)

# Add text annotations

plt.text(mean_male, plt.gca().get_ylim()[1], f"Mean Cholesterol / Male: {mean_male:.2f}",
fontsize=10, color="#2986cc", ha='right', va='top')
plt.text(mean_female, plt.gca().get_ylim()[1], f"Mean Cholesterol / Female: {mean_female:.2f}",
fontsize=10, color="#c90076", ha='left', va='top')

plt.show() Python Code Link: https://t.me/AIMLDeepThaught/573

heart_df_fg = sns.FacetGrid(
data=df,
col="sex",
hue="sex",
row="cp",
height=4,
aspect=1.3,
palette='Dark2',
col_order=["Male", "Female"],
)
heart_df_fg.map_dataframe(sns.regplot, "age", "chol")
plt.show()
x = df.groupby("cp")["chol"].min().index
y = df.groupby("cp")["chol"].min().values
df = pd.DataFrame({'cp':x,
'chol':y })

fig = px.bar(df,
x='cp',
y='chol',
color='cp', #color represents brand
title='Chol Value'
)
fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573

Chol Value

180
cp
asymptomatic
160 atypical angina
non-anginal
140 typical angina

120

100
chol

0
asymptomatic atypical angina non-anginal typical angina

df = pd.read_csv("heart_disease_uci.csv")
df = df.dropna()
df = df.drop('id',axis =1 )
df.head(2)

age sex dataset cp trestbps chol fbs restecg thalch exang oldpeak slope ca thal num

0 63 Male Cleveland typical angina 145.0 233.0 True lv hypertrophy 150.0 False 2.3 downsloping 0.0 fixed defect 0

1 67 Male Cleveland asymptomatic 160.0 286.0 False lv hypertrophy 108.0 True 1.5 flat 3.0 normal 2

plt.figure(figsize=(18,5))

plt.subplot(1,5,1)
sns.distplot(df['age'],color='DeepPink')
plt.subplot(1,5,2)
sns.distplot(df['chol'],color='Green')
plt.subplot(1,5,3)
sns.distplot(df['thalch'],color='Red')
plt.subplot(1,5,4)
sns.distplot(df['oldpeak'],color='Magenta')

plt.tight_layout()
plt.show()
df_cpy = df.copy("Deep")
df_cpy = df_cpy.select_dtypes("number")
df_cpy = df_cpy[['age','chol','thalch','oldpeak']]

fig,axis=plt.subplots(ncols=4,nrows=1,figsize=(15,5))
index=0
axis=axis.flatten()

for col,values in df_cpy.items():

sns.boxplot(y=col,data=df_cpy,color='r',ax=axis[index])
index+=1
plt.tight_layout(pad=0.5,w_pad=0.7,h_pad=5.0)

df_cpy = df.copy("Deep")
df_cpy = df_cpy.select_dtypes("number")
df_cpy = df_cpy[['age','chol','thalch','oldpeak']]

flierprops = dict(markerfacecolor='g', color='g', alpha=0.5)

n_cols = 4
n_rows = int(np.ceil(df_cpy.shape[-1]*2 / n_cols))
fig, axes = plt.subplots(n_rows, n_cols, figsize=(4 * n_cols, 3 * n_rows))
for i, (col) in enumerate(list(df_cpy.columns)):
mean = df_cpy[col].mean()
median = df_cpy[col].median()
sns.histplot(df_cpy[col], ax=axes.flatten()[2*i], kde=True)
sns.boxplot(x=df_cpy[col], orient='h', ax=axes.flatten()[2*i+1], color='g')
axes.flatten()[2*i+1].vlines(mean, ymin = -1, ymax = 1, color='r', label=f"For [{col}]\nMean: {mean:.2}\nMedian:
axes.flatten()[2*i+1].legend()

if i % n_cols == 0:
ax.set_ylabel('Frequency')
else:
ax.set_ylabel('')
plt.tight_layout()

sns.set(style='whitegrid', palette="deep", font_scale=1.1, rc={"figure.figsize": [20, 6]})

sns.histplot(df['chol'], bins = 30).set(xlabel = "Chol");

df2 = df[['age','sex','chol']]

f, (ax_box, ax_hist) = plt.subplots(2, sharex=True, gridspec_kw={"height_ratios": (.15, .85)})

ax_box.title.set_text('Age countplot and Boxplot')

sns.boxplot(df2["age"], orient="h" ,ax=ax_box)
sns.histplot(data=df2, x="age", ax=ax_hist)
ax_box.set(xlabel='')
plt.show()

#is online delivery available?

colors = ("darkorange", "green",'Red','Pink')
explodes = [0.5, 0.5,0.75, .50]
df["cp"].value_counts(sort=False).plot.pie(colors=colors,
textprops={'fontsize': 15},
autopct = '%4.1f',
startangle= 90,
radius =2,
rotatelabels=True,
shadow = True) ;
wine = pd.read_csv("WineQT.csv")
wine.head(2)

fixed volatile citric residual free sulfur total sulfur

chlorides density pH sulphates alcohol quality Id
acidity acidity acid sugar dioxide dioxide

0 7.4 0.70 0.0 1.9 0.08 11.0 34.0 1.0 3.51 0.56 9.4 5 0

1 7.8 0.88 0.0 2.6 0.10 25.0 67.0 1.0 3.20 0.68 9.8 5 1

NUMERICAL = wine[['fixed acidity', 'volatile acidity', 'residual sugar',

'chlorides', 'free sulfur dioxide', 'total sulfur dioxide',
'pH', 'alcohol']]
fig, axes = plt.subplots(2, 4)
fig.set_figheight(12)
fig.set_figwidth(16)
for i,col in enumerate(NUMERICAL):
sns.histplot(wine[col],ax=axes[(i // 4) -1 ,(i % 4)], kde = True)
axes[(i // 4) -1 ,(i % 4)].axvline(wine[col].mean(), color='k', linestyle='dashed', linewidth=1)
#set configuration for charts
plt.rcParams["figure.figsize"]=[18 , 6]
plt.rcParams["font.size"]=15
plt.rcParams["legend.fontsize"]="medium"
plt.rcParams["figure.titlesize"]="medium"

def plot_disribution(data , x ,color,bins ):

mean = data[x].mean()
std = data[x].std()
info=dict(data = data , x = x , color = color)
plt.subplot(1 , 3 , 1 , title =f"Ditstribution of {x} column")
sns.distplot(a=data[x] , bins = bins)
plt.xlabel(f"bins of {x}")
plt.axvline(mean , label ="mean" , color ="red")
plt.ylabel("frequency")
plt.legend(["${\sigma}$ = %d"%std , f"mean = {mean:.2f}"])
plt.title(f"histogram of {x} column")
plt.subplot(1 , 3 , 2)
sns.boxplot(**info)
plt.xlabel(f"{x}")
plt.title(f"box plot of {x} column")
plt.subplot(1 , 3 , 3)
sns.swarmplot(**info)
plt.xlabel(f"{x}")
plt.title(f"distribution of points in {x} column")
plt.suptitle(f"Distribution of {x} column" , fontsize =15 , color="red")
plt.show()

age_bins = np.arange(29 , 77+5 , 5)

base_color = sns.color_palette()[4]
plot_disribution(data = df , x ="chol" , color = base_color , bins=age_bins)
plot , ax = plt.subplots(1 , 3 , figsize=(20,6))
sns.histplot(data = df.loc[df["thal"]== 'normal'] , x = "age" , hue = "sex",binwidth=2,ax = ax[0],palette = sns.
sns.histplot(data = df.loc[df["thal"]== 'reversable defect'] , x = "age" , hue = "sex",binwidth=2,ax = ax[1],palette
sns.histplot(data = df.loc[df["thal"]== 'fixed defect'] , x = "age" , hue = "sex",binwidth=2,ax = ax[2],palette
plt.show()

sex = ["Male", "Female"]

values = df["sex"].value_counts()
color = ["#FF0000", "#000000"]

plt.figure(figsize = (5, 7))

plt.pie(values, labels = sex, colors = color, explode = (0.1, 0), textprops = {"color":"w"}, autopct = "%.2f%%",

plt.legend();

#plotting
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(18, 9))
fig.suptitle(' Highest and Lowest Correlation ', size = 20, weight='bold')
axs = [ax1, ax2]
#kdeplot
sns.kdeplot(data=df, y='chol', x='thalch', ax=ax1, color="red")
ax1.set_title('Chol Vs Thalch', size = 14, weight='bold', pad=20)

#kdeplot
sns.kdeplot(data=df, y='chol', x='oldpeak', ax=ax2, color='Blue')
ax2.set_title('Chol Vs Oldpeak', size = 14, weight='bold', pad=20);

df1 = pd.read_csv('US_Job_Market.csv')
df1 = df1.dropna().reset_index()
df1 = df1.drop('index',axis=1)
df1.head(2) Python Code Link: https://t.me/AIMLDeepThaught/573
position company description reviews location

Operation DEPARTMENT: Program OperationsPOSITION Atlanta, GA

0 Data Analyst 44.0
HOPE LOCATIO... 30303

Assistant Professor -TT - Signal Processing & Emory

1 DESCRIPTION\nThe Emory University Department o... 550.0 Atlanta, GA
... University

plt.figure(figsize=(20, 7))

# Filter for the top 10 most frequent companies

df_v = df1['company'].value_counts().head(10).reset_index()
df_v.columns = ['company', 'count']

# Calculate the percentage

total = df1['company'].value_counts().sum()
df_v['percentage'] = (df_v['count'] / total) * 100

# Create the bar plot

plot = sns.barplot(y='company', x='count', data=df_v)

# Annotate the bars with the percentage

for index, row in df_v.iterrows():
plot.text(row['count'], index, f"{row['percentage']:.2f}%", color='black', ha="left")

plt.xticks(rotation=0)
plt.title('Top 10 Most Frequent Companies')
plt.show()
from PIL import Image
import matplotlib.pyplot as plt
import numpy as np
from wordcloud import WordCloud, STOPWORDS
import pandas as pd
import requests
from io import BytesIO

# Download mask image

mask_url = "https://cdn.pixabay.com/photo/2013/07/12/17/47/test-pattern-152459_1280.png"
response = requests.get(mask_url)
mask_image = Image.open(BytesIO(response.content))
wordcloud_mask = np.array(mask_image)

# Generate word cloud

plt.figure(figsize=(15,15))
all_text = " ".join(df1['company'].values.tolist())
wordcloud = WordCloud(width=800,
height=800,
stopwords=STOPWORDS,
background_color='white',
max_words=800,
colormap="hsv",
mask=wordcloud_mask).generate(all_text)

# Display the word cloud

plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.show()

from wordcloud import WordCloud

# create a word cloud for positive reviews

positive_reviews = df1[df1['location'] == 'Atlanta, GA']['company'].str.cat(sep=' ')
positive_cloud = WordCloud(width=1500, height=800, max_words=100, background_color='white').generate(positive_reviews

plt.figure(figsize=(20, 6), facecolor=None)

plt.imshow(positive_cloud)
plt.axis("off")
plt.tight_layout(pad=0)
plt.show()

import plotly.graph_objs as go
values = df1['company'].value_counts()[:10]
labels=values.index
text=values.index
fig = go.Figure(data=[go.Pie(values=values,labels=labels,hole=.3)])
fig.update_traces(hoverinfo='label+percent', textinfo='value', textfont_size=20,
marker=dict(line=dict(color='#000000', width=3)))
fig.update_layout(title="Most popular Jobs in USA",
titlefont={'size': 30},
)
fig.show() Python Code Link: https://t.me/AIMLDeepThaught/573

Most popular Jobs in USA

Amazon.com
Ball Aerospace
Microsoft
Google
187 NYU Langone Health
Fred Hutchinson Cancer Research Center
357
KPMG

137 Broad Institute

Facebook
Walmart eCommerce

134
45
49

76
70 66 49

print("Count of unique Jobs in USA")

locationCount=df1['company'].value_counts().head(10).sort_values(ascending=True)
locationCount
fig=plt.figure(figsize=(18,10))
locationCount.plot(kind="barh",fontsize=8)
plt.ylabel("Job names",fontsize=25,color="red",fontweight='bold')
plt.title("Jobs Vs. COUNT GRAPH",fontsize=40,color="BLACK",fontweight='bold')
for v in range(len(locationCount)):
plt.text(v+locationCount[v],v,locationCount[v],fontsize=10,color="BLACK",fontweight='bold')

Count of unique Jobs in USA

z=df1['position'].value_counts().head(10)
fig=px.bar(z,x=z.index,y=z.values,color=z.index,text=z.values,labels={'index':'job title','y':'count','text':'count'
fig.show()

Top 10 Popular Roles in Data Sceince

200
position
204
Data Scientist
Senior Data Scientist
Research Analyst
150 Data Engineer
Machine Learning Engineer
Sr. Data Scientist
count

Principal Data Scientist

100
Quantitative Analyst
Research Scientist
Lead Data Scientist
50
53
44
39
26 22 20 20 20 17
0
Da Se R Da Ma Sr Pr Qu Re Le
ta nio ese ta ch . in an se ad
Sc rD ar
c En ine Data cipa tit ar
c D
ien ata hA gin L S lD ati h S ata
tis na ee ea cie ata ve cie Sc
t Sc lys r rn n A n ien
ien ing tis Sc n
tis t En t ien alys tist tis
t
t gin tis t
ee t
r

position

# Plotting Outliers
col = 1
plt.figure(figsize = (20, 10))
for i in wine.columns:
if col < 11:
plt.subplot(2, 5, col)
plt.boxplot(wine[i])
plt.xlabel(i)
col = col + 1
s = sns.countplot(x = 'cp',data = df)
sizes=[]
for p in s.patches:
height = p.get_height()
sizes.append(height)
s.text(p.get_x()+p.get_width()/2.,
height + 3,
'{:1.2f}%'.format(height/len(df)*100),
ha="center", fontsize=16)

#checking the target variables for distribution

sns.distplot(df['chol'],color='Red')
plt.axvline(x=df['chol'].mean(), color='Blue', linestyle='--', linewidth=2)
plt.title('Chol');

wine.iloc[:, :-1].describe().T.sort_values(by='std' , ascending = False)\

.style.background_gradient(cmap='GnBu')\
.bar(subset=["max"], color='#BB0000')\
.bar(subset=["mean",], color='green')

count mean std min 25% 50% 75% max

total sulfur dioxide 1143.000000 45.914698 32.782130 6.000000 21.000000 37.000000 61.000000 289.000000

free sulfur dioxide 1143.000000 15.615486 10.250486 1.000000 7.000000 13.000000 21.000000 68.000000

fixed acidity 1143.000000 8.311111 1.747595 4.600000 7.100000 7.900000 9.100000 15.900000

residual sugar 1143.000000 2.532152 1.355917 0.900000 1.900000 2.200000 2.600000 15.500000

alcohol 1143.000000 10.442111 1.082196 8.400000 9.500000 10.200000 11.100000 14.900000

quality 1143.000000 5.657043 0.805824 3.000000 5.000000 6.000000 6.000000 8.000000

citric acid 1143.000000 0.268364 0.196686 0.000000 0.090000 0.250000 0.420000 1.000000

volatile acidity 1143.000000 0.531339 0.179633 0.120000 0.392500 0.520000 0.640000 1.580000

sulphates 1143.000000 0.657708 0.170399 0.330000 0.550000 0.620000 0.730000 2.000000

pH 1143.000000 3.311015 0.156664 2.740000 3.205000 3.310000 3.400000 4.010000

chlorides 1143.000000 0.086933 0.047267 0.012000 0.070000 0.079000 0.090000 0.611000

density 1143.000000 0.996730 0.001925 0.990070 0.995570 0.996680 0.997845 1.003690

df[df["age"] >= 50].describe().style.background_gradient(cmap='RdPu')

age trestbps chol thalch oldpeak ca num

count 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000 213.000000

mean 59.159624 134.793427 252.220657 144.708920 1.214085 0.835681 1.103286

std 5.731645 18.575307 54.953695 22.377966 1.197004 0.969460 1.280713

min 50.000000 94.000000 100.000000 71.000000 0.000000 0.000000 0.000000

25% 55.000000 120.000000 214.000000 130.000000 0.100000 0.000000 0.000000

50% 58.000000 132.000000 246.000000 148.000000 1.000000 1.000000 1.000000

75% 63.000000 145.000000 283.000000 161.000000 1.800000 1.000000 2.000000

max 77.000000 200.000000 564.000000 195.000000 6.200000 3.000000 4.000000

def highlight_min(s, props=''):

return np.where(s == np.nanmin(s.values), props, '')
df.describe().style.apply(highlight_min, props='color:yellow;background-color:Grey', axis=0)

age trestbps chol thalch oldpeak ca num

count 299.000000 299.000000 299.000000 299.000000 299.000000 299.000000 299.000000

mean 54.521739 131.715719 246.785953 149.327759 1.058528 0.672241 0.946488

std 9.030264 17.747751 52.532582 23.121062 1.162769 0.937438 1.230409

min 29.000000 94.000000 100.000000 71.000000 0.000000 0.000000 0.000000

25% 48.000000 120.000000 211.000000 132.500000 0.000000 0.000000 0.000000

50% 56.000000 130.000000 242.000000 152.000000 0.800000 0.000000 0.000000

75% 61.000000 140.000000 275.500000 165.500000 1.600000 1.000000 2.000000

max 77.000000 200.000000 564.000000 202.000000 6.200000 3.000000 4.000000

Prepared By: Syed Afroz Ali

Python Code Link

Clevenger, Shelly - Higgins, George E. - Marcum, Catherine D. - Navarro, Jordana N - Understanding Victimology - An Active-Learning Approach-Taylor and Francis (2020)
89% (9)
Clevenger, Shelly - Higgins, George E. - Marcum, Catherine D. - Navarro, Jordana N - Understanding Victimology - An Active-Learning Approach-Taylor and Francis (2020)
222 pages
Additive Manufacturing of Metallic Lattice Structures Unconstrained Design, Accurate Fabrication, Fascinated Performances, and Challenges
No ratings yet
Additive Manufacturing of Metallic Lattice Structures Unconstrained Design, Accurate Fabrication, Fascinated Performances, and Challenges
56 pages
2D Cohesive Elements Abaqus
No ratings yet
2D Cohesive Elements Abaqus
2 pages
A User-Material Subroutine Incorporating Single Crystal Plasticity in The Abaqus Finite Element Program
No ratings yet
A User-Material Subroutine Incorporating Single Crystal Plasticity in The Abaqus Finite Element Program
47 pages
ACP Tutorial Ex2
No ratings yet
ACP Tutorial Ex2
17 pages
Lecture 2-Hypermesh
100% (1)
Lecture 2-Hypermesh
37 pages
Physical Science
From Everand
Physical Science
Carson Dellosa Education
4.5/5 (3)
Dengue Prevention Health Teaching 09-16-10
67% (3)
Dengue Prevention Health Teaching 09-16-10
3 pages
Reiki Seminar
100% (3)
Reiki Seminar
36 pages
Stress Analysis On Chair Frame
No ratings yet
Stress Analysis On Chair Frame
5 pages
Lecture 38
No ratings yet
Lecture 38
78 pages
Chapter 2 - Voltage and Current: Reference: Introductory Circuit Analysis Robert L. Boylestad
No ratings yet
Chapter 2 - Voltage and Current: Reference: Introductory Circuit Analysis Robert L. Boylestad
44 pages
Ansys Workbench
No ratings yet
Ansys Workbench
3 pages
ANSYS Workbench 10
No ratings yet
ANSYS Workbench 10
1,192 pages
Materials Engineer Reviewer
No ratings yet
Materials Engineer Reviewer
40 pages
Assignment 3 AFEM HK191 PDF
No ratings yet
Assignment 3 AFEM HK191 PDF
3 pages
CFturbo - ANSYS Workbench Extension2
No ratings yet
CFturbo - ANSYS Workbench Extension2
3 pages
Materials Engineer Reviewer
No ratings yet
Materials Engineer Reviewer
40 pages
Heat Transfer Analysis in ABAQUS
No ratings yet
Heat Transfer Analysis in ABAQUS
26 pages
Introduction To Tool Design
No ratings yet
Introduction To Tool Design
24 pages
ANSYS Workbench Workshop
No ratings yet
ANSYS Workbench Workshop
29 pages
ANSYS workbench help (帮助文件) PDF
No ratings yet
ANSYS workbench help (帮助文件) PDF
977 pages
Numerical Methods
No ratings yet
Numerical Methods
28 pages
ANSYS Workbench Simulation - Modal
No ratings yet
ANSYS Workbench Simulation - Modal
28 pages
Test Yourself 4
No ratings yet
Test Yourself 4
15 pages
Results Postprocessing: Chapter Eight
No ratings yet
Results Postprocessing: Chapter Eight
50 pages
ANSYS Structural Mechanics R16 Updates
No ratings yet
ANSYS Structural Mechanics R16 Updates
82 pages
Hypermesh Materials1
No ratings yet
Hypermesh Materials1
286 pages
Exam For Construction and Material Testing
No ratings yet
Exam For Construction and Material Testing
5 pages
ANSYS 19.2 Worksheet
No ratings yet
ANSYS 19.2 Worksheet
40 pages
Controls Toolkit in ADAMS View
100% (1)
Controls Toolkit in ADAMS View
19 pages
Ansa v13.1.0 Release Notes
No ratings yet
Ansa v13.1.0 Release Notes
129 pages
Material Parameter Calibration Services For Abaqus Non-Linear Material Models
No ratings yet
Material Parameter Calibration Services For Abaqus Non-Linear Material Models
20 pages
ANSYS 10.0 Workbench Tutorial - Exercise 3, Named Selections and Localized Loads
No ratings yet
ANSYS 10.0 Workbench Tutorial - Exercise 3, Named Selections and Localized Loads
46 pages
Lecture 2 Meshing 1 PDF
No ratings yet
Lecture 2 Meshing 1 PDF
82 pages
Day 1 Materials Engineer Test Reviewer
No ratings yet
Day 1 Materials Engineer Test Reviewer
6 pages
tài liệu ansys
100% (1)
tài liệu ansys
16 pages
HW11.0.101-HWDesktop Release Notes
No ratings yet
HW11.0.101-HWDesktop Release Notes
38 pages
Primary: Battery Secondary
No ratings yet
Primary: Battery Secondary
31 pages
5 - Aluminium Alloys 2010-2011
No ratings yet
5 - Aluminium Alloys 2010-2011
52 pages
ANSYS12 Meshing Features
100% (1)
ANSYS12 Meshing Features
120 pages
Questions Vol. 1
No ratings yet
Questions Vol. 1
31 pages
Mesh-Intro 19R2 M05 Lecture Slides Mesh Quality and Advanced Topics
100% (1)
Mesh-Intro 19R2 M05 Lecture Slides Mesh Quality and Advanced Topics
60 pages
Mech Intro 18.0 M02 Lecture Slides Preprocessing
No ratings yet
Mech Intro 18.0 M02 Lecture Slides Preprocessing
36 pages
Cantilever Beam Tutorial
No ratings yet
Cantilever Beam Tutorial
7 pages
CONSTRUCTION Question
No ratings yet
CONSTRUCTION Question
11 pages
Lecture PDF
No ratings yet
Lecture PDF
27 pages
Reverse Engineering Gear Demo (PDFDrive)
No ratings yet
Reverse Engineering Gear Demo (PDFDrive)
120 pages
Results Mapping PDF
No ratings yet
Results Mapping PDF
3 pages
Noise, Vibration, and Harshness (NVH) Analysis of A Full Vehicle Model
No ratings yet
Noise, Vibration, and Harshness (NVH) Analysis of A Full Vehicle Model
5 pages
AcuSolve Intro Training V18b
No ratings yet
AcuSolve Intro Training V18b
400 pages
Ansys Fluent Battery Module Manualpdf
No ratings yet
Ansys Fluent Battery Module Manualpdf
24 pages
Ansys Tutorial
No ratings yet
Ansys Tutorial
6 pages
Ansys Workbench Training Manual
No ratings yet
Ansys Workbench Training Manual
3 pages
Reduce Noise Improve Sound Quality With Actran Acoustic Radiation Simulation
No ratings yet
Reduce Noise Improve Sound Quality With Actran Acoustic Radiation Simulation
22 pages
Fea Based Durability Using Strain-Life Models For Different Medium Carbon Steel As Fabrication Materials For An Automotive Component
No ratings yet
Fea Based Durability Using Strain-Life Models For Different Medium Carbon Steel As Fabrication Materials For An Automotive Component
7 pages
Finite element analysis is useful numerical technique to solve various structural problems. In this paper FEA model of slab column connection is model using ANSYS 16.0 . Punching shear failure is a major problem encountered in the design of reinforced concrete flat plates. The utilization of shear reinforcement via shear studs or other means has become a choice for improving the punching shear capacity .The obtained results indicate that, the proposed shear reinforcement system and drop panel has a positive effect in the enhancement of both the punching shear capacity and the strain energy of interior slab–column connection of both normal and high strength concrete. The general finite element software ANSYS can be used successfully to simulate the punching shearbehaviour of reinforced concrete flat plates.
No ratings yet
Finite element analysis is useful numerical technique to solve various structural problems. In this paper FEA model of slab column connection is model using ANSYS 16.0 . Punching shear failure is a major problem encountered in the design of reinforced concrete flat plates. The utilization of shear reinforcement via shear studs or other means has become a choice for improving the punching shear capacity .The obtained results indicate that, the proposed shear reinforcement system and drop panel has a positive effect in the enhancement of both the punching shear capacity and the strain energy of interior slab–column connection of both normal and high strength concrete. The general finite element software ANSYS can be used successfully to simulate the punching shearbehaviour of reinforced concrete flat plates.
6 pages
16.1 - EXPLICIT Solver Radioss
No ratings yet
16.1 - EXPLICIT Solver Radioss
10 pages
Aws90 Ws Fatigue A12
No ratings yet
Aws90 Ws Fatigue A12
34 pages
An Integrated Approach To Random Analysis Using Nastran
No ratings yet
An Integrated Approach To Random Analysis Using Nastran
23 pages
Introduction To CPFEM Manual-1
No ratings yet
Introduction To CPFEM Manual-1
20 pages
Python Datavisualization
No ratings yet
Python Datavisualization
69 pages
Nama: Davina Audy Fitriana NIM: J1D115018: Laporan Sementara
No ratings yet
Nama: Davina Audy Fitriana NIM: J1D115018: Laporan Sementara
9 pages
Different Types of Line Protection: Reasons For Faults
No ratings yet
Different Types of Line Protection: Reasons For Faults
13 pages
Bernard Purdie Discography PDF
No ratings yet
Bernard Purdie Discography PDF
6 pages
M. Sanchez-Jimenez
No ratings yet
M. Sanchez-Jimenez
14 pages
Symbols For Electrical Diagrams: GAP.5.0.2.B
No ratings yet
Symbols For Electrical Diagrams: GAP.5.0.2.B
6 pages
Eeict2015 413 Daboul Waaerbauer PDF
No ratings yet
Eeict2015 413 Daboul Waaerbauer PDF
5 pages
Introduction To Atp - May 16 2011v8
No ratings yet
Introduction To Atp - May 16 2011v8
110 pages
Solution To HW 1
No ratings yet
Solution To HW 1
7 pages
Neurocomputing: Shusen Zhou, Qingcai Chen, Xiaolong Wang
No ratings yet
Neurocomputing: Shusen Zhou, Qingcai Chen, Xiaolong Wang
11 pages
Appendix - Fourier Spectra Data Grady August 31, 2001
No ratings yet
Appendix - Fourier Spectra Data Grady August 31, 2001
9 pages
Lab 1
No ratings yet
Lab 1
3 pages
Notes On Symmetric Components
No ratings yet
Notes On Symmetric Components
2 pages
Power Flow by Gauss Seidel Method
No ratings yet
Power Flow by Gauss Seidel Method
4 pages
High Yield Notes
No ratings yet
High Yield Notes
211 pages
Sukoon Home
No ratings yet
Sukoon Home
52 pages
Antibiotic Sensitivity Test
No ratings yet
Antibiotic Sensitivity Test
2 pages
First Aid Quiz
100% (2)
First Aid Quiz
8 pages
Wolf 2014
No ratings yet
Wolf 2014
37 pages
Kuesioner Carpal Tunnel Syndrome
No ratings yet
Kuesioner Carpal Tunnel Syndrome
2 pages
Nursing Pharmacology Handouts
No ratings yet
Nursing Pharmacology Handouts
9 pages
Auxiliary Label - Guiding Principles
No ratings yet
Auxiliary Label - Guiding Principles
3 pages
HEMATOLOGY Psychomotor and Cognitive Objectives For Lab Classes
No ratings yet
HEMATOLOGY Psychomotor and Cognitive Objectives For Lab Classes
7 pages
Us English 2024
No ratings yet
Us English 2024
11 pages
Equine Endocrine and Laminitis 1
No ratings yet
Equine Endocrine and Laminitis 1
12 pages
【IFU】Device Instructions for Use of anti CCP IFU-WW01
No ratings yet
【IFU】Device Instructions for Use of anti CCP IFU-WW01
2 pages
Occupational Hazards in Dentistry
No ratings yet
Occupational Hazards in Dentistry
7 pages
Emergency Procedures
100% (1)
Emergency Procedures
5 pages
Oxygen Administration and Monitoring Procedure
No ratings yet
Oxygen Administration and Monitoring Procedure
6 pages
pseudohypoparathyroidism grand
No ratings yet
pseudohypoparathyroidism grand
34 pages
ANNEX B DECLARATION For CPD
No ratings yet
ANNEX B DECLARATION For CPD
1 page
Soal BHS Inggris
No ratings yet
Soal BHS Inggris
17 pages
Final Uganda National Child Policy October 2020 Lores
No ratings yet
Final Uganda National Child Policy October 2020 Lores
76 pages
Slaughtering
No ratings yet
Slaughtering
69 pages
1 s2.0 S002571251830172X
No ratings yet
1 s2.0 S002571251830172X
11 pages
Ankylosing Spondylitis:: Hard To Say, Hard To See, Time To Hear May 2014 Gail Beer Matt James
No ratings yet
Ankylosing Spondylitis:: Hard To Say, Hard To See, Time To Hear May 2014 Gail Beer Matt James
40 pages
Beyond The Dopamine Hypothesis of Schizophrenia To Three Neural Networks of Psychosis Dopamine Serotonin and Glutamate
No ratings yet
Beyond The Dopamine Hypothesis of Schizophrenia To Three Neural Networks of Psychosis Dopamine Serotonin and Glutamate
5 pages
Ovarian Torsion
No ratings yet
Ovarian Torsion
4 pages
UVC
No ratings yet
UVC
165 pages
FCMFOS (SA) Primary Past Papers - 2015 2nd Semester 14-6-2017
No ratings yet
FCMFOS (SA) Primary Past Papers - 2015 2nd Semester 14-6-2017
2 pages
Disadvantages of Fast Food
No ratings yet
Disadvantages of Fast Food
1 page
English 7 - Quarter 3 - Mod6 - Reacting To What Is Asserted or Expressed in A Text
100% (2)
English 7 - Quarter 3 - Mod6 - Reacting To What Is Asserted or Expressed in A Text
18 pages