Data Analysis Using Python
Data Analysis Using Python
~import libraries
In [1]: import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import datetime
%matplotlib inline
sns.set()
In [2]: Hr = pd.read_csv(r'C:\Users\compucity\Downloads\Book1.csv')
Hr.head()
Out[2]:
E_ID Name Age Address Telephone Salary Department Hire Date
task of Data
1- Find the number of employees in each governorate
2- Find the number of employees in each Department
3- Average age of employees in each department
4- Average Salary of employees in each department
5- Retrieving the data of employees who work in the computer department only,
as well as the rest of the employees in other departments
6- Find the number of employees in each department who work in Cairo Governorate only
7- Search for the employee who receives the highest salary and retrieve his complete data+
8- Number of employees by department in each governorate
9- Bonus ... Based on Hire Date
Hire Date1 >= -1-2005 5% of Salary
1-1-2003 10%
1-1-2000 15%
1-1-1995 20%
1-1-1990 25%
Else 30%
Based on Hire Date
10 - Find some suitable graph for the data
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 1/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
In [4]: Hr.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 24 entries, 0 to 23
Data columns (total 8 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 E_ID 24 non-null int64
1 Name 24 non-null object
2 Age 24 non-null int64
3 Address 24 non-null object
4 Telephone 24 non-null int64
5 Salary 24 non-null int64
6 Department 24 non-null object
7 Hire Date 24 non-null object
dtypes: int64(4), object(4)
memory usage: 1.6+ KB
In [5]: Hr.dtypes
Out[6]: E_ID 0
Name 0
Age 0
Address 0
Telephone 0
Salary 0
Department 0
Hire Date 0
dtype: int64
In [7]: # duplicated
Hr.duplicated().sum()
Out[7]: 0
In [8]: Hr['Address'].value_counts()
Out[8]: Cairo 8
Alex 6
Giza 5
Milan 2
Alexandria 1
Alixandria 1
milan 1
Name: Address, dtype: int64
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 2/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
In [9]: Hr['Department'].value_counts()
Out[9]: Sales 9
Computer 8
Account 7
Name: Department, dtype: int64
Out[11]: Department
Account 2143.0
Computer 4375.0
Sales 2778.0
Name: Salary, dtype: float64
Out[12]: E_ID Name Age Address Telephone Salary Department Hire Date
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 3/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
In [16]: # Find the number of employees in each department who work in Cairo Governorate only
Hr[Hr['Address']=='Cairo']['Department'].value_counts()
Out[16]: Computer 4
Account 2
Sales 2
Name: Department, dtype: int64
In [36]: # Search for the employee who receives the highest salary and retrieve his complete data
Hr[Hr['Salary']==Hr['Salary']].max()[['Name']+['Address']+['Department']+['Age']+['Salary']
Out[36]: Name Zeiad
Address milan
Department Sales
Age 57
Salary 7000
dtype: object
In [37]: # Search for the employee who receives the min salary and retrieve his complete data
Hr[Hr['Salary']==Hr['Salary']].min()[['Name']+['Address']+['Department']+['Age']+['Salary']
Out[37]: Name Ahmed
Address Alex
Department Account
Age 24
Salary 1000
dtype: object
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 4/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
#Two different ways to find the solution and how to deal with history to do the calculation
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 5/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
0 300.0
1 100.0
2 750.0
3 250.0
4 400.0
5 200.0
6 250.0
7 450.0
8 700.0
9 450.0
10 800.0
11 450.0
12 250.0
13 100.0
14 150.0
15 300.0
16 400.0
17 400.0
18 500.0
19 150.0
20 50.0
21 1200.0
22 1000.0
23 400.0
Name: Bonus, dtype: float64
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 6/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 7/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
In [18]: xa = sns.countplot(x=Hr['Department'],palette='PuBu')
for bar in xa.containers:
xa.bar_label(bar)
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 8/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
In [22]: xa = sns.countplot(x=Hr['Address'],palette='Set1')
for bar in xa.containers:
xa.bar_label(bar)
In [19]: plt.figure(figsize=(10,5))
sns.countplot(x='Department',hue='Address',data=Hr,palette='hsv')
Out[19]: <AxesSubplot:xlabel='Department', ylabel='count'>
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 9/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 10/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 11/12
3/27/24, 11:20 PM Untitled11 - Jupyter Notebook
In [ ]:
localhost:8889/notebooks/mahmoud1/Untitled11.ipynb 12/12