IP (12) Proj File Pandas&Matplotlib
IP (12) Proj File Pandas&Matplotlib
Pandas Programs
Ques 1). Create a pandas Series from a dictionary of values and an ndarray.
Soln ).
# Through Dictionary
import pandas as pd
dic={'a':'Apple', 'b':'Boll', 'c':'Cat', 1:'Sumit', 2:'Rohit'}
sr_dir=pd.Series(dic)
print('Series Through Dicionary:\n', sr_dir)
# Through ndarray
import pandas as pd
import numpy as np
sr_np=pd.Series(np.array(['a', 2, 'e', 4, 'i', 6, 'o', 8, 'u']))
print('\nSeries Through Throuch ndarray:\n', sr_np)
Output :
Series Through Dicionary:
a Apple
b Boll
c Cat
1 Sumit
2 Rohit
dtype: object
Ques 2). Write a pandas program to perform arithmatic operations on two pandas Series.
Soln ).
import pandas as pd
s1=pd.Series([3,5,6,3,4])
s2=pd.Series([2,4,6,8,0])
print('Addition of Series:\n', s1+s2)
print('\nSubstraction of Series:\n', s1-s2)
print('\nMultiplication of Series:\n', s1*s2)
print('\nDivision of Series:\n', s1/s2)
Output :
Addition of Series:
0 5
1 9
2 12
3 11
4 4
dtype: int64
Substraction of Series:
0 1
1 1
2 0
3 -5
4 4
dtype: int64
Multiplication of Series:
0 6
1 20
2 36
3 24
4 0
dtype: int64
Division of Series:
0 1.500
1 1.250
2 1.000
3 0.375
4 inf
dtype: float64
new_sr=sr.append(pd.Series(['Female', 'Pass']))
print('\nAfter add some data in Series:\n', new_sr)
Output :
Original Data Series:
0 Suman
1 CS
2 12
dtype: object
Ques 4). On a given Series, print all the elements that are above then 75th percentile.
Soln ).
import pandas as dp
import numpy as np
sr=pd.Series([-2,8,9,12,13,14,13,11,12,7,54])
p_sr=sr.quantile(q=0.75)
for val in sr:
if(val>p_sr):
print(val)
Output :
14
54
Ques 5). Write a pandas program to select the rows where the percentage greater then 70.
Soln ).
import pandas as pd
data={'Name':['Aman', 'Roman', 'Ruhi', 'Salu', 'Abhi', 'Manu'],\
'Gender':['Male', 'Male', 'Female', 'Female', 'Male', 'Female'],\
'Percentage':[65.8, 71, 89, 77, 93.7, 57.1]}
labels=['a', 'b', 'c', 'd', 'e', 'f']
df=pd.DataFrame(data, index=labels)
Output :
The percentage data, which is more then 70:
Name Gender Percentage
b Roman Male 71.0
c Ruhi Female 89.0
d Salu Female 77.0
e Abhi Male 93.7
Ques 6). Write a pandas program to delect the rows the percentage is in between 70 to 90.
Soln ).
import pandas as pd
data={'Name':['Aman', 'Roman', 'Ruhi', 'Salu', 'Abhi', 'Manu'],\
'Gender':['Male', 'Male', 'Female', 'Female', 'Male', 'Female'],\
'Percentage':[65.8, 70, 90, 77, 93.7, 57.1]}
labels=['a', 'b', 'c', 'd', 'e', 'f']
df=pd.DataFrame(data, index=labels)
Output :
The students whose percentage is in between 70 to 90:
Name Gender Percentage
b Roman Male 70.0
c Ruhi Female 90.0
d Salu Female 77.0
Ques 7). Write a pandas program to change the percentage in a given row by the user.
Soln ).
import pandas as pd
data={'Name':['Aman', 'Roman', 'Ruhi', 'Salu', 'Abhi', 'Manu'],\
'Gender':['Male', 'Male', 'Female', 'Female', 'Male', 'Female'],\
'Percentage':[65.8, 70, 90, 77, 93.7, 57.1]}
df=pd.DataFrame(data)
print('Original DataFrame:\n',df)
Output :
Original DataFrame:
Name Gender Percentage
0 Aman Male 65.8
1 Roman Male 70.0
2 Ruhi Female 90.0
3 Salu Female 77.0
4 Abhi Male 93.7
5 Manu Female 57.1
Enter teh index of row:3
Enter the percentage to be changed:79.82
Ques 8). Create a DataFrame quarterly sales where each row contains the item category,
item name and expenditure. Locate the three largest values of expenditure in this
DataFrame.
Soln ).
import pandas as pd
Output :
QtrSales DataFrame is:
Item Category Item Name Expenditure
0 Food Biscuit 100
1 Drink Pepsi 80
2 Food Bread 40
3 Drink Cocacola 150
4 Sweet Rasgulla 120
5 Food Butter 180
6 Sweet Milkcake 165
Ques 9). Create a DataFrame quarterly sales where each row contains the item category,
item name and expenditure. Group the rows by the category and print the total expendature
per category.
Soln ).
import pandas as pd
df = pd.DataFrame(data, columns=col)
print(df)
Output :
item_category item_name expenditure
0 Car Maruti 1050000
1 AC Hitachi 106000
2 Air Cooler Bajaj 16000
3 Washing Machine LG 15600
4 Car Ford 2450000
5 AC Samsung 166000
6 Air Cooler Symphony 15500
7 Washing Machine Wirlpool 15800
8 Car Thar 2650000
9 AC LG 168000
10 Air Cooler Usha 12500
11 Washing Machine Samsung 12600
Ques 10). Create a Data Frame for Examination result and display the row lables, column
lables, data types of each columns, shape size and dimension.
Soln ).
import pandas as pd
data = {'Eng':[76, 79, 97, 69, 89], 'Phy':[68, 88, 69, 59, 87], 'Chem':[59, 85,
74, 85, 83], 'Maths':[96, 94, 89, 99, 97], 'CS':[92, 68, 83, 89, 93],
'Ttl':[391, 414, 412, 401, 449], 'Per':[78.2, 82.8, 82.4, 80.2, 89.8]}
idx = ['Amit', 'Sumit', 'Rekha', 'Suman', 'Rupam']
df = pd. DataFrame(data, index=idx)
print(df)
Output :
Eng Phy Chem Maths CS Ttl Per
Amit 76 68 59 96 92 391 78.2
Sumit 79 88 85 94 68 414 82.8
Rekha 97 69 74 89 83 412 82.4
Suman 69 59 85 99 89 401 80.2
Rupam 89 87 83 97 93 449 89.8
Size is: 35
Dimension is: 2
Ques 11). Write a Pandas program to join the two given dataframes along with columns and
assign all data.
Soln ).
import pandas as pd
import numpy as np
Output :
Original DataFrames 1 exam:
name perc qualify
0 Aman 79.5 yes
1 Kamal 29.0 no
2 Amjad 90.5 yes
3 Rohan NaN no
4 Amit 32.0 no
5 Sumit 65.0 yes
6 Matthew NaN no
# Add Series
combined_data_sr = exam_data1.append(s, ignore_index=True)
# Add Dictionary
combined_info_dicts = combined_data_sr.append(dicts, ignore_index=True)
Output :
Original DataFrames:
name perc qualify
0 Aman 79.5 yes
1 Kamal 29.0 no
2 Amjad 90.5 yes
3 Rohan NaN no
4 Amit 32.0 no
5 Sumit 65.0 yes
6 Matthew 56.0 yes
Series:
name Sukhvir
perc 54
qualify yes
dtype: object
Dictionary:
[{'name': 'Krish', 'perc': 45, 'qualify': 'yes'}, {'name': 'Kumar', 'perc': 67,
'qualify': 'yes'}]
Combined Data:
name perc qualify
0 Aman 79.5 yes
1 Kamal 29.0 no
2 Amjad 90.5 yes
3 Rohan NaN no
4 Amit 32.0 no
5 Sumit 65.0 yes
6 Matthew 56.0 yes
7 Sukhvir 54.0 yes
8 Krish 45.0 yes
9 Kumar 67.0 yes
Output :
Data Frame:
sales1 sales2
0 10 20
1 20 15
2 -4 10
3 5 -1
4 -1 12
5 15 -2
Ques 14). Replace all missing values in a data frame with 999.
Soln ).
import pandas as pd
import numpy as np
Output :
Data Frame:
sales1 sales2
0 NaN 20.0
1 20.0 NaN
2 NaN -10.0
3 5.0 1.0
4 -1.0 12.0
5 15.0 NaN
Ques 15). Filter out rows based on different criteria such as duplicate rows.
Soln ).
import pandas as pd
data={'Name':['Aman','Rohit','Deepika','Aman','Deepika','Sohit','Geeta'],
'Sales':[8500,4500,9200,8500,9200,9600,8400]}
sales=pd.DataFrame(data)
# Find duplicate rows
duplicated = sales[sales.duplicated(keep=False)]
print("duplicate Row:\n",duplicated)
Output :
duplicate Row:
Name Sales
0 Aman 8500
2 Deepika 9200
3 Aman 8500
4 Deepika 9200
Ques 16). Importing and exporting data between pandas and csv file.
Soln ).
import pandas as pd
csv_to_df = pd.read_csv('C://Users/Mukesh-Sahu/OneDrive/Desktop/St_data.csv',
sep=',', header=0)
print('Reading CSV file in dataframe:\n', csv_to_df)
df_to_csv = csv_to_df.to_csv('St_Record.csv')
Output :
Reading CSV file in dataframe:
Name Eng Phy Chem Maths IP
0 Amit 76 79 97 69 89
1 Sumit 68 88 69 59 87
2 Rekha 59 85 74 85 83
3 Suman 96 94 89 99 97
4 Rupam 92 68 83 89 93
Matplotlib Programs
Ques 17). Given the school result data, analyses the performance of the students on
different parameters, e.g subject wise or class wise.
Soln ).
import matplotlib.pyplot as plt
Output :
Ques 18). For the Data frames created above, analyze, and plot appropriate charts with title
and legend.
Soln ).
import matplotlib.pyplot as plt
import numpy as np
x = np.arange(len(s))
plt.bar(x, mark_sc, label='Science', width=0.2, color='r')
plt.bar(x+0.2, mark_com, label='Commerce', width=0.2, color='b')
Output :
Ques19). Write a program to plot a bar chart in python to display the result of a school for
five consecutive years.
Soln ).
import matplotlib.pyplot as plt
year=['2015','2016','2017','2018','2019']
p=[98.50,70.25,55.20,90.5,61.50]
j=['b','g','r','m','c']
Output :
Ques 20). Take data of your interest from an open source (e.g. data.gov.in) aggregate and
summarize it. Then plot it using plotting functions of the matplotlib library
Soln ).
import matplotlib.pyplot as plt
import pandas as pd
data = pd.read_csv('C:/Users/Mukesh-Sahu/Downloads/census.csv')
df = data.iloc[1:-1].sort_values('Population', ascending=False).head(10)
print(df)
Output :
Sr. No. State/UT Population
32 33 Uttar Pradesh 137465
1 2 Andra Pradesh 43769
19 20 Maharashtra 40891
4 5 Bihar 40827
34 35 West Bangal 30349
18 19 Madhya Pradesh 29597
30 31 Tamil Nadu 22364
25 26 Odisha 20332
15 16 Karnataka 20266
28 29 Rajasthan 16517