Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
25 views

IP (12) Proj File Pandas&Matplotlib

The document contains solutions to 11 questions related to pandas programs. The programs demonstrate how to create pandas Series from dictionaries and arrays, perform arithmetic operations on Series, add/select data from Series, calculate percentiles and filter rows in DataFrames based on conditions. It also shows how to group and aggregate DataFrame data, join DataFrames, and display properties of a DataFrame.

Uploaded by

akarshsahu417
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

IP (12) Proj File Pandas&Matplotlib

The document contains solutions to 11 questions related to pandas programs. The programs demonstrate how to create pandas Series from dictionaries and arrays, perform arithmetic operations on Series, add/select data from Series, calculate percentiles and filter rows in DataFrames based on conditions. It also shows how to group and aggregate DataFrame data, join DataFrames, and display properties of a DataFrame.

Uploaded by

akarshsahu417
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Informatics Practices Practical File (2023-24)

Pandas Programs
Ques 1). Create a pandas Series from a dictionary of values and an ndarray.
Soln ).
# Through Dictionary
import pandas as pd
dic={'a':'Apple', 'b':'Boll', 'c':'Cat', 1:'Sumit', 2:'Rohit'}
sr_dir=pd.Series(dic)
print('Series Through Dicionary:\n', sr_dir)

# Through ndarray
import pandas as pd
import numpy as np
sr_np=pd.Series(np.array(['a', 2, 'e', 4, 'i', 6, 'o', 8, 'u']))
print('\nSeries Through Throuch ndarray:\n', sr_np)

Output :
Series Through Dicionary:
a Apple
b Boll
c Cat
1 Sumit
2 Rohit
dtype: object

Series Through Throuch ndarray:


0 a
1 2
2 e
3 4
4 i
5 6
6 o
7 8
8 u
dtype: object

Ques 2). Write a pandas program to perform arithmatic operations on two pandas Series.
Soln ).
import pandas as pd
s1=pd.Series([3,5,6,3,4])
s2=pd.Series([2,4,6,8,0])
print('Addition of Series:\n', s1+s2)
print('\nSubstraction of Series:\n', s1-s2)
print('\nMultiplication of Series:\n', s1*s2)
print('\nDivision of Series:\n', s1/s2)

Output :
Addition of Series:
0 5
1 9
2 12
3 11
4 4
dtype: int64

Substraction of Series:
0 1
1 1
2 0
3 -5
4 4
dtype: int64

Multiplication of Series:
0 6
1 20
2 36
3 24
4 0
dtype: int64

Division of Series:
0 1.500
1 1.250
2 1.000
3 0.375
4 inf
dtype: float64

Ques 3). Write a program to add some data to an existing Series.


Soln ).
import pandas as pd
sr=pd.Series(['Suman', 'CS', 12])
print('Original Data Series:\n', sr)

new_sr=sr.append(pd.Series(['Female', 'Pass']))
print('\nAfter add some data in Series:\n', new_sr)

Output :
Original Data Series:
0 Suman
1 CS
2 12
dtype: object

After add some data in Series:


0 Suman
1 CS
2 12
0 Female
1 Pass
dtype: object

Ques 4). On a given Series, print all the elements that are above then 75th percentile.
Soln ).
import pandas as dp
import numpy as np
sr=pd.Series([-2,8,9,12,13,14,13,11,12,7,54])
p_sr=sr.quantile(q=0.75)
for val in sr:
if(val>p_sr):
print(val)

Output :
14
54

Ques 5). Write a pandas program to select the rows where the percentage greater then 70.
Soln ).
import pandas as pd
data={'Name':['Aman', 'Roman', 'Ruhi', 'Salu', 'Abhi', 'Manu'],\
'Gender':['Male', 'Male', 'Female', 'Female', 'Male', 'Female'],\
'Percentage':[65.8, 71, 89, 77, 93.7, 57.1]}
labels=['a', 'b', 'c', 'd', 'e', 'f']
df=pd.DataFrame(data, index=labels)

print('The percentage data, which is more then 70:')


print(df[df['Percentage']>70])

Output :
The percentage data, which is more then 70:
Name Gender Percentage
b Roman Male 71.0
c Ruhi Female 89.0
d Salu Female 77.0
e Abhi Male 93.7

Ques 6). Write a pandas program to delect the rows the percentage is in between 70 to 90.
Soln ).
import pandas as pd
data={'Name':['Aman', 'Roman', 'Ruhi', 'Salu', 'Abhi', 'Manu'],\
'Gender':['Male', 'Male', 'Female', 'Female', 'Male', 'Female'],\
'Percentage':[65.8, 70, 90, 77, 93.7, 57.1]}
labels=['a', 'b', 'c', 'd', 'e', 'f']
df=pd.DataFrame(data, index=labels)

print('The students whose percentage is in between 70 to 90:')


print(df[df['Percentage'].between(70,90)])

Output :
The students whose percentage is in between 70 to 90:
Name Gender Percentage
b Roman Male 70.0
c Ruhi Female 90.0
d Salu Female 77.0

Ques 7). Write a pandas program to change the percentage in a given row by the user.
Soln ).
import pandas as pd
data={'Name':['Aman', 'Roman', 'Ruhi', 'Salu', 'Abhi', 'Manu'],\
'Gender':['Male', 'Male', 'Female', 'Female', 'Male', 'Female'],\
'Percentage':[65.8, 70, 90, 77, 93.7, 57.1]}
df=pd.DataFrame(data)
print('Original DataFrame:\n',df)

idx=eval(input('\nEnter teh index of row:'))


per=eval(input('Enter the percentage to be changed:'))
print('\nChange the percentage in row', idx, 'to', per)
df.loc[idx,'Percentage']=per
print(df)

Output :
Original DataFrame:
Name Gender Percentage
0 Aman Male 65.8
1 Roman Male 70.0
2 Ruhi Female 90.0
3 Salu Female 77.0
4 Abhi Male 93.7
5 Manu Female 57.1
Enter teh index of row:3
Enter the percentage to be changed:79.82

Change the percentage in row 3 to 79.82


Name Gender Percentage
0 Aman Male 65.80
1 Roman Male 70.00
2 Ruhi Female 90.00
3 Salu Female 79.82
4 Abhi Male 93.70
5 Manu Female 57.10

Ques 8). Create a DataFrame quarterly sales where each row contains the item category,
item name and expenditure. Locate the three largest values of expenditure in this
DataFrame.
Soln ).
import pandas as pd

qtr_sales_df = pd.DataFrame({'Item Category':['Food', 'Drink', 'Food', 'Drink',


'Sweet', 'Food', 'Sweet'], 'Item Name':['Biscuit', 'Pepsi', 'Bread','Cocacola',
'Rasgulla', 'Butter', 'Milkcake'], 'Expenditure':[100, 80, 40, 150, 120, 180,
165]})
print('QtrSales DataFrame is:\n', qtr_sales_df)

print('\n3 largest expenditure values of DataFrame:')


print(qtr_sales_df.sort_values('Expenditure', ascending=False).head(3))

Output :
QtrSales DataFrame is:
Item Category Item Name Expenditure
0 Food Biscuit 100
1 Drink Pepsi 80
2 Food Bread 40
3 Drink Cocacola 150
4 Sweet Rasgulla 120
5 Food Butter 180
6 Sweet Milkcake 165

3 largest expenditure values of DataFrame:


Item Category Item Name Expenditure
5 Food Butter 180
6 Sweet Milkcake 165
3 Drink Cocacola 150

Ques 9). Create a DataFrame quarterly sales where each row contains the item category,
item name and expenditure. Group the rows by the category and print the total expendature
per category.
Soln ).
import pandas as pd

data = [['Car', 'Maruti', 1050000],['AC', 'Hitachi', 106000],['Air Cooler',


'Bajaj', 16000],['Washing Machine', 'LG', 15600],['Car', 'Ford',2450000],['AC',
'Samsung', 166000],['Air Cooler', 'Symphony', 15500],['Washing Machine',
'Wirlpool', 15800],['Car', 'Thar', 2650000],['AC', 'LG', 168000],['Air Cooler',
'Usha', 12500],['Washing Machine', 'Samsung', 12600]]
col = ['item_category', 'item_name', 'expenditure']

df = pd.DataFrame(data, columns=col)
print(df)

print('\nGrouping the rows by item category:')


grouped_df = df.groupby('item_category')
# print(abc.groups)
res_df = grouped_df['expenditure'].sum()
print(res_df)

Output :
item_category item_name expenditure
0 Car Maruti 1050000
1 AC Hitachi 106000
2 Air Cooler Bajaj 16000
3 Washing Machine LG 15600
4 Car Ford 2450000
5 AC Samsung 166000
6 Air Cooler Symphony 15500
7 Washing Machine Wirlpool 15800
8 Car Thar 2650000
9 AC LG 168000
10 Air Cooler Usha 12500
11 Washing Machine Samsung 12600

Grouping the rows by item category:


item_category
AC 440000
Air Cooler 44000
Car 6150000
Washing Machine 44000
Name: expenditure, dtype: int64

Ques 10). Create a Data Frame for Examination result and display the row lables, column
lables, data types of each columns, shape size and dimension.
Soln ).
import pandas as pd
data = {'Eng':[76, 79, 97, 69, 89], 'Phy':[68, 88, 69, 59, 87], 'Chem':[59, 85,
74, 85, 83], 'Maths':[96, 94, 89, 99, 97], 'CS':[92, 68, 83, 89, 93],
'Ttl':[391, 414, 412, 401, 449], 'Per':[78.2, 82.8, 82.4, 80.2, 89.8]}
idx = ['Amit', 'Sumit', 'Rekha', 'Suman', 'Rupam']
df = pd. DataFrame(data, index=idx)
print(df)

print('\nindexes are:', df.index)


print('\nColumns are:', df.columns)
print('\nData type of each columns are:', df.dtypes)
print('\nShape is:', df.shape)
print('\nSize is:', df.size)
print('\nDimension is:', df.ndim)

Output :
Eng Phy Chem Maths CS Ttl Per
Amit 76 68 59 96 92 391 78.2
Sumit 79 88 85 94 68 414 82.8
Rekha 97 69 74 89 83 412 82.4
Suman 69 59 85 99 89 401 80.2
Rupam 89 87 83 97 93 449 89.8

indexes are: Index(['Amit', 'Sumit', 'Rekha', 'Suman', 'Rupam'],


dtype='object')
Columns are: Index(['Eng', 'Phy', 'Chem', 'Maths', 'CS', 'Ttl', 'Per'],
dtype='object')

Data type of each columns are: Eng int64


Phy int64
Chem int64
Maths int64
CS int64
Ttl int64
Per float64
dtype: object

Shape is: (5, 7)

Size is: 35

Dimension is: 2

Ques 11). Write a Pandas program to join the two given dataframes along with columns and
assign all data.
Soln ).
import pandas as pd
import numpy as np

exam_dic1 = {'name': ['Aman', 'Kamal', 'Amjad', 'Rohan', 'Amit', 'Sumit',


'Matthew'],
'perc': [79.5, 29, 90.5, np.nan, 32, 65, np.nan],
'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'no']}
exam_data1 = pd.DataFrame(exam_dic1)
print("Original DataFrames 1 exam:\n", exam_data1)

exam_dic2 = {'name': ['Parveen', 'Ahil', 'Ashaz', 'Shifin', 'Hanash'],


'perc': [89.5, 92, 90.5, 91.5, 90],
'qualify': ['yes', 'yes', 'yes', 'yes', 'yes']}
exam_data2 = pd.DataFrame(exam_dic2)
print("\nOriginal DataFrames 2 exam:\n", exam_data2)

print("\nJoined Dataframes along rows:")


result_data = pd.concat([exam_data1, exam_data2], axis=1)
print(result_data)

Output :
Original DataFrames 1 exam:
name perc qualify
0 Aman 79.5 yes
1 Kamal 29.0 no
2 Amjad 90.5 yes
3 Rohan NaN no
4 Amit 32.0 no
5 Sumit 65.0 yes
6 Matthew NaN no

Original DataFrames 2 exam:


name perc qualify
0 Parveen 89.5 yes
1 Ahil 92.0 yes
2 Ashaz 90.5 yes
3 Shifin 91.5 yes
4 Hanash 90.0 yes

Joined Dataframes along rows:


name perc qualify name perc qualify
0 Aman 79.5 yes Parveen 89.5 yes
1 Kamal 29.0 no Ahil 92.0 yes
2 Amjad 90.5 yes Ashaz 90.5 yes
3 Rohan NaN no Shifin 91.5 yes
4 Amit 32.0 no Hanash 90.0 yes
5 Sumit 65.0 yes NaN NaN NaN
6 Matthew NaN no NaN NaN NaN

Ques12). Write a Pandas program to append a list of dictionaries or series to an existing


DataFrame and display the combined data.
Soln ).
import pandas as pd
import numpy as np

exam_dic1 = {'name': ['Aman', 'Kamal', 'Amjad', 'Rohan', 'Amit', 'Sumit',


'Matthew'], 'perc': [79.5, 29, 90.5, np.nan, 32, 65, 56], 'qualify': ['yes',
'no', 'yes', 'no', 'no', 'yes', 'yes']}
exam_data1 = pd.DataFrame(exam_dic1)
print("Original DataFrames:\n", exam_data1)

s = pd.Series(['Sukhvir', 54, 'yes'], index=['name', 'perc', 'qualify'])


print("\nSeries:\n", s)

dicts = [{'name':'Krish', 'perc': 45, 'qualify':'yes'},\


{'name':'Kumar', 'perc': 67, 'qualify':'yes'}]
print("\nDictionary:\n", dicts)

# Add Series
combined_data_sr = exam_data1.append(s, ignore_index=True)

# Add Dictionary
combined_info_dicts = combined_data_sr.append(dicts, ignore_index=True)

print("\nCombined Data:\n", combined_info_dicts)

Output :
Original DataFrames:
name perc qualify
0 Aman 79.5 yes
1 Kamal 29.0 no
2 Amjad 90.5 yes
3 Rohan NaN no
4 Amit 32.0 no
5 Sumit 65.0 yes
6 Matthew 56.0 yes

Series:
name Sukhvir
perc 54
qualify yes
dtype: object

Dictionary:
[{'name': 'Krish', 'perc': 45, 'qualify': 'yes'}, {'name': 'Kumar', 'perc': 67,
'qualify': 'yes'}]

Combined Data:
name perc qualify
0 Aman 79.5 yes
1 Kamal 29.0 no
2 Amjad 90.5 yes
3 Rohan NaN no
4 Amit 32.0 no
5 Sumit 65.0 yes
6 Matthew 56.0 yes
7 Sukhvir 54.0 yes
8 Krish 45.0 yes
9 Kumar 67.0 yes

Ques 13). Replace all negative values in a data frame with 0.


Soln ).
import pandas as pd

data = {'sales1':[10,20,-4,5,-1,15], 'sales2':[20,15,10,-1,12,-2]}


df = pd.DataFrame(data)
print("Data Frame:\n", df)

print('\nDisplay DataFrame after replacing every negative value with 0:')


df[df<0]=0
print(df)

Output :
Data Frame:
sales1 sales2
0 10 20
1 20 15
2 -4 10
3 5 -1
4 -1 12
5 15 -2

Display DataFrame after replacing every negative value with 0:


sales1 sales2
0 10 20
1 20 15
2 0 10
3 5 0
4 0 12
5 15 0

Ques 14). Replace all missing values in a data frame with 999.
Soln ).
import pandas as pd
import numpy as np

data = {'sales1':[np.nan, 20, np.nan, 5, -1, 15], 'sales2':[20, np.nan, -10, 1,


12, np.nan]}
df = pd.DataFrame(data)
print("Data Frame:\n", df)

print("\nAfter Replacing missing value with 999:")


df=df.fillna(999)
print(df)

Output :
Data Frame:
sales1 sales2
0 NaN 20.0
1 20.0 NaN
2 NaN -10.0
3 5.0 1.0
4 -1.0 12.0
5 15.0 NaN

After Replacing missing value with 999:


sales1 sales2
0 999.0 20.0
1 20.0 999.0
2 999.0 -10.0
3 5.0 1.0
4 -1.0 12.0
5 15.0 999.0

Ques 15). Filter out rows based on different criteria such as duplicate rows.
Soln ).
import pandas as pd
data={'Name':['Aman','Rohit','Deepika','Aman','Deepika','Sohit','Geeta'],
'Sales':[8500,4500,9200,8500,9200,9600,8400]}
sales=pd.DataFrame(data)
# Find duplicate rows
duplicated = sales[sales.duplicated(keep=False)]
print("duplicate Row:\n",duplicated)

Output :
duplicate Row:
Name Sales
0 Aman 8500
2 Deepika 9200
3 Aman 8500
4 Deepika 9200

Ques 16). Importing and exporting data between pandas and csv file.
Soln ).
import pandas as pd

csv_to_df = pd.read_csv('C://Users/Mukesh-Sahu/OneDrive/Desktop/St_data.csv',
sep=',', header=0)
print('Reading CSV file in dataframe:\n', csv_to_df)

df_to_csv = csv_to_df.to_csv('St_Record.csv')

Output :
Reading CSV file in dataframe:
Name Eng Phy Chem Maths IP
0 Amit 76 79 97 69 89
1 Sumit 68 88 69 59 87
2 Rekha 59 85 74 85 83
3 Suman 96 94 89 99 97
4 Rupam 92 68 83 89 93
Matplotlib Programs
Ques 17). Given the school result data, analyses the performance of the students on
different parameters, e.g subject wise or class wise.
Soln ).
import matplotlib.pyplot as plt

subject = ['Physic','Chemistry','Mathematics', 'Biology','Computer']


marks =[80,75,70,78,82]

plt.plot(subject, marks, 'r', marker ='*', markeredgecolor='b')


plt.title('Marks Scored')
plt.xlabel('SUBJECT')
plt.ylabel('MARKS')
plt.show()

Output :

Ques 18). For the Data frames created above, analyze, and plot appropriate charts with title
and legend.
Soln ).
import matplotlib.pyplot as plt
import numpy as np

s = ['1st', '2nd', '3rd']


mark_sc = [95,87,76]
mark_com = [98,91,68]

x = np.arange(len(s))
plt.bar(x, mark_sc, label='Science', width=0.2, color='r')
plt.bar(x+0.2, mark_com, label='Commerce', width=0.2, color='b')

plt.title("Bar Graph of Scince & Commerce Student's Marks", color='g',


fontsize='13', loc='center')
plt.legend()
plt.xticks(x+0.1, s)
plt.xlabel('Position', color='g')
plt.ylabel('Marks', color='g')
plt.show()

Output :

Ques19). Write a program to plot a bar chart in python to display the result of a school for
five consecutive years.
Soln ).
import matplotlib.pyplot as plt

year=['2015','2016','2017','2018','2019']
p=[98.50,70.25,55.20,90.5,61.50]
j=['b','g','r','m','c']

plt.bar(year, p, width=0.4, color=j)


plt.xlabel("year")
plt.ylabel("Passing %")
plt.show( )

Output :
Ques 20). Take data of your interest from an open source (e.g. data.gov.in) aggregate and
summarize it. Then plot it using plotting functions of the matplotlib library
Soln ).
import matplotlib.pyplot as plt
import pandas as pd

data = pd.read_csv('C:/Users/Mukesh-Sahu/Downloads/census.csv')
df = data.iloc[1:-1].sort_values('Population', ascending=False).head(10)
print(df)

plt.bar(df['State/UT'], df['Population'], width=0.4)


plt.xticks(rotation=90)
plt.title('Top 10 populated states', color='r', fontsize=14)
plt.xlabel('State')
plt.ylabel('Population')
plt.show()

Output :
Sr. No. State/UT Population
32 33 Uttar Pradesh 137465
1 2 Andra Pradesh 43769
19 20 Maharashtra 40891
4 5 Bihar 40827
34 35 West Bangal 30349
18 19 Madhya Pradesh 29597
30 31 Tamil Nadu 22364
25 26 Odisha 20332
15 16 Karnataka 20266
28 29 Rajasthan 16517

You might also like