Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

DAV Practical

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 12

Practical 1

def list_of_dict(Heights):
keys = Heights.keys()
values = zip(*[Heights[k] for k in keys])
result = [dict(zip(keys,v)) for v in values]
return result

Heights = {'Boys':[72,68,70,69,74], 'Girls':[63,65,69,62,61]}


print("\n ORIGINAL DICTIONARY OF LISTS :",Heights)
print("\n NOW LIST OF DICTIONARIES : \n",list_of_dict(Heights))

Practical 2
Write programs in Python using NumPy library to do the following:
a. Compute the mean, standard deviation, and variance of a two dimensional
random integer array
along the second axis.
b. Get the indices of the sorted elements of a given array.
a. B = [56, 48, 22, 41, 78, 91, 24, 46, 8, 33]
c. Create a 2-dimensional array of size m x n integer elements, also print
the shape, type and data
type of the array and then reshape it into nx m array, n and m are user
inputs given at the run time.
d. Test whether the elements of a given array are zero, non-zero and NaN.
Record the indices of
these elements in three separate arrays.

import numpy as np
arr=np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(arr)
r1 = np.mean(arr, 1)
print("\nMean: ", r1)

r2 = np.std(arr, 1)
print("\nstd: ", r2)

r3 = np.var(arr, 1)
print("\nvariance: ", r3)

print('')

arr2=np.array([56, 48, 22, 41, 78, 91, 24, 46, 8, 33])


print('original array: ', arr2)
indices = np.argsort(arr2)
print('indices of sorted array: ', indices)

arr3 = np.array([1, 0, 2, np.nan, 3, 0, 0, 5, 6, 7, np.nan, 0, 8])


print('Original array: ', arr3)

print("\nIndices of elements equal to zero:")


res1 = np.where(arr3 == 0)[0]
print(res1)

print("\nIndices of elements not equal to zero:")


res2 = np.where(arr3 != 0)[0]
print(res2)

print("\nIndices of elements equal to NaN:")


res3 = np.where(np.isnan(arr3))
print(res3)

print('')
m = int(input("Enter the number of rows:"))
n = int(input("Enter the number of columns:"))

print("Enter the entries in a single line (separated by space): ")


entries = list(map(int, input().split()))

matrix = np.array(entries).reshape(m, n)
print('Matrtix: ',matrix)

print('Type: ',type(matrix))
print('Shape: ',matrix.shape)
print('Data Type: ',matrix.dtype)

reshaped_matrix = matrix.reshape(n,m)
print('Reshaped Matrix: ', reshaped_matrix)

Practical 3
Create a dataframe having at least 3 columns and 50 rows to store
numeric data generated using a random
function. Replace 10% of the values by null values whose index
positions are generated using random function.
Do the following:
a. Identify and count missing values in a dataframe.
b. Drop the column having more than 5 null values.
c. Identify the row label having maximum of the sum of all values in a
row and drop that row.
d. Sort the dataframe on the basis of the first column.
e. Remove all duplicates from the first column.
f. Find the correlation between first and second column and covariance
between second and third
column.
g. Detect the outliers and remove the rows having outliers.
h. Discretize second column and create 5 bins

import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(50,3)),
columns=list('123'))
df.head()

for c in df.sample(int(df.shape[0]*df.shape[1]*0.10)).index:
df.loc[c,str(np.random.randint(1,4))]=np.nan
df

#part A
print(df.isnull().sum().sum())
#part B
for col in df.columns:
print(col,df[col].isnull().sum())
df.dropna(axis = 1,thresh=(df.shape[0]-5)).head()

#part C
sum=df.sum(axis=1)
print("SUM IS :\n",sum)
print("\nMAXIMUM SUM IS :",sum.max())
max_sum_row = df.sum(axis=1).idxmax()
print("\nRow index having maximum sum is :" ,max_sum_row)

df = df.drop(max_sum_row ,axis =0)


print("\nDATA Frame AFTER REMOVING THE ROW HAVING MAXIMUM SUM VALUE")
df

#part D
sortdf=df.sort_values('1')
sortdf.head()

# PART E
df =df.drop_duplicates(subset='1',keep = "first")
print(df)

#part F
correlation = df['1'].corr(df['2'])
print("CORRELATION between column 1 and 2 : ", correlation)
covariance = df['2'].cov(df['3'])
print("COVARIANCE between column 2 and 3 :",covariance)

Practical 4
Consider two excel files having attendance of a workshop’s participants for two days. Each
file has three
fields ‘Name’, ‘Time of joining’, duration (in minutes) where names are unique within a file.
Note that duration
may take one of three values (30, 40, 50) only. Import the data into two dataframes and do
the following:
a. Perform merging of the two dataframes to find the names of students who had attended
the
workshop on both days.
b. Find names of all students who have attended workshop on either of the days.
c. Merge two data frames row-wise and find the total number of records in the data frame.
d. Merge two data frames and use two columns names and duration as multi-row indexes.
Generate descriptive statistics for this multi-index.

from google.colab import files


uploaded = files.upload()
dfDay1 = pd.read_excel('Day1_deeksha.xlsx')
dfDay2 = pd.read_excel('Day2_deeksha.xlsx')
print(dfDay1.head(),"\n")
print(dfDay2.head())
# Perform merging of the two dataframes to find the names of students
who had attended the workshop on both days.

pd.merge(dfDay1,dfDay2,how='inner',on='Name')
#Find names of all students who have attended workshop on either of the
days.

either_day = pd.merge(dfDay1,dfDay2,how='outer',on='Name')
either_day
#Merge two data frames row-wise and find the total number of records in
the data frame.
#using the either day from part b

either_day['Name'].count()
# Merge two data frames and use two columns names and duration as
multi-row indexes.
#Generate descriptive statistics for this multi-index

both_days =
pd.merge(dfDay1,dfDay2,how='outer',on=['Name','Duration']).copy() #
creates a copy of an existing list

both_days.fillna(value='-',inplace=True) # to fill out the missing


values in the given series object

both_days.set_index(['Name','Duration']) # a method to set a List as


index of a Data Frame

Practical 5
Taking Iris data, plot the following with proper legend and axis labels: (Download IRIS data
from:
https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn.datasets)

a. Plot bar chart to show the frequency of each class label in the data.
b. Draw a scatter plot for Petal width vs sepal width.
c. Plot density distribution for feature petal length.
d. Use a pair plot to show pairwise bivariate distribution in the Iris Dataset.

from sklearn import datasets


import pandas as pd
import matplotlib.pyplot as plt

iris=datasets.load_iris()
print(iris.keys(), "\n\n")
df=pd.DataFrame(data=iris.data,columns=iris.feature_names)
df["Species"]=iris.target
print(df)
print("\n\n")
df['Species']=df['Species'].apply(lambda x:iris.target_names[x])
print(df)

print("\n\n")

c=df['Species'].value_counts()
print(c)

c.plot(kind="bar",color="red",edgecolor="green")
plt.title("Frequency of each class label")
plt.xlabel("Species")
plt.ylabel("Frequency")
plt.plot()

print("\n\n")

b=plt.scatter(df['petal width (cm)'],df['sepal width


(cm)'],c=iris.target)
plt.title("Petal width vs Sepal Width")
plt.xlabel("Petal Width")
plt.ylabel("Sepal Width")
names=[x for x in iris.target_names]
plt.legend(handles=b.legend_elements()[0],labels=names)
plt.grid()
plt.show()
jkkjklkllkjh76666trtrtrtrt5rtrertrtr
print("\n\n")

df['petal length (cm)'].plot(kind="density")


plt.title("Density distribution for petal length")
plt.xlabel("Petal length")
plt.plot()

import seaborn as sns


sns.pairplot(df, hue="Species")
plt.show()

Practical 6
6. Consider any sales training/ weather forecasting dataset
a. Compute mean of a series grouped by another series
b. Fill an intermittent time series to replace all missing dates with values of previous non-
missing date.
c. Perform appropriate year-month string to dates conversion.
d. Split a dataset to group by two columns and then sort the aggregated results within the
groups.
e. Split a given dataframe into groups with bin counts.

import pandas as pd
import numpy as np
fromdf1 = pd.read_csv("/content/weatherHistory.csv")
df1.head()
dateutil.parser import parse
df1.groupby('Summary')['Temperature (C)'].mean()
df1.head()
df1['Formatted Date'].map(lambda d: parse(d))
df1.groupby(['Summary', 'Temperature (C)']).agg({'Humidity':sum})
groups = df1.groupby(["Summary", pd.cut(df1.Humidity, 3)])
groups.size().unstack()

Practical 7
Consider a data frame containing data about students i.e. name, gender and passing
division:

Name Birth_Month Gender Pass_Division


0 Mudit Chauhan December M III
1 Seema Chopra January F II
2 Rani Gupta March F I
3 Aditya Narayan October M I
4 Sanjeev Sahni February M II
5 Prakash Kumar December M III
6 Ritu Agarwal September F I
7 Akshay Goel August M I
8 Meeta Kulkarni July F II
9 Preeti Ahuja November F II
10 Sunil Das Gupta April M III
11 Sonali Sapre January F I
12 Rashmi Talwar June F III
13 Ashish Dubey May M II
14 Kiran Sharma February F II
15 Sameer Bansal October M I
a. Perform one hot encoding of the last two columns of categorical data using the
get_dummies() function.
b. Sort this data frame on the “Birth Month” column (i.e. January to December). Hint:
Convert Month to Categorical.

data = [['Mudit Chauhan', 'December', 'M', 'III'],


['Seema Chopra', 'January', 'F', 'II'],
['Rani Gupta', 'March', 'F', 'I'],
['Aditya Narayan', 'October', 'M', 'I'],
['Sanjeev Sahni', 'February', 'M', 'II'],
['Prakash Kumar', 'December', 'M', 'III'],
['Ritu Agarwal', 'September', 'F', 'I'],
['Akshay Goel', 'August', 'M', 'I'],
['Meeta Kulkarni', 'July', 'F', 'II'],
['Preeti Ahuja', 'November', 'F', 'II'],
['Sunil Das Gupta', 'April', 'M', 'III'],
['Sonali Sapre', 'January', 'F', 'I'],
['Rashmi Talwar', 'June', 'F', 'III'],
['Ashish Dubey', 'May', 'M', 'II'],
['Kiran Sharma', 'February', 'F', 'II'],
['Sameer Bansal', 'October', 'M', 'I']]

df = pd.DataFrame(data, columns = ['Name', 'Birth-Month', 'Gender',


'Pass-Division'])
df
pd.get_dummies(data=df, columns=['Gender', 'Pass-Division'])
months = ["January", "February", "March", "April", "May", "June",
"July", "August", "September", "October", "November",
"December"]
df.sort_values('Birth-Month', key = lambda x : pd.Categorical(x,
categories=months, ordered=True))

Practical 8
Consider the following data frame containing a family name, gender of the family member
and her/his monthly
income in each record.

Name Gender MonthlyIncome (Rs.)


Shah Male 114000.00
Vats Male 65000.00
Vats Female 43150.00
Kumar Female 69500.00
Vats Female 155000.00
Kumar Male 103000.00
Shah Male 55000.00
Shah Female 112400.00
Kumar Female 81030.00
Vats Male 71900.00
Write a program in Python using Pandas to perform the following:
a. Calculate and display familywise gross monthly income.
b. Calculate and display the member with the highest monthly income in a family.
c. Calculate and display monthly income of all members with income greater than Rs.
60000.00.
d. Calculate and display the average monthly income of the female members in the Shah
family.

data = [['Shah', 'Male', 114000.00],


['Vats', 'Male', 65000.00],
['Vats', 'Female', 43150.00],
['Kumar', 'Female', 69500.00],
['Vats', 'Female', 155000.00],
['Kumar' ,'Male' ,103000.00],
['Shah', 'Male', 55000.00]]

df = pd.DataFrame(data, columns=['Name', 'Gender', 'MonthlyIncome'])


df

df.groupby('Name').sum()

df.groupby('Name').max()

df[df.MonthlyIncome > 60000]

res = df.groupby(["Name","Gender"]).mean()
res

res['Vats':]

Practical Assignment 1
Consider the following DataFrame EXERCISE to answer the given questions
where ‘Kind’ attribute indicates the type of exercise regime followed.

ID Name Diet Pulse


Time
(min) Kind
0 A low fat 85 40 walking
1 A low fat 85 45 walking
2 A no fat 88 30 running
3 B no fat 90 10 walking
4 B no fat 92 15 rest
5 B low fat 93 30 rest
6 C low fat 97 15 rest
7 C low fat 97 15 rest
8 C low fat 94 30 walking
9 D low fat 80 10 walking
10 D low fat 82 15 rest
11 D low fat 83 30 rest
12 E no fat 91 10 rest
13 E low fat 92 15 running
14 E low fat 91 30 running

a) Create a new DataFrame SELECTED having a hierarchical index on


columns “Name” and “Diet”. Then, find the maximum pulse rate for each
individual in the SELECTED DataFrame.
b) Count the total number of records of individuals having names ‘A’ or ‘B’
and who are following a low fat diet plan from the data frame SELECTED
created in part (a). Also, sort DataFrame SELECTED on index at first
level in descending order.
c) Using DataFrame EXERCISE, create a figure with two subplots and save
the figure with the name ‘exerciseplot.jpeg’. Set title of the figure as
‘EXERCISE’. First subplot compares the average pulse rate of individuals
and the second subplot shows the relationship between variables ‘Pulse’
and ‘Time’. Do color encoding using variable ‘kind’ in the scatter plot.

data = np.array([[0,'A','low fat',85,40,'walking'],


[1,'A','low fat',85,45,'walking'],
[2,'A','no fat',88,30,'running'],
[3,'B','no fat',90,10,'walking'],
[4,'B','no fat',92,15,'rest'],
[5,'B','low fat',93,30,'rest'],
[6,'C','low fat',97,15,'rest'],
[7,'C','low fat',97,15,'rest'],
[8,'C','low fat',94,30,'walking'],
[9,'D','low fat',80,10,'walking'],
[10,'D','low fat',82,15,'rest'],
[11,'D','low fat',83,30,'rest'],
[12,'E','no fat',91,10,'rest'],
[13,'E','low fat',92,15,'running'],
[14,'E','low fat',91,30,'running'],
]);

exercise = pd.DataFrame(data, columns = ('ID', 'Name', 'Diet', 'Pulse',


'Time', 'Kind'))

exercise

exercise['ID'] = exercise['ID'].astype('int32')
exercise['Pulse'] = exercise['Pulse'].astype('int32')
exercise['Time'] = exercise['Time'].astype('int32')

"""question 1"""

selected = exercise.set_index(['Name','Diet'], drop = False)

selected

selected.loc['A','Pulse'].max()

for i in selected['Name'].unique():
print(i, " ",selected['Pulse'].loc[i].max())

"""Question 2"""

print("A ",(selected.loc['A','Diet'] == "low fat").sum())


print("B ",(selected['Diet'].loc['B'] == "low fat").sum())

selected.sort_index(level=1, ascending = False)

Practical assignment 2
Q3 a) Given the following commands to create series sr

import numpy as np
import pandas as pd
sr = pd.Series([‘Madhuri’,’AjaySh@rma’, ‘R@ni’,
‘Radha’,np.nan,’Smita’,’3567’])
Write separate commands to compute the length of each string in the
series, replace @ with ‘a’ in all strings in the series, count the
occurrences of ‘a’ in each string, change the case of all letters, find
all
strings with pattern ‘adh’ in them and find all strings that end with
letter ‘i’.
b) Create a DataFrame of 7 rows and 7 columns containing random
integers in the range of 1 to 100. Compute the correlation of each row
with the preceding row.
c) Write Numpy code to generate a random list of 100 integers (range of
55 to 150) and identify the index of the largest element and smallest
element. Change this list into a 10 x 10 matrix and replace all
diagonal
elements with 1.

Q4 Using the data frame EXERCISE provided in Q2 , attempt the following


questions
a) What is a map function? Use map function to convert all values in
the
‘Diet’ attribute to uppercase.
b) Assuming the data is stored in a csv file “Exercise.csv”, give
appropriate
commands to read this file, indexed on ‘Name’ and ‘Diet’ into a
dataframe
named EXERCISE. Modify this command to read only the first 5 rows of
the file. If the file contains millions of records then give the
command to
read the file in small pieces of uniform size.
c) Differentiate between qcut and cut methods. Use the appropriate
method to
create 4 bins on the ‘Pulse’ attribute. Store the corresponding bin
value of
‘Pulse’ attribute as a new attribute ‘Pbin’ in the original DataFrame.
Display the count of values of each bin.

Ans3
sr=pd.Series(['Madhuri','AjaySh@rma','R@ni','np.nan','Smita','3567'])
sr.str.len()
sr=sr.str.replace('@','a')
sr
for i in sr:
print(i.count('a'))
for i in sr:
print(i.swapcase())
for i in sr:
if(i.count('adh')>0):
print(i)
print(sr.str.endswith('i'))
df=pd.DataFrame(np.random.randint(1,100,size=(7,7)))
df
df.corrwith(df,axis=1)
rand_num=np.random.normal(55,150,100)
rand_num
rand_num=rand_num.reshape(10,10)
np. fill_diagonal(rand_num, 1)
rand_num

Ans 4
exercise['Diet'] = list(map(lambda x: x.upper(), exercise['Diet']))
exercise
import sys
data.to_csv(sys.stdout, index=False, columns=['ID','Name', 'Diet',
'Pulse', 'Time', 'Kind'])

parsed = pd.read_csv('examples/csv_mindex.csv',
index_col=['Name', 'Diet'])
parsed

Practical assignment 3
Q5
Consider the following DataFrame ADM containing data of freshly admitted students in a
college during various rounds of admission. The DataFrame consists of the student’s name,
cut off list in which he/she has taken admission, date of admission, his/her % of marks,
course code and gender.
Sid Name List DateAdm Marks % CourseCode Gender
S1 Amit Jaiswal I 01-07-2021 97 C001 Male
S2 Pradeep Dubey II 09-07-2021 95 C009 Male
S3 Rinky Arora I 04-07-2021 90 C112 Female
S4 Sonia Shah IV 01-08-2021 96 C001 Female
S5 Sushil Negi III 20-07-2021 96.5 C001 Male
S6 Neeraj Gaur II 11-07-2021 94.5 C009 Male
S7 Preeti Sharma IV 03-08-21 89 C112 Female
S8 Deep Gupta III 23-07-2021 95.75 C001 Male
S9 Priya Bansal II 10-7-2021 93.5 C009 Female
S10 Anand Ahuja I 01-07-2021 88.5 C112 Male

Perform the following:


a) Set the first column ‘Sid’ as the row index of the given DataFrame ADM.
Create a pivot table of the DataFrame to display the total number of admissions as per
‘Course Code’ and ‘Gender’.
b) For each ‘List’, find the total number of admissions, minimum ‘Marks%’ and maximum
‘Marks%’ in each course.
c) Calculate and display the average ‘Marks%’ of all Female students of course ‘C112’.

Q6
a) Give Pandas statements to create two data series of random floating-point
numbers where the first data series has a datetime index of all second
Tuesdays of every month of 2021 and the second data series has a datetime
index of 20 continuous dates ending at 31/01/2021.
b) What is resampling? Write python code depicting the usage of resample
method.
c) Create a DataFrame DS with two columns ‘Dates’ and ‘Sale’ containing all
dates of January 2021 and 31 random integers between 500 and 1000
respectively. Add another column ‘Moving Avg’ to DS containing the
rolling average of 5 consecutive values in the ‘Sale’ column. Plot simple
line plots between ‘Dates’ and ‘Sale’ as well as ‘Dates’ and ‘Moving Avg’.
Explain the utility of the rolling method with respect to these plots.

You might also like