DAV Practical
DAV Practical
DAV Practical
def list_of_dict(Heights):
keys = Heights.keys()
values = zip(*[Heights[k] for k in keys])
result = [dict(zip(keys,v)) for v in values]
return result
Practical 2
Write programs in Python using NumPy library to do the following:
a. Compute the mean, standard deviation, and variance of a two dimensional
random integer array
along the second axis.
b. Get the indices of the sorted elements of a given array.
a. B = [56, 48, 22, 41, 78, 91, 24, 46, 8, 33]
c. Create a 2-dimensional array of size m x n integer elements, also print
the shape, type and data
type of the array and then reshape it into nx m array, n and m are user
inputs given at the run time.
d. Test whether the elements of a given array are zero, non-zero and NaN.
Record the indices of
these elements in three separate arrays.
import numpy as np
arr=np.array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]])
print(arr)
r1 = np.mean(arr, 1)
print("\nMean: ", r1)
r2 = np.std(arr, 1)
print("\nstd: ", r2)
r3 = np.var(arr, 1)
print("\nvariance: ", r3)
print('')
print('')
m = int(input("Enter the number of rows:"))
n = int(input("Enter the number of columns:"))
matrix = np.array(entries).reshape(m, n)
print('Matrtix: ',matrix)
print('Type: ',type(matrix))
print('Shape: ',matrix.shape)
print('Data Type: ',matrix.dtype)
reshaped_matrix = matrix.reshape(n,m)
print('Reshaped Matrix: ', reshaped_matrix)
Practical 3
Create a dataframe having at least 3 columns and 50 rows to store
numeric data generated using a random
function. Replace 10% of the values by null values whose index
positions are generated using random function.
Do the following:
a. Identify and count missing values in a dataframe.
b. Drop the column having more than 5 null values.
c. Identify the row label having maximum of the sum of all values in a
row and drop that row.
d. Sort the dataframe on the basis of the first column.
e. Remove all duplicates from the first column.
f. Find the correlation between first and second column and covariance
between second and third
column.
g. Detect the outliers and remove the rows having outliers.
h. Discretize second column and create 5 bins
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(0,100,size=(50,3)),
columns=list('123'))
df.head()
for c in df.sample(int(df.shape[0]*df.shape[1]*0.10)).index:
df.loc[c,str(np.random.randint(1,4))]=np.nan
df
#part A
print(df.isnull().sum().sum())
#part B
for col in df.columns:
print(col,df[col].isnull().sum())
df.dropna(axis = 1,thresh=(df.shape[0]-5)).head()
#part C
sum=df.sum(axis=1)
print("SUM IS :\n",sum)
print("\nMAXIMUM SUM IS :",sum.max())
max_sum_row = df.sum(axis=1).idxmax()
print("\nRow index having maximum sum is :" ,max_sum_row)
#part D
sortdf=df.sort_values('1')
sortdf.head()
# PART E
df =df.drop_duplicates(subset='1',keep = "first")
print(df)
#part F
correlation = df['1'].corr(df['2'])
print("CORRELATION between column 1 and 2 : ", correlation)
covariance = df['2'].cov(df['3'])
print("COVARIANCE between column 2 and 3 :",covariance)
Practical 4
Consider two excel files having attendance of a workshop’s participants for two days. Each
file has three
fields ‘Name’, ‘Time of joining’, duration (in minutes) where names are unique within a file.
Note that duration
may take one of three values (30, 40, 50) only. Import the data into two dataframes and do
the following:
a. Perform merging of the two dataframes to find the names of students who had attended
the
workshop on both days.
b. Find names of all students who have attended workshop on either of the days.
c. Merge two data frames row-wise and find the total number of records in the data frame.
d. Merge two data frames and use two columns names and duration as multi-row indexes.
Generate descriptive statistics for this multi-index.
pd.merge(dfDay1,dfDay2,how='inner',on='Name')
#Find names of all students who have attended workshop on either of the
days.
either_day = pd.merge(dfDay1,dfDay2,how='outer',on='Name')
either_day
#Merge two data frames row-wise and find the total number of records in
the data frame.
#using the either day from part b
either_day['Name'].count()
# Merge two data frames and use two columns names and duration as
multi-row indexes.
#Generate descriptive statistics for this multi-index
both_days =
pd.merge(dfDay1,dfDay2,how='outer',on=['Name','Duration']).copy() #
creates a copy of an existing list
Practical 5
Taking Iris data, plot the following with proper legend and axis labels: (Download IRIS data
from:
https://archive.ics.uci.edu/ml/datasets/iris or import it from sklearn.datasets)
a. Plot bar chart to show the frequency of each class label in the data.
b. Draw a scatter plot for Petal width vs sepal width.
c. Plot density distribution for feature petal length.
d. Use a pair plot to show pairwise bivariate distribution in the Iris Dataset.
iris=datasets.load_iris()
print(iris.keys(), "\n\n")
df=pd.DataFrame(data=iris.data,columns=iris.feature_names)
df["Species"]=iris.target
print(df)
print("\n\n")
df['Species']=df['Species'].apply(lambda x:iris.target_names[x])
print(df)
print("\n\n")
c=df['Species'].value_counts()
print(c)
c.plot(kind="bar",color="red",edgecolor="green")
plt.title("Frequency of each class label")
plt.xlabel("Species")
plt.ylabel("Frequency")
plt.plot()
print("\n\n")
Practical 6
6. Consider any sales training/ weather forecasting dataset
a. Compute mean of a series grouped by another series
b. Fill an intermittent time series to replace all missing dates with values of previous non-
missing date.
c. Perform appropriate year-month string to dates conversion.
d. Split a dataset to group by two columns and then sort the aggregated results within the
groups.
e. Split a given dataframe into groups with bin counts.
import pandas as pd
import numpy as np
fromdf1 = pd.read_csv("/content/weatherHistory.csv")
df1.head()
dateutil.parser import parse
df1.groupby('Summary')['Temperature (C)'].mean()
df1.head()
df1['Formatted Date'].map(lambda d: parse(d))
df1.groupby(['Summary', 'Temperature (C)']).agg({'Humidity':sum})
groups = df1.groupby(["Summary", pd.cut(df1.Humidity, 3)])
groups.size().unstack()
Practical 7
Consider a data frame containing data about students i.e. name, gender and passing
division:
Practical 8
Consider the following data frame containing a family name, gender of the family member
and her/his monthly
income in each record.
df.groupby('Name').sum()
df.groupby('Name').max()
res = df.groupby(["Name","Gender"]).mean()
res
res['Vats':]
Practical Assignment 1
Consider the following DataFrame EXERCISE to answer the given questions
where ‘Kind’ attribute indicates the type of exercise regime followed.
exercise
exercise['ID'] = exercise['ID'].astype('int32')
exercise['Pulse'] = exercise['Pulse'].astype('int32')
exercise['Time'] = exercise['Time'].astype('int32')
"""question 1"""
selected
selected.loc['A','Pulse'].max()
for i in selected['Name'].unique():
print(i, " ",selected['Pulse'].loc[i].max())
"""Question 2"""
Practical assignment 2
Q3 a) Given the following commands to create series sr
import numpy as np
import pandas as pd
sr = pd.Series([‘Madhuri’,’AjaySh@rma’, ‘R@ni’,
‘Radha’,np.nan,’Smita’,’3567’])
Write separate commands to compute the length of each string in the
series, replace @ with ‘a’ in all strings in the series, count the
occurrences of ‘a’ in each string, change the case of all letters, find
all
strings with pattern ‘adh’ in them and find all strings that end with
letter ‘i’.
b) Create a DataFrame of 7 rows and 7 columns containing random
integers in the range of 1 to 100. Compute the correlation of each row
with the preceding row.
c) Write Numpy code to generate a random list of 100 integers (range of
55 to 150) and identify the index of the largest element and smallest
element. Change this list into a 10 x 10 matrix and replace all
diagonal
elements with 1.
Ans3
sr=pd.Series(['Madhuri','AjaySh@rma','R@ni','np.nan','Smita','3567'])
sr.str.len()
sr=sr.str.replace('@','a')
sr
for i in sr:
print(i.count('a'))
for i in sr:
print(i.swapcase())
for i in sr:
if(i.count('adh')>0):
print(i)
print(sr.str.endswith('i'))
df=pd.DataFrame(np.random.randint(1,100,size=(7,7)))
df
df.corrwith(df,axis=1)
rand_num=np.random.normal(55,150,100)
rand_num
rand_num=rand_num.reshape(10,10)
np. fill_diagonal(rand_num, 1)
rand_num
Ans 4
exercise['Diet'] = list(map(lambda x: x.upper(), exercise['Diet']))
exercise
import sys
data.to_csv(sys.stdout, index=False, columns=['ID','Name', 'Diet',
'Pulse', 'Time', 'Kind'])
parsed = pd.read_csv('examples/csv_mindex.csv',
index_col=['Name', 'Diet'])
parsed
Practical assignment 3
Q5
Consider the following DataFrame ADM containing data of freshly admitted students in a
college during various rounds of admission. The DataFrame consists of the student’s name,
cut off list in which he/she has taken admission, date of admission, his/her % of marks,
course code and gender.
Sid Name List DateAdm Marks % CourseCode Gender
S1 Amit Jaiswal I 01-07-2021 97 C001 Male
S2 Pradeep Dubey II 09-07-2021 95 C009 Male
S3 Rinky Arora I 04-07-2021 90 C112 Female
S4 Sonia Shah IV 01-08-2021 96 C001 Female
S5 Sushil Negi III 20-07-2021 96.5 C001 Male
S6 Neeraj Gaur II 11-07-2021 94.5 C009 Male
S7 Preeti Sharma IV 03-08-21 89 C112 Female
S8 Deep Gupta III 23-07-2021 95.75 C001 Male
S9 Priya Bansal II 10-7-2021 93.5 C009 Female
S10 Anand Ahuja I 01-07-2021 88.5 C112 Male
Q6
a) Give Pandas statements to create two data series of random floating-point
numbers where the first data series has a datetime index of all second
Tuesdays of every month of 2021 and the second data series has a datetime
index of 20 continuous dates ending at 31/01/2021.
b) What is resampling? Write python code depicting the usage of resample
method.
c) Create a DataFrame DS with two columns ‘Dates’ and ‘Sale’ containing all
dates of January 2021 and 31 random integers between 500 and 1000
respectively. Add another column ‘Moving Avg’ to DS containing the
rolling average of 5 consecutive values in the ‘Sale’ column. Plot simple
line plots between ‘Dates’ and ‘Sale’ as well as ‘Dates’ and ‘Moving Avg’.
Explain the utility of the rolling method with respect to these plots.