Fundamentals of Data science Lab manual new
Fundamentals of Data science Lab manual new
CS3361
Year/Sem: ___________________
1
DEPARTMERNT OF INFORMATION TECHNOLOGY
LIST OF EXPERIMENTS
S.No NAME OF THE EXPERIMENT
4. Frequency distributions
5. Averages
6. Variability
7. Normal curves
9. Correlation coefficient
10. Regression
2
DEPARTMERNT OF INFORMATION TECHNOLOGY
CONTENTS
5
Develop python program for Frequency distributions
3
DEPARTMERNT OF INFORMATION TECHNOLOGY
AIM
ALGORITHM
Step1: Start
Step2: Import numpy module
Step3: Print the basic characteristics and operactions of
array Step4: Stop
PROGRAM
import numpy as np
# Creating array object
arr = np.array( [[ 1, 2,
3],
[ 4, 2, 5]] )
# Printing type of arr object
print("Array is of type: ", type(arr))
# Printing array dimensions (axes)
print("No. of dimensions: ",
arr.ndim) # Printing shape of array
print("Shape of array: ", arr.shape)
# Printing size (total number of elements) of
array print("Size of array: ", arr.size)
# Printing type of elements in array
print("Array stores elements of type: ", arr.dtype)
OUTPUT
a = np.array([[1,2,3],[3,4,5],[4,5,6]])
4
DEPARTMERNT OF INFORMATION TECHNOLOGY
print(a)
print("After slicing")
print(a[1:])
5
DEPARTMERNT OF INFORMATION TECHNOLOGY
Output
[[1 2 3]
[3 4 5]
[4 5 6]]
After slicing
[[3 4 5]
[4 5 6]]
Output:
Our array is:
[[1 2 3]
[3 4 5]
[4 5 6]]
The items in the second column are:
[2 4 5]
The items in the second row are:
[3 4 5]
The items column 1 onwards are:
[[2 3]
[4 5]
[5 6]]
6
DEPARTMERNT OF INFORMATION TECHNOLOGY
Result:
Thus the working with Numpy arrays was successfully completed.
7
DEPARTMERNT OF INFORMATION TECHNOLOGY
Aim:
ALGORITHM
Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop
PROGRAM
import numpy as np
import pandas as pd
data = np.array([['','Col1','Col2'],
['Row1',1,2],
['Row2',3,4]])
print(pd.DataFrame(data=data[1:,1:],
index = data[1:,0],
columns=data[0,1:]))
# Take a 2D array as input to your DataFrame
my_2darray = np.array([[1, 2, 3], [4, 5, 6]])
print(pd.DataFrame(my_2darray))
Output:
Col1 Col2
Row1 1 2
Row2 3 4
0 1 2
0 1 2 3
1 4 5 61 2 3
0 1 1 2
1 3 2 4A
0 4
1 5
2 6
3 7
0
United Kingdom London
India New Delhi
United States Washington
Belgium Brussels
(2, 3)
2
Result:
9
DEPARTMERNT OF INFORMATION TECHNOLOGY
Thus the working with Pandas data frames was successfully completed.
10
DEPARTMERNT OF INFORMATION TECHNOLOGY
Matplotlib Aim:
ALGORITHM
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop
Program:3a
# x axis values
x = [1,2,3]
# corresponding y axis values
y = [2,4,1]
11
DEPARTMERNT OF INFORMATION TECHNOLOGY
Output:
12
DEPARTMERNT OF INFORMATION TECHNOLOGY
Program:4b
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
plt.plot(a)
# naming the x-
axis plt.xlabel('Day
->')
13
DEPARTMERNT OF INFORMATION TECHNOLOGY
Output:
14
DEPARTMERNT OF INFORMATION TECHNOLOGY
Program:4c
import matplotlib.pyplot as plt
a = [1, 2, 3, 4, 5]
b = [0, 0.6, 0.2, 15, 10, 8, 16, 21]
c = [4, 2, 6, 8, 3, 20, 13, 15]
# use fig whenever u want the
# output in a new window
also # specify the window size
you # want ans to be
displayed
fig = plt.figure(figsize =(10, 10))
sub1.plot(a, 'sb')
sub2.plot(b, 'or')
sub4.plot(c, 'Dm')
Output:
16
DEPARTMERNT OF INFORMATION TECHNOLOGY
Result:
Thus the basic plots using Matplotlib in Python program was successfully completed.
17
DEPARTMERNT OF INFORMATION TECHNOLOGY
Aim:
To Count the frequency of occurrence of a word in a body of text is often needed during
text processing.
ALGORITHM
Program:
from nltk.tokenize import
word_tokenize from nltk.corpus import
gutenberg
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample)
wlist = []
for i in range(50):
wlist.append(token[i])
Result:
Thus the count the frequency of occurrence of a word in a body of text is often needed
during text processing and Conditional Frequency Distribution program using python was
18
DEPARTMERNT OF INFORMATION TECHNOLOGY
successfully completed.
19
DEPARTMERNT OF INFORMATION TECHNOLOGY
Aim:
To compute weighted averages in Python either defining your own functions or using Numpy
ALGORITHM
Program:6c
df['employees_number']),2) weighted_avg_m3
Output:
44225.35
Result:
Thus the compute weighted averages in Python either defining your own functions or using
20
DEPARTMERNT OF INFORMATION TECHNOLOGY
21
DEPARTMERNT OF INFORMATION TECHNOLOGY
Aim
: To write a python program to calculate the variance.
ALGORITHM
Program:
# Python code to demonstrate
variance() # function on varying range
of data-types
# importing statistics
module from statistics
import variance
23
# Print the variance of each samples
print("Variance of Sample1 is % s " %
(variance(sample1))) print("Variance of Sample2 is % s "
%(variance(sample2))) print("Variance of Sample3 is % s
" %(variance(sample3))) print("Variance of Sample4 is %
s " %(variance(sample4))) print("Variance of Sample5 is
% s " %(variance(sample5)))
Output :
Result:
Thus the computation for variance was successfully completed.
Ex. No.:8 Normal Curve
Aim:
To create a normal curve using python program.
ALGORITHM
Program:
sb.set_style('whitegrid')
sb.lineplot(data, pdf , color = 'black')
plt.xlabel('Heights')
plt.ylabel('Probability Density')
Output:
Result:
Thus the normal curve using python program was successfully completed.
Ex. No.: 9 Correlation and scatter plots
Aim:
To write a python program for correlation with scatter plot
ALGORITHM
Program:
# Data
#Plot
# Plot
Result:
Thus the Correlation and scatter plots using python program was successfully completed.
Ex. No.: 10 Correlation coefficient
Aim:
To write a python program to compute correlation coefficient.
ALGORITHM
Program:
i=0
while i < n :
# sum of elements of array
X. sum_X = sum_X + X[i]
# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]
Output :
0.953463
Result:
Thus the computation for correlation coefficient was successfully completed.
Ex. No.: 11 Simple Linear Regression
Aim:
To write a python program for Simple Linear Regression
ALGORITHM
Program:
import numpy as np
import matplotlib.pyplot as plt
# mean of x and y
vector m_x =
np.mean(x)
m_y = np.mean(y)
# predicted response
vector y_pred = b[0] +
b[1]*x
# putting
labels
plt.xlabel('x')
plt.ylabel('y')
# function to show
plot plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating
coefficients b =
estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))
Estimated coefficients:
b_0 = -0.0586206896552
b_1 = 1.45747126437
Graph:
Result:
Thus the computation for Simple Linear Regression was successfully completed.