Lab Mannual
Lab Mannual
Lab Mannual
Tools: Python, Numpy, Scipy, Matplotlib, Pandus, statinodels, seuhorn, plotly, Ivich
SUGGESTED EXERCISE:
TOTAL: 60 PERIODS
HARDWARE
• Standalone Desktops with Windows DS
SOFTWARE
• Python with statistical Packages
Name of the Page Staff
S. Date Marks
Experiment No. Signature
No (100)
5 Regression
Z-test
6
T-test
7
Anova
8
Building and
9 validating linear
models
Building and
validating logistic
10 models
AIM:
ALGORITHM:
Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop
PROGRAM:
import pandas as pd data = {"calories": [420, 380, 390], "duration": [50, 40, 45]}
#load data into a DataFrame object:
df = pd.DataFrame(data) print (df.loc[0])
OUTPUT:
calories 420 duration 50
Name: 0, dtype: int64
RESULT:
Thus the working with Pandas data frames was successfully completed.
Experiment No: 2
BASIC PLOTS USING MATPLOTLIB
Date :
AIM:
To draw basic plots in Python program using Matplotlib
ALGORITHM:
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop
PROGRAM:
import matplotlib.pyplot as plt a = [1, 2, 3, 4, 5] b = [0,
0.6, 0.2, 15, 10, 8, 16, 21] plt.plot(a)
# o is for circles and r is
# for red plt.plot(b, "or") plt.plot(list(range(0, 22,
3))) # naming the x-axis plt.xlabel('Day ->')
# naming the y-axis plt.ylabel('Temp ->') c =
[4, 2, 6, 8, 3, 20, 13, 15] plt.plot(c, label =
'4th Rep') # get current axes command ax =
plt.gca()
# get command over the individual # boundary line of the
graph body ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False) # set the range or the
bounds of # the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)
# set the interval by which # the x-axis set the marks
plt.xticks(list(range(-3, 10))) # set the
intervals by which y-axis
RESULT:
Thus the basic plots using Matplotlib in Python program was successfully completed.
Experiment No: 3a
FREQUENCY DISTRIBUTIONS
Date :
AIM:
To Count the frequency of occurrence of a word in a body of text is often needed during text
processing.
ALGORITHM:
sample = gutenberg.raw("blake-poems.txt")
token = word_tokenize(sample)
wlist = []
for i in range(50):
wlist.append(token[i])
[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS', 2),
(OF', 3),
(INNOCENCE', 2), (AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1),
(BOOK', 1), (of', 2),
(THEL', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2), (INTRODUCTION', 1),
(Piping', 2), (down', 1),
(the', 1), (valleys', 1), (wild', 1), (,', 3), (Piping', 2), (songs', 1), (of', 2), (pleasant', 1),
(glee', 1), (,', 3),
(On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2), (child', 1), (,', 3), (And', 1), (he', 1),
(laughing', 1), (said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]
RESULT:
Thus the compute weighted averages in Python either defining your own functions
or using Numpy was successfully completed.
Experiment No: 3b AVERAGES
Date :
AIM:
To compute weighted averages in Python either defining your own functions or using Numpy
ALGORITHM :
PROGRAM:
weighted_avg_m3
OUTPUT:
44225.35
RESULT:
Thus the compute weighted averages in Python either defining your own functions
or using Numpy was successfully completed.
Experiment No: 3c VARIABILITY
Date :
AIM:
ALGORITHM :
RESULT:
AIM:
ALGORITHM:
# import required
libraries from
scipy.stats import
norm import numpy
as np import
matplotlib.pyplot as
plt
import seaborn as sb
# Creating the
distribution data =
np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )
sb.set_style('whitegrid')
sb.lineplot(data, pdf , color =
'black') plt.xlabel('Heights')
plt.ylabel('Probability
Density')
OUTPUT:
RESULT:
Thus the normal curve using python program was successfully completed.
Experiment No: 4b
CORRELATION AND SCATTER PLOTS
Date :
AIM:
ALGORITHM :
# Data
x-pp random
randn(100)
yl=x*5+9 y2=-
5°x
y3=no_random.randn(100)
#Plot
# Plot
RESULT:
Thus the Correlation and scatter plots using python program was successfully completed.
Experiment No: 4c
CORRELATION COEFFICIENT
Date :
AIM:
To write a python program to compute correlation coefficient.
ALGORITHM :
i=
0
whi
le i
<n
:
# sum of elements of array X.
sum_X = sum_X + X[i]
# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]
# Find the size
of array. n =
len(X)
0.953463
RESULT:
Thus the computation for correlation coefficient was Successfully completed
Experiment No: 5
REGRESSION
Date :
AIM:
ALGORITHM :
import numpy as np
import matplotlib.pyplot as plt
# mean of x
and y vector m_x =
np.mean(x) m_y =
np.mean(y)
# calculating regression
coefficients b_1 = SS_xy /
SS_xx b_0 = m_y -
b_1*m_x
# predicted response
vector y_pred = b[0] + b[1]*x
# putting labels
plt.xlabel('x')
plt.ylabel('y')
Estimated
coefficients: b_0
=-
0.0586206896552
b_1 =
1.45747126437
Graph:
RESULT:
Thus the computation for Simple Linear Regression was successfully completed.
Experiment No: 6 Z-TEST
Date :
AIM:
To Perform Z-test
ALGORITHM:
Stepl: Start
Step2: Import math, numpy, stats models & z-test
Step3: Create a list &Print the z-test list
Step4: Stop
PROGRAM:
else:
print("Fail to Reject NUll Hypothesis")
OUTPUT:
RESULT:
Thus the program for Z-Test case studies has been executed and verified successfully.
Experiment No: 7
T-TEST
Date :
AIM:
To Perform T-test for sampling distribution.
ALGORITHM:
Stepl: Start
Step2: Import random &numpy
Step3: Calculate the standard deviation
Step4: Stop
PROGRAM:
RESULT:
Thus the program for T-test case studies has been executed and verified successfully.
Experiment No: 8
ANOVA
Date :
AIM:
To Perform ANOVA test.
ALGORITHM:
Stepl: Start
Step2: Import scipy
Step3: Import statsmodels
Step4: Calculate ANOVA F and p value
Step 5: Stop
PROGRAM:
RESULT:
Thus the program for ANOVA case studies has been executed and verified successfully.
Experiment No: 9
BUILDING AND VALIDATING LINEAR MODELS
Date :
AIM:
To Perform Linear Regression
ALGORITHM
Stepl: Start
Step2: Import numpy.pandas,seaborn,matplotlib&sklearn
Step3: calculate linear regression using the appropriate functions
Step4: display the result
Step 5: Stop
PROGRAM:
So let’s plot a regression plot to see the correlation between RM and MEDV.
sns.lmplot(x = ‘RM’, y = ‘MEDV’, data = df)
RESULT:
Thus the program for Linear Regression has been executed and verified successfully.
Experiment No: 10
BUILDING AND VALIDATING LOGISTICS MODELS
Date :
AIM:
To Perform Logistic Regression
ALGORITHM:
Stepl: Start
Step2: Import numpy.pandas,seaborn,matplotlib&sklearn
Step3: Calculate logistic regression using the appropriate functions
Step4: Display the result
Step 5: Stop
PROGRAM:
Building the Logistic Regression model:
OUTPUT:
Optimization terminated successfully.
Current function value: 0.352707
Iterations 8
Actual values [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions : [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
Testing the accuracy of the model :
OUTPUT:
Confusion Matrix :
[[6 0]
[2 2]]
Test accuracy = 0.8
RESULT:
Thus the program for Logistics Regression has been executed and verified successfully
Experiment No: 11
TIME SERIES ANALYSIS
Date:
AIM:
ALGORITHM:
Step 5: Stop
PROGRAM:
We start from time series analysis and forecasting for furniture sales.
df=pd.read_excel("Superstore.xls") furniture = df.loc[df['Category'] ==
'Furniture'] A good 4-year furniture sales data.
furniture['Order Date'].min(), furniture['Order Date'].max()
Timestamp(‘2014–01–06 00:00:00’), Timestamp(‘2017–12–30 00:00:00’)
Data Preprocessing
This step includes removing columns we do not need, check missing values, aggregate sales by
date and so on.
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID',
'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID',
'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']
furniture.drop(cols,axis=1,inplace=True) furniture=furniture.sort_values('Order
Date')furniture.isnull().sum()
furniture=furniture.groupby('OrderDate')['Sales'].sum().reset_ index()
Order Date 0
Sales dtype: int64 0
Figure 1
Indexing with Time Series Data furniture=furniture.set_index('OrderDate')
furniture.index
Figure 2
We will use the averages daily sales value for that month instead, and we are using the start of each
month as the timestamp.
y = furniture['Sales'].resample('MS').mean() Have a quick peek
2017 furniture sales data.
y['2017':]
Figure 3
OUTPUT:
RESULT:
Thus the program for Time series analysis has been executed and verified successfully