Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Lab Mannual

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

SYLLABUS

AD3411 DATA SCIENCE AND ANALYTICS LABORATORY LTPC

COURSE OBJECTIVES 0042


• To develop data analytic code in python
• To be able to me python libraries for handling data
• To develop analytical applications song python
• To perform data visualization sang plats

Tools: Python, Numpy, Scipy, Matplotlib, Pandus, statinodels, seuhorn, plotly, Ivich

SUGGESTED EXERCISE:

1. Working with Pandas data frames.


2. Basic plots using Matplotlib.
3. Frequency distributions. Averages, Variability.
4. Normal curves, Correlation and scatter plots, Correlation coefficient.
5. Regression.
6.Z-test.
7.T-test.
8.ANOVA.
9. Building and validating foscar models.
10. Building and validating logistic models.
11.Time Series Analysis.

TOTAL: 60 PERIODS
HARDWARE
• Standalone Desktops with Windows DS

SOFTWARE
• Python with statistical Packages
Name of the Page Staff
S. Date Marks
Experiment No. Signature
No (100)

Tools:Python, Numpy, Scipy, Matplotib, Pandas,Statmodels,


Seaborn,Plotly,Bokeh,working with Numpy arrays

Working with Pandas


1
data frame

Basic Plots using


2
Matplotlib
Frequency
3 distributors, Averages,
Variability
Normal Curves,
Correlation and scatter
4 plots, Correlation
coefficient

5 Regression

Z-test
6

T-test
7

Anova
8

Building and
9 validating linear
models
Building and
validating logistic
10 models

Time series analysis


11
Experiment No:1
WORKING WITH PANDAS DATA FRAMES
Date:

AIM:

To work with Pandas data frames

ALGORITHM:

Step1: Start
Step2: import numpy and pandas module
Step3: Create a dataframe using the dictionary
Step4: Print the output
Step5: Stop

PROGRAM:
import pandas as pd data = {"calories": [420, 380, 390], "duration": [50, 40, 45]}
#load data into a DataFrame object:
df = pd.DataFrame(data) print (df.loc[0])
OUTPUT:
calories 420 duration 50
Name: 0, dtype: int64

RESULT:

Thus the working with Pandas data frames was successfully completed.
Experiment No: 2
BASIC PLOTS USING MATPLOTLIB
Date :

AIM:
To draw basic plots in Python program using Matplotlib

ALGORITHM:
Step1: Start
Step2: import Matplotlib module
Step3: Create a Basic plots using Matplotlib
Step4: Print the output
Step5: Stop
PROGRAM:
import matplotlib.pyplot as plt a = [1, 2, 3, 4, 5] b = [0,
0.6, 0.2, 15, 10, 8, 16, 21] plt.plot(a)
# o is for circles and r is
# for red plt.plot(b, "or") plt.plot(list(range(0, 22,
3))) # naming the x-axis plt.xlabel('Day ->')
# naming the y-axis plt.ylabel('Temp ->') c =
[4, 2, 6, 8, 3, 20, 13, 15] plt.plot(c, label =
'4th Rep') # get current axes command ax =
plt.gca()
# get command over the individual # boundary line of the
graph body ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False) # set the range or the
bounds of # the left boundary line to fixed range
ax.spines['left'].set_bounds(-3, 40)
# set the interval by which # the x-axis set the marks
plt.xticks(list(range(-3, 10))) # set the
intervals by which y-axis

# set the marks


plt.yticks(list(range(-3, 20, 3)))
# legend denotes that what color
# signifies what ax.legend(['1st Rep', '2nd Rep', '3rd Rep', '4th Rep'])
# annotate command helps to write
# ON THE GRAPH any text xy denotes # the position on the graph
plt.annotate('Temperature V / s Days', xy = (1.01, -2.15))
# gives a title to the Graph plt.title('All Features Discussed')
plt.show()
OUTPUT:

RESULT:

Thus the basic plots using Matplotlib in Python program was successfully completed.
Experiment No: 3a
FREQUENCY DISTRIBUTIONS
Date :

AIM:

To Count the frequency of occurrence of a word in a body of text is often needed during text
processing.

ALGORITHM:

Step 1: Start the Program


Step 2: Create text file blake-poems.txt
Step 3: Import the word_tokenize function and gutenberg
Step 4: Write the code to count the frequency of occurrence of a word in a body of text
Step 5: Print the result
Step 6: Stop the process
PROGRAM:

from nltk.tokenize import word_tokenize


from nltk.corpus import gutenberg

sample = gutenberg.raw("blake-poems.txt")

token = word_tokenize(sample)
wlist = []

for i in range(50):
wlist.append(token[i])

wordfreq = [wlist.count(w) for w in wlist]


print("Pairs\n" + str(zip(token, wordfreq)))
OUTPUT:

[([', 1), (Poems', 1), (by', 1), (William', 1), (Blake', 1), (1789', 1), (]', 1), (SONGS', 2),
(OF', 3),
(INNOCENCE', 2), (AND', 1), (OF', 3), (EXPERIENCE', 1), (and', 1), (THE', 1),
(BOOK', 1), (of', 2),
(THEL', 1), (SONGS', 2), (OF', 3), (INNOCENCE', 2), (INTRODUCTION', 1),
(Piping', 2), (down', 1),
(the', 1), (valleys', 1), (wild', 1), (,', 3), (Piping', 2), (songs', 1), (of', 2), (pleasant', 1),
(glee', 1), (,', 3),
(On', 1), (a', 2), (cloud', 1), (I', 1), (saw', 1), (a', 2), (child', 1), (,', 3), (And', 1), (he', 1),
(laughing', 1), (said', 1), (to', 1), (me', 1), (:', 1), (``', 1)]

RESULT:
Thus the compute weighted averages in Python either defining your own functions
or using Numpy was successfully completed.
Experiment No: 3b AVERAGES

Date :

AIM:
To compute weighted averages in Python either defining your own functions or using Numpy

ALGORITHM :

Step 1: Start the Program


Step 2: Create the employees_salary table and save as .csv file
Step 3: Import packages (pandas and numpy) and the employees_salary table itself:
Step 4: Calculate weighted sum and average using Numpy Average() Function
Step 5 : Stop the process

PROGRAM:

#Method Using Numpy Average() Function

weighted_avg_m3 = round(average( df['salary_p_year'], weights =


df['employees_number']),2)

weighted_avg_m3
OUTPUT:

44225.35

RESULT:
Thus the compute weighted averages in Python either defining your own functions
or using Numpy was successfully completed.
Experiment No: 3c VARIABILITY

Date :

AIM:

To write a python program to calculate the variance.

ALGORITHM :

Step 1: Start the Program


Step 2: Import statistics module from statistics import variance
Step 3: Import fractions as parameter values from fractions import Fraction as fr
Step 4: Create tuple of a set of positive and negative numbers
Step 5: Print the variance of each samples
Step 6: Stop the process
PROGRAM:

# Python code to demonstrate variance()


# function on varying range of data-types

# importing statistics module

from statistics import variance

# importing fractions as parameter values


from fractions import Fraction as fr

# tuple of a set of positive integers


# numbers are spread apart but not
very much
sample1 = (1, 2, 5, 4, 8, 9, 12)

# tuple of a set of negative integers


sample2 = (-2, -4, -3, -1, -5, -6)

# tuple of a set of positive and negative


numbers # data-points are spread apart
considerably sample3 = (-9, -1, -0, 2, 1,
3, 4, 19)

# tuple of a set of fractional numbers


sample4 = (fr(1, 2), fr(2, 3), fr(3, 4), fr(5, 6), fr(7, 8))

# tuple of a set of floating point values


sample5 = (1.23, 1.45, 2.1, 2.2, 1.9)
# Print the variance of each samples
print("Variance of Sample1 is % s "
%(variance(sample1))) print("Variance of
Sample2 is % s " %(variance(sample2)))
print("Variance of Sample3 is % s "
%(variance(sample3))) print("Variance of
Sample4 is % s " %(variance(sample4)))
print("Variance of Sample5 is % s "
%(variance(sample5)))
OUTPUT:

Variance of Sample 1 is 15.80952380952381


Variance of Sample 2 is 3.5
Variance of Sample 3 is 61.125
Variance of Sample 4 is 1/45
Variance of Sample 5 is 0.17613000000000006

RESULT:

Thus the computation for variance was successfully completed.


Experiment No: 4a
NORMAL CURVES
Date :

AIM:

To create a normal curve using python program.


To write a python program for correlation with scatter plot and compute correlation
coefficient

ALGORITHM:

Step 1: Start the Program


Step 2: Import packages scipy and call function scipy.stats
Step 3: Import packages numpy, matplotlib and seaborn
Step 4: Create the distribution
Step 5: Visualizing the distribution
Step 6: Stop the process
PROGRAM:

# import required
libraries from
scipy.stats import
norm import numpy
as np import
matplotlib.pyplot as
plt
import seaborn as sb

# Creating the
distribution data =
np.arange(1,10,0.01)
pdf = norm.pdf(data , loc = 5.3 , scale = 1 )

#Visualizing the distribution

sb.set_style('whitegrid')
sb.lineplot(data, pdf , color =
'black') plt.xlabel('Heights')
plt.ylabel('Probability
Density')
OUTPUT:

RESULT:

Thus the normal curve using python program was successfully completed.
Experiment No: 4b
CORRELATION AND SCATTER PLOTS
Date :

AIM:

To write a python program for correlation with scatter plot .

ALGORITHM :

Step 1: Start the Program


Step 2: Create variable y1, y2
Step 3: Create variable x, y3 using random function
Step 4: plot the scatter plot
Step 5: Print the result
Step 6: Stop the process
PROGRAM:

# Scatterplot and Correlations

# Data

x-pp random
randn(100)
yl=x*5+9 y2=-
5°x
y3=no_random.randn(100)

#Plot

plt.reParams update('figure figsize' (10,8), 'figure dpi¹:100}) plt scatter(x,


yl, label=fyl, Correlation = {np.round(np.corrcoef(x,y1)[0,1], 2)}) plt
scatter(x, y2, label=fy2 Correlation = (np.round(np.corrcoef(x,y2)[0,1],
2)}) plt scatter(x, y3, label=fy3 Correlation =
(np.round(np.corrcoef(x,y3)[0,1], 2)})

# Plot

plt titlef('Scatterplot and Correlations')


plt(legend)
plt(show)
OUTPUT:

RESULT:

Thus the Correlation and scatter plots using python program was successfully completed.
Experiment No: 4c
CORRELATION COEFFICIENT
Date :

AIM:
To write a python program to compute correlation coefficient.

ALGORITHM :

Step 1: Start the Program


Step 2: Import math package
Step 3: Define correlation coefficient function
Step 4: Calculate correlation using formula
Step 5:Print the result
Step 6 : Stop the process
PROGRAM:

# Python Program to find correlation coefficient.


import math

# function that returns correlation


coefficient. def
correlationCoefficient(X, Y, n) :
sum_X = 0 sum_Y = 0
sum_XY = 0 squareSum_X = 0
squareSum_Y = 0

i=
0
whi
le i
<n
:
# sum of elements of array X.
sum_X = sum_X + X[i]

# sum of elements of array Y.


sum_Y = sum_Y + Y[i]

# sum of X[i] * Y[i].


sum_XY = sum_XY + X[i] * Y[i]

# sum of square of array elements.


squareSum_X = squareSum_X + X[i] * X[i]
squareSum_Y = squareSum_Y + Y[i] * Y[i] i=i
+1

# use formula for calculating


correlation # coefficient.
corr = (float)(n * sum_XY - sum_X *
sum_Y)/ (float)(math.sqrt((n *
squareSum_X - sum_X *
sum_X)* (n * squareSum_Y -
sum_Y * sum_Y)))
return corr

# Driver function
X = [15, 18, 21, 24, 27]
Y = [25, 25, 27, 31, 32]
# Find the size
of array. n =
len(X)

# Function call to correlationCoefficient.


print
('{0:.6f}'.format(correlationCoefficient(X,
Y, n)))
OUTPUT:

0.953463

RESULT:
Thus the computation for correlation coefficient was Successfully completed
Experiment No: 5
REGRESSION
Date :

AIM:

To write a python program for Simple Linear Regression .

ALGORITHM :

Step 1: Start the Program


Step 2: Import numpy and matplotlib package
Step 3: Define coefficient function
Step 4: Calculate cross-deviation and deviation about x
Step 5: Calculate regression coefficients
Step 6: Plot the Linear regression and define main function
Step 7: Print the result
Step 8: Stop the process
PROGRAM:

import numpy as np
import matplotlib.pyplot as plt

def estimate_coef(x, y):


# number of observations/points
n = np.size(x)

# mean of x
and y vector m_x =
np.mean(x) m_y =
np.mean(y)

# calculating cross-deviation and deviation about x


SS_xy = np.sum(y*x) - n*m_y*m_x
SS_xx = np.sum(x*x) - n*m_x*m_x

# calculating regression
coefficients b_1 = SS_xy /
SS_xx b_0 = m_y -
b_1*m_x

return (b_0, b_1)

def plot_regression_line(x, y, b):


# plotting the actual points as scatter
plot plt.scatter(x, y, color = "m",
marker = "o", s = 30)

# predicted response
vector y_pred = b[0] + b[1]*x

# plotting the regression line


plt.plot(x, y_pred, color = "g")

# putting labels
plt.xlabel('x')
plt.ylabel('y')

# function to show plot


plt.show()
def main():
# observations / data
x = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12])
# estimating
coefficients b =
estimate_coef(x, y)
print("Estimated coefficients:\nb_0 = {} \
\nb_1 = {}".format(b[0], b[1]))

# plotting regression line


plot_regression_line(x, y, b)
if __name__ ==
"__main__":
main()
OUTPUT:

Estimated
coefficients: b_0
=-
0.0586206896552
b_1 =
1.45747126437

Graph:

RESULT:
Thus the computation for Simple Linear Regression was successfully completed.
Experiment No: 6 Z-TEST
Date :

AIM:

To Perform Z-test

ALGORITHM:

Stepl: Start
Step2: Import math, numpy, stats models & z-test
Step3: Create a list &Print the z-test list
Step4: Stop
PROGRAM:

# imports import math import numpy as np from numpy.random


import randn from statsmodels.stats.weightstats import ztest
# Generate a random array of 50 numbers having mean 110 and sd 15
# similar to the IQ scores data we assume above mean_iq = 110
sd_iq = 15/math.sqrt(50) alpha = 0.05 null_mean =100 data =
sd_iq*randn(50)+mean_iq
# print mean and sd print('mean=%.2f stdv=%.2f' % (np.mean(data),
np.std(data)))
# now we perform the test. In this function, we passed data, in the value parameter
# we passed mean value in the null hypothesis, in alternative hypothesis we check
whether the
# mean is larger ztest_Score,p_value=ztest(data,value=null_mean,alternative='la
rger')
# the function outputs a p_value and z-score corresponding to that value, we
compare the
# p-value with alpha, if it is greater than alpha then we do not null hypothesis # else
we reject it. if(p_value < alpha):
print("Reject Null Hypothesis")

else:
print("Fail to Reject NUll Hypothesis")
OUTPUT:

Reject Null Hypothesis

RESULT:
Thus the program for Z-Test case studies has been executed and verified successfully.
Experiment No: 7
T-TEST
Date :

AIM:
To Perform T-test for sampling distribution.

ALGORITHM:

Stepl: Start
Step2: Import random &numpy
Step3: Calculate the standard deviation
Step4: Stop
PROGRAM:

# Importing the required libraries and packages import numpy as np


from scipy import stats
# Defining two random distributions
# Sample Size
N = 10
# Gaussian distributed data with mean = 2 and var = 1 x = np.random.randn(N) + 2
# Gaussian distributed data with mean = 0 and var = 1 y = np.random.randn(N)
# Calculating the Standard Deviation
# Calculating the variance to get the standard deviation var_x = x.var(ddof = 1)
var_y = y.var(ddof = 1)
# Standard Deviation
SD = np.sqrt((var_x + var_y) / 2) print("Standard Deviation =", SD) #
Calculating the T-Statistics tval = (x.mean() - y.mean()) / (SD *
np.sqrt(2 / N))
# Comparing with the critical T-Value
# Degrees of freedom dof = 2 * N - 2
# p-value after comparison with the T-Statistics pval = 1 - stats.t.cdf( tval,
df = dof) print("t = " + str(tval)) print("p = " + str(2 * pval))
## Cross Checking using the internal function from SciPy Packa ge tval2, pval2 =
stats.ttest_ind(x, y) print("t = " + str(tval2)) print("p = " + str(pval2))
OUTPUT:
Standard Deviation = 0.7642398582227466
t = 4.87688162540348
p = 0.0001212767169695983 t = 4.876881625403479
p = 0.00012127671696957205

RESULT:
Thus the program for T-test case studies has been executed and verified successfully.
Experiment No: 8
ANOVA
Date :

AIM:
To Perform ANOVA test.

ALGORITHM:
Stepl: Start
Step2: Import scipy
Step3: Import statsmodels
Step4: Calculate ANOVA F and p value
Step 5: Stop
PROGRAM:

# Installing the package install.packages("dplyr")


# Loading the package library(dplyr)
# Variance in mean within group and between group
boxplot(mtcars$disp~factor(mtcars$gear), xlab = "gear", ylab
= "disp")
# Step 1: Setup Null Hypothesis and Alternate Hypothesis
# H0 = mu = mu01 = mu02 (There is no difference
# between average displacement for different gear)
# H1 = Not all means are equal
# Step 2: Calculate test statistics using aov function mtcars_aov <-
aov(mtcars$disp~factor(mtcars$gear)) summary(mtcars_aov)
# Step 3: Calculate F-Critical Value
# For 0.05 Significant value, critical value = alpha = 0.05
# Step 4: Compare test statistics with F-Critical value
# and conclude test p <alpha, Reject Null Hypothesis
OUTPUT:

RESULT:
Thus the program for ANOVA case studies has been executed and verified successfully.
Experiment No: 9
BUILDING AND VALIDATING LINEAR MODELS
Date :

AIM:
To Perform Linear Regression

ALGORITHM

Stepl: Start
Step2: Import numpy.pandas,seaborn,matplotlib&sklearn
Step3: calculate linear regression using the appropriate functions
Step4: display the result
Step 5: Stop
PROGRAM:

# Importing the necessary libraries import pandas as pd import


numpy as np import matplotlib.pyplot as plt import
seaborn as sns from sklearn.datasets import load_boston
sns.set(style=”ticks”,color_codes=True)
plt.rcParams[‘figure.figsize’] = (8,5)
plt.rcParams[‘figure.dpi’] = 150
# loading the databoston = load_boston() You can check those keys with the
following code. print(boston.keys()) The output will be as follow:
dict_keys([‘data’, ‘target’, ‘feature_names’, ‘DESCR’,
‘filename’]) print(boston.DESCR)

You will find these details in output:


Attribute Information (in order):
— CRIM per capita crime rate by town
— ZN proportion of residential land zoned for lots over 25,000 sq.ft. — INDUS
proportion of non-retail business acres per town
— CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
— NOX nitric oxides concentration (parts per 10 million)
— RM average number of rooms per dwelling
— AGE proportion of owner-occupied units built prior to 1940
— DIS weighted distances to five Boston employment centres
— RAD index of accessibility to radial highways
— TAX full-value property-tax rate per $10,000
— PTRATIO pupil-teacher ratio by town
— B 1000 (Bk — 0.63)² where Bk is the proportion of blacks by town
— LSTAT % lower status of the population
— MEDV Median value of owner-occupied homes in $1000’s :Missing
Attribute Values: None
df=pd.DataFrame(boston.data,columns=boston.feature_names) df.head()
# print the columns present in the dataset print(df.columns)
# print the top 5 rows in the dataset print(df.head())
OUTPUT:
First five records from data set

#plotting heatmap for overall data setsns.heatmap(df.corr(), square=True, cmap=’RdYlGn’)


Heat map of overall data set

So let’s plot a regression plot to see the correlation between RM and MEDV.
sns.lmplot(x = ‘RM’, y = ‘MEDV’, data = df)

Regression plot with RM and MEDV

RESULT:
Thus the program for Linear Regression has been executed and verified successfully.
Experiment No: 10
BUILDING AND VALIDATING LOGISTICS MODELS
Date :

AIM:
To Perform Logistic Regression

ALGORITHM:

Stepl: Start
Step2: Import numpy.pandas,seaborn,matplotlib&sklearn
Step3: Calculate logistic regression using the appropriate functions
Step4: Display the result
Step 5: Stop

PROGRAM:
Building the Logistic Regression model:

# importing libraries import statsmodels.api as sm


import pandas as pd
# loading the training dataset df = pd.read_csv('logit_train1.csv', index_col = 0)
# defining the dependent and independent variables Xtrain =
df[['gmat', 'gpa', 'work_experience']] ytrain = df[['admitted']]
# building the model and fitting the data log_reg = sm.Logit(ytrain,
Xtrain).fit()
OUTPUT :
Optimization terminated successfully.
Current function value: 0.352707 Iterations 8
# printing the summary table print(log_reg.summary())

Logit Regression Results


=============================================================
Dep. Variable: admitted No. Observations: 30
Model: Logit Df Residuals: 27
Method: MLE Df Model: 2
Date: Wed, 15 Jul 2020 Pseudo R-squ.: 0.4912
Time: 16:09:17 Log-Likelihood: -10.581
converged: True LL-Null: -20.794
Covariance Type: nonrobust LLR p-value: 3.668e-05
=============================================================
===
coef std err z P>|z| [0.025 0.975]
-----------------------------------------------------------------------------------
gmat -0.0262 0.011 -2.383 0.017 -0.048 -0.005
gpa 3.9422 1.964 2.007 0.045 0.092 7.792

work_experience 1.1983 0.482 2.487 0.013 0.254 2.143

Predicting on New Data :

# loading the testing dataset df = pd.read_csv('logit_test1.csv', index_col = 0) #


defining the dependent and independent variables Xtest = df[['gmat', 'gpa',
'work_experience']] ytest = df['admitted']
# performing predictions on the test dataset yhat = log_reg.predict(Xtest)
prediction = list(map(round, yhat))
# comparing original and predicted values of y print('Actual values', list(ytest.values))
print('Predictions :', prediction)

OUTPUT:
Optimization terminated successfully.
Current function value: 0.352707
Iterations 8
Actual values [0, 0, 0, 0, 0, 1, 1, 0, 1, 1]
Predictions : [0, 0, 0, 0, 0, 0, 0, 0, 1, 1]
Testing the accuracy of the model :

from sklearn.metrics import (confusion_matrix, accuracy_score)


# confusion matrix cm = confusion_matrix(ytest, prediction) print ("Confusion Matrix :
\n", cm) # accuracy score of the model print('Test accuracy = ',
accuracy_score(ytest, prediction))

OUTPUT:

Confusion Matrix :
[[6 0]
[2 2]]
Test accuracy = 0.8

RESULT:
Thus the program for Logistics Regression has been executed and verified successfully
Experiment No: 11
TIME SERIES ANALYSIS

Date:

AIM:

To Perform Time series analysis.

ALGORITHM:

Step1: Start Time Series Analysis

Step2: Import numpy.pandas, matplotlib&seaborn

Step3: draw the plot

Step4: display the plo

Step 5: Stop
PROGRAM:

We are using Superstore sales data .


import warnings import itertools import numpy as np import matplotlib.pyplot as plt
warnings.filterwarnings("ignore") plt.style.use('fivethirtyeight') import pandas
as pd import statsmodels.api as sm import
matplotlibmatplotlib.rcParams['axes.labelsize'] = 14
matplotlib.rcParams['xtick.labelsize'] = 12 matplotlib.rcParams['ytick.labelsize']
= 12 matplotlib.rcParams['text.color'] = 'k'

We start from time series analysis and forecasting for furniture sales.
df=pd.read_excel("Superstore.xls") furniture = df.loc[df['Category'] ==
'Furniture'] A good 4-year furniture sales data.
furniture['Order Date'].min(), furniture['Order Date'].max()
Timestamp(‘2014–01–06 00:00:00’), Timestamp(‘2017–12–30 00:00:00’)

Data Preprocessing
This step includes removing columns we do not need, check missing values, aggregate sales by
date and so on.
cols = ['Row ID', 'Order ID', 'Ship Date', 'Ship Mode', 'Customer ID',
'Customer Name', 'Segment', 'Country', 'City', 'State', 'Postal Code', 'Region', 'Product ID',
'Category', 'Sub-Category', 'Product Name', 'Quantity', 'Discount', 'Profit']

furniture.drop(cols,axis=1,inplace=True) furniture=furniture.sort_values('Order
Date')furniture.isnull().sum()
furniture=furniture.groupby('OrderDate')['Sales'].sum().reset_ index()

Order Date 0
Sales dtype: int64 0

Figure 1
Indexing with Time Series Data furniture=furniture.set_index('OrderDate')
furniture.index

Figure 2

We will use the averages daily sales value for that month instead, and we are using the start of each
month as the timestamp.
y = furniture['Sales'].resample('MS').mean() Have a quick peek
2017 furniture sales data.
y['2017':]

Figure 3
OUTPUT:

Visualizing Furniture Sales Time Series Data


y.plot(figsize=(15,6)) plt.show()

RESULT:

Thus the program for Time series analysis has been executed and verified successfully

You might also like