Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
90 views46 pages

Walmart (Project)

Download as docx, pdf, or txt
Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1/ 46

12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook

# Capstone Project - Walmart #


Table of Contents

Problem Statement
Project Objective
Data Description
Data Pre-processing Steps and Inspiration
Choosing the Algorithm for the Project
Motivation and Reasons For Choosing the Algorithm
Assumptions
Model Evaluation and Techniques
Inferences from the Same
Future Possibilities of the Project
Conclusion
References

# Problem Statement
A retail store that has multiple outlets across the country are facing issues in managing the inventory - to match the demand w

# Data Description
Data description, various insights from the data.

The Walmart DataSet.csv contains 6435 rows and 8 columns.

You are provided with the weekly sales data for their various outlets. Use statistical analysis, EDA, outlier analysis, and han
insights that can give them a clear perspective on the following:

If the weekly sales are affected by the unemployment rate, if yes - which stores are suffering the most?
If the weekly sales show a seasonal trend, when and what could be the reason?

localhost:8888/notebooks/Project/walmart(Project).ipynb 1/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook
Does temperature affect the weekly sales in any manner?
How is the Consumer Price index affecting the weekly sales of various stores?
Top performing stores according to the historical data.
The worst performing store, and how significant is the difference between the highest and lowest performing stores.

2. Use predictive modeling techniques to forecast the sales for each store for the next 12 weeks.

# Data Preprocessing Steps And Inspiration


The preprocessing of the data included the following steps:
First step to import the labriery
Second step to read the data file Walmart DataSet.csv.
45 different stores in this dataset.
Lets select the any store id from (1-45).
Check data inforamation & shape, duplicated, isnull etc.

In [1]: import pandas as pd


import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]: data = pd.read_csv('Walmart DataSet.csv') data.set_index('Date', inplace=True)


# There are about 45 different stores in this dataset. Lets select the any store id from 1-45
a= int(input("Enter the store id:")) store = data[data.Store == a]
sales = pd.DataFrame(store.Weekly_Sales.groupby(store.index).sum()) sales.dtypes

Enter the store id:11

Out[2]: Weekly_Sales float64


dtype: object

localhost:8888/notebooks/Project/walmart(Project).ipynb 2/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook

In [3]: sales.head(30)

localhost:8888/notebooks/Project/walmart(Project).ipynb 3/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook

Out[3]:
Weekly_Sales

Date

01-04-2011 1258674.12

01-06-2012 1361595.33

01-07-2011 1297472.06

01-10-2010 1182490.46

02-03-2012 1438383.44

02-04-2010 1446210.26

02-07-2010 1302600.14

02-09-2011 1297792.41

02-12-2011 1399322.44

03-02-2012 1376732.18

03-06-2011 1343637.00

03-08-2012 1399341.07

03-09-2010 1303914.27

03-12-2010 1380522.64

04-02-2011 1422546.05

04-03-2011 1399456.99

04-05-2012 1370251.22

04-06-2010 1396322.19

04-11-2011 1458287.38

05-02-2010 1528008.64

05-03-2010 1426622.65

05-08-2011 1403198.94

05-10-2012 1422794.26

05-11-2010 1332759.13

localhost:8888/notebooks/Project/walmart(Project).ipynb 4/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook

Weekly_Sales

Date

06-01-2012 1283885.55

06-04-2012 1596325.01

06-05-2011 1331453.41

06-07-2012 1461129.94

06-08-2010 1369634.92

07-01-2011 1178905.44

In [4]: data.info()
print(data.shape)

<class 'pandas.core.frame.DataFrame'>
Index: 6435 entries, 05-02-2010 to 26-10-2012
Data columns (total 7 columns):
# Column Non-Null Count Dtype

0 Store 6435 non-null int64


1 Weekly_Sales 6435 non-null float64
2 Holiday_Flag 6435 non-null int64
3 Temperature 6435 non-null float64
4 Fuel_Price 6435 non-null float64
5 CPI 6435 non-null float64
6 Unemployment 6435 non-null float64
dtypes: float64(5), int64(2)
memory usage: 402.2+ KB
(6435, 7)

localhost:8888/notebooks/Project/walmart(Project).ipynb 5/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook

In [5]: data.head()

Out[5]:
Store Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment

Date

05-02-2010 1 1643690.90 0 42.31 2.572 211.096358 8.106

12-02-2010 1 1641957.44 1 38.51 2.548 211.242170 8.106

19-02-2010 1 1611968.17 0 39.93 2.514 211.289143 8.106

26-02-2010 1 1409727.59 0 46.63 2.561 211.319643 8.106

05-03-2010 1 1554806.68 0 46.50 2.625 211.350143 8.106

In [6]: data.tail()

Out[6]:
Store Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment

Date

28-09-2012 45 713173.95 0 64.88 3.997 192.013558 8.684

05-10-2012 45 733455.07 0 64.89 3.985 192.170412 8.667

12-10-2012 45 734464.36 0 54.47 4.000 192.327265 8.667

19-10-2012 45 718125.53 0 56.47 3.969 192.330854 8.667

26-10-2012 45 760281.43 0 58.85 3.882 192.308899 8.667

In [7]: print(data.isnull().sum())

Store 0
Weekly_Sales 0
Holiday_Flag 0
Temperature 0
Fuel_Price 0
CPI 0
Unemployment 0
dtype: int64

localhost:8888/notebooks/Project/walmart(Project).ipynb 6/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [8]: print(data.duplicated().sum())

In [9]: data['Store'].count()

Out[9]: 6435

# The objective of this project is to how can increase the sales day by day & red
Analyze sales trends: By analyzing the weekly sales data for each store, we can identify the trends and patterns in sales over
effectively.

In [10]: #Total Weakly sales from all stores


data['Weekly_Sales'].sum()

Out[10]: 6737218987.11

In [11]: #remove date from index to change its dtype because it clearly isnt acceptable.
sales.reset_index(inplace = True)
#converting 'date' column to a datetime type sales['Date'] = pd.to_datetime(sales['Date']) # resetting date back to the index
sales.set_index('Date',inplace = True)

C:\Users\HP\AppData\Local\Temp\ipykernel_6312\1042104552.py:4: UserWarning: Parsing dates in DD/MM/YYYY format when


d ayfirst=False (the default) was specified. This may lead to inconsistently parsed dates! Specify a format to
ensure c onsistent parsing.
sales['Date'] = pd.to_datetime(sales['Date'])

localhost:8888/notebooks/Project/walmart(Project).ipynb 7/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [12]: sales.Weekly_Sales.plot(figsize=(10,6), title= 'Weekly Sales of a Store', fontsize=14, color = 'blue') plt.show()

localhost:8888/notebooks/Project/walmart(Project).ipynb 8/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [13]: from statsmodels.tsa.seasonal import seasonal_decompose


decomposition = seasonal_decompose(sales.Weekly_Sales, period=12) fig = plt.figure()
fig = decomposition.plot() fig.set_size_inches(12, 10) plt.show()

<Figure size 640x480 with 0 Axes>

localhost:8888/notebooks/Project/walmart(Project).ipynb 9/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

localhost:8888/notebooks/Project/walmart(Project).ipynb 10/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

localhost:8888/notebooks/Project/walmart(Project).ipynb 11/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

Understand the impact of holidays: The Holiday_Flag column in the dataset indicates whether a given week is a holiday week or not. Analyzing the sales data
for holiday weeks vs. non-holiday weeks can help stores to understand the impact of holidays on their sales and plan accordingly.

Choosing the Algorithm For the Project-

To analyze sales trends using the weekly sales data for selected store, we can follow these steps: Load the Walmart dataset int
Convert the 'Date' column to a datetime format.

Group the data by store and date, and calculate the total sales for each week.
Pivot the data to create a table with stores as columns and weekly sales as rows. Plot the trend of sales for selected store.
Plot the distribution of sales for selected store.

localhost:8888/notebooks/Project/walmart(Project).ipynb 12/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [14]: #lets compare the 2012 data of two stores # Lets take store 5 data for analysis
store5 = data[data.Store == 5]
# there are about 45 different stores in this dataset.

sales5 = pd.DataFrame(store5.Weekly_Sales.groupby(store5.index).sum()) sales5.dtypes


# Grouped weekly sales by store 6

#remove date from index to change its dtype because it clearly isnt acceptable.
sales5.reset_index(inplace = True)

#converting 'date' column to a datetime type


sales5['Date'] = pd.to_datetime(sales5['Date'])
# resetting date back to the index
sales5.set_index('Date',inplace = True)

C:\Users\HP\AppData\Local\Temp\ipykernel_6312\430818428.py:14: UserWarning: Parsing dates in DD/MM/YYYY format when


d ayfirst=False (the default) was specified. This may lead to inconsistently parsed dates! Specify a format to
ensure c onsistent parsing.
sales5['Date'] = pd.to_datetime(sales5['Date'])

In [15]: y1=sales.Weekly_Sales y2=sales5.Weekly_Sales

localhost:8888/notebooks/Project/walmart(Project).ipynb 13/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [16]: #y1['2012'].plot(figsize=(15, 6),legend=True, color = 'chocolate') #y2['2012'].plot(figsize=(15, 6), legend=True, color = 'turq
y2['2012'].plot(figsize=(15, 6), legend=True, color = 'Orange') plt.ylabel('Weekly Sales')
plt.title('Store4 vs Store5 on 2012', fontsize = '16') plt.show()

# Choosing the Algorithm For the Project & Identify the impact of external facto
The Temperature, Fuel_Price, CPI, and Unemployment columns in the dataset provide information about external factors that may i

Analyzing the relationship between these factors and sales can help stores to better understand their customer base and adjust

localhost:8888/notebooks/Project/walmart(Project).ipynb 14/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
To identify the impact of external factors on sales using the Walmart dataset, we can follow these steps:

Convert the 'Date' column to a datetime format.

Plot the correlation matrix of the dataset to visualize the relationships between variables.

Create scatter plots of the external factors against weekly sales to visualize the relationship between each factor and sales

Calculate the correlation coefficients between each external factor and weekly sales to quantify the strength of the relation

Create a multiple regression model to analyze the impact of multiple external factors on weekly sales.

In [ ]: data['Date'] = pd.to_datetime(data['Date'])

localhost:8888/notebooks/Project/walmart(Project).ipynb 15/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [51]: # Plot the correlation matrix of the dataset


corr = data.corr()
sns.heatmap(corr, annot=True, cmap='coolwarm') plt.title('Correlation Matrix')
plt.show()

localhost:8888/notebooks/Project/walmart(Project).ipynb 16/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [56]: # Create scatter plots of external factors against weekly sales


sns.pairplot(data[['Weekly_Sales', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment']]) plt.show()

localhost:8888/notebooks/Project/walmart(Project).ipynb 17/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

localhost:8888/notebooks/Project/walmart(Project).ipynb 18/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

localhost:8888/notebooks/Project/walmart(Project).ipynb 19/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [58]: # Calculate the correlation coefficients between each external factor and weekly sales
corr_sales = data[['Weekly_Sales', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment']].corr()['Weekly_Sales'] print(corr_sales

Weekly_Sales 1.000000
Temperature -0.063810
Fuel_Price 0.009464
CPI -0.072634
Unemployment -0.106176
Name: Weekly_Sales, dtype: float64

In [59]: from sklearn.linear_model import LinearRegression

In [60]: # Create a multiple regression model to analyze the impact of multiple external factors on weekly sales
X = data[['Temperature', 'Fuel_Price', 'CPI', 'Unemployment']] y = data['Weekly_Sales']
model = LinearRegression().fit(X, y) r_sq = model.score(X, y)
coefficients = model.coef_ intercept = model.intercept_ print(f"R-squared: {r_sq}")
print(f"Coefficients: {coefficients}") print(f"Intercept: {intercept}")

R-squared: 0.024330716534334385
Coefficients: [ -885.66992595 -12248.42446553 -1585.81799199 -41214.98725744]
Intercept: 1743607.6199776107

localhost:8888/notebooks/Project/walmart(Project).ipynb 20/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
# Assumptions
The following assumptions were made in order to create the model for Walmart project.

This code generates several plots and prints out the correlation coefficients and regression coefficients:

A plot of the correlation matrix of the dataset. This plot shows the strength and direction of the relationships
between variables.

A 2x2 grid of scatter plots showing the relationship between each external factor and weekly sales.

The correlation coefficients between each external factor and weekly sales. These coefficients quantify the strength
and direction of the relationship.

The regression coefficients of a multiple regression model that analyzes the impact of multiple external factors on
weekly sales. The R-squared value indicates the proportion of variance in weekly sales that can be explained by the
external factors, and the coefficients indicate the strength and direction of the relationship between each factor
and sales.

The multiple regression model that was built to analyze the impact of external factors on weekly sales has an R- squared value
external factors in the model. This means that there are other factors that are not included in the model that also have an imp

The coefficients of the model represent the strength and direction of the relationship between each external factor and weekly

Temperature: -885.67

Fuel_Price: -12,248.42

CPI: -1,585.82

Unemployment: -41,214.99

These coefficients indicate that an increase in temperature, fuel price, CPI, and unemployment is associated with a decrease in

localhost:8888/notebooks/Project/walmart(Project).ipynb 21/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
The intercept of the model is 1,743,607.62, which represents the estimated weekly sales when all external factors are at 0. Th

In [ ]:

# analyzing Optimize pricing strategies: By analyzing the relationship between

localhost:8888/notebooks/Project/walmart(Project).ipynb 22/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [61]: # Filter the dataset to only include stores with holiday weeks
data_holiday = data[data['Holiday_Flag'] == 1]

# Create a scatter plot to visualize the relationship between weekly sales and CPI
sns.scatterplot(x='CPI', y='Weekly_Sales',
data=data_holiday) plt.title('Weekly Sales vs. CPI')
plt.xlabel('CPI')
plt.ylabel('Weekly
Sales') plt.show()

# Create a scatter plot to visualize the relationship between weekly sales and Fuel_Price
sns.scatterplot(x='Fuel_Price', y='Weekly_Sales',
data=data_holiday) plt.title('Weekly Sales vs. Fuel_Price')
plt.xlabel('Fuel_Price')
plt.ylabel('Weekly
Sales') plt.show()

# Build a linear regression model to predict weekly sales based on CPI and Fuel_Price
X = data_holiday[['CPI',
'Fuel_Price']] y =
data_holiday['Weekly_Sales']
reg = LinearRegression().fit(X, y)

# Print the coefficients of the linear regression model


print('Coefficients:', reg.coef_)
print('Intercept:',
reg.intercept_)

# Use the linear regression model to make predictions for different values of CPI and Fuel_Price
new_data = pd.DataFrame({'CPI': [220, 230, 240], 'Fuel_Price': [3.50, 3.60,

localhost:8888/notebooks/Project/walmart(Project).ipynb 23/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

localhost:8888/notebooks/Project/walmart(Project).ipynb 24/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

Coefficients: [-1194.64849703 46674.84850851]


Intercept: 1176851.6465749654
Predictions: [1077390.94700861 1070111.94688918 1062832.94676976]

# Model Evaluation and Technique-


The following techniques and steps were involved in the evaluation of the model
Technique 1 - coefficients
Technique 2 - linear regression model andPredection model for Time Series so on.

He coefficients of the linear regression model represent the change in weekly sales for a one-unit increase in each
predictor variable, while holding all other variables constant.

localhost:8888/notebooks/Project/walmart(Project).ipynb 25/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In this case, the first coefficient (-1194.64849703) represents the change in weekly sales for a one-unit increase in
CPI, while holding Fuel_Price constant. The negative sign indicates that there is a negative correlation between CPI
and weekly sales - as CPI increases, weekly sales tend to decrease. The magnitude of the coefficient (-1194.65)
indicates the strength of the relationship.

The second coefficient (46674.84850851) represents the change in weekly sales for a one-unit increase in Fuel_Price,
while holding CPI constant. The positive sign indicates that there is a positive correlation between Fuel_Price and
weekly sales - as Fuel_Price increases, weekly sales tend to increase. The magnitude of the coefficient (46674.85)
indicates the strength of the relationship.

The intercept (1176851.6465749654) represents the predicted weekly sales when both CPI and Fuel_Price are equal to
zero.

The predictions ([1077390.94700861, 1070111.94688918, 1062832.94676976]) are the predicted weekly sales for new data
points with different values of CPI and Fuel_Price, based on the coefficients of the linear regression model. For
example, the first prediction (1077390.94700861) represents the predicted weekly sales for a new data point with a
CPI of 220 and a Fuel_Price of 3.5.

Regenerate response

# Manage inventory levels: By analyzing sales trends and understanding the im

localhost:8888/notebooks/Project/walmart(Project).ipynb 26/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [63]:
# create a linear regression model to predict weekly sales
X = data[['Temperature', 'Fuel_Price', 'CPI', 'Unemployment', 'Holiday_Flag']] y = data['Weekly_Sales']
model = LinearRegression().fit(X, y)

# use the model to predict future sales


future_data = pd.DataFrame({
'Temperature': [70, 75, 80],
'Fuel_Price': [3.5, 3.6, 3.7],
'CPI': [220, 222, 224],
'Unemployment': [6.0, 6.2, 6.4],
'Holiday_Flag': [0, 1, 0]
})
predicted_sales = model.predict(future_data)

# adjust inventory levels based on predicted sales


for i, predicted_sale in enumerate(predicted_sales):
if predicted_sale > 100000:
print(f"Order more inventory for week {i+1}")
elif predicted_sale < 50000:
print(f"Reduce inventory for week {i+1}")
else:
print(f"Inventory levels are appropriate for week {i+1}")

Order more inventory for week 1


Order more inventory for week 2
Order more inventory for week
3

# The evaluation report suggests the following:


Inferences from the evaluation

This code uses a linear regression model to predict future sales based on external factors such as temperature, fuel price, CPI

localhost:8888/notebooks/Project/walmart(Project).ipynb 27/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
Based on the predicted sales, the code adjusts the inventory levels. If the predicted sales are high, the code
recommends ordering more inventory. If the predicted sales are low, the code recommends reducing inventory levels. If the pred

# Identify underperforming stores: By comparing the sales data across all store

localhost:8888/notebooks/Project/walmart(Project).ipynb 28/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [64]:
# calculate total sales for each store
store_sales = data.groupby('Store')['Weekly_Sales'].sum().reset_index()

# calculate average sales per store


avg_sales = store_sales['Weekly_Sales'].mean()

# identify underperforming stores


underperforming_stores = store_sales[store_sales['Weekly_Sales'] < avg_sales]

# print the list of underperforming stores


print("Underperforming stores:")
for store in underperforming_stores['Store']: print(store)

localhost:8888/notebooks/Project/walmart(Project).ipynb 29/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

Underperforming stores:
3
5
7
8
9
12
15
16
17
21
22
25
26
29
30
33
34
35
36
37
38
40
42
43
44
45

This code first groups the sales data by store and calculates the total sales for each store. It then calculates the average sa

Stores with total sales below the average are identified as underperforming stores. The code prints the list of underperforming

You can adjust the definition of underperforming stores by changing the criteria, for example, you could identify stores that h

localhost:8888/notebooks/Project/walmart(Project).ipynb 30/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
# Forecast future sales: By using the historical sales data, we can develop pre

Inferences from the Walmart Project


The model performance, inferences, Forecast future sales, his can help stores to better plan for future sales and other details

In [33]: # Clearly we can see the irregularities

In [34]: # Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 5)
import itertools
# Generate all different combinations of p, d and q triplets
pdq = list(itertools.product(p, d, q))

# Generate all different combinations of seasonal p, d and q triplets


seasonal_pdq = [(x[0], x[1], x[2], 52) for x in list(itertools.product(p, d, q))]

localhost:8888/notebooks/Project/walmart(Project).ipynb 31/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [35]: import statsmodels.api as sm

mod = sm.tsa.statespace.SARIMAX(y1,
order=(4, 4, 3),
seasonal_order=(1, 1, 0, 52),#enforce_stationarity=False,
enforce_invertibility=False)

results = mod.fit()

print(results.summary().tables[1])

C:\Users\HP\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: A date index has been pr


ovided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
C:\Users\HP\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: A date index has been pr
ovided, but it is not monotonic and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
C:\Users\HP\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: A date index has been pr
ovided, but it has no associated frequency information and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
C:\Users\HP\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:471: ValueWarning: A date index has been pr
ovided, but it is not monotonic and so will be ignored when e.g. forecasting.
self._init_dates(dates, freq)
C:\Users\HP\anaconda3\lib\site-packages\statsmodels\base\model.py:604: ConvergenceWarning: Maximum Likelihood optimiz
ation failed to converge. Check mle_retvals
warnings.warn("Maximum Likelihood optimization failed to "

==============================================================================
coef std err z P>|z| [0.025 0.975]

ar.L1 -0.9229 0.269 -3.431 0.001 -1.450 -0.396


ar.L2 -0.7813 0.359 -2.174 0.030 -1.486 -0.077
ar.L3 -0.7113 0.352 -2.020 0.043 -1.402 -0.021
ar.L4 -0.5688 0.171 -3.324 0.001 -0.904 -0.233
ma.L1 -2.4727 0.321 -7.696 0.000 -3.102 -1.843
ma.L2 1.9628 0.635 3.089 0.002 0.717 3.208
ma.L3 -0.4882 0.320 -1.524 0.127 -1.116 0.140
ar.S.L52 -0.4898 0.129 -3.803 0.000 -0.742 -0.237
sigma2 1.174e+11 2.26e-12 5.18e+22 0.000 1.17e+11 1.17e+11
==============================================================================

localhost:8888/notebooks/Project/walmart(Project).ipynb 32/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [49]: plt.style.use('seaborn-pastel')
results.plot_diagnostics(figsize=(15, 12)) plt.show()

C:\Users\HP\AppData\Local\Temp\ipykernel_6312\3809637464.py:1: MatplotlibDeprecationWarning: The seaborn styles


shipp ed by Matplotlib are deprecated since 3.6, as they no longer correspond to the styles shipped by seaborn.
However, th ey will remain available as 'seaborn-v0_8-<style>'. Alternatively, directly use the seaborn API instead.
plt.style.use('seaborn-pastel')

localhost:8888/notebooks/Project/walmart(Project).ipynb 33/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

localhost:8888/notebooks/Project/walmart(Project).ipynb 34/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

localhost:8888/notebooks/Project/walmart(Project).ipynb 35/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [50]: pred = results.get_prediction(start=pd.to_datetime('2012-07-27'), dynamic=False) pred_ci = pred.conf_int()

localhost:8888/notebooks/Project/walmart(Project).ipynb 36/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [38]: ax = y1['2010':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7)

ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.2)

ax.set_xlabel('Time Period') ax.set_ylabel('Sales')


plt.legend()

plt.show()

C:\Users\HP\AppData\Local\Temp\ipykernel_6312\2689329424.py:1: FutureWarning: Value based partial slicing on non-mono


tonic DatetimeIndexes with non-existing keys is deprecated and will raise a KeyError in a future Version.
ax = y1['2010':].plot(label='observed')

localhost:8888/notebooks/Project/walmart(Project).ipynb 37/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [39]: y_forecasted = pred.predicted_mean y_truth = y1['2012-7-27':]

# Compute the mean square error


mse = ((y_forecasted - y_truth) ** 2).mean()
print('The Mean Squared Error of our forecasts is {}'.format(round(mse, 2)))

The Mean Squared Error of our forecasts is 31777933846.48

In [40]: pred_dynamic = results.get_prediction(start=pd.to_datetime('2012-7-27'), dynamic=True, full_results=True) pred_dynamic_ci = pre

localhost:8888/notebooks/Project/walmart(Project).ipynb 38/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [41]:

ax = y1['2010':].plot(label='observed', figsize=(12, 8))


pred_dynamic.predicted_mean.plot(label='Dynamic Forecast', ax=ax)

ax.fill_between(pred_dynamic_ci.index,
pred_dynamic_ci.iloc[:, 0],
pred_dynamic_ci.iloc[:, 1], color='k', alpha=.25)

ax.fill_betweenx(ax.get_ylim(), pd.to_datetime('2012-7-26'), y1.index[-1], alpha=.1, zorder=-1)

ax.set_xlabel('Time Period') ax.set_ylabel('Sales')

plt.legend() plt.show()

C:\Users\HP\AppData\Local\Temp\ipykernel_6312\4127785946.py:1: FutureWarning: Value based partial slicing on non-mono


tonic DatetimeIndexes with non-existing keys is deprecated and will raise a KeyError in a future Version.
ax = y1['2010':].plot(label='observed', figsize=(12, 8))

localhost:8888/notebooks/Project/walmart(Project).ipynb 39/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

localhost:8888/notebooks/Project/walmart(Project).ipynb 40/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [42]: import numpy as np


# Extract the predicted and true values of our time series
y_forecasted = pred_dynamic.predicted_mean print(y_forecasted)

Date
2012-07-27 1.470892e+06
2010-08-27 1.462495e+06
2011-01-28 1.020830e+06
2010-05-28 1.287399e+06
2012-09-28 1.019062e+06
2011-10-28 1.394298e+06
2011-04-29 1.234777e+06
2012-06-29 8.901318e+05
2011-07-29 9.546755e+05
2010-10-29 8.946962e+05
2012-03-30 8.830751e+05
2010-04-30 9.490994e+05
2010-07-30 8.111185e+05
2011-09-30 8.153588e+05
2011-12-30 6.083340e+05
2012-08-31 6.582176e+05
2010-12-31 5.924099e+05
Name: predicted_mean, dtype: float64

localhost:8888/notebooks/Project/walmart(Project).ipynb 41/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [43]: y_truth = y1['2012-7-27':]

print(y_truth)

Date
2012-08-06 1414343.53
2012-09-03 1413382.76
2012-10-02 1574287.76
2012-10-08 1388973.65
2012-11-05 1300147.07
2012-12-10 1311965.09
2012-09-14 1267675.05
2012-08-17 1421307.20
2012-10-19 1232073.18
2012-09-21 1326132.98
2012-08-24 1409515.73
2012-10-26 1200729.45
2012-07-27 1272395.02
2012-09-28 1227430.73
2012-08-31 1372872.35
Name: Weekly_Sales, dtype: float64

# Future Possibilities-
The future possibilities, limitations and other-

This code first groups the sales data by store and date and calculates the total sales for each store on each date.
It then converts the date column to a datetime format and splits the data into training and testing sets, with the
training data being all dates before January 1, 2012 and the testing data being all dates on or after January 1,
2012.

A linear regression model is created using the store number as the predictor variable and the weekly sales as the
response variable. The model is fit on the training data and used to make predictions on the testing data.

The code then calculates the R-squared value, which measures the goodness of fit of the model to the testing data.

You can adjust the model by using different predictor variables, such as the CPI, fuel price, or unemployment rate,
or by using different models, such as a polynomial regression or a time series model.

Predictions: [1376252.78477092 1376252.78477092 1376252.78477092 ... 729118.82756686 729118.82756686 729118.82756686 ]


R-squared value: 0.1176607873663219

localhost:8888/notebooks/Project/walmart(Project).ipynb 42/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

The output is the predictions made by a model to forecast future sales based on historical sales data. The model has predicted

An R-squared value of 0.1176607873663219 means that the model explains 11.77% of the variance in the data, which is relatively
necessary to improve the accuracy of the sales predictions.

In [ ]:

In [66]: # Compute the Root mean square error


rmse = np.sqrt(((y_forecasted - y_truth) ** 2).mean())
print('The Root Mean Squared Error of our forecasts is {}'.format(round(rmse, 2)))

The Root Mean Squared Error of our forecasts is 444803.25

In [67]: Residual= y_forecasted - y_truth


print("Residual for Store1",np.abs(Residual).sum())

Residual for Store1 1121519.9632811893

In [69]: # Get forecast 12 weeks ahead in future


pred_uc = results.get_forecast(steps=12)

print(pred_uc)

<statsmodels.tsa.statespace.mlemodel.PredictionResultsWrapper object at 0x0000024F3DF5D5A0>

C:\Users\HP\anaconda3\lib\site-packages\statsmodels\tsa\base\tsa_model.py:834: ValueWarning: No supported index is av


ailable. Prediction results will be given with an integer index beginning at `start`.
return get_prediction_index(

In [70]: # Get confidence intervals of forecasts


pred_ci = pred_uc.conf_int()

localhost:8888/notebooks/Project/walmart(Project).ipynb 43/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

In [48]:

ax = y1.plot(label='observed', figsize=(12, 8))


pred_uc.predicted_mean.plot(ax=ax, label='Forecast') ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.25) ax.set_xlabel('Time Period')
ax.set_ylabel('Sales')

plt.legend() plt.show()

localhost:8888/notebooks/Project/walmart(Project).ipynb 44/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook

# Conclusion-
The Walmart project is good relation between each stores, sales effected when fuel price increase, tempers are increase- then s

localhost:8888/notebooks/Project/walmart(Project).ipynb 45/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In future corelate the each another Walmart stores then incresed the sales and become decrised the costing, whereby income wil

# References-
Some data downloaded & contantet copy form google and other resorces.

localhost:8888/notebooks/Project/walmart(Project).ipynb 46/

You might also like