Walmart (Project)
Walmart (Project)
Walmart (Project)
Problem Statement
Project Objective
Data Description
Data Pre-processing Steps and Inspiration
Choosing the Algorithm for the Project
Motivation and Reasons For Choosing the Algorithm
Assumptions
Model Evaluation and Techniques
Inferences from the Same
Future Possibilities of the Project
Conclusion
References
# Problem Statement
A retail store that has multiple outlets across the country are facing issues in managing the inventory - to match the demand w
# Data Description
Data description, various insights from the data.
You are provided with the weekly sales data for their various outlets. Use statistical analysis, EDA, outlier analysis, and han
insights that can give them a clear perspective on the following:
If the weekly sales are affected by the unemployment rate, if yes - which stores are suffering the most?
If the weekly sales show a seasonal trend, when and what could be the reason?
localhost:8888/notebooks/Project/walmart(Project).ipynb 1/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook
Does temperature affect the weekly sales in any manner?
How is the Consumer Price index affecting the weekly sales of various stores?
Top performing stores according to the historical data.
The worst performing store, and how significant is the difference between the highest and lowest performing stores.
2. Use predictive modeling techniques to forecast the sales for each store for the next 12 weeks.
localhost:8888/notebooks/Project/walmart(Project).ipynb 2/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook
In [3]: sales.head(30)
localhost:8888/notebooks/Project/walmart(Project).ipynb 3/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook
Out[3]:
Weekly_Sales
Date
01-04-2011 1258674.12
01-06-2012 1361595.33
01-07-2011 1297472.06
01-10-2010 1182490.46
02-03-2012 1438383.44
02-04-2010 1446210.26
02-07-2010 1302600.14
02-09-2011 1297792.41
02-12-2011 1399322.44
03-02-2012 1376732.18
03-06-2011 1343637.00
03-08-2012 1399341.07
03-09-2010 1303914.27
03-12-2010 1380522.64
04-02-2011 1422546.05
04-03-2011 1399456.99
04-05-2012 1370251.22
04-06-2010 1396322.19
04-11-2011 1458287.38
05-02-2010 1528008.64
05-03-2010 1426622.65
05-08-2011 1403198.94
05-10-2012 1422794.26
05-11-2010 1332759.13
localhost:8888/notebooks/Project/walmart(Project).ipynb 4/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook
Weekly_Sales
Date
06-01-2012 1283885.55
06-04-2012 1596325.01
06-05-2011 1331453.41
06-07-2012 1461129.94
06-08-2010 1369634.92
07-01-2011 1178905.44
In [4]: data.info()
print(data.shape)
<class 'pandas.core.frame.DataFrame'>
Index: 6435 entries, 05-02-2010 to 26-10-2012
Data columns (total 7 columns):
# Column Non-Null Count Dtype
localhost:8888/notebooks/Project/walmart(Project).ipynb 5/
12/23/23, 12:21 PM walmart(Project) - Jupyter Notebook
In [5]: data.head()
Out[5]:
Store Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment
Date
In [6]: data.tail()
Out[6]:
Store Weekly_Sales Holiday_Flag Temperature Fuel_Price CPI Unemployment
Date
In [7]: print(data.isnull().sum())
Store 0
Weekly_Sales 0
Holiday_Flag 0
Temperature 0
Fuel_Price 0
CPI 0
Unemployment 0
dtype: int64
localhost:8888/notebooks/Project/walmart(Project).ipynb 6/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [8]: print(data.duplicated().sum())
In [9]: data['Store'].count()
Out[9]: 6435
# The objective of this project is to how can increase the sales day by day & red
Analyze sales trends: By analyzing the weekly sales data for each store, we can identify the trends and patterns in sales over
effectively.
Out[10]: 6737218987.11
In [11]: #remove date from index to change its dtype because it clearly isnt acceptable.
sales.reset_index(inplace = True)
#converting 'date' column to a datetime type sales['Date'] = pd.to_datetime(sales['Date']) # resetting date back to the index
sales.set_index('Date',inplace = True)
localhost:8888/notebooks/Project/walmart(Project).ipynb 7/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [12]: sales.Weekly_Sales.plot(figsize=(10,6), title= 'Weekly Sales of a Store', fontsize=14, color = 'blue') plt.show()
localhost:8888/notebooks/Project/walmart(Project).ipynb 8/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 9/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 10/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 11/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
Understand the impact of holidays: The Holiday_Flag column in the dataset indicates whether a given week is a holiday week or not. Analyzing the sales data
for holiday weeks vs. non-holiday weeks can help stores to understand the impact of holidays on their sales and plan accordingly.
To analyze sales trends using the weekly sales data for selected store, we can follow these steps: Load the Walmart dataset int
Convert the 'Date' column to a datetime format.
Group the data by store and date, and calculate the total sales for each week.
Pivot the data to create a table with stores as columns and weekly sales as rows. Plot the trend of sales for selected store.
Plot the distribution of sales for selected store.
localhost:8888/notebooks/Project/walmart(Project).ipynb 12/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [14]: #lets compare the 2012 data of two stores # Lets take store 5 data for analysis
store5 = data[data.Store == 5]
# there are about 45 different stores in this dataset.
#remove date from index to change its dtype because it clearly isnt acceptable.
sales5.reset_index(inplace = True)
localhost:8888/notebooks/Project/walmart(Project).ipynb 13/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [16]: #y1['2012'].plot(figsize=(15, 6),legend=True, color = 'chocolate') #y2['2012'].plot(figsize=(15, 6), legend=True, color = 'turq
y2['2012'].plot(figsize=(15, 6), legend=True, color = 'Orange') plt.ylabel('Weekly Sales')
plt.title('Store4 vs Store5 on 2012', fontsize = '16') plt.show()
# Choosing the Algorithm For the Project & Identify the impact of external facto
The Temperature, Fuel_Price, CPI, and Unemployment columns in the dataset provide information about external factors that may i
Analyzing the relationship between these factors and sales can help stores to better understand their customer base and adjust
localhost:8888/notebooks/Project/walmart(Project).ipynb 14/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
To identify the impact of external factors on sales using the Walmart dataset, we can follow these steps:
Plot the correlation matrix of the dataset to visualize the relationships between variables.
Create scatter plots of the external factors against weekly sales to visualize the relationship between each factor and sales
Calculate the correlation coefficients between each external factor and weekly sales to quantify the strength of the relation
Create a multiple regression model to analyze the impact of multiple external factors on weekly sales.
In [ ]: data['Date'] = pd.to_datetime(data['Date'])
localhost:8888/notebooks/Project/walmart(Project).ipynb 15/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 16/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 17/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 18/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 19/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [58]: # Calculate the correlation coefficients between each external factor and weekly sales
corr_sales = data[['Weekly_Sales', 'Temperature', 'Fuel_Price', 'CPI', 'Unemployment']].corr()['Weekly_Sales'] print(corr_sales
Weekly_Sales 1.000000
Temperature -0.063810
Fuel_Price 0.009464
CPI -0.072634
Unemployment -0.106176
Name: Weekly_Sales, dtype: float64
In [60]: # Create a multiple regression model to analyze the impact of multiple external factors on weekly sales
X = data[['Temperature', 'Fuel_Price', 'CPI', 'Unemployment']] y = data['Weekly_Sales']
model = LinearRegression().fit(X, y) r_sq = model.score(X, y)
coefficients = model.coef_ intercept = model.intercept_ print(f"R-squared: {r_sq}")
print(f"Coefficients: {coefficients}") print(f"Intercept: {intercept}")
R-squared: 0.024330716534334385
Coefficients: [ -885.66992595 -12248.42446553 -1585.81799199 -41214.98725744]
Intercept: 1743607.6199776107
localhost:8888/notebooks/Project/walmart(Project).ipynb 20/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
# Assumptions
The following assumptions were made in order to create the model for Walmart project.
This code generates several plots and prints out the correlation coefficients and regression coefficients:
A plot of the correlation matrix of the dataset. This plot shows the strength and direction of the relationships
between variables.
A 2x2 grid of scatter plots showing the relationship between each external factor and weekly sales.
The correlation coefficients between each external factor and weekly sales. These coefficients quantify the strength
and direction of the relationship.
The regression coefficients of a multiple regression model that analyzes the impact of multiple external factors on
weekly sales. The R-squared value indicates the proportion of variance in weekly sales that can be explained by the
external factors, and the coefficients indicate the strength and direction of the relationship between each factor
and sales.
The multiple regression model that was built to analyze the impact of external factors on weekly sales has an R- squared value
external factors in the model. This means that there are other factors that are not included in the model that also have an imp
The coefficients of the model represent the strength and direction of the relationship between each external factor and weekly
Temperature: -885.67
Fuel_Price: -12,248.42
CPI: -1,585.82
Unemployment: -41,214.99
These coefficients indicate that an increase in temperature, fuel price, CPI, and unemployment is associated with a decrease in
localhost:8888/notebooks/Project/walmart(Project).ipynb 21/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
The intercept of the model is 1,743,607.62, which represents the estimated weekly sales when all external factors are at 0. Th
In [ ]:
localhost:8888/notebooks/Project/walmart(Project).ipynb 22/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [61]: # Filter the dataset to only include stores with holiday weeks
data_holiday = data[data['Holiday_Flag'] == 1]
# Create a scatter plot to visualize the relationship between weekly sales and CPI
sns.scatterplot(x='CPI', y='Weekly_Sales',
data=data_holiday) plt.title('Weekly Sales vs. CPI')
plt.xlabel('CPI')
plt.ylabel('Weekly
Sales') plt.show()
# Create a scatter plot to visualize the relationship between weekly sales and Fuel_Price
sns.scatterplot(x='Fuel_Price', y='Weekly_Sales',
data=data_holiday) plt.title('Weekly Sales vs. Fuel_Price')
plt.xlabel('Fuel_Price')
plt.ylabel('Weekly
Sales') plt.show()
# Build a linear regression model to predict weekly sales based on CPI and Fuel_Price
X = data_holiday[['CPI',
'Fuel_Price']] y =
data_holiday['Weekly_Sales']
reg = LinearRegression().fit(X, y)
# Use the linear regression model to make predictions for different values of CPI and Fuel_Price
new_data = pd.DataFrame({'CPI': [220, 230, 240], 'Fuel_Price': [3.50, 3.60,
localhost:8888/notebooks/Project/walmart(Project).ipynb 23/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 24/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
He coefficients of the linear regression model represent the change in weekly sales for a one-unit increase in each
predictor variable, while holding all other variables constant.
localhost:8888/notebooks/Project/walmart(Project).ipynb 25/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In this case, the first coefficient (-1194.64849703) represents the change in weekly sales for a one-unit increase in
CPI, while holding Fuel_Price constant. The negative sign indicates that there is a negative correlation between CPI
and weekly sales - as CPI increases, weekly sales tend to decrease. The magnitude of the coefficient (-1194.65)
indicates the strength of the relationship.
The second coefficient (46674.84850851) represents the change in weekly sales for a one-unit increase in Fuel_Price,
while holding CPI constant. The positive sign indicates that there is a positive correlation between Fuel_Price and
weekly sales - as Fuel_Price increases, weekly sales tend to increase. The magnitude of the coefficient (46674.85)
indicates the strength of the relationship.
The intercept (1176851.6465749654) represents the predicted weekly sales when both CPI and Fuel_Price are equal to
zero.
The predictions ([1077390.94700861, 1070111.94688918, 1062832.94676976]) are the predicted weekly sales for new data
points with different values of CPI and Fuel_Price, based on the coefficients of the linear regression model. For
example, the first prediction (1077390.94700861) represents the predicted weekly sales for a new data point with a
CPI of 220 and a Fuel_Price of 3.5.
Regenerate response
localhost:8888/notebooks/Project/walmart(Project).ipynb 26/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [63]:
# create a linear regression model to predict weekly sales
X = data[['Temperature', 'Fuel_Price', 'CPI', 'Unemployment', 'Holiday_Flag']] y = data['Weekly_Sales']
model = LinearRegression().fit(X, y)
This code uses a linear regression model to predict future sales based on external factors such as temperature, fuel price, CPI
localhost:8888/notebooks/Project/walmart(Project).ipynb 27/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
Based on the predicted sales, the code adjusts the inventory levels. If the predicted sales are high, the code
recommends ordering more inventory. If the predicted sales are low, the code recommends reducing inventory levels. If the pred
# Identify underperforming stores: By comparing the sales data across all store
localhost:8888/notebooks/Project/walmart(Project).ipynb 28/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [64]:
# calculate total sales for each store
store_sales = data.groupby('Store')['Weekly_Sales'].sum().reset_index()
localhost:8888/notebooks/Project/walmart(Project).ipynb 29/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
Underperforming stores:
3
5
7
8
9
12
15
16
17
21
22
25
26
29
30
33
34
35
36
37
38
40
42
43
44
45
This code first groups the sales data by store and calculates the total sales for each store. It then calculates the average sa
Stores with total sales below the average are identified as underperforming stores. The code prints the list of underperforming
You can adjust the definition of underperforming stores by changing the criteria, for example, you could identify stores that h
localhost:8888/notebooks/Project/walmart(Project).ipynb 30/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
# Forecast future sales: By using the historical sales data, we can develop pre
In [34]: # Define the p, d and q parameters to take any value between 0 and 2
p = d = q = range(0, 5)
import itertools
# Generate all different combinations of p, d and q triplets
pdq = list(itertools.product(p, d, q))
localhost:8888/notebooks/Project/walmart(Project).ipynb 31/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
mod = sm.tsa.statespace.SARIMAX(y1,
order=(4, 4, 3),
seasonal_order=(1, 1, 0, 52),#enforce_stationarity=False,
enforce_invertibility=False)
results = mod.fit()
print(results.summary().tables[1])
==============================================================================
coef std err z P>|z| [0.025 0.975]
localhost:8888/notebooks/Project/walmart(Project).ipynb 32/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [49]: plt.style.use('seaborn-pastel')
results.plot_diagnostics(figsize=(15, 12)) plt.show()
localhost:8888/notebooks/Project/walmart(Project).ipynb 33/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 34/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 35/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 36/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [38]: ax = y1['2010':].plot(label='observed')
pred.predicted_mean.plot(ax=ax, label='One-step ahead Forecast', alpha=.7)
ax.fill_between(pred_ci.index,
pred_ci.iloc[:, 0],
pred_ci.iloc[:, 1], color='k', alpha=.2)
plt.show()
localhost:8888/notebooks/Project/walmart(Project).ipynb 37/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 38/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [41]:
ax.fill_between(pred_dynamic_ci.index,
pred_dynamic_ci.iloc[:, 0],
pred_dynamic_ci.iloc[:, 1], color='k', alpha=.25)
plt.legend() plt.show()
localhost:8888/notebooks/Project/walmart(Project).ipynb 39/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
localhost:8888/notebooks/Project/walmart(Project).ipynb 40/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
Date
2012-07-27 1.470892e+06
2010-08-27 1.462495e+06
2011-01-28 1.020830e+06
2010-05-28 1.287399e+06
2012-09-28 1.019062e+06
2011-10-28 1.394298e+06
2011-04-29 1.234777e+06
2012-06-29 8.901318e+05
2011-07-29 9.546755e+05
2010-10-29 8.946962e+05
2012-03-30 8.830751e+05
2010-04-30 9.490994e+05
2010-07-30 8.111185e+05
2011-09-30 8.153588e+05
2011-12-30 6.083340e+05
2012-08-31 6.582176e+05
2010-12-31 5.924099e+05
Name: predicted_mean, dtype: float64
localhost:8888/notebooks/Project/walmart(Project).ipynb 41/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
print(y_truth)
Date
2012-08-06 1414343.53
2012-09-03 1413382.76
2012-10-02 1574287.76
2012-10-08 1388973.65
2012-11-05 1300147.07
2012-12-10 1311965.09
2012-09-14 1267675.05
2012-08-17 1421307.20
2012-10-19 1232073.18
2012-09-21 1326132.98
2012-08-24 1409515.73
2012-10-26 1200729.45
2012-07-27 1272395.02
2012-09-28 1227430.73
2012-08-31 1372872.35
Name: Weekly_Sales, dtype: float64
# Future Possibilities-
The future possibilities, limitations and other-
This code first groups the sales data by store and date and calculates the total sales for each store on each date.
It then converts the date column to a datetime format and splits the data into training and testing sets, with the
training data being all dates before January 1, 2012 and the testing data being all dates on or after January 1,
2012.
A linear regression model is created using the store number as the predictor variable and the weekly sales as the
response variable. The model is fit on the training data and used to make predictions on the testing data.
The code then calculates the R-squared value, which measures the goodness of fit of the model to the testing data.
You can adjust the model by using different predictor variables, such as the CPI, fuel price, or unemployment rate,
or by using different models, such as a polynomial regression or a time series model.
localhost:8888/notebooks/Project/walmart(Project).ipynb 42/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
The output is the predictions made by a model to forecast future sales based on historical sales data. The model has predicted
An R-squared value of 0.1176607873663219 means that the model explains 11.77% of the variance in the data, which is relatively
necessary to improve the accuracy of the sales predictions.
In [ ]:
print(pred_uc)
localhost:8888/notebooks/Project/walmart(Project).ipynb 43/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In [48]:
plt.legend() plt.show()
localhost:8888/notebooks/Project/walmart(Project).ipynb 44/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
# Conclusion-
The Walmart project is good relation between each stores, sales effected when fuel price increase, tempers are increase- then s
localhost:8888/notebooks/Project/walmart(Project).ipynb 45/
12/23/23, 12:21 walmart(Project) - Jupyter Notebook
In future corelate the each another Walmart stores then incresed the sales and become decrised the costing, whereby income wil
# References-
Some data downloaded & contantet copy form google and other resorces.
localhost:8888/notebooks/Project/walmart(Project).ipynb 46/