Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
85 views

Exploratory Data Analytics With SQL

The document provides an overview of Noufal's SQL portfolio and examples of analyzing ecommerce data from The Look dataset. It includes 4 sections - an overview of the data, simple exploratory analysis calculating order metrics by status and month, analysis of inventory stock levels including a query to calculate monthly growth rates, and upcoming cohort analysis. Noufal demonstrates SQL queries and provides insights and recommendations from the results.

Uploaded by

Honey Bee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
85 views

Exploratory Data Analytics With SQL

The document provides an overview of Noufal's SQL portfolio and examples of analyzing ecommerce data from The Look dataset. It includes 4 sections - an overview of the data, simple exploratory analysis calculating order metrics by status and month, analysis of inventory stock levels including a query to calculate monthly growth rates, and upcoming cohort analysis. Noufal demonstrates SQL queries and provides insights and recommendations from the results.

Uploaded by

Honey Bee
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 27

DATA ANALYST

PORTFOLIO
Structured Query Language

NOUFAL ZHAFIRA
ABOUT ME
Hi, I am Noufal and I am a Data
Analyst. In this section, allow me to
give some of examples how to analyze
group of data by using Structured
Query Language (SQL). I am going to
show you simple exploratory data
analysis and some advanced analysis.
Stay tuned!!
TABLE OF CONTENTS

Simple
01 Data Overview 02 Exploratory
Data Analysis

03 04
Inventory Cohort
Stock Analysis Analysis
DATA
01 OVERVIEW
DATA OVERVIEW

The Look eCommerce is a fictitious ecommerce clothing.


This dataset are provided to industry practitioners for the
purpose of testing, evaluation, and education. The dataset
contains information about customers, products, orders,
events and campaigns.
Entity Relationship Diagram
Important!!
It is strongly recommended to visualize our
dataset by using entity relationship diagram
before we dive into analysis. By doing so, we
would able to determine correlation between
tables in dataset.

In this analysis, we would only use these fives


tables for our analysis:
● users
● orders
● products
● inventory_items
● orders_items
Tools

Serverless infrastructure Visualize the data into


with a cloud-based data informative, easy to read, easy
warehouse and powerful to share, fully customizable
analytical tools with SQL and share our insights.
program language.
EXPLORATORY
02 DATA
ANALYSIS
#1 Calculate the number of unique users, number of orders, and total sale
price per status and month in Jan 2019 until Aug 2022

Query Schema Table Result


## Monthly unique users, orders, and sale price per
status
SELECT
FORMAT_DATE("%B %Y", DATE (created_at)) AS Month_Year,
status AS Order_Status,
COUNT (DISTINCT user_id) AS Unique_Users,
COUNT (DISTINCT order_id) AS Total_Orders,
ROUND(SUM(sale_price),2) AS Total_Sale_Price
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE DATE (created_at) BETWEEN '2019-01-01' AND
'2022-08-31'
GROUP BY 1, 2
ORDER BY PARSE_DATE('%B %Y', Month_year);
13.600 orders were cancelled in 2021 & 2022

Insights:
Total sales significantly increase
every year. Unfortunately, total
percentage order cancelled up
to 60% compared with
complete ordered.

Recommendation:
Strongly recommended to
minimize cancelled order to
increase major total sales.
#2 Calculate frequencies, average order value, and total number of unique
users where status is complete grouped by month

Query Schema Table Result


## Monthly frequencies, AOV, and unique users with
‘Complete’ status
SELECT
FORMAT_DATE("%B %Y", DATE (created_at)) AS Month_Year,
ROUND(COUNT(order_id)/COUNT(DISTINCT(user_id)),2) AS
Frequencies,
ROUND(SUM(sale_price) / COUNT(DISTINCT order_id),2) AS
AOV,
COUNT(DISTINCT user_id) AS Unique_Buyers
FROM `bigquery-public-data.thelook_ecommerce.order_items`
WHERE DATE(created_at) BETWEEN '2019-01-01' AND
'2022-08-31'
AND status = 'Complete'
GROUP BY 1
ORDER BY PARSE_DATE('%B %Y', Month_Year);
Average order value (AOV) were STABLE

Insights:
Average order values were
stable between 60 and 67.

Recommendation:
Keep the performance well. If
company desire to increase
their orders, program bundling
or marketed campaign would
likely increase AOV significantly.
#3 Top 5 Least and Most Profitable Products

Query
WITH product_profit_table AS(
-- Main query
SELECT
SELECT
orders.product_id AS Product_ID,
Product_ID, Product_Name, Category,
product.name AS Product_Name,
Cost_Price, Retail_Price, Items_Sold,
product.category AS Category,
Total_Profit
ROUND(product.cost,2) AS Cost_Price,
FROM
ROUND(orders.sale_price,2) AS Retail_Price,
(
COUNT(*) AS Items_Sold,
SELECT *,
RANK() OVER(ORDER BY Total_Profit DESC)
ROUND(SUM(orders.sale_price)-SUM(product.cost),2)
AS rank_profit_desc,
AS Total_Profit
RANK() OVER(ORDER BY Total_Profit ASC)
FROM
AS rank_profit_asc
`bigquery-public-data.thelook_ecommerce.order_item
FROM product_profit_table
s` AS orders
)
JOIN
WHERE rank_profit_desc BETWEEN 1 AND 5
`bigquery-public-data.thelook_ecommerce.products`
OR rank_profit_asc BETWEEN 1 AND 5
AS product
ORDER BY Total_Profit DESC;
ON orders.product_id = product.id
WHERE status = 'Complete'
GROUP BY 1, 2, 3, 4, 5
)
#3 Top 5 Least and Most Profitable Products

Schema Table Result


Top 5 Least and Most Profitable Products

Recommendation:
Focus sales in these 5 most
profitable products. Meanwhile,
take down these 5 least
profitable products.
INVENTORY
03 STOCK
ANALYSIS
Monthly Growth Rate of Inventory

Before we continue with querying and analysis.


Let's equate our understanding of inventory.
In this case, inventory level is the total items at the end of the
month.
We will use this formula to calculate inventory level and growth rate.

Inventory Level = Initial qty + good receipt - expenditures

Growth Rate = (Inventory level - Prev. Inventory Level) / Prev. Inventory Level
Monthly Growth Rate of Inventory

Query
##Monthly growth rate of inventory
cumulative_in_out AS ( lag_invent_level AS (
WITH sum_invent_in AS (
SELECT SELECT
SELECT
a.months, months, product_category, invent_level,
DATE_TRUNC(DATE(created_at), MONTH) AS months,
a.product_category, LAG(invent_level) OVER(
product_category, COUNT(id) AS invent_in
SUM(a.invent_in) OVER( PARTITION BY product_category
FROM
PARTITION BY a.product_category ORDER BY months ASC) AS level_prev_month,
`bigquery-public-data.thelook_ecommerce.inventory_items`
ORDER BY a.months) AS cum_invent_in, FROM inventory_level
WHERE created_at BETWEEN '2019-01-01' AND '2022-04-30'
SUM(IFNULL(b.invent_out,0)) OVER( )
GROUP BY 1, 2
PARTITION BY a.product_category -- Main query
ORDER BY 2, 1),
ORDER BY a.months) AS cum_invent_out, SELECT *,
FROM sum_invent_in AS a CASE WHEN level_prev_month = 0 THEN NULL
sum_invent_out AS (
LEFT JOIN sum_invent_out AS b ELSE ROUND((invent_level - level_prev_month) /
SELECT
ON a.months = b.months level_prev_month,2)
DATE_TRUNC(DATE(sold_at), MONTH) AS months,
AND a.product_category = b.product_category END AS growth_rate
product_category, COUNT(id) AS invent_out
GROUP BY 1, 2, a.invent_in, b.invent_out FROM lag_invent_level
FROM
ORDER BY 2, 1 ORDER BY 2, 1;
`bigquery-public-data.thelook_ecommerce.inventory_items`
),
WHERE created_at BETWEEN '2019-01-01' AND '2022-04-30'
inventory_level AS (
AND sold_at IS NOT NULL
SELECT
GROUP BY 1, 2
months, product_category,
ORDER BY 2, 1),
cum_invent_in,cum_invent_out,
cum_invent_in - cum_invent_out AS invent_level
FROM cumulative_in_out
ORDER BY 2, 1
),
Monthly Growth Rate of Inventory

Schema Table Result


Inventory Level was PILLING UP in the warehouse

Insights:
There was massive growth
inventory in early 2020. However,
the inventory level kept
increasing. It means that
products weren’t selling and
pilling up in the warehouse

Recommendation:
● Give discounts and prioritize to
sell old stocks products.
● Evaluate inventory control
system. Implement FIFO if
necessary.
● Do proper forecast for demands
and benchmark them in inventory
stock.
04 COHORT
ANALYSIS
Monthly Retention Cohorts in 2022

Query
## Monthly Retention Cohorts in 2022
-- take all users' subsequent purchases after the -- combine cohort month with subsequent months
first month SELECT
WITH cohort_item AS (
SELECT DATE_TRUNC(DATE(b.cohort_month), MONTH)
-- take the first date of user made a complete purchase
DATE_DIFF(EXTRACT(DATE FROM (a.created_at)), AS cohort_month,
SELECT
b.cohort_month, MONTH) AS month_number, CONCAT('M',c.month_number)
EXTRACT(DATE FROM MIN(created_at)) AS cohort_month,
a.user_id AS month_number,
user_id
FROM COUNT(cohort_month) AS num_users
FROM `bigquery-public-data.thelook_ecommerce.orders`
`bigquery-public-data.thelook_ecommerce.orders` AS a FROM user_activities AS c
WHERE status = 'Complete'
LEFT JOIN cohort_item AS b LEFT JOIN cohort_item AS b
GROUP BY user_id
ON a.user_id = b.user_id ON c.user_id = b.user_id
ORDER BY cohort_month
WHERE EXTRACT(YEAR FROM created_at) = 2022 GROUP BY 1, 2
),
AND status = 'Complete' ORDER BY 1, 2
cohort_size AS (
GROUP BY 2, 1 )
-- total users per starting month
), -- Main query
SELECT
retention_table AS ( SELECT
DATE_TRUNC(DATE(cohort_month),MONTH)AS cohort_month,
d.cohort_month,
COUNT(user_id) AS num_users
d.month_number,
FROM cohort_item
e.num_users AS cohort_size,
GROUP BY cohort_month
d.num_users,
ORDER BY cohort_month
d.num_users/e.num_users AS ratio
),
FROM retention_table AS d
user_activities AS (
LEFT JOIN cohort_size AS e
ON d.cohort_month = e.cohort_month
GROUP BY 1, 2, 3, 4
ORDER BY 1, 2;
Monthly Retention Cohorts in 2022

Schema Table Result


Average Retention Rate per Month 1.50%

SUM of ratio Month_Number

Year 2022 M0 M1 M2 M3 M4 M5 M6 M7 M8 M9 M10 M11

January 100% 0.94% 1.29% 0.82% 1.29% 1.05% 0.47% 0.70% 1.64% 1.52% 0.70% 0.94%

February 100% 1.78% 1.78% 1.40% 1.27% 1.52% 1.27% 1.14% 1.27% 1.14% 1.91%

March 100% 1.35% 2.07% 1.35% 1.45% 1.24% 1.35% 1.76% 1.24% 1.56%

April 100% 1.33% 1.33% 1.73% 1.84% 1.84% 1.43% 1.22% 1.33%

May 100% 1.29% 0.99% 1.19% 1.69% 1.99% 1.89% 0.60%

June 100% 1.64% 1.64% 1.89% 1.72% 1.81% 2.24%

July 100% 2.12% 2.12% 1.78% 2.12% 2.46%

August 100% 1.97% 2.42% 1.66% 2.65% Maximum retention rate 3.68%
September 100% 3.08% 2.57% 2.57%

October 100% 2.74% 3.14% Average retention rate per month 1.50%
November 100% 3.68%

December 100%

Grand Total 100% 1.99% 1.93% 1.60% 1.75% 1.70% 1.44% 1.09% 1.37% 1.41% 1.30% 0.94%
Monthly Retention Cohorts in 2022

Insights: Recommendation:

The number of customers had TheLook ecommerce needs to


decreased continuously from improvise its customers
January 2022 until end of the year. engagement and retention
It’s show by the decreasing of immediately since this decrease in
purchase per customers month to activity has become a trend
(downtrend).
month. We can conclude that,
most of our customers do not
repeat order in our stores and
extremely low retention rate.
THANKS
Do you have any questions?
noufalzhafira@gmail.com
+62 821 7048 7070
https://linkedin.com/in/noufal-zhafira

CREDITS: This presentation template was created by Slidesgo,


including icons by Flaticon and infographics & images by Freepik
CONTACT DETAILS

ADDRESS Grogol, West Jakarta

MOBILE +62 821 7048 7070

EMAIL noufalzhafira@gmail.com

INSTAGRAM @noufalzhafira

LINKEDIN @noufalzhafira

You might also like