Exploratory Data Analytics With SQL
Exploratory Data Analytics With SQL
PORTFOLIO
Structured Query Language
NOUFAL ZHAFIRA
ABOUT ME
Hi, I am Noufal and I am a Data
Analyst. In this section, allow me to
give some of examples how to analyze
group of data by using Structured
Query Language (SQL). I am going to
show you simple exploratory data
analysis and some advanced analysis.
Stay tuned!!
TABLE OF CONTENTS
Simple
01 Data Overview 02 Exploratory
Data Analysis
03 04
Inventory Cohort
Stock Analysis Analysis
DATA
01 OVERVIEW
DATA OVERVIEW
Insights:
Total sales significantly increase
every year. Unfortunately, total
percentage order cancelled up
to 60% compared with
complete ordered.
Recommendation:
Strongly recommended to
minimize cancelled order to
increase major total sales.
#2 Calculate frequencies, average order value, and total number of unique
users where status is complete grouped by month
Insights:
Average order values were
stable between 60 and 67.
Recommendation:
Keep the performance well. If
company desire to increase
their orders, program bundling
or marketed campaign would
likely increase AOV significantly.
#3 Top 5 Least and Most Profitable Products
Query
WITH product_profit_table AS(
-- Main query
SELECT
SELECT
orders.product_id AS Product_ID,
Product_ID, Product_Name, Category,
product.name AS Product_Name,
Cost_Price, Retail_Price, Items_Sold,
product.category AS Category,
Total_Profit
ROUND(product.cost,2) AS Cost_Price,
FROM
ROUND(orders.sale_price,2) AS Retail_Price,
(
COUNT(*) AS Items_Sold,
SELECT *,
RANK() OVER(ORDER BY Total_Profit DESC)
ROUND(SUM(orders.sale_price)-SUM(product.cost),2)
AS rank_profit_desc,
AS Total_Profit
RANK() OVER(ORDER BY Total_Profit ASC)
FROM
AS rank_profit_asc
`bigquery-public-data.thelook_ecommerce.order_item
FROM product_profit_table
s` AS orders
)
JOIN
WHERE rank_profit_desc BETWEEN 1 AND 5
`bigquery-public-data.thelook_ecommerce.products`
OR rank_profit_asc BETWEEN 1 AND 5
AS product
ORDER BY Total_Profit DESC;
ON orders.product_id = product.id
WHERE status = 'Complete'
GROUP BY 1, 2, 3, 4, 5
)
#3 Top 5 Least and Most Profitable Products
Recommendation:
Focus sales in these 5 most
profitable products. Meanwhile,
take down these 5 least
profitable products.
INVENTORY
03 STOCK
ANALYSIS
Monthly Growth Rate of Inventory
Growth Rate = (Inventory level - Prev. Inventory Level) / Prev. Inventory Level
Monthly Growth Rate of Inventory
Query
##Monthly growth rate of inventory
cumulative_in_out AS ( lag_invent_level AS (
WITH sum_invent_in AS (
SELECT SELECT
SELECT
a.months, months, product_category, invent_level,
DATE_TRUNC(DATE(created_at), MONTH) AS months,
a.product_category, LAG(invent_level) OVER(
product_category, COUNT(id) AS invent_in
SUM(a.invent_in) OVER( PARTITION BY product_category
FROM
PARTITION BY a.product_category ORDER BY months ASC) AS level_prev_month,
`bigquery-public-data.thelook_ecommerce.inventory_items`
ORDER BY a.months) AS cum_invent_in, FROM inventory_level
WHERE created_at BETWEEN '2019-01-01' AND '2022-04-30'
SUM(IFNULL(b.invent_out,0)) OVER( )
GROUP BY 1, 2
PARTITION BY a.product_category -- Main query
ORDER BY 2, 1),
ORDER BY a.months) AS cum_invent_out, SELECT *,
FROM sum_invent_in AS a CASE WHEN level_prev_month = 0 THEN NULL
sum_invent_out AS (
LEFT JOIN sum_invent_out AS b ELSE ROUND((invent_level - level_prev_month) /
SELECT
ON a.months = b.months level_prev_month,2)
DATE_TRUNC(DATE(sold_at), MONTH) AS months,
AND a.product_category = b.product_category END AS growth_rate
product_category, COUNT(id) AS invent_out
GROUP BY 1, 2, a.invent_in, b.invent_out FROM lag_invent_level
FROM
ORDER BY 2, 1 ORDER BY 2, 1;
`bigquery-public-data.thelook_ecommerce.inventory_items`
),
WHERE created_at BETWEEN '2019-01-01' AND '2022-04-30'
inventory_level AS (
AND sold_at IS NOT NULL
SELECT
GROUP BY 1, 2
months, product_category,
ORDER BY 2, 1),
cum_invent_in,cum_invent_out,
cum_invent_in - cum_invent_out AS invent_level
FROM cumulative_in_out
ORDER BY 2, 1
),
Monthly Growth Rate of Inventory
Insights:
There was massive growth
inventory in early 2020. However,
the inventory level kept
increasing. It means that
products weren’t selling and
pilling up in the warehouse
Recommendation:
● Give discounts and prioritize to
sell old stocks products.
● Evaluate inventory control
system. Implement FIFO if
necessary.
● Do proper forecast for demands
and benchmark them in inventory
stock.
04 COHORT
ANALYSIS
Monthly Retention Cohorts in 2022
Query
## Monthly Retention Cohorts in 2022
-- take all users' subsequent purchases after the -- combine cohort month with subsequent months
first month SELECT
WITH cohort_item AS (
SELECT DATE_TRUNC(DATE(b.cohort_month), MONTH)
-- take the first date of user made a complete purchase
DATE_DIFF(EXTRACT(DATE FROM (a.created_at)), AS cohort_month,
SELECT
b.cohort_month, MONTH) AS month_number, CONCAT('M',c.month_number)
EXTRACT(DATE FROM MIN(created_at)) AS cohort_month,
a.user_id AS month_number,
user_id
FROM COUNT(cohort_month) AS num_users
FROM `bigquery-public-data.thelook_ecommerce.orders`
`bigquery-public-data.thelook_ecommerce.orders` AS a FROM user_activities AS c
WHERE status = 'Complete'
LEFT JOIN cohort_item AS b LEFT JOIN cohort_item AS b
GROUP BY user_id
ON a.user_id = b.user_id ON c.user_id = b.user_id
ORDER BY cohort_month
WHERE EXTRACT(YEAR FROM created_at) = 2022 GROUP BY 1, 2
),
AND status = 'Complete' ORDER BY 1, 2
cohort_size AS (
GROUP BY 2, 1 )
-- total users per starting month
), -- Main query
SELECT
retention_table AS ( SELECT
DATE_TRUNC(DATE(cohort_month),MONTH)AS cohort_month,
d.cohort_month,
COUNT(user_id) AS num_users
d.month_number,
FROM cohort_item
e.num_users AS cohort_size,
GROUP BY cohort_month
d.num_users,
ORDER BY cohort_month
d.num_users/e.num_users AS ratio
),
FROM retention_table AS d
user_activities AS (
LEFT JOIN cohort_size AS e
ON d.cohort_month = e.cohort_month
GROUP BY 1, 2, 3, 4
ORDER BY 1, 2;
Monthly Retention Cohorts in 2022
January 100% 0.94% 1.29% 0.82% 1.29% 1.05% 0.47% 0.70% 1.64% 1.52% 0.70% 0.94%
February 100% 1.78% 1.78% 1.40% 1.27% 1.52% 1.27% 1.14% 1.27% 1.14% 1.91%
March 100% 1.35% 2.07% 1.35% 1.45% 1.24% 1.35% 1.76% 1.24% 1.56%
April 100% 1.33% 1.33% 1.73% 1.84% 1.84% 1.43% 1.22% 1.33%
August 100% 1.97% 2.42% 1.66% 2.65% Maximum retention rate 3.68%
September 100% 3.08% 2.57% 2.57%
October 100% 2.74% 3.14% Average retention rate per month 1.50%
November 100% 3.68%
December 100%
Grand Total 100% 1.99% 1.93% 1.60% 1.75% 1.70% 1.44% 1.09% 1.37% 1.41% 1.30% 0.94%
Monthly Retention Cohorts in 2022
Insights: Recommendation:
EMAIL noufalzhafira@gmail.com
INSTAGRAM @noufalzhafira
LINKEDIN @noufalzhafira