Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2K views

Target SQL Case Study

The document discusses analyzing e-commerce order data from Target in Brazil. It includes queries to analyze trends over time in orders and revenue, customer preferences and patterns, and economic impacts. Key analyses include monthly/yearly sales trends, preferred customer purchase times, distribution of customers across states, costs of orders year-over-year, and delivery times. The goal is to gain business insights from exploration of the dataset.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2K views

Target SQL Case Study

The document discusses analyzing e-commerce order data from Target in Brazil. It includes queries to analyze trends over time in orders and revenue, customer preferences and patterns, and economic impacts. Key analyses include monthly/yearly sales trends, preferred customer purchase times, distribution of customers across states, costs of orders year-over-year, and delivery times. The goal is to gain business insights from exploration of the dataset.
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Problem statement:-Data type of columns in a table?

select 
(select DDL from Target.INFORMATION_SCHEMA.TABLES where table_name = 'customers') csutom
ers,
(select DDL from Target.INFORMATION_SCHEMA.TABLES where table_name = 'geolocation') geolo
cation,
(select DDL from Target.INFORMATION_SCHEMA.TABLES where table_name = 'order_items') order
_items,
(select DDL from Target.INFORMATION_SCHEMA.TABLES where table_name = 'orders') orders,
(select DDL from Target.INFORMATION_SCHEMA.TABLES where table_name = 'order_reviews') ord
er_reviews,
(select DDL from Target.INFORMATION_SCHEMA.TABLES where table_name = 'payments') paymen
ts,
(select DDL from Target.INFORMATION_SCHEMA.TABLES where table_name = 'products') products
,
(select DDL from Target.INFORMATION_SCHEMA.TABLES where table_name = 'sellers') sellers

Result-

Problem statement:- Time Period for which the data is provided

SELECT
  MIN(DATE(ordr.order_purchase_timestamp)) AS start_date,
  MAX(DATE(ordr.order_purchase_timestamp)) AS end_date
  FROM
Target.orders AS ordr;

Initial Exploration of the Dataset


 Cities and States of customers ordered during the given period

SELECT
  DISTINCT(cust.customer_city) AS customer_city,
  brzls.string_field_1 as customer_state_short_codes,
  brzls.string_field_0 AS customer_states
FROM
  Target.customers AS cust
JOIN
  Target.Brazil_states AS brzls
ON
  cust.customer_state = brzls.string_field_1

In-Depth Exploration
Seasonality and general Trend in Target’s sales as well as in the industry

Query 1 -> MOM Trend of Revenue for the Years 2017 and 2018
WITH
  Trends AS (
  SELECT
    EXTRACT(YEAR
    FROM
      ord.order_purchase_timestamp) AS Years,
    EXTRACT(MONTH
    FROM
      ord.order_purchase_timestamp) AS Months,
    COUNT(ord.order_id) AS Total_Orders,
    ROUND(SUM(pay.payment_value),2) AS Revenue
  FROM
    Target.orders AS ord
  JOIN
    Target.payments AS pay
  ON
    ord.order_id = pay.order_id
  GROUP BY
    Months,
    Years 
  HAVING
    Years in (2017,2018)
  ORDER BY
    Years,
    Months)
SELECT
  *,
  DENSE_RANK() OVER(PARTITION BY Years ORDER BY Revenue DESC) AS mom_rank_re
venue_year_wise,
  ifnull(round((100 * (Revenue/LAG(Revenue,1) OVER(PARTITION BY Years ORDER BY Mont
hs) - 1)),2),0) as percentage_change_revenue
FROM
  Trends
ORDER BY Years,Months
Result-
QUERY 2 -> YOY Trend of Revenue for the months of Jan to Aug

WITH
  Trends AS (
  SELECT
    EXTRACT(YEAR
    FROM
      ord.order_purchase_timestamp) AS Years,
    EXTRACT(MONTH
    FROM
      ord.order_purchase_timestamp) AS Months,
    COUNT(ord.order_id) AS Total_Orders,
    ROUND(SUM(pay.payment_value),2) AS Revenue
  FROM
    Target.orders AS ord
  JOIN
    Target.payments AS pay
  ON
    ord.order_id = pay.order_id
  GROUP BY
    Months,
    Years 
  HAVING
    Years in (2017,2018)
  ORDER BY
    Years,
    Months)
SELECT
 round((select sum(Revenue) from temp1 where Years = 2017 and Months BETWEEN 1 and 8),
2) revenue_2017,
 round((select sum(Revenue) from temp1 where Years = 2018 and Months BETWEEN 1 and 8),
2) revenue_2018,
 round(100*((select sum(Revenue) from temp1 where Years = 2018 and Months BETWEEN 1 a
nd 8)/(select sum(Revenue) from Trends where Years = 2017 and Months BETWEEN 1 and 8)-
1),2) percentage_change_in_revenue

RESULT 2:-

We considered months 1-8 for this analysis as per the business requirement.

In-Depth Exploration

Problem Statement:-

- Preferred Time of Buying for Brazilian Customers


- We have divided a day into four parts in accordance with the 24 hour Time Format.

[0 to 6] -Dawn

(6 to 12] - Morning

(12 to 17] - Afternoon

[17 to 24) – Night

Result-

WITH
  time_of_buying AS (
  SELECT
    ord.order_purchase_timestamp,
    EXTRACT(HOUR
    FROM
      ord.order_purchase_timestamp) AS hour_of_buying
  FROM
    Target.orders ord)
SELECT
   EXTRACT(YEAR
  FROM
    ord.order_purchase_timestamp) AS Years,
  CASE
    WHEN hour_of_buying >= 0 AND hour_of_buying <= 6 THEN 'DAWN'
    WHEN hour_of_buying > 6
  AND hour_of_buying <= 12 THEN 'MORNING'
    WHEN hour_of_buying > 12 AND hour_of_buying <= 17 THEN 'AFTERNOON'
  ELSE
  'NIGHT'
END
  AS period_of_day,
 COUNT(ord.order_id) AS total_order_count

FROM
  Target.orders ord
JOIN
  time_of_buying tob
ON
  ord.order_purchase_timestamp = tob.order_purchase_timestamp
GROUP BY
  Years,
  period_of_day
ORDER BY
  Years,
  total_order_count desc

Evolution of E-commerce orders in the Brazil region:

1. Get month on month orders by states

with Target_1 as 
(SELECT
  EXTRACT(Year
  FROM
    ord.order_purchase_timestamp) AS Years,
  EXTRACT(MONTH
  FROM
    ord.order_purchase_timestamp) AS Months,
  bzs.string_field_0 AS state_name,
  COUNT(ord.order_id) AS Mom_Orders_By_States
FROM
  Target.orders ord
JOIN
  Target.customers cust
ON
  ord.customer_id = cust.customer_id
JOIN
  Target.Brazil_states bzs
ON
  cust.customer_state = bzs.string_field_1
GROUP BY
  Years,
  Months,
  bzs.string_field_0
ORDER BY
  Years,Months)
select
*,  
DENSE_RANK() OVER(partition by Years,Months ORDER BY MOM_orders_by_states DE
SC) AS
  Mom_by_no_of_orders
from Target_1
order by Years,Months,mom_by_no_of_orders

P.S.2)--Distribution of customers across the states in Brazil

SELECT
  cust.customer_state AS state_code,
  bzs.string_field_0 state_name,
  COUNT(DISTINCT cust.customer_unique_id) total_customers_by_state,
  SUM(COUNT(DISTINCT cust.customer_unique_id)) OVER() AS Total_Customer_of_Brazil,
  ROUND(COUNT(DISTINCT cust.customer_unique_id)/SUM(COUNT(DISTINCT cust.custome
r_unique_id)) OVER() * 100,2) AS contribution_percentage
FROM
  Target.customers cust
JOIN
  `Target.Brazil_states` AS bzs
ON
  cust.customer_state = bzs.string_field_1
GROUP BY
  cust.customer_state,
  bzs.string_field_0
ORDER BY
  contribution_percentage DESC

Impact on Economy: Analyze the money movement by e-commerce by looking at order prices,
freight and others.

- Percentage increase in cost of orders from 2017 to 2018 (including months between
Jan to Aug only)

- It shows the total cost of order increment YOY

- Through this analysis, we get to know the growth of the sales and revenue by the
company
WITH sum_of_2017 AS (
  SELECT
    EXTRACT(year
    FROM
      ord.order_purchase_timestamp) AS Years,
    ROUND(SUM(pay.payment_value),2) AS cost_of_orders_2017
  FROM
    Target.orders ord
  JOIN
    Target.payments pay
  ON
    ord.order_id = pay.order_id
  WHERE
    EXTRACT(month
    FROM
      ord.order_purchase_timestamp) < 9
    AND EXTRACT(year
    FROM
      ord.order_purchase_timestamp) = 2017
  GROUP BY
    Years ),
  sum_of_2018 AS (
  SELECT
    EXTRACT(year
    FROM
      ord.order_purchase_timestamp) AS Years,
    ROUND(SUM(pay.payment_value),2) AS cost_of_orders_2018
  FROM
    Target.orders ord
  JOIN
    Target.payments pay
  ON
    ord.order_id = pay.order_id
  WHERE
    EXTRACT(month
    FROM
      ord.order_purchase_timestamp) < 9
    AND EXTRACT(year
   
FROM
      ord.order_purchase_timestamp) = 2018
  GROUP BY
    Years)
SELECT
  ROUND((
    SELECT
      cost_of_orders_2017
    FROM
      sum_of_2017),2) total_cost_2017,
  ROUND((
    SELECT
      cost_of_orders_2018
    FROM
      sum_of_2018),2) total_cost_2018,
  ROUND(100*((
      SELECT
        cost_of_orders_2018
      FROM
        sum_of_2018)/(
      SELECT
        cost_of_orders_2017
      FROM
        sum_of_2017) - 1),2) AS percentage_change_cost_of_orders
`Result-

--2) Mean & Sum of price and freight value by customer state

with Tempt as
(SELECT
  bzs.string_field_1 AS state_code,
  bzs.string_field_0 AS state_name,
  ROUND(AVG(ordit.price),2) mean_price,
  ROUND(AVG(ordit.freight_value),2) mean_freight_value,
  COUNT(ordit.order_id) as no_of_orders
FROM
  `Target.order_items` ordit
JOIN
  Target.orders ord
ON
  ordit.order_id = ord.order_id
JOIN
  Target.customers cust
ON
  ord.customer_id = cust.customer_id
JOIN
  `Target.Brazil_states` bzs
ON
  cust.customer_state = bzs.string_field_1
GROUP BY
  bzs.string_field_1,
  bzs.string_field_0
)
select 
*,
DENSE_RANK() OVER(ORDER BY mean_price desc) as mean_price_by_state_rank,
DENSE_RANK() OVER(ORDER BY mean_freight_value asc) as mean_freight_value_by_state
_rank
from Tempt

5. Analysis on sales, freight and delivery time

1)Days between purchasing, delivering and estimated delivery


Result:-
SELECT
  ord.order_id,
  DATETIME_DIFF(EXTRACT(date
    FROM
      ord.order_delivered_carrier_date),EXTRACT(date
    FROM
      ord.order_purchase_timestamp), day) AS diff_betwn_order_delivery,
  DATETIME_DIFF(EXTRACT(date
    FROM
      ord.order_delivered_carrier_date),EXTRACT(date
    FROM
      ord.order_estimated_delivery_date), day) AS diff_betwn_delivery_est_delivery,
  DATETIME_DIFF(EXTRACT(date
    FROM
      ord.order_estimated_delivery_date),EXTRACT(date
    FROM
      ord.order_purchase_timestamp), day) AS diff_betwnn_purchase_est_delivery
FROM
  Target.orders ord
QUERY 2 -> Count of orders based on the payment installments.

Overall Number of Orders in every installment type

SELECT
  pay.payment_installments payment_installments,
  COUNT(distinct pay.order_id) AS no_of_orders
FROM
  Target.payments pay
GROUP BY
  pay.payment_installments
ORDER BY
  pay.payment_installments
 Grouped data by state, with mean of freight_value, time_to_delivery,
diff_estimated_delivery

 Top 5 states with highest/lowest average freight value - sort in desc/asc limit 5

 Top 5 states with highest/lowest average time to delivery

 Top 5 states where delivery is really fast/ not so fast compared to estimated date

QUERY 5]1] – Top 5 states with highest average freight value

WITH
  delivery_data AS (
  SELECT
    ord.order_id AS order_id,
    DATETIME_DIFF(EXTRACT(date
      FROM
        ord.order_purchase_timestamp),EXTRACT(date
      FROM
        ord.order_delivered_customer_date), day) AS time_to_delivery,
    DATETIME_DIFF(EXTRACT(date
      FROM
        ord.order_estimated_delivery_date),EXTRACT(date
      FROM
        ord.order_delivered_customer_date), day) AS diff_estimated_delivery
  FROM
    Target.orders ord)
SELECT
  bzs.string_field_1 AS state_code,
  bzs.string_field_0 AS state_name,
  ROUND(AVG(ordit.freight_value),2) mean_freight_value,
  ROUND(AVG(time_to_delivery),2) mean_time_to_delivery,
  ROUND(AVG(diff_estimated_delivery),2) mean_diff_estimated_delivery
FROM
  `Target.order_items` ordit
JOIN
  Target.orders ord
ON
  ordit.order_id = ord.order_id
JOIN
  Target.customers cust
ON

  ord.customer_id = cust.customer_id
JOIN
  `Target.Brazil_states` bzs
ON

QUERY 2 ]– Top 5 states with lowest average freight value

WITH
  delivery_data AS (
  SELECT
    ord.order_id AS order_id,
    DATETIME_DIFF(EXTRACT(date
      FROM
        ord.order_purchase_timestamp),EXTRACT(date
      
FROM
        ord.order_delivered_customer_date), day) AS time_to_delivery,
    DATETIME_DIFF(EXTRACT(date
      FROM
        ord.order_estimated_delivery_date),EXTRACT(date
      FROM
        ord.order_delivered_customer_date), day) AS diff_estimated_delivery
  FROM
    Target.orders ord)
SELECT
  bzs.string_field_1 AS state_code,
  bzs.string_field_0 AS state_name,
  ROUND(AVG(ordit.freight_value),2) mean_freight_value,
  ROUND(AVG(time_to_delivery),2) mean_time_to_delivery,
  ROUND(AVG(diff_estimated_delivery),2) mean_diff_estimated_delivery
FROM
  `Target.order_items` ordit
JOIN
  Target.orders ord
ON
  ordit.order_id = ord.order_id
JOIN
  Target.customers cust
ON
  ord.customer_id = cust.customer_id
JOIN
  `Target.Brazil_states` bzs
ON
  cust.customer_state = bzs.string_field_1
JOIN
  delivery_data dvd
ON
  ord.order_id = dvd.order_id
GROUP BY
  bzs.string_field_1,
  bzs.string_field_0
ORDER BY
  mean_time_to_delivery desc
LIMIT 5

QUERY 5]6] – Top 5 states with highest average time to delivery (Also, top 5 states where deliver is
really fast)

WITH
  delivery_data AS (
  SELECT
    ord.order_id AS order_id,
    DATETIME_DIFF(EXTRACT(date
      FROM
        ord.order_purchase_timestamp),EXTRACT(date
      
FROM
        ord.order_delivered_customer_date), day) AS time_to_delivery,
    DATETIME_DIFF(EXTRACT(date
      FROM
        ord.order_estimated_delivery_date),EXTRACT(date
      FROM
        ord.order_delivered_customer_date), day) AS diff_estimated_delivery
  FROM
    Target.orders ord)
SELECT
  bzs.string_field_1 AS state_code,
  bzs.string_field_0 AS state_name,
  ROUND(AVG(ordit.freight_value),2) mean_freight_value,
  ROUND(AVG(time_to_delivery),2) mean_time_to_delivery,
  ROUND(AVG(diff_estimated_delivery),2) mean_diff_estimated_delivery
FROM
  `Target.order_items` ordit
JOIN
  Target.orders ord
ON
  ordit.order_id = ord.order_id
JOIN
  Target.customers cust
ON
  ord.customer_id = cust.customer_id
JOIN
  `Target.Brazil_states` bzs
ON
  cust.customer_state = bzs.string_field_1
JOIN
  delivery_data dvd
ON
  ord.order_id = dvd.order_id
GROUP BY
  bzs.string_field_1,
  bzs.string_field_0
ORDER BY
  mean_time_to_delivery desc
LIMIT 5

QUERY 4 - Top 5 states with highest average time to delivery (Also, top 5 states where delivery is
really slow)

WITH
  delivery_data AS (
  SELECT
    ord.order_id AS order_id,
    DATETIME_DIFF(EXTRACT(date
      FROM
        ord.order_purchase_timestamp),EXTRACT(date
      FROM
        ord.order_delivered_customer_date), day) AS time_to_delivery,
    DATETIME_DIFF(EXTRACT(date
      FROM
        ord.order_estimated_delivery_date),EXTRACT(date
      FROM
        ord.order_delivered_customer_date), day) AS diff_estimated_delivery
  FROM
    Target.orders ord)
SELECT
  bzs.string_field_1 AS state_code,
  bzs.string_field_0 AS state_name,
  ROUND(AVG(ordit.freight_value),2) mean_freight_value,
  ROUND(AVG(time_to_delivery),2) mean_time_to_delivery,
  ROUND(AVG(diff_estimated_delivery),2) mean_diff_estimated_delivery
FROM
  `Target.order_items` ordit
JOIN
  Target.orders ord
ON
  ordit.order_id = ord.order_id
JOIN
  Target.customers cust
ON
  ord.customer_id = cust.customer_id
JOIN
  `Target.Brazil_states` bzs  
ON cust.customer_state = bzs.string_field_1
JOIN
  delivery_data dvd

ON
  ord.order_id = dvd.order_id
GROUP BY
  bzs.string_field_1,
  bzs.string_field_0
ORDER BY
  mean_time_to_delivery
LIMIT 5

Payment Type Analysis


- QUERY- Month over Months count of orders for different payment types

WITH
  temp1 AS (
  SELECT
    EXTRACT(Year
    FROM
      ord.order_purchase_timestamp) AS Years,
    EXTRACT(MONTH
    FROM
      ord.order_purchase_timestamp) AS Months,
    pay.payment_type,
    COUNT( DISTINCT ord.order_id) AS no_of_payments
  FROM
    Target.orders ord
  JOIN
    Target.payments pay
  ON
    ord.order_id = pay.order_id
  GROUP BY
    pay.payment_type,
    Months,
    Years
  ORDER BY
    Years,
    Months,
    pay.payment_type)
SELECT
  *,
  DENSE_RANK() OVER(PARTITION BY Years, Months ORDER BY no_of_payments DES
C) rank_by_no_of_payments
FROM
  temp1
ORDER BY
  Years,Months

You might also like