Most_Confusing_SQL_Functions
Most_Confusing_SQL_Functions
4. JOIN vs UNION
7. SUBQUERIES vs CTE
8. ISNULL vs COALESCE
You’re not alone! Below is a concise guide to help visualize the key differences between
these SQL commands and functions, complete with sample input data, SQL scripts, and
expected outputs.
• RANK(): Leaves gaps after ties (1,1,3). This function gives tied rows the same
rank and then skips subsequent numbers.
• DENSE_RANK(): No gaps (1,1,2). This function gives tied rows the same rank
without skipping numbers.
• ROW_NUMBER assigns a unique, sequential number to each row.
RANK:
• Assigns the same rank to tied values but leaves gaps afterward.
Result:
1. | name | score | rank |
2. |-------|-------|------|
3. | Alice | 95 | 1 |
4. | Bob | 95 | 1 |
5. | Carol | 90 | 3 |
6.
DENSE_RANK:
• Assigns the same rank to ties and does not leave gaps.
Result:
1. | name | score | dense_rank |
2. |-------|-------|------------|
3. | Alice | 95 | 1 |
4. | Bob | 95 | 1 |
5. | Carol | 90 | 2 |
6.
ROW_NUMBER:
Result:
1. | name | score | row_num |
2. |-------|-------|---------|
3. | Alice | 95 | 1 |
4. | Bob | 95 | 2 |
5. | Carol | 90 | 3 |
6.
"Show departments WHERE salary > 5000" vs "Show departments HAVING AVG(salary)
> 5000."
Expected Output:
1. | department | count |
2. |------------|-------|
3. | HR | 1 |
4. | IT | 2 |
5.
Expected Output:
1. | department | emp_count |
2. |------------|-----------|
3. | HR | 2 |
4. | IT | 3 |
5. | Sales | 2 |
6.
Sample Data:
1. Table1 (names):
2. | name |
3. |---------|
4. | Alice |
5. | Bob |
6. | Charlie |
7.
Table2 (names):
1. | name |
2. |-------|
3. | Alice |
4. | David |
5.
Expected Output:
1. | name |
2. |---------|
3. | Alice |
4. | Bob |
5. | Charlie |
6. | Alice |
7. | David |
8.
4. JOIN vs UNION
salaries
1. | emp_id | salary|
2. |--------|-------|
3. | 1 | 50000 |
4. | 2 | 60000 |
5. | 3 | 70000 |
6.
Expected Output:
1. | emp_id | name | salary|
2. |--------|-------|-------|
3. | 1 | Alice | 50000 |
4. | 2 | Bob | 60000 |
5.
Sample Queries:
1. SELECT name, 'employee' AS type FROM employees
2. UNION
3. SELECT name, 'manager' FROM (SELECT 2 AS emp_id, 'Bob' AS
name UNION SELECT 3, 'Carol');
4.
Expected Output:
1. | name | type |
2. |-------|----------|
3. | Alice | employee |
4. | Bob | employee |
5. | Bob | manager |
6. | Carol | manager |
7.
• DELETE: "I’ll remove these specific rows (and log every change)."
• TRUNCATE: "I’ll wipe ALL rows (and reset the counter)."
• DROP: "I’ll nuke the entire table (RIP)."
CTE Example:
1. WITH HighEarners AS (
2. SELECT FROM employees WHERE salary > 100000
3. )
4. SELECT FROM HighEarners;
5.
Expected Output:
1. | emp_id | name | salary |
2. |--------|-------|--------|
3. | 1 | Alice | 120000 |
4. | 3 | Carol | 110000 |
5.
Expected Output:
7. SUBQUERIES vs CTE
Using SUBQUERIES:
departments
1. | dept_id | location |
2. |---------|----------|
3. | 10 | NY |
4. | 20 | LA |
5.
Subquery Example:
1. SELECT name FROM employees
2. WHERE dept_id IN (SELECT dept_id FROM departments WHERE
location = 'NY');
3.
Expected Output:
1. | name |
2. |-------|
3. | Alice |
4. | Carol |
5.
CTE Example:
1. WITH NYDepts AS (
2. SELECT dept_id FROM departments WHERE location = 'NY'
3. )
4. SELECT name FROM employees WHERE dept_id IN (SELECT dept_id
FROM NYDepts);
5.
Expected Output:
8. ISNULL vs COALESCE
Expected Output:
1. | contact_id | phone |
2. |------------|--------|
3. | 1 | N/A |
4. | 2 | 987654 |
5. | 3 | N/A |
6.
Expected Output:
1. | contact_id | contact_number |
2. |------------|----------------|
3. | 1 | 123456 |
4. | 2 | 987654 |
5. | 3 | N/A |
6.
Table1
1. | id |
2. |----|
3. | 1 |
4. | 2 |
5. | 3 |
6.
Table2
1. | id |
2. |----|
3. | 2 |
4. | 3 |
5. | 4 |
6.
INTERSECT Example:
1. SELECT id FROM Table1
2. INTERSECT
3. SELECT id FROM Table2;
4.
Expected Output:
1. | id |
2. |----|
3. | 2 |
4. | 3 |
5.
Table1
1. | id | name |
2. |----|------|
3. | 1 | A |
4. | 2 | B |
5. | 3 | C |
6.
Table2
1. | id | value |
2. |----|-------|
3. | 2 | X |
4. | 3 | Y |
5. | 4 | Z |
6.
Expected Output:
1. | id | name | value |
2. |----|------|-------|
3. | 2 | B | X |
4. | 3 | C | Y |
5.
Table1
1. | id |
2. |----|
3. | 1 |
4. | 2 |
5. | 3 |
6. | 4 |
7.
Table2
1. | id |
2. |----|
3. | 2 |
4. | 4 |
5.
EXCEPT Example:
1. SELECT id FROM Table1
2. EXCEPT
3. SELECT id FROM Table2;
4.
Expected Output:
1. | id |
2. |----|
3. | 1 |
4. | 3 |
5.
NOT IN Example:
1. SELECT id FROM Table1
2. WHERE id NOT IN (SELECT id FROM Table2);
3.
Expected Output:
1. | id |
2. |----|
3. | 1 |
4. | 3 |
5.
Definitions:
• INNER JOIN: Returns rows when there is a match between the tables.
• LEFT JOIN: Returns all rows from the left table and matching rows from the right
table. Unmatched rows in the right table return NULL.
• RIGHT JOIN: Returns all rows from the right table and matching rows from the left
table. Unmatched rows in the left table return NULL.
• FULL JOIN: Returns rows when there is a match in either the left or the right table,
filling unmatched rows with NULL.
Sample Tables:
Customers Table
| CustomerID | Name | City |
|------------|--------|-------|
| 1 | Alice | Paris |
| 2 | Bob | Tokyo |
| 3 | Charlie| Delhi |
Orders Table:
| OrderID | CustomerID | Product |
|---------|------------|---------|
| 101 | 1 | Laptop |
| 102 | 2 | Phone |
| 103 | 4 | Camera |
INNER JOIN
SELECT Customers.Name, Orders.Product
FROM Customers
INNER JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
Output:
| Name | Product |
|-------|---------|
| Alice | Laptop |
| Bob | Phone |
LEFT JOIN
SELECT Customers.Name, Orders.Product
FROM Customers
LEFT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
Output:
| Name | Product |
|---------|---------|
| Alice | Laptop |
| Bob | Phone |
| Charlie | NULL |
RIGHT JOIN
SELECT Customers.Name, Orders.Product
FROM Customers
RIGHT JOIN Orders ON Customers.CustomerID = Orders.CustomerID;
Output:
| Name | Product |
|-------|---------|
| Alice | Laptop |
| Bob | Phone |
| NULL | Camera |
FULL JOIN
Output:
| Name | Product |
|---------|---------|
| Alice | Laptop |
| Bob | Phone |
| Charlie | NULL |
| NULL | Camera |
Definitions:
• LAG(): Accesses data from the previous row in the result set.
• LEAD(): Accesses data from the next row in the result set.
Sample Table:
Sales Table:
| SaleID | Product | SaleAmount | SaleDate |
|--------|---------|------------|------------|
| 1 | Laptop | 1000 | 2023-01-01 |
| 2 | Phone | 500 | 2023-01-02 |
| 3 | Tablet | 700 | 2023-01-03 |
LAG() Example
SELECT Product, SaleAmount, LAG(SaleAmount) OVER (ORDER BY
SaleDate) AS PreviousSale
FROM Sales;
Output:
| Product | SaleAmount | PreviousSale |
|---------|------------|--------------|
| Laptop | 1000 | NULL |
| Phone | 500 | 1000 |
| Tablet | 700 | 500 |
LEAD() Example
That's a great follow-up! Even though LAG() and LEAD() may return NULL in some cases
(especially at the boundaries of data), they are still incredibly valuable for many use cases.
In datasets where order matters (like time-series data), you often need to compare the
current row with a previous or next row to analyze trends.
E.g., "Has the sale increased or decreased compared to the previous day?"
2. Trend Analysis:
They are essential for calculating moving averages, growth rates, or detecting anomalies.
Even when NULL is returned, it's a helpful indicator that there is no relevant value
available (like missing events or boundary conditions).
4. Flexible Calculation:
You can handle NULL gracefully using functions like COALESCE() to replace them with
default values (like 0 or the current row’s value).
Practical Demonstration
Sales Data Example
| SaleDate | Product | SaleAmount |
|-----------|---------|------------|
| 2023-01-01 | Laptop | 1000 |
| 2023-01-02 | Phone | 800 |
| 2023-01-03 | Tablet | 1200 |
SQL Query
SELECT
SaleDate, Product, SaleAmount,
LAG(SaleAmount) OVER (ORDER BY SaleDate) AS PreviousSale,
LEAD(SaleAmount) OVER (ORDER BY SaleDate) AS NextSale,
SaleAmount - LAG(SaleAmount) OVER (ORDER BY SaleDate) AS
SaleChange
FROM Sales;
Result:
|SaleDate |Product |SaleAmount| PreviousSale| NextSale| SaleChange|
|------------|---------|----------|------------------
| 2023-01-01 | Laptop | 1000 | NULL | 800 | NULL |
| 2023-01-02 | Phone | 800 | 1000 | 1200 | -200 |
| 2023-01-03 | Tablet | 1200 | 800 | NULL | 400 |
• In row 1, NULL for PreviousSale clearly shows it's the first row with no earlier
reference.
• In row 3, NULL for NextSale indicates it's the last row, so there's no subsequent
sale to compare.
• The SaleChange column provides meaningful insights for the rows that have
valid comparisons.
Even though NULLs appear at boundaries, they are indicators of valid structural limits in
your dataset. Handling them with logic (like COALESCE() or conditional checks) ensures
these functions remain powerful for insights. Would you like suggestions on how to
visualize or analyze this further?
1. Moving Averages
A moving average is a calculation that takes the average of a fixed number of recent data
points. It's useful for smoothing out short-term fluctuations and identifying long-term
trends.
Sample Data:
| SaleDate | SaleAmount |
|------------|------------|
| 2023-01-01 | 100 |
| 2023-01-02 | 200 |
| 2023-01-03 | 300 |
| 2023-01-04 | 400 |
| 2023-01-05 | 500 |
SELECT
SaleDate,
SaleAmount,
AVG(SaleAmount) OVER (ORDER BY SaleDate ROWS BETWEEN 2
PRECEDING AND CURRENT ROW) AS MovingAvg3Days
FROM Sales;
Result:
| SaleDate | SaleAmount | MovingAvg3Days |
|------------|------------|----------------|
| 2023-01-01 | 100 | 100 |
| 2023-01-02 | 200 | 150 |
| 2023-01-03 | 300 | 200 |
| 2023-01-04 | 400 | 300 |
| 2023-01-05 | 500 | 400 |
Explanation: The average smooths out the sharp changes and shows a more stable trend.
2. Growth Rates
A growth rate measures the percentage change from one period to the next.
Sample Data:
| Month | SaleAmount |
|------------|------------|
| 2023-01 | 1000 |
| 2023-02 | 1500 |
| 2023-03 | 2000 |
SELECT
Month,
SaleAmount,
(SaleAmount - LAG(SaleAmount) OVER (ORDER BY Month)) * 100.0
/ LAG(SaleAmount) OVER (ORDER BY Month) AS GrowthRate
FROM Sales;
Result:
| Month | SaleAmount | GrowthRate |
|------------|------------|------------|
| 2023-01 | 1000 | NULL |
| 2023-02 | 1500 | 50.00 |
| 2023-03 | 2000 | 33.33 |
Explanation: In February, sales grew by 50% compared to January, and in March, they grew
by 33.33% compared to February.
3. Detecting Anomalies
An anomaly is an unusual or unexpected data point that differs significantly from the rest.
Use Case: Detecting days when sales are significantly higher than the moving average
Sample Data:
| SaleDate | SaleAmount |
|------------|------------|
| 2023-01-01 | 100 |
| 2023-01-02 | 200 |
| 2023-01-03 | 3000 | -- Anomaly
| 2023-01-04 | 400 |
| 2023-01-05 | 500 |
Explanation: The sales on `2023-01-03` are much higher than the moving average, flagging
it as an anomaly.
Summary
This guide, complete with sample input and output data, should help novice users clearly
understand the differences between these SQL commands and functions. Enjoy exploring
and practicing these concepts!