SQL Material
SQL Material
When customers place orders, each order has a well-defined structure: an order ID, customer ID, list
of products, quantities, prices, order date, and so on. Here, we also have multiple relationships to
track — for example:
Because of these structured relationships, a relational database (like MySQL, PostgreSQL, or Oracle)
is ideal. This database can use tables to store data:
Order_Product table to handle the many-to-many relationship between orders and products.
With SQL queries, we can easily retrieve data like, “Show all orders by a specific customer” or “List
products purchased within a date range.” The structure and relationships make relational databases
highly reliable and efficient for such structured, well-defined data.
Our products may have varied attributes — some might have size and color, while others might have
expiration dates or manufacturer information. Since product attributes can vary widely, it’s hard to
fit this data into a fixed table structure without lots of empty or unnecessary fields.
A non-relational database (like MongoDB or Cassandra) is flexible enough to store this kind of
unstructured data. We can store each product as a document with its own unique fields. For
example:
A T-shirt document might have fields for size, color, and material.
With this approach, each product can be represented in a way that best suits its specific data,
without rigid schema requirements. This flexibility is ideal for scenarios where data formats vary or
evolve over time.
• It is a type of SQL command used to define data structures and modify data. It
creates, alters, and deletes database objects such as tables, views, indexes, and
users.
• It is used to retrieve data from a database. The main DQL command is SELECT,
which allows you to query or search through tables to find specific information.
=================================================================================
a sample dataset for a hospital management system with the following tables: Patients,
Appointments, Doctors, and Departments. We’ll go through several business scenarios, posing
questions that might arise in each, and provide example SQL queries in DDL, DML, and DQL.
first_name VARCHAR(50),
last_name VARCHAR(50),
date_of_birth DATE,
gender VARCHAR(10),
phone VARCHAR(15),
address VARCHAR(100)
);
location VARCHAR(100)
);
first_name VARCHAR(50),
last_name VARCHAR(50),
specialty VARCHAR(50),
department_id INT,
);
patient_id INT,
doctor_id INT,
appointment_date DATE,
appointment_time TIME,
status VARCHAR(20),
);
Alter Table:
Add a new column to the Patients table to store email addresses for each patient
Increase the length of the phone column in the Patients table to store international phone
numbers.
Rename the location column in the Departments table to department_location for clarity
Syntax:
INSERT INTO Departments (col1,col2 ,col3) VALUES(value1,value2,value3)
INSERT INTO Patients (patient_id, first_name, last_name, date_of_birth, gender, phone, address)
VALUES
Update:
Update the location of the "Radiology" department to "First Floor."
UPDATE Departments
UPDATE Departments
Relocate all departments currently on the "Second Floor" to the "Third Floor."
UPDATE Departments
UPDATE Doctors
UPDATE Doctors
SET department_id = 7
Delete a specific appointment that was mistakenly scheduled (e.g., appointment_id = 1005).
Delete all past appointments (appointments before today) from the Appointments table.
WHERE patient_id = 1;
Remove all patient records where no phone number has been provided.
Remove doctors who are assigned to departments that no longer exist in the Departments table.
FROM Patients
Department Insights:
List all department names and locations.
FROM Doctors
GROUP BY department_id
LIMIT 1;
DCL : Data Control Language:
Questions:
Grant read-only access to the Patients and Appointments tables for a user named
receptionist_user
Grant a doctor access to view and update their own appointment records. Assume
the doctor’s username is doctor_user.
GRANT SELECT, UPDATE ON Appointments TO doctor_user;
Grant full access to the Doctors table to a department head user, allowing them to
manage doctor records in their department.
GRANT SELECT, INSERT, UPDATE, DELETE ON Doctors TO department_head_user;
Order of Execution
Objective: Retrieve a list of doctors with the total number of completed appointments in 2024, only
including doctors who had at least 5 completed appointments in this period. The results should be
ordered by the total number of appointments in descending order.
SELECT
doctor_id,
COUNT(appointment_id) AS total_completed_appointments
FROM
Appointments
WHERE
status = 'Completed'
GROUP BY
doctor_id
HAVING
COUNT(appointment_id) >= 5
ORDER BY
total_completed_appointments DESC;
FROM: The query starts by identifying the table from which to retrieve data (Appointments).
WHERE: The WHERE clause filters the rows based on the condition specified. Only rows where status
= 'Completed' and appointment_date is in the year 2024 are selected.
GROUP BY: After filtering, the GROUP BY clause groups the rows by doctor_id, so we can perform
aggregate functions on each group.
HAVING: The HAVING clause applies a filter to the groups created by GROUP BY. In this case, it keeps
only those groups (i.e., doctors) with a COUNT of appointment_id that is 5 or more.
SELECT: The SELECT clause is processed next. It retrieves doctor_id and the calculated
total_completed_appointments (the result of COUNT(appointment_id)).
ORDER BY: Finally, the results are sorted by total_completed_appointments in descending order.
Final Output
The result will be a list of doctors who have completed at least 5 appointments in 2024, with each
doctor’s ID and their total completed appointments, sorted from the highest to the lowest count.
This demonstrates SQL’s logical execution order, which is important for understanding how filters
and aggregations apply to data.
3. Advanced SQL Queries
Complex Select Queries:
Scenario: Retrieve the patient name, doctor name, appointment date, and department for all
scheduled appointments, sorted by department and appointment date. If the appointment has a
note, include it; otherwise, show "No Notes Available."
SELECT
P.patient_name AS Patient,
D.doctor_name AS Doctor,
A.appointment_date AS Appointment_Date,
Dept.department_name AS Department,
FROM
Appointments A
JOIN
JOIN
JOIN
A.status = 'Scheduled'
ORDER BY
Dept.department_name, A.appointment_date;
Filters only "Scheduled" appointments and sorts them by department and date.
SELECT DISTINCT
Scenario: Find all unique appointment statuses recorded in the hospital.
FROM Appointments;
Explanation:
The DISTINCT keyword ensures that each unique value of status appears only once in the result.
SELECT
P.patient_name AS Patient_Name,
D.doctor_name AS Doctor_Name,
A.appointment_date AS Appointment_Date
FROM
Appointments A
JOIN
JOIN
Doctors D ON A.doctor_id = D.doctor_id;
Explanation:
AS assigns temporary names (aliases) to columns for better readability in the output.
SELECT
appointment_id AS Appointment_ID,
patient_id AS Patient_ID,
doctor_id AS Doctor_ID,
appointment_date AS Appointment_Date
FROM
Appointments
WHERE
notes IS NULL;
Scenario:
The hospital administration wants to understand key metrics for completed appointments,
including:
SELECT
COUNT(appointment_id) AS Total_Completed_Appointments,
SUM(fee) AS Total_Revenue,
AVG(fee) AS Average_Fee,
MIN(fee) AS Minimum_Fee,
MAX(fee) AS Maximum_Fee
FROM
Appointments
WHERE
status = 'Completed';
Explanation:
MIN and MAX return the smallest and largest fees respectively.
GROUP BY Clause
Definition:
Used with aggregate functions to perform calculations for each group separately.
Scenario: The hospital management wants to know how many appointments each doctor has
handled, grouped by their name.
SELECT
D.doctor_name AS Doctor_Name,
COUNT(A.appointment_id) AS Total_Appointments
FROM
Appointments A
JOIN
GROUP BY
D.doctor_name;
Explanation:
● GROUP BY groups data by doctor names.
● COUNT calculates the number of appointments for each doctor.
● This is useful for understanding workload distribution among doctors.
Having Clause:
Definition:
Scenario : Management wants to identify doctors who have handled more than 10 appointments so
far.
SELECT
D.doctor_name AS Doctor_Name,
COUNT(A.appointment_id) AS Total_Appointments
FROM
Appointments A
JOIN
GROUP BY
D.doctor_name
HAVING
Explanation:
● GROUP BY groups data by each doctor’s name.
● HAVING filters groups to include only doctors with more than 10 appointments.
SELECT
Dept.department_name AS Department_Name,
COUNT(A.appointment_id) AS Total_Appointments,
SUM(A.fee) AS Total_Revenue
FROM
Appointments A
JOIN
JOIN
GROUP BY
Dept.department_name
HAVING
To get summarized metrics, like total revenue, average consultation time, or the range of fees.
GROUP BY Clause:
To break data into categories or groups (e.g., by doctor, department, or patient demographics).
HAVING Clause:
To filter results of grouped data based on aggregate values, such as identifying departments with
high revenue or doctors with many appointments.
JOINS:
Patients Table
INSERT INTO Patients (patient_id, first_name,last_name, gender, age) VALUES
Appointments Table
Doctors Table
ALTER TABLE Doctors ADD experience_yearsINT;
1. INNER JOIN
Definition:
● An INNER JOIN retrieves rows that have matching values in both tables based on the
specified condition.
The hospital wants a list of patients who have scheduled appointments, including their names,
appointment dates, and fees.
Query:
SELECT
P.patient_name AS Patient_Name,
A.appointment_date AS Appointment_Date,
A.fee AS Appointment_Fee
FROM
Patients P
INNER JOIN
WHERE
A.status = 'Scheduled';
Explanation:
● Combines Patients and Appointments tables where patient_id matches in both tables.
Example Output:
● A LEFT JOIN retrieves all rows from the left table and the matching rows from the right table.
Rows from the left table with no match will show NULL for right table columns.
● A RIGHT JOIN is the opposite: all rows from the right table and the matching rows from the
left table.
The hospital wants to list all patients, including those who haven’t booked any appointments.
SELECT
P.patient_name AS Patient_Name,
A.appointment_date AS Appointment_Date,
A.status AS Appointment_Status
FROM
Patients P
LEFT JOIN
Explanation:
Example Output:
The management wants to list all appointments, including those that aren’t associated with any
patient.
SELECT
A.appointment_id AS Appointment_ID,
P.patient_name AS Patient_Name,
A.appointment_date AS Appointment_Date
FROM
Patients P
RIGHT JOIN
Example Output:
● A FULL OUTER JOIN retrieves all rows from both tables, matching rows where possible.
Unmatched rows from either table will have NULL values for missing columns.
The hospital wants a complete list of all patients and all appointments, showing matches where
possible.
Query:
SELECT
P.patient_name AS Patient_Name,
A.appointment_date AS Appointment_Date,
A.status AS Appointment_Status
FROM
Patients P
Example Output:
4. Self-Joins
Definition:
● A SELF-JOIN is when a table is joined to itself. Often used to compare rows within the same
table.
Query:
SELECT
S1.staff_name AS Staff_Name,
S2.staff_name AS Manager_Name
FROM
Staff S1
LEFT JOIN
Explanation:
Example Output:
Staff_Name Manager_Name
5. CROSS JOIN
Definition:
● A CROSS JOIN produces a Cartesian product of two tables, pairing every row from the first
table with every row from the second.
The hospital wants to see all possible combinations of doctors and departments to explore possible
rotations.
Query:
SELECT
D.doctor_name AS Doctor_Name,
Dept.department_name AS Department_Name
FROM
Doctors D
CROSS JOIN
Departments Dept;
Explanation:
● Useful for generating combinations when testing scenarios or generating all possible
options.
Example Output:
Doctor_Name Department_Name
Summary
INNER JOIN Match rows in both tables Only include rows with matches.
LEFT JOIN Include all rows from the left table Include unmatched rows from the left table.
RIGHT JOIN Include all rows from the right table Include unmatched rows from the right table.
SELF-JOIN Compare rows within the same table Explore hierarchical or relationship data.
DDL and DML Queries for these topics and new tables required for it
Patients
Appointments
INSERT INTO Appointments (appointment_id, patient_id, doctor_id, appointment_date, status, fee)
VALUES
Doctors Table
Departments Table
(9, 'Oncology'),
(12, 'Nephrology'),
(13, 'Gastroenterology');
Staff Table:
role VARCHAR(50),
);
Basic Subqueries
Definition:
A subquery is a query inside another query. Subqueries help break down complex logic into smaller,
manageable pieces.
Types:
● Single-Row Subquery: Returns only one value.
● Multiple-Row Subquery: Returns multiple values.
1. Single-Row Subquery
Scenario 1 (Simple):
Query:
SELECT department_name
FROM Departments
WHERE department_id = (
SELECT department_id
FROM Doctors
WHERE experience_years = (SELECT MAX(experience_years) FROM Doctors)
);
Explanation:
Scenario 2 (Medium):
Find the patient name who has paid the highest appointment fee.
Query:
SELECT patient_name
FROM Patients
WHERE patient_id = (
SELECT patient_id
FROM Appointments
WHERE fee = (SELECT MAX(fee) FROM Appointments)
);
Explanation:
2. Multiple-Row Subqueries
Scenario 1 (Simple):
List the names of patients who have scheduled appointments with doctors in the "Oncology"
department.
Query:
SELECT patient_name
FROM Patients
WHERE patient_id IN (
SELECT patient_id
FROM Appointments
WHERE doctor_id IN (
SELECT doctor_id
FROM Doctors
WHERE department_id = (SELECT department_id FROM Departments WHERE
department_name = 'Oncology')
)
);
Explanation:
Scenario 2 (Medium):
SELECT department_name
FROM Departments
WHERE department_id IN (
SELECT department_id
FROM Doctors
WHERE doctor_id IN (
SELECT doctor_id
FROM Appointments
WHERE status = 'Scheduled'
GROUP BY doctor_id
HAVING COUNT(*) >= 2
)
);
Explanation:
● The innermost subquery groups appointments by doctor and checks for at least two
scheduled appointments.
● The second subquery identifies departments of these doctors.
● The outer query lists department names.
Correlated Subqueries
Definition:
In a correlated subquery, the inner query refers to columns from the outer query. This subquery runs
once for each row of the outer query.
Scenario 1 (Simple):
Query:
SELECT doctor_name
FROM Doctors D
WHERE EXISTS (
SELECT 1
FROM Appointments A
WHERE A.doctor_id = D.doctor_id
GROUP BY A.doctor_id
HAVING COUNT(*) > 1
);
Explanation:
1. The subquery groups appointments by doctor ID and checks for those with more than one
appointment.
2. The EXISTS keyword filters only doctors matching this condition.
Scenario 2 (Medium):
Find patients whose appointment fee is higher than the average fee for their doctor.
Query:
SELECT patient_name
FROM Patients P
WHERE EXISTS (
SELECT 1
FROM Appointments A
WHERE A.patient_id = P.patient_id
AND A.fee > (
SELECT AVG(fee)
FROM Appointments
WHERE doctor_id = A.doctor_id
)
);
Explanation:
● The subquery calculates the average fee for each doctor.
● The outer query checks if a patient's fee exceeds their doctor's average fee.
Query:
SELECT department_name
FROM Departments D
WHERE EXISTS (
SELECT 1
FROM Doctors Doc
WHERE Doc.department_id = D.department_id
);
Explanation:
SELECT patient_name
FROM Patients P
WHERE NOT EXISTS (
SELECT 1
FROM Appointments A
WHERE A.patient_id = P.patient_id
);
Explanation:
A Common Table Expression (CTE) is a temporary result set that simplifies queries and improves
readability. It can also be recursive for hierarchical data.
Query:
WITH DepartmentFees AS (
SELECT D.department_name, SUM(A.fee) AS TotalFees
FROM Appointments A
JOIN Doctors Doc ON A.doctor_id = Doc.doctor_id
JOIN Departments D ON Doc.department_id = D.department_id
WHERE A.status = 'Completed'
GROUP BY D.department_name
)
SELECT department_name, TotalFees
FROM DepartmentFees;
Explanation:
Query:
WITH StaffHierarchy AS (
SELECT staff_id, staff_name, role, manager_id
FROM Staff
WHERE manager_id IS NULL -- Top-level manager (Director)
UNION ALL
SELECT S.staff_id, S.staff_name, S.role, S.manager_id
FROM Staff S
INNER JOIN StaffHierarchy SH ON S.manager_id = SH.staff_id
)
SELECT *
FROM StaffHierarchy;
Explanation:
Departments
VALUES
Doctors
VALUES
Patients
VALUES
Appointments
VALUES
Definition:
● UNION combines the results of two or more queries and removes duplicates.
● UNION ALL combines the results but keeps duplicates.
Business Scenario 1:
Explanation:
● Combines staff names and doctor names into a single result set.
● Removes duplicate names, providing a list of unique names across both groups.
Business Scenario 2:
Explanation:
2. INTERSECT
Definition:
INTERSECT retrieves only the rows that are common between two queries.
Business Scenario:
Identify names that are common between the Staff and Doctors tables (e.g., a staff member who is
also a doctor).
Query:
Explanation:
3. EXCEPT
Definition:
EXCEPT retrieves rows from the first query that are not present in the second query.
Business Scenario 1:
Query:
Explanation:
Business Scenario 2:
Query:
Explanation:
● Returns staff names that are not found in the list of doctors.
● Helps in understanding the distinct roles within the hospital.
Definition:
This involves integrating or comparing data from different queries using UNION, INTERSECT, or
EXCEPT.
Business Scenario:
Explanation:
Summary:
Definition:
The CASE WHEN statement applies conditional logic to query results, transforming or classifying data
based on specified criteria.
Basic Scenario:
Query:
SELECT
patient_id,
patient_name,
age,
CASE
WHEN age < 18 THEN 'Minor'
WHEN age BETWEEN 18 AND 60 THEN 'Adult'
ELSE 'Senior'
END AS age_group
FROM Patients;
Explanation:
Medium Scenario:
Determine whether appointments are "On Time", "Late", or "Missed" based on appointment status.
Query:
SELECT
appointment_id,
patient_id,
doctor_id,
status,
CASE
WHEN status = 'Completed' THEN 'On Time'
WHEN status = 'Cancelled' THEN 'Missed'
ELSE 'Late'
END AS appointment_status
FROM Appointments;
Explanation:
Definition:
Basic Scenario:
Query:
SELECT
CASE
WHEN age < 18 THEN 'Minor'
WHEN age BETWEEN 18 AND 60 THEN 'Adult'
ELSE 'Senior'
END AS age_group,
COUNT(*) AS patient_count
FROM Patients
GROUP BY
CASE
WHEN age < 18 THEN 'Minor'
WHEN age BETWEEN 18 AND 60 THEN 'Adult'
ELSE 'Senior'
END;
Explanation:
Medium Scenario:
Calculate the total revenue collected for "Completed" and "Cancelled" appointments.
Query:
SELECT
CASE
WHEN status = 'Completed' THEN 'Completed'
ELSE 'Cancelled'
END AS appointment_status,
SUM(fee) AS total_revenue
FROM Appointments
GROUP BY
CASE
WHEN status = 'Completed' THEN 'Completed'
ELSE 'Cancelled'
END;
Explanation:
Advanced Scenario:
Identify the doctor with the highest number of appointments for each appointment type.
Query:
SELECT
doctor_id,
CASE
WHEN status = 'Completed' THEN 'Completed'
WHEN status = 'Cancelled' THEN 'Cancelled'
ELSE 'Scheduled'
END AS appointment_status,
COUNT(*) AS total_appointments
FROM Appointments
GROUP BY
doctor_id,
CASE
WHEN status = 'Completed' THEN 'Completed'
WHEN status = 'Cancelled' THEN 'Cancelled'
ELSE 'Scheduled'
END
ORDER BY doctor_id, total_appointments DESC;
Explanation:
Key Takeaways
Topic Syntax Use Case Difficulty Key Learning
CASE WHEN with Multiple group Advanced grouping Advanced Combines grouping,
Aggregation conditions with aggregation logic, and
logic summarization.
Definition:
Window functions allow you to perform calculations across a "window" of rows related to the
current row without collapsing the result set. They are particularly useful for ranking, running totals,
comparisons, and advanced analytics.
1. OVER() Clause: Defines the scope or "window" of rows for the function.
2. PARTITION BY Clause: Divides the dataset into partitions (groups) for calculations.
3. ORDER BY Clause: Specifies the order of rows within each partition or window.
4. Types of Window Functions:
○ Ranking Functions: Assign a rank or position to rows (e.g., ROW_NUMBER,
RANK, DENSE_RANK).
○ Aggregate Functions as Window Functions: Perform calculations such as SUM,
AVG, or COUNT but maintain row-level data.
○ Analytic Functions: Provide advanced insights, such as LEAD, LAG,
FIRST_VALUE, LAST_VALUE, or NTILE.
5. Frame Clause: Specifies subsets of rows for each calculation using ROWS BETWEEN or
RANGE BETWEEN.
Basic Syntax:
SELECT
column1,
window_function() OVER ([PARTITION BY column2] [ORDER BY column3]) AS
alias_name
FROM table_name;
Examples
Simple Query:
Find the total revenue for each doctor while keeping row-level appointment details.
Query:
SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id) AS total_revenue_by_doctor
FROM Appointments;
Output:
Explanation:
Medium Query:
Query:
SELECT
doctor_id,
appointment_id,
fee,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_by_fee
FROM Appointments;
Output:
302 202 0 2
303 207 0 2
Explanation:
2. Analytic Functions
Simple Query:
SELECT
doctor_id,
appointment_id,
fee,
LAG(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date) AS
previous_fee
FROM Appointments;
Output:
Explanation:
● LAG(fee) gets the fee from the previous appointment for the same doctor.
● PARTITION BY doctor_id ORDER BY appointment_date ensures that fees are ordered
chronologically within each doctor.
Medium Query:
Find the cumulative total of appointment fees for each doctor over time.
Query:
SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date ROWS
BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS cumulative_fee
FROM Appointments;
Output:
Explanation:
3. Frame Clause
Simple Query:
Calculate a 3-day moving average of fees for each doctor.
Query:
SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date ROWS
BETWEEN 2 PRECEDING AND CURRENT ROW) AS moving_avg_fee
FROM Appointments;
Output:
Explanation:
● The window for AVG(fee) includes the current row and the two preceding rows.
● Provides a rolling average for trend analysis.
Key Takeaways
Feature Usage Difficulty Scenario
OVER() Clause Define window for calculations Basic Used in all window
functions.
Analytic Functions Compare current row with Medium Lead/Lag, trends, and row
other rows comparisons.
Frame Clause Define row ranges for Advanced Moving averages or running
advanced calculations totals.
1. Ranking Functions
Example 1: ROW_NUMBER
Rank appointments for each doctor based on the appointment fee in descending order.
Query:
SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
row_number_rank
FROM Appointments;
Explanation:
Example 2: RANK
Rank appointments for each doctor based on the appointment fee in descending order (with ties).
Query:
SELECT
doctor_id,
appointment_id,
fee,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties
FROM Appointments;
Explanation:
● Assigns a rank while leaving gaps for ties (e.g., if two rows tie at rank 1, the next rank is 3).
● Useful to compare values where ties matter.
Example 1: SUM
Query:
SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date) AS
cumulative_revenue
FROM Appointments;
Explanation:
● The cumulative revenue for each doctor is calculated by summing fees in chronological order
using ORDER BY appointment_date.
● PARTITION BY doctor_id ensures that the sum is calculated for each doctor
independently.
Example 2: AVG
Find the average fee collected by each doctor, shown alongside each appointment.
Query:
SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (PARTITION BY doctor_id) AS avg_fee_by_doctor
FROM Appointments;
Explanation:
3. Analytic Functions
Example 1: LEAD
Get the fee for the next appointment for each doctor.
Query:
SELECT
doctor_id,
appointment_id,
fee,
LEAD(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date) AS
next_appointment_fee
FROM Appointments;
Explanation:
● LEAD(fee) retrieves the fee of the next appointment in chronological order for the same
doctor.
● Helps to compare fees across consecutive appointments.
Example 2: FIRST_VALUE
SELECT
doctor_id,
appointment_id,
fee,
FIRST_VALUE(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_date) AS
first_appointment_fee
FROM Appointments;
Explanation:
● FIRST_VALUE(fee) picks the fee from the earliest appointment for each doctor, based on
appointment_date.
● Useful for determining baseline or initial data values.
Ranking Functions Ranking rows (with or without Useful for ordered comparisons or
ties) competitions.
These examples show how to use window functions to analyze data without collapsing it, offering
flexibility in handling real-world business scenarios.
4o
You said:
two examples for each function listed below based on above dataset ROW_NUMBER, RANK,
DENSE_RANK with explanation of code
ChatGPT said:
ChatGPT
Ranking Functions: Examples for ROW_NUMBER, RANK, DENSE_RANK
Scenario 1: Rank appointments for each doctor based on the appointment fee in descending order,
assigning unique ranks even for identical fees.
Query:
SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
unique_rank
FROM Appointments;
Explanation:
● ROW_NUMBER assigns a unique rank to each row within a group defined by PARTITION
BY doctor_id.
● ORDER BY fee DESC ranks appointments by fee in descending order.
● No ties are allowed; each row gets a distinct rank, regardless of identical fees.
Output:
302 202 0 2
2. RANK: Assign Ranks with Ties
Scenario 2: Rank appointments for each doctor based on fee in descending order, allowing tied
ranks.
Query:
SELECT
doctor_id,
appointment_id,
fee,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties
FROM Appointments;
Explanation:
● RANK assigns ranks to rows within each group defined by PARTITION BY doctor_id.
● Tied rows (rows with identical fees) receive the same rank.
● The rank sequence skips numbers after ties (e.g., if two rows tie at rank 1, the next rank will
be 3).
Output:
302 202 0 2
Scenario 3: Rank appointments for each doctor based on fee in descending order, ensuring ranks are
consecutive even with ties.
Query:
SELECT
doctor_id,
appointment_id,
fee,
DENSE_RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
dense_rank
FROM Appointments;
Explanation:
● DENSE_RANK works like RANK, but it does not skip numbers after ties.
● Tied rows receive the same rank, and the next rank is incremented consecutively.
Output:
302 202 0 2
ROW_NU Assigns a unique rank to each row. Ties are When a strict sequence is
MBER broken arbitrarily, and no gaps occur. required.
RANK Allows tied rows to have the same rank but When ties are important, but
skips numbers after ties. skipping ranks is acceptable.
DENSE_R Allows tied rows to have the same rank, and the When ties are important, and no
ANK next rank is assigned consecutively. gaps are desired.
Query:
SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
row_number_rank,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties,
DENSE_RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
dense_rank
FROM Appointments;
Output:
302 202 0 2 2 2
This table helps clearly see the differences between the ranking functions!
4o
You said:
along with above code , I need clear and easy definition of functions
ChatGPT said:
ChatGPT
Definition:
The ROW_NUMBER function assigns a unique sequential number to each row within a partition
(group of rows). Even if multiple rows have the same values in the column(s) being ordered,
ROW_NUMBER will not allow ties and will assign a unique number to each row.
Use Case:
When you need a strict sequence for rows, regardless of whether they are identical in ranking
criteria.
Definition:
The RANK function assigns a rank to rows within a partition. Rows with identical values (ties) are
assigned the same rank. After a tie, the ranking will skip numbers (gaps in ranks).
Use Case:
When you want to identify ties and maintain rank gaps for analysis (e.g., competitions or fee
rankings).
Definition:
The DENSE_RANK function works like RANK, but it does not skip numbers after ties. The next
rank is always consecutive, regardless of ties.
Use Case:
When you want tied rows to share the same rank and require consecutive ranks without gaps.
Scenario: Rank appointments for each doctor by fee in descending order, with no ties allowed.
Query:
SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
unique_rank
FROM Appointments;
Explanation:
● ROW_NUMBER(): Assigns a unique rank starting from 1 for each row in the group.
● PARTITION BY doctor_id: Each doctor's appointments are ranked independently.
● ORDER BY fee DESC: Fees are ranked from highest to lowest.
● Even if two fees are the same, they are assigned different ranks.
Query:
SELECT
doctor_id,
appointment_id,
fee,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties
FROM Appointments;
Explanation:
● RANK(): Assigns ranks starting from 1, allowing tied rows to share the same rank.
● Tied rows: If two rows have the same fee, they will share the same rank.
● Skipping ranks: If two rows tie at rank 1, the next row will have rank 3.
Query:
SELECT
doctor_id,
appointment_id,
fee,
DENSE_RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
dense_rank
FROM Appointments;
Explanation:
● DENSE_RANK(): Similar to RANK(), but the next rank after a tie is consecutive (no
skipped ranks).
● Useful when analyzing ranks without any gaps, such as performance levels.
ROW_NU Assigns a unique rank to each row, No ties; unique for Always sequential.
MBER no ties allowed. each row.
RANK Assigns rank with ties; skips ranks Tied rows share the Gaps exist after
after ties. same rank. ties.
DENSE_ Assigns rank with ties; no gaps in Tied rows share the Consecutive, no
RANK rank sequence. same rank. gaps.
Combined Example
Scenario: Compare all three ranking functions to analyze doctor appointment fees.
Query:
SELECT
doctor_id,
appointment_id,
fee,
ROW_NUMBER() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
row_number_rank,
RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS rank_with_ties,
DENSE_RANK() OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
dense_rank
FROM Appointments;
Output:
302 202 0 2 2 2
This example clearly illustrates the unique behaviors of ROW_NUMBER, RANK, and
DENSE_RANK.
4o
You said:
two examples for each window function listed below based on above dataset ,along with code , I
need clear and easy definition of functions SUM, AVG, or COUNT but maintain row-level data.
ChatGPT said:
ChatGPT
Window functions allow us to perform calculations across rows that are related to the current row,
without collapsing the data into aggregated results. This is particularly useful when you need to
maintain row-level data while adding insights.
1. SUM as a Window Function
Definition:
The SUM window function calculates a running total or group total for each row in the dataset,
maintaining the original row structure.
Scenario: You want to know the total fees collected by each doctor for their appointments while
keeping row-level data.
Query:
SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id) AS total_fees_per_doctor
FROM Appointments;
Explanation:
Output:
Scenario: You want to calculate the cumulative fees collected by each doctor across their
appointments in descending fee order.
Query:
SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
running_total_fees
FROM Appointments;
Explanation:
● ORDER BY fee DESC: Calculates the running total in descending order of fees.
● Each row contains the cumulative total up to that row within the partition (doctor_id).
Output:
The AVG window function calculates the average of a column across rows in a group while keeping
the individual rows intact.
Scenario: You want to find the average fee collected by each doctor for their appointments.
Query:
SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (PARTITION BY doctor_id) AS avg_fee_per_doctor
FROM Appointments;
Explanation:
Output:
Query:
SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS running_avg_fee
FROM Appointments;
Explanation:
● ORDER BY fee DESC: Calculates the running average in descending order of fees.
● Each row shows the average of all fees up to that row within the partition (doctor_id).
Output:
Definition:
The COUNT window function calculates the number of rows in a group or partition while
maintaining row-level data.
Example 1: Count Total Appointments Per Doctor
Scenario: You want to know how many appointments each doctor has.
Query:
SELECT
doctor_id,
appointment_id,
fee,
COUNT(appointment_id) OVER (PARTITION BY doctor_id) AS total_appointments
FROM Appointments;
Explanation:
Output:
302 202 0 2
Scenario: Count the cumulative number of appointments for each doctor in descending fee order.
Query:
SELECT
doctor_id,
appointment_id,
fee,
COUNT(appointment_id) OVER (PARTITION BY doctor_id ORDER BY fee DESC) AS
running_count
FROM Appointments;
Explanation:
● ORDER BY fee DESC: Calculates a running count based on descending fee order.
● Each row contains the count of appointments up to that row.
Output:
302 202 0 2
SUM Calculates a cumulative or grouped total Total fees collected per doctor or
while maintaining row-level data. running total fees.
AVG Calculates a cumulative or grouped average Average fees per doctor or running
while maintaining row-level data. average fees.
COUNT Calculates the count of rows while Total appointments per doctor or
maintaining row-level data. running count of appointments.
4o
You said:
two examples for each window function listed below based on above dataset ,along with code , I
need clear and easy definition of functions Analytic Functions: Provide advanced insights, such as
LEAD, LAG, FIRST_VALUE, LAST_VALUE, or NTILE
ChatGPT said:
ChatGPT
Analytic functions provide advanced insights by analyzing and comparing data within partitions or
rows. These include LEAD, LAG, FIRST_VALUE, LAST_VALUE, and NTILE. They help in
scenarios like trend analysis, identifying first or last records, or dividing data into equal parts.
1. LEAD Function
Definition:
The LEAD function retrieves the value of a column from the next row in the same partition. It's used
to compare current rows with future rows.
Scenario: You want to know the fee of the next appointment for each doctor.
Query:
sql
Copy code
SELECT
doctor_id,
appointment_id,
fee,
LEAD(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
next_appointment_fee
FROM Appointments;
Explanation:
● LEAD(fee): Fetches the fee of the next row within the same partition (doctor_id).
● ORDER BY appointment_id: Determines the order in which the next row is fetched.
Output:
Scenario: You want to identify the patient scheduled next for each doctor.
Query:
SELECT
doctor_id,
appointment_id,
patient_id,
LEAD(patient_id) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
next_patient_id
FROM Appointments;
Output:
2. LAG Function
Definition:
The LAG function retrieves the value of a column from the previous row in the same partition. It is
often used to compare current rows with past rows.
Scenario: You want to know the fee of the previous appointment for each doctor.
Query:
SELECT
doctor_id,
appointment_id,
fee,
LAG(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
prev_appointment_fee
FROM Appointments;
Output:
Scenario: You want to identify the patient scheduled before each current appointment for every
doctor.
Query:
SELECT
doctor_id,
appointment_id,
patient_id,
LAG(patient_id) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
prev_patient_id
FROM Appointments;
Output:
3. FIRST_VALUE Function
Definition:
The FIRST_VALUE function retrieves the first value of a column in the window or partition.
Scenario: You want to know the fee of the first appointment for each doctor.
Query:
SELECT
doctor_id,
appointment_id,
fee,
FIRST_VALUE(fee) OVER (PARTITION BY doctor_id ORDER BY appointment_id) AS
first_appointment_fee
FROM Appointments;
Output:
302 202 0 0
Scenario: You want to identify the first patient scheduled for each doctor.
Query:
SELECT
doctor_id,
appointment_id,
patient_id,
FIRST_VALUE(patient_id) OVER (PARTITION BY doctor_id ORDER BY appointment_id)
AS first_patient_id
FROM Appointments;
Output:
4. NTILE Function
Definition:
The NTILE function divides rows in a partition into a specified number of groups and assigns a rank
(bucket number) to each row.
Scenario: You want to divide appointments for each doctor into two equal groups based on their
fees.
Query:
SELECT
doctor_id,
appointment_id,
fee,
NTILE(2) OVER (PARTITION BY doctor_id ORDER BY fee) AS fee_group
FROM Appointments;
Output:
302 202 0 1
Scenario: You want to divide appointments into three groups based on their appointment_id.
Query:
SELECT
doctor_id,
appointment_id,
fee,
NTILE(3) OVER (ORDER BY appointment_id) AS appointment_group
FROM Appointments;
Output:
302 202 0 3
These examples highlight how analytic functions provide valuable insights by comparing and
analyzing data across rows.
4o
You said:
frame clause - range between and rows between give like above in SQL
ChatGPT said:
ChatGPT
Definition:
The frame clause specifies a subset of rows relative to the current row within a window. It is used in
window functions to define which rows are included in the computation. The two primary clauses
are:
1. RANGE BETWEEN: Operates on the logical value of rows (e.g., based on ORDER BY
values).
2. ROWS BETWEEN: Operates on the physical row positions within the result set.
1. RANGE BETWEEN
Definition:
Defines a range of rows based on values in the ORDER BY clause. It is used for scenarios like
cumulative sums or averages over a specific value range.
Query:
SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (
PARTITION BY doctor_id
ORDER BY appointment_id
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
) AS cumulative_fee
FROM Appointments;
Explanation:
Output:
302 202 0 0
Query:
SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (
PARTITION BY doctor_id
ORDER BY appointment_id
RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) AS avg_fee_range
FROM Appointments;
Explanation:
Output:
2. ROWS BETWEEN
Definition:
Defines a frame based on the physical number of rows around the current row, regardless of their
values.
Scenario: You want to calculate the rolling sum of fees for the current row and the previous row.
Query:
SELECT
doctor_id,
appointment_id,
fee,
SUM(fee) OVER (
PARTITION BY doctor_id
ORDER BY appointment_id
ROWS BETWEEN 1 PRECEDING AND CURRENT ROW
) AS rolling_sum_fee
FROM Appointments;
Explanation:
● ROWS BETWEEN 1 PRECEDING AND CURRENT ROW: Includes the current row
and one row preceding it.
Output:
302 202 0 0
Scenario: You want to calculate the average fee for the current row and the next two rows.
Query:
SELECT
doctor_id,
appointment_id,
fee,
AVG(fee) OVER (
PARTITION BY doctor_id
ORDER BY appointment_id
ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING
) AS rolling_avg_fee
FROM Appointments;
Explanation:
● ROWS BETWEEN CURRENT ROW AND 2 FOLLOWING: Includes the current row
and the next two rows.
Output:
Use Case Aggregates over value ranges. Aggregates over row positions.