SQL Interview Questions
SQL Interview Questions
3. How do you select all the employees who were hired after a specific date and who have a
salary greater than a specific amount?
4. Can you explain what a subquery is and provide an example of how it can be used?
6. Can you explain the difference between a clustered and non-clustered index?
10. Can you explain what a transaction is and how it can be used in SQL?
11. INNER JOIN and OUTER JOIN are two types of joins used in SQL. INNER JOIN returns only the
matching rows from both tables, while OUTER JOIN returns all the rows from one table and
matching rows from the other table.
12. To find the second highest salary in a table, you can use the following SQL query:
14. This query first finds the highest salary in the table and then selects the maximum salary that
is less than the highest salary. This will give you the second highest salary.
15. To select all employees who were hired after a specific date and have a salary greater than a
specific amount, you can use the following SQL query:
17. This query selects all the columns from the employees table where the hire date is after
January 1st, 2022, and the salary is greater than $50,000.
18. A subquery is a query within a query. It can be used to retrieve data that will be used in the
main query. Here is an example of a subquery:
20. In this query, the subquery retrieves the department_id for the Sales department, and the
main query selects all the employees who work in the Sales department.
21. To find the nth highest salary in a table, you can use the following SQL query:
24. A clustered index determines the physical order of data in a table, while a non-clustered
index creates a separate data structure that contains a copy of the indexed columns.
25. To delete duplicate rows in a table, you can use the following SQL query:
26. DELETE FROM table_name WHERE id NOT IN (SELECT MIN(id) FROM table_name GROUP BY
column1, column2, ...)
27. This query deletes all rows from the table except for the one with the lowest ID for each
unique combination of values in the specified columns.
28. A view is a virtual table that is created based on the result of a SELECT statement, while a
table is a physical storage structure that contains data.
29. To find the maximum salary for each department, you can use the following SQL query:
31. This query groups the employees by department and selects the maximum salary for each
department.
32. A transaction is a sequence of SQL statements that are treated as a single unit of work.
Transactions are used to ensure that all statements are completed successfully or rolled back
if an error occurs. For example, a transaction can be used to ensure that a bank transfer is
completed successfully, or rolled back if there are insufficient funds in the account.
1. How do you find the three most common words in a column of text?
2. Can you explain the difference between a correlated subquery and a non-correlated
subquery, and provide an example of each?
3. How do you write a query to find all the customers who have made a purchase every
month for the past six months?
4. Can you explain the difference between a left outer join and a right outer join, and
provide an example of each?
5. How do you find the top three products by revenue for each month in a given year?
6. How do you write a query to find the number of days between the first and last order
date for each customer?
7. Can you explain the difference between a primary key and a foreign key?
8. How do you write a query to find the average number of days between orders for each
customer?
9. Can you explain the difference between a scalar function and a table function, and
provide an example of each?
10. How do you find the longest consecutive streak of orders for each customer, and what is
the SQL function you would use to accomplish this?
1. To find the three most common words in a column of text, you can use the following SQL
query:
SELECT word, COUNT(*) AS count FROM ( SELECT regexp_split_to_table(text_column, E'\
s+') AS word FROM table_name ) AS words GROUP BY word ORDER BY count DESC LIMIT
3
This query first splits the text column into individual words using a regular expression,
and then counts the occurrences of each word. The results are then sorted by the count
and limited to the top three.
2. A correlated subquery is a subquery that references a column from the outer query, while
a non-correlated subquery does not. Here is an example of each:
Correlated subquery:
SELECT * FROM table1 t1 WHERE t1.column1 = (SELECT MAX(column1) FROM table2 t2
WHERE t2.column2 = t1.column2)
Non-correlated subquery:
SELECT * FROM table1 WHERE column1 IN (SELECT column1 FROM table2)
3. To find all the customers who have made a purchase every month for the past six
months, you can use the following SQL query:
SELECT customer_id FROM sales WHERE purchase_date >= (CURRENT_DATE - INTERVAL
'6 months') GROUP BY customer_id HAVING COUNT(DISTINCT DATE_TRUNC('month',
purchase_date)) = 6
This query selects all the sales in the past six months, groups them by customer ID, and
then counts the distinct number of months that the customer made a purchase.
Customers who made a purchase every month will have a count of 6, and these are the
customers that are selected.
4. A left outer join returns all the rows from the left table and matching rows from the right
table, while a right outer join returns all the rows from the right table and matching rows
from the left table. Here is an example of each:
Left outer join:
SELECT * FROM table1 LEFT JOIN table2 ON table1.column1 = table2.column1
Right outer join:
SELECT * FROM table1 RIGHT JOIN table2 ON table1.column1 = table2.column1
5. To find the top three products by revenue for each month in a given year, you can use
the following SQL query:
SELECT month, product, revenue FROM ( SELECT DATE_TRUNC('month', order_date) AS
month, product, SUM(price * quantity) AS revenue, ROW_NUMBER() OVER (PARTITION BY
DATE_TRUNC('month', order_date) ORDER BY SUM(price * quantity) DESC) AS rank FROM
orders WHERE EXTRACT(YEAR FROM order_date) = 2022 GROUP BY month, product ) AS
ranked WHERE rank <= 3
This query first groups the orders by month and product and calculates the revenue for
each product. The results are then ranked by revenue within each month using the
ROW_NUMBER() function. Finally, the top three products by revenue for each month are
selected.
6. To find the number of days between the first and last order date for each customer, you
can use the following SQL query:
SELECT customer_id, MAX(order_date) - MIN(order_date) AS days_between_orders FROM
orders GROUP BY customer_id
This query groups the orders by customer ID and calculates the difference between the
maximum and minimum order dates for each customer.
6. To find the number of days between the first and last order date for each
customer, you can use the following SQL query:
SELECT customer_id, MAX(order_date) - MIN(order_date) AS
days_between_orders FROM orders GROUP BY customer_id
This query groups the orders by customer ID and calculates the difference
between the maximum and minimum order dates for each customer.
7. A primary key is a column or set of columns that uniquely identifies each row
in a table, while a foreign key is a column or set of columns that refers to a
primary key in another table. The foreign key establishes a relationship
between the two tables, allowing rows in one table to refer to rows in another
table. Here is an example:
Primary key:
CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INT,
order_date DATE, ... );
Foreign key:
CREATE TABLE customers ( customer_id SERIAL PRIMARY KEY, name TEXT, ... );
CREATE TABLE orders ( order_id SERIAL PRIMARY KEY, customer_id INT
REFERENCES customers (customer_id), order_date DATE, ... );
In this example, the orders table has a foreign key that references the
customer_id column in the customers table.
8. To find the average number of days between orders for each customer, you
can use the following SQL query:
SELECT customer_id, AVG(days_between_orders) AS avg_days_between_orders
FROM ( SELECT customer_id, order_date - LAG(order_date) OVER (PARTITION
BY customer_id ORDER BY order_date) AS days_between_orders FROM
orders ) AS temp WHERE days_between_orders IS NOT NULL GROUP BY
customer_id
This query uses the LAG() window function to calculate the number of days
between each order and the previous order for each customer. The results are
then averaged for each customer.
9. A scalar function is a function that returns a single value, while a table function
is a function that returns a table. Here is an example of each:
Scalar function:
CREATE FUNCTION add_numbers(x INT, y INT) RETURNS INT AS $$ BEGIN
RETURN x + y; END;
SELECT add_numbers(3, 4); -- returns 7 Table function: CREATE FUNCTION
get_orders(customer_id INT) RETURNS TABLE ( order_id INT, order_date DATE,
... ) AS $$ BEGIN RETURN QUERY SELECT * FROM orders WHERE
orders.customer_id = get_orders.customer_id; END; $$ LANGUAGE plpgsql;
SELECT * FROM get_orders(123); -- returns all the orders for customer ID 123
In this example, the scalar function add_numbers takes two integer
parameters and returns their sum, while the table function get_orders takes a
customer ID parameter and returns all the orders for that customer as a table.
10. To find the longest consecutive streak of orders for each customer, you can
use the following SQL query:
SELECT customer_id, MAX(streak_length) AS longest_streak FROM ( SELECT
customer_id, order_date, ROW_NUMBER() OVER (PARTITION BY customer_id
ORDER BY order_date) - ROW_NUMBER() OVER (PARTITION BY customer_id,
is_gap ORDER BY order_date) AS streak_number COUNT(*) OVER (PARTITION
BY customer_id, streak_number) AS streak_length FROM ( SELECT customer_id,
order_date, CASE WHEN order_date - LAG(order_date) OVER (PARTITION BY
customer_id ORDER BY order_date) > INTERVAL '1 day' THEN 1 ELSE