1. What is SQL, and what is its purpose in data analysis?
SQL (Structured Query Language) is a standard language for managing and manipulating relational databases. It allows analysts to query, update, and manage data effectively, which is crucial for data analysis tasks. 2. How do you retrieve all columns from a table named sales? Example: "SELECT * FROM sales;" 3. Write a SQL query to select distinct values from a column named product_category. Example: "SELECT DISTINCT product_category FROM sales;" 4. How do you filter records in SQL? Provide an example. Use the WHERE clause to filter records based on specific conditions. Example: "SELECT * FROM sales WHERE amount > 1000;" 5. Explain the GROUP BY clause and provide an example. The GROUP BY clause is used to aggregate rows that have the same values in specified columns. Example: "SELECT product_category, SUM(amount) FROM sales GROUP BY product_category;" 6. What is the purpose of the HAVING clause? How does it differ from WHERE? HAVING is used to filter groups after aggregation, while WHERE filters rows before aggregation. Example: "SELECT product_category, SUM(amount) FROM sales GROUP BY product_category HAVING SUM(amount) > 5000;" 7. How do you sort the results of a query? Use the ORDER BY clause to sort results in ascending or descending order. Example: "SELECT * FROM sales ORDER BY amount DESC;" 8. Write a query to find the total sales amount for each product. Example: "SELECT product_name, SUM(amount) FROM sales GROUP BY product_name;" 9. What is the difference between INNER JOIN and LEFT JOIN? INNER JOIN returns rows with matching values in both tables, while LEFT JOIN returns all rows from the left table and matched rows from the right table, with unmatched rows from the right table filled with NULLs. • INNER JOIN Example: "SELECT sales.product_name, products.price FROM sales INNER JOIN products ON sales.product_id = products.product_id;" • LEFT JOIN Example: "SELECT sales.product_name, products.price FROM sales LEFT JOIN products ON sales.product_id = products.product_id;" 10. How do you find the top 5 highest sales amounts? Example: "SELECT amount FROM sales ORDER BY amount DESC LIMIT 5;" Medium SQL Question for Data Analyst Interview 11. What is a subquery, and how can you use it in the WHERE clause? A subquery is a query within another query used to perform operations based on the results of the outer query. Example: "SELECT product_name FROM sales WHERE amount > (SELECT AVG(amount) FROM sales);" 12. Explain the concept of a CROSS JOIN. A CROSS JOIN returns the Cartesian product of two tables, where each row in the first table is combined with each row in the second table. Example: "SELECT * FROM products CROSS JOIN categories;" 13. How do you calculate the average of a column? Use the AVG() function to calculate the average value. Example: "SELECT AVG(amount) FROM sales;" 14. What is the purpose of the COUNT() function? Provide an example. The COUNT() function returns the number of rows that match a specified condition. Example: "SELECT COUNT(*) FROM sales WHERE product_category = 'Electronics';" 15. Write a query to find the minimum and maximum sales amounts. Example: "SELECT MIN(amount) AS MinAmount, MAX(amount) AS MaxAmount FROM sales;" 16. How do you update records in a table? Use the UPDATE statement to modify existing records. Example: "UPDATE sales SET amount = amount * 1.1 WHERE product_name = 'Laptop';" 17. Explain the CASE statement and provide an example. The CASE statement allows for conditional logic in SQL queries. Example: "SELECT product_name, CASE WHEN amount > 1000 THEN 'High' ELSE 'Low' END AS sales_category FROM sales;" 18. How do you handle NULL values in SQL queries? Use functions like IS NULL, IS NOT NULL, or COALESCE() to handle NULL values. Example: "SELECT COALESCE(discount, 0) FROM sales;" 19. What is the LIMIT clause used for? Provide an example. The LIMIT clause restricts the number of rows returned by a query. Example: "SELECT * FROM sales LIMIT 10;" 20. How do you use the JOIN clause to combine data from multiple tables? Use JOIN to combine rows from two or more tables based on related columns. Example: "SELECT orders.order_id, customers.customer_name FROM orders JOIN customers ON orders.customer_id = customers.customer_id;" 21. What is a self-join? Provide an example. A self-join is a join where a table is joined with itself. It is useful for hierarchical data. Example: "SELECT e1.employee_name AS Employee, e2.employee_name AS Manager FROM employees e1 LEFT JOIN employees e2 ON e1.manager_id = e2.employee_id;" 22. Explain the UNION and UNION ALL operators. UNION combines the results of two queries and removes duplicates. UNION ALL combines results without removing duplicates. • UNION Example: "SELECT product_name FROM sales UNION SELECT product_name FROM returns;" • UNION ALL Example: "SELECT product_name FROM sales UNION ALL SELECT product_name FROM returns;" 23. How do you find duplicate records in a table? Use the GROUP BY clause with HAVING COUNT(*) > 1 to find duplicates. Example: "SELECT product_name, COUNT(*) FROM sales GROUP BY product_name HAVING COUNT(*) > 1;" 24. What is a window function? Provide an example. Window functions perform calculations across a set of table rows related to the current row, such as running totals or rankings. Example: "SELECT product_name, amount, RANK() OVER (ORDER BY amount DESC) AS rank FROM sales;" 25. How do you calculate a running total in SQL? Use the SUM() window function with an appropriate OVER clause. Example: "SELECT order_date, amount, SUM(amount) OVER (ORDER BY order_date) AS running_total FROM orders;" 26. How do you use the EXISTS clause in SQL? The EXISTS clause checks if a subquery returns any rows. Example: "SELECT product_name FROM sales WHERE EXISTS (SELECT * FROM returns WHERE returns.product_id = sales.product_id);" 27. Explain the difference between WHERE and HAVING clauses. WHERE filters rows before aggregation, while HAVING filters groups after aggregation. • WHERE Example: "SELECT * FROM sales WHERE amount > 1000;" • HAVING Example: "SELECT product_name, SUM(amount) FROM sales GROUP BY product_name HAVING SUM(amount) > 5000;" 28. How do you find the number of orders per customer? Example: "SELECT customer_id, COUNT(order_id) FROM orders GROUP BY customer_id;" 29. What is a TEMPORARY table, and how is it used? A TEMPORARY table is a table that exists temporarily during a session and is dropped automatically when the session ends. Example: "CREATE TEMPORARY TABLE temp_sales AS SELECT * FROM sales WHERE amount > 1000;" 30. How do you use the ALTER TABLE statement? The ALTER TABLE statement modifies an existing table structure, such as adding or dropping columns. Example: "ALTER TABLE sales ADD COLUMN discount DECIMAL(10, 2);" 31. What is normalization, and why is it important? Normalization is the process of organizing data to reduce redundancy and improve data integrity. It ensures that the database is efficient and maintains consistency. 32. How do you perform a bulk insert of data into a table? Use the INSERT INTO ... VALUES statement with multiple values or a bulk loading utility. • Multiple values Example: "INSERT INTO sales (product_name, amount) VALUES ('Laptop', 1200), ('Smartphone', 800);" • Bulk load Example (MySQL): "LOAD DATA INFILE 'file_path.csv' INTO TABLE sales FIELDS TERMINATED BY ',';" 33. Explain the concept of indexing and its benefits. Indexing improves the speed of data retrieval operations on a table by creating a data structure that allows quick lookups. 34. How do you create an index on a table? Use the CREATE INDEX statement to create an index on one or more columns. Example: "CREATE INDEX idx_product_name ON sales(product_name);" Hard SQL Question For Data Analyst Interview Questions 35. What are some common performance optimization techniques in SQL? Common techniques include using indexes, optimizing queries, reducing the use of subqueries, and ensuring efficient joins. 36. How do you use the EXPLAIN statement to analyze query performance? The EXPLAIN statement provides information about how a query is executed, including the order of operations and the use of indexes. Example: "EXPLAIN SELECT * FROM sales WHERE amount > 1000;" 37. What is the purpose of the COALESCE() function? The COALESCE() function returns the first non-NULL value from a list of arguments. Example: "SELECT COALESCE(discount, 0) FROM sales;" 38. How do you calculate the percentage of total sales for each product? Use a combination of SUM() and a window function to calculate the percentage. Example: "SELECT product_name, SUM(amount) AS total_sales, (SUM(amount) / SUM(SUM(amount)) OVER ()) * 100 AS percentage FROM sales GROUP BY product_name;" 39. What is a VIEW, and how do you create one? A VIEW is a virtual table based on the result of a query. It simplifies complex queries and enhances security. Example: "CREATE VIEW high_value_sales AS SELECT * FROM sales WHERE amount > 1000;" 40. How do you use the DROP TABLE statement? The DROP TABLE statement deletes an entire table and its data from the database Example: "DROP TABLE old_sales;" 41. What is the difference between DELETE and TRUNCATE? DELETE removes rows from a table based on a condition and can be rolled back, while TRUNCATE removes all rows from a table and cannot be rolled back. • DELETE Example: "DELETE FROM sales WHERE amount < 500;" • TRUNCATE Example: "TRUNCATE TABLE sales;" 42. How do you find the median value of a column in SQL? Finding the median requires sorting and using window functions. Example: "WITH OrderedSales AS (SELECT amount, ROW_NUMBER() OVER (ORDER BY amount) AS rn, COUNT(*) OVER () AS total_count FROM sales) SELECT AVG(amount) AS median FROM OrderedSales WHERE rn IN ((total_count + 1) / 2, (total_count + 2) / 2);" 43. What is the RANK() function, and how is it used? The RANK() function assigns a rank to each row within a partition of the result set, with gaps in rank values if there are ties. Example: "SELECT product_name, amount, RANK() OVER (ORDER BY amount DESC) AS rank FROM sales;" 44. How do you perform a JOIN with multiple tables? You can join more than two tables by chainingJOIN operations. Example: "SELECT orders.order_id, customers.customer_name, products.product_name FROM orders JOIN customers ON orders.customer_id = customers.customer_id JOIN products ON orders.product_id = products.product_id;" 45. What is a TEMPORARY table, and when would you use it? A TEMPORARY table is used to store intermediate results temporarily during a session, useful for complex queries or batch processing. Example: "CREATE TEMPORARY TABLE temp_sales AS SELECT * FROM sales WHERE amount > 1000;" 46. How do you retrieve the last N records from a table? Use the ORDER BY clause with LIMIT to get the last N records. Example: "SELECT * FROM sales ORDER BY sale_date DESC LIMIT 10;" 47. Explain the DATEPART() function with an example. DATEPART() extracts a specific part (e.g., year, month) from a date. Example: "SELECT DATEPART(year, sale_date) AS sale_year FROM sales;" 48. How do you use JOIN to include rows with no match in one of the tables? Use a LEFT JOIN or RIGHT JOIN to include rows from one table even if there are no matching rows in the other table. Example: "SELECT employees.name, departments.department_name FROM employees LEFT JOIN departments ON employees.department_id = departments.department_id;" 49. How can you calculate the difference between two dates in SQL? Use date functions to calculate the difference between two dates. Example: "SELECT DATEDIFF(day, start_date, end_date) AS date_difference FROM projects;" 50. What is the difference between CHAR and VARCHAR data types? CHAR is a fixed-length string, while VARCHAR is a variable-length string. CHAR uses the specified length for all values, padding with spaces if needed, whereas VARCHAR only uses the space needed for each value.