SQL interview1
SQL interview1
What is SQL?
SQL (Structured Query Language) is a programming language used to manage and interact with databases.
It helps you:
• Store data
• Retrieve data (using queries)
• Update data
• Delete data
• Create and modify database structures (like tables)
It's used with databases like MySQL, SQL Server, PostgreSQL, and more.
What are the subsets of SQL or types of SQL commands and briefly explain?
SQL commands are categorized into five main subsets:
1. DDL (Data Definition Language):
o Purpose: Defines and manages database structures like tables, schemas, etc.
o Commands:
▪ CREATE – Creates new tables or databases
▪ ALTER – Modifies existing tables
▪ DROP – Deletes tables or databases
▪ TRUNCATE – Removes all data from a table without deleting the structure
2. DML (Data Manipulation Language):
o Purpose: Manages data within tables.
o Commands:
▪ SELECT – Retrieves data from tables
▪ INSERT – Adds new data
▪ UPDATE – Modifies existing data
▪ DELETE – Removes data
3. DCL (Data Control Language):
o Purpose: Controls access to data in the database.
o Commands:
▪ GRANT – Gives user permissions
▪ REVOKE – Removes user permissions
4. TCL (Transaction Control Language):
o Purpose: Manages transactions to ensure data integrity.
o Commands:
▪ COMMIT – Saves changes made by a transaction
▪ ROLLBACK – Undoes changes if there’s an error
▪ SAVEPOINT – Sets a point to roll back to if needed
5. DQL (Data Query Language):
o Purpose: Focuses only on querying data.
o Command:
▪ SELECT – Retrieves data from the database
In simple terms:
• DDL defines the structure,
• DML works with the data,
• DCL controls access,
• TCL manages transactions, and
• DQL retrieves data.
Disadvantages of SQL:
1. Complexity with Advanced Queries:
o Writing highly complex queries (nested subqueries, multiple joins) can be challenging for
beginners.
2. Limited Control Over Database Logic:
o SQL focuses on data operations; it’s not suitable for complex business logic like traditional
programming languages.
3. Vendor Dependency:
o Some SQL features are specific to certain databases (e.g., T-SQL for SQL Server, PL/SQL for
Oracle), which affects portability.
4. Performance Issues with Large Data:
o Poorly optimized queries can slow down performance, especially with very large datasets.
5. Security Risks if Not Handled Properly:
o Vulnerable to SQL injection attacks if queries are not secured in applications.
Summary:
SQL is powerful, easy to learn, and widely used, but handling complex queries and ensuring security can
be challenging if not done correctly.
What is DBMS?
A DBMS (Database Management System) is software that enables users to create, manage, and interact
with databases. It provides a systematic way of storing, retrieving, and manipulating data. The DBMS
ensures that data is stored securely, efficiently, and can be accessed by multiple users while maintaining
data integrity and consistency.
1. Data Storage: It allows for the efficient storage of data and provides mechanisms for fast data
retrieval.
2. Data Manipulation: It provides operations such as inserting, updating, deleting, and querying data.
3. Data Security: A DBMS helps secure data through authentication, authorization, and encryption.
4. Concurrency Control: It manages concurrent access to data, ensuring that multiple users can
interact with the database without conflicting.
5. Data Integrity: It enforces constraints to maintain data consistency and accuracy.
6. Backup and Recovery: A DBMS provides features for backing up data and recovering it in case of
system failures.
Examples of DBMS include MySQL, Oracle, Microsoft SQL Server, and PostgreSQL.
Tables:
• A table is a collection of data organized into rows and columns.
• Each table represents a specific entity or object (e.g., Customers, Orders, Products) in the
database.
• A table is made up of multiple fields (columns) and records (rows).
• Tables are structured in a way that each row contains a set of related data, and each column stores
a specific type of information about the entity.
Fields:
• A field (also called a column) represents a single type of data within a table. Each field has a name
and a specific data type (e.g., text, number, date).
• Fields define the structure of the data stored in a table. Every record in a table has a value for each
field.
• The fields hold the attributes or characteristics of the entity represented by the table.
• Customer ID is a field that stores unique identifiers for customers (e.g., 1, 2).
• Name is a field that stores the customer's name (e.g., Alice, Bob).
• Email is a field that stores the customer's email address (e.g., alice@email.com, bob@email.com).
Each field contains a specific type of data that helps describe the entity represented by the table.
A Primary Key is a column (or a combination of columns) in a table that uniquely identifies each row in
the table. It must contain unique values and cannot contain NULL values.
A Foreign Key is a column (or a combination of columns) in one table that refers to the Primary Key in
another table. The foreign key establishes a relationship between the two tables, ensuring referential
integrity.
• Primary Key: Uniquely identifies each record in the same table and ensures data integrity within
the table.
• Foreign Key: Links a column in one table to the primary key of another table, ensuring
relationships between tables and maintaining referential integrity.
Syntax:
CREATE TABLE table_name
(column1_name column1_data_type [constraint],
column2_name column2_data_type [constraint]);
Syntax:
ALTER TABLE old_table_name RENAME TO new_table_name;
What is join in SQL? List its different types.
In SQL, a JOIN is used to combine rows from two or more tables based on a related column.
Types of Joins:
1. INNER JOIN:
o Returns only the rows where there is a match in both tables.
2. LEFT JOIN (or LEFT OUTER JOIN):
o Returns all rows from the left table and the matched rows from the right table. If no match is
found, NULL values are returned for columns from the right table.
3. RIGHT JOIN (or RIGHT OUTER JOIN):
o Returns all rows from the right table and the matched rows from the left table. If no match is
found, NULL values are returned for columns from the left table.
4. FULL JOIN (or FULL OUTER JOIN):
o Returns all rows when there is a match in either the left or right table. If there is no match,
NULL values are returned for missing matches in either table.
5. CROSS JOIN:
o Returns the Cartesian product of both tables, combining each row from the first table with
every row from the second table.
6. SELF JOIN:
o A join where a table is joined with itself. It is used to compare rows within the same table.
• Reduces Data Redundancy: By organizing data into smaller, related tables, normalization minimizes
duplicate data.
• Improves Data Integrity: It ensures that the data is consistent and adheres to rules for relationships
between tables.
Normal Forms:
Normalization is typically done in stages, called Normal Forms (NF). Each subsequent normal form
builds on the previous one.
1. First Normal Form (1NF):
o Ensures that each column contains atomic (indivisible) values and that there are no
repeating groups or arrays in a column.
2. Second Normal Form (2NF):
o Achieved when a table is in 1NF and all non-key columns are fully functionally dependent on
the primary key (i.e., eliminates partial dependency).
3. Third Normal Form (3NF):
Achieved when a table is in 2NF and all columns are non-transitively dependent on the
o
primary key (i.e., no transitive dependency).
4. Boyce-Codd Normal Form (BCNF):
o A stricter version of 3NF where every determinant is a candidate key.
5. Fourth Normal Form (4NF):
o Achieved when a table is in BCNF and has no multi-valued dependencies (i.e., no column has
multiple independent sets of values).
6. Fifth Normal Form (5NF):
o Achieved when a table is in 4NF and has no join dependencies, meaning it cannot be
decomposed further without loss of information.
Benefits of Normalization:
While normalization helps with consistency and reduces redundancy, sometimes too much
normalization can lead to complex queries with multiple joins. In certain cases, some
denormalization (reducing the level of normalization) may be used to optimize performance.
The TRUNCATE, DELETE, and DROP statements in SQL are used to remove data, but they differ in their
functionality and use cases.
1. TRUNCATE
• Purpose: Removes all rows from a table, but does not remove the table itself. It is faster than
DELETE because it doesn't log individual row deletions.
• Effect: Resets any auto-increment values and cannot be rolled back in many databases (depending
on transaction settings).
• Use Case: When you want to delete all data from a table but keep the structure for future use.
Syntax:
2. DELETE
• Purpose: Removes rows from a table based on a condition. Unlike TRUNCATE, it can delete specific
rows and is logged for each row.
• Effect: Data can be deleted selectively, and the operation can be rolled back if wrapped in a
transaction.
• Use Case: When you want to delete specific rows based on a condition or when you need to
preserve the table structure with constraints.
Syntax:
3. DROP
• Purpose: Completely removes a table, including its structure, data, and associated indexes,
constraints, and triggers.
• Effect: The table is permanently deleted from the database, and it cannot be rolled back unless the
database is backed up.
• Use Case: When you want to completely remove a table and all its data from the database.
Syntax:
Key Differences:
Arithmetic: +, -, *, /, %
Existence: EXISTS
Aggregate Functions
Aggregate functions perform a calculation on a set of values and return a single result. They are typically
used with the GROUP BY clause to group rows based on a certain column.
Scalar Functions
Scalar functions perform operations on individual values and return a single result for each input value.
They are applied to single values or columns in the query.
Key Differences:
• Aggregate Functions: Operate on a set of rows, returning a single result (e.g., COUNT, SUM, AVG).
• Scalar Functions: Operate on a single value and return a single result for each row (e.g., UPPER,
LEN, ROUND).
A window function in SQL performs calculations across a set of rows related to the current row within a
query's result set, without collapsing the rows into a single output. This allows you to calculate aggregates,
rankings, or other values over a specific window of data while still keeping the individual rows in the result
set.
Key Characteristics:
• Retains Row Detail: Unlike aggregate functions, which group rows into a single result, window
functions return a value for each row.
• Operates Over a Window: The "window" is a set of rows defined by a specified range or partition in
the result set, based on an ordered or grouped context.
• Does Not Alter Row Count: The number of rows in the result set remains the same, but additional
calculated columns are added based on the window.
1. ROW_NUMBER(): Assigns a unique sequential number to rows within a partition of the result set,
starting at 1.
2. RANK(): Assigns a rank to each row within a partition of the result set, with gaps between ranks for
ties.
3. DENSE_RANK(): Similar to RANK(), but does not leave gaps in the ranking when there are ties.
4. NTILE(n): Divides the result set into n buckets and assigns a number to each row indicating the
bucket it belongs to.
5. LEAD(): Provides access to the value of the next row in the result set (relative to the current row).
6. LAG(): Provides access to the value of the previous row in the result set.
7. SUM(), AVG(), MIN(), MAX(): These aggregate functions can also be used as window functions to
perform calculations over a range of rows.
In SQL, indexes are used to speed up the retrieval of rows from a table. There are two main types of
indexes: clustered index and non-clustered index. They both help improve query performance, but they
function differently.
1. Clustered Index:
• Definition: A clustered index determines the physical order of data rows in a table. The table’s data
is actually stored in the order of the clustered index. There can only be one clustered index per
table, as the data rows themselves can only be sorted one way.
• Structure: The data rows of the table are stored in the index itself. So, the clustered index is
essentially the table’s data organized in a specific order.
• Performance: Since the data is sorted according to the index, retrieving data using the clustered
index is very fast for range queries (e.g., between two values) and exact matches.
Key Points:
• Definition: A non-clustered index is a separate structure from the data table. It contains pointers to
the data rows, and the data itself is not stored in the same order as the index. There can be
multiple non-clustered indexes on a table.
• Structure: A non-clustered index contains a sorted list of the indexed column(s) and a reference
(pointer) to the corresponding data rows in the table. The actual table data is stored independently
of the index.
• Performance: Non-clustered indexes are useful for lookups and specific column searches but can be
slower for range queries because the data is not stored in the index order.
Key Points: