Database Concepts and Interview Questions
Database Concepts and Interview Questions
Database Concepts and Interview Questions
SQL JOIN
A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
Let's look at a selection from the "Orders" table:
10308 2 1996-09-18
10309 37 1996-09-19
10310 77 1996-09-20
Notice that the "CustomerID" column in the "Orders" table refers to the "CustomerID" in the
"Customers" table. The relationship between the two tables above is the "CustomerID" column.
Then, we can create the following SQL statement (that contains an INNER JOIN), that selects
records that have matching values in both tables:
Example
Or
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders, Customers
Where Orders.CustomerID=Customers.CustomerID;
M Muzammal Murtaza - (PUCIT Fall 16)
● (INNER) JOIN: Returns records that have matching values in both tables
●
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders, Customers
Where Orders.CustomerID=Customers.CustomerID;
● LEFT (OUTER) JOIN: Returns all records from the left table, and the matched records
from the right table
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders, Customers
Where Orders.CustomerID=Customers.CustomerID(+);
● RIGHT (OUTER) JOIN: Returns all records from the right table, and the matched
records from the left table
● FULL (OUTER) JOIN: Returns all records when there is a match in either left or right
table
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders, Customers
M Muzammal Murtaza - (PUCIT Fall 16)
● SELF JOIN: A join in which a table is joined with itself (which is also called Unary
relationships).
Normalization in database
Database Normalization is a technique of organizing the data in the database.
Normalization is a systematic approach of decomposing tables to eliminate data
redundancy(repetition) and undesirable characteristics like Insertion, Update and
Deletion Anomalies. It is a multi-step process that puts data into tabular form, removing
duplicated data from the relation tables.
Normalization Rule
4. BCNF
What is Partial Dependency? Do not worry about it. First let's understand what is
Dependency in a table?
What is Dependency?
In this table, student_id is the primary key and will be unique for every row, hence we
can use student_id to fetch any row of data from this table
Even for a case, where student names are the same, if we know the student_id we can
easily fetch the correct record.
Hence we can say a Primary Key for a table is the column or a group of
columns(composite key) which can uniquely identify each record in the table.
I can ask for the branch name of a student with student_id 10, and I can get it. Similarly,
if I ask for the name of a student with student_id 10 or 11, I will get it. So all I need is
student_id and every other column depends on it, or can be fetched using it.
Now that we know what dependency is, we are in a better state to understand what
partial dependency is.
For a simple table like Student, a single column like student_id can uniquely identify all
the records in a table.
M Muzammal Murtaza - (PUCIT Fall 16)
But this is not true all the time. So now let's extend our example to see if more than 1
column together can act as a primary key.
Let's create another table for Subject, which will have subject_id and subject_name fields
and subject_id will be the primary key.
subject_id subject_name
1 Java
2 C++
3 Php
Now we have a Student table with student information and another table Subject for
storing subject information.
Let's create another table Score, to store the marks obtained by students in the
respective subjects. We will also be saving the name of the teacher who teaches that
subject along with marks.
1 10 1 70 Java Teacher
2 10 2 75 C++ Teacher
3 11 1 80 Java Teacher
In the score table we are saving the student_id to know which student's marks are
these and subject_id to know for which subject the marks are for.
M Muzammal Murtaza - (PUCIT Fall 16)
Together, student_id + subject_id forms a Candidate Key(learn about Database Keys) for
this table, which can be the Primary key.
Confused, How can this combination be a primary key?
See, if I ask you to get me marks of a student with student_id 10, can you get it from this
table? No, because you don't know for which subject. And if I give you subject_id, you
would not know for which student. Hence we need student_id + subject_id to uniquely
identify any row.
But where is Partial Dependency?
Now if you look at the Score table, we have a column names teacher which is only
dependent on the subject, for Java it's Java Teacher and for C++ it's C++ Teacher & so
on.
Now as we just discussed that the primary key for this table is a composition of two
columns which is student_id & subject_id but the teacher's name only depends on
subject, hence the subject_id, and has nothing to do with student_id.
This is Partial Dependency, where an attribute in a table depends on only a part of the
primary key and not on the whole key.
There can be many different solutions for this, but our objective is to remove the
teacher's name from the Score table.
The simplest solution is to remove the column's teacher from the Score table and add it
to the Subject table. Hence, the Subject table will become:
And our Score table is now in the second normal form, with no partial dependency.
1 10 1 70
2 10 2 75
3 11 1 80
So let's use the same example, where we have 3 tables, Student, Subject and Score.
Student Table
Subject Table
Score Table
1 10 1 70
2 10 2 75
3 11 1 80
In the Score table, we need to store some more information, which is the exam name
and total marks, so let's add 2 more columns to the Score table.
With exam_name and total_marks added to our Score table, it saves more data now.
Primary key for our Score table is a composite key, which means it's made up of two
attributes or columns → student_id + subject_id.
Our new column exam_name depends on both student and subject. For example, a
mechanical engineering student will have a Workshop exam but a computer science
student won't. And for some subjects you have Practical exams and for some you don't.
So we can say that exam_name is dependent on both student_id and subject_id.
And what about our second new column total_marks? Does it depend on our Score
table's primary key?
Well, the column total_marks depends on exam_name as with exam type the total score
changes. For example, practicals are of less marks while theory exams are of more
marks.
M Muzammal Murtaza - (PUCIT Fall 16)
But, exam_name is just another column in the score table. It is not a primary key or even
a part of the primary key, and total_marks depends on it.
Again the solution is very simple. Take out the columns exam_name and total_marks from
Score table and put them in an Exam table and use the exam_id wherever required.
Score Table: In 3rd Normal Form
1 Workshop 200
2 Mains 70
3 Practicals 30
M Muzammal Murtaza - (PUCIT Fall 16)
Boyce and Codd Normal Form is a higher version of the Third Normal form. This form
deals with a certain type of anomaly that is not handled by 3NF. A 3NF table which does
not have multiple overlapping candidate keys is said to be in BCNF. For a table to be in
BCNF, following conditions must be satisfied:
For a table to satisfy the Boyce-Codd Normal Form, it should satisfy the following two
conditions:
The second point sounds a bit tricky, right? In simple words, it means that for a
Below we have a college enrolment table with columns student_id, subject and professor.
M Muzammal Murtaza - (PUCIT Fall 16)
103 C# P.Cash
As you can see, we have also added some sample data to the table.
● One student can enrol for multiple subjects. For example, student with
● And, there can be multiple professors teaching one subject like we have for Java.
Well, in the table above student_id, subject together form the primary key, because using
student_id and subject, we can find all the columns of the table.
M Muzammal Murtaza - (PUCIT Fall 16)
One more important point to note here is, one professor teaches only one subject, but
Hence, there is a dependency between subject and professor here, where subject
This table satisfies the 1st Normal form because all the values are atomic, column
names are unique and all the values stored in a particular column are of the same
domain.
This table also satisfies the 2nd Normal Form as there is no Partial Dependency.
And, there is no Transitive Dependency, hence the table also satisfies the 3rd Normal
Form.
In the table above, student_id, subject form primary key, which means subject column is a
prime attribute.
And while subject is a prime attribute, professor is a non-prime attribute, which is not
allowed by BCNF.
M Muzammal Murtaza - (PUCIT Fall 16)
To make this relation(table) satisfy BCNF, we will decompose this table into two tables,
Student Table
student_id p_id
101 1
101 2
and so on...
1 P.Java Java
2 P.Cpp C++
and so on...
M Muzammal Murtaza - (PUCIT Fall 16)
And now, this relation satisfy Boyce-Codd Normal Form. In the next tutorial we will learn
Distinct vs unique
The SELECT DISTINCT statement is used to return only distinct (different) values.
Inside a table, a column often contains many duplicate values; and sometimes you only
want to list the different (distinct) values.
Unique is a keyword used in the Create Table() directive to denote that a field will
contain unique data, usually used for natural keys, foreign keys etc.
For example:
Emp_FName varchar(16),
Emp_LName varchar(16) )
Distinct is used in the Select statement to notify the query that you only want the unique
items returned when a field holds data that may not be unique.
You may have many employees with the same last name, but you only want each different
last name.
Obviously if the field you are querying holds unique data, then the Distinct keyword
becomes superfluous.
These are some important terminologies that are used in terms of relation.
Attribute: Attributes are the properties that define a relation. e.g.; ROLL_NO, NAME etc.
Tuple: Each row in the relation is known as tuple.The above relation contains 4 tuples.
Degree: The number of attributes in the relation is known as degree of the relation. The
STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation
defined above has cardinality 4.
Column: Column represents the set of values for a particular attribute. The column ROLL_NO
is extracted from relation STUDENT.
A DBMS key is an attribute or set of an attribute which helps you to identify a row(tuple) in a
relation(table). They allow you to find the relation between two tables. Keys help you uniquely
identify a row in a table by a combination of one or more columns in that table.
Keys in Database
Super key
Candidate key
Primary key
Alternate key
Composite primary key
Unique key
Foreign key
Compound key
Surrogate key
Example:
Example:
Example:
In this table, StudID, Roll No, Email are qualified to become a primary key. But
since StudID is the primary key, Roll No, Email becomes the alternative key.
Example: In the given table Stud ID, Roll No, and email are candidate keys
which help us to uniquely identify the student record in the table.
Example:
M Muzammal Murtaza - (PUCIT Fall 16)
DeptCode DeptName
001 Science
002 English
005 Computer
In this table, adding the foreign key in Deptcode to the Teacher name, we can
create a relationship between the two tables.
Example:
In this example, OrderNo and ProductID can't be a primary key as it does not
uniquely identify a record. However, a compound key of Order ID and Product
ID could be used as it uniquely identified each record.
M Muzammal Murtaza - (PUCIT Fall 16)
The difference between compound and the composite key is that any part of
the compound key can be a foreign key, but the composite key may or may
not be a part of the foreign key.
Above, given example, shown shift timings of the different employee. In this
example, a surrogate key is needed to uniquely identify each employee.
Both Primary key and Unique Key are used to uniquely define a row in a table. Primary Key
creates a clustered index of the column whereas a Unique creates an unclustered index of the
column . A Primary Key doesn't allow NULL value , however a Unique Key does allow one
NULL value key.
SQL commands
SQL commands are mainly categorized into five categories as:
1. DDL – Data Definition Language
2. DQL – Data Query Language
3. DML – Data Manipulation Language
4. DCL – Data Control Language
5. TCL – Transaction Control Language.
M Muzammal Murtaza - (PUCIT Fall 16)
M Muzammal Murtaza - (PUCIT Fall 16)
● The DROP command removes a table from the database. All the tables' rows, indexes, and
privileges will also be removed. The operation cannot be rolled back.
● DROP and TRUNCATE are DDL commands, whereas DELETE is a DML command.
● DELETE operations can be rolled back (undone), while DROP and TRUNCATE operations cannot
be rolled back.
● Truncate reinitializes the identity by making changes in data definition therefore it is DDL,
whereas Delete only deletes the records from the table and doesn't make any changes in its
Definition that's why it is DML.
● TRUNCATE TABLE statement drop and re-create the table in such a way that any
auto-increment value is reset to its start value which is generally 1.
● DELETE lets you filter which rows to be deleted based upon an optional WHERE clause, whereas
TRUNCATE TABLE doesn't support WHERE clause it just removes all the rows.
● TRUNCATE TABLE is faster and uses fewer system resources than DELETE, because DELETE scans
the table to generate a count of rows that were affected then delete the rows one by one and
records an entry in the database log for each deleted row, while TRUNCATE TABLE just delete all
the rows without providing any additional information.
Transaction
Transactions group a set of tasks into a single execution unit. Each transaction begins with a
specific task and ends when all the tasks in the group successfully complete. If any of the tasks
fail, the transaction fails. Therefore, a transaction has only two results: success or failure.
Incomplete steps result in the failure of the transaction. A database transaction, by definition,
must be atomic, consistent, isolated and durable. These are popularly known as ACID
properties.
What are ACID properties?
Atomicity − This property states that a transaction must be treated as an atomic unit,
that is, either all of its operations are executed or none. There must be no state in a
database where a transaction is left partially completed. States should be defined either
before the execution of the transaction or after the execution/abortion/failure of the
transaction.
Consistency − The database must remain in a consistent state after any transaction.
No transaction should have any adverse effect on the data residing in the database. If
the database was in a consistent state before the execution of a transaction, it must
remain consistent after the execution of the transaction as well.
Isolation − In a database system where more than one transaction is being executed
simultaneously and in parallel, the property of isolation states that all the transactions
will be carried out and executed as if it is the only transaction in the system. No
transaction will affect the existence of any other transaction.
M Muzammal Murtaza - (PUCIT Fall 16)
Durability − The database should be durable enough to hold all its latest updates even
if the system fails or restarts. If a transaction updates a chunk of data in a database and
commits, then the database will hold the modified data. If a transaction commits but the
system fails before the data could be written on to the disk, then that data will be
updated once the system springs back into action.
IF @BookCount > 1
BEGIN
ROLLBACK TRANSACTION AddBook
PRINT 'A book with the same name already exists'
END
ELSE
BEGIN
COMMIT TRANSACTION AddStudent
PRINT 'New book added successfully'
END
SQL Views
Views in SQL are a kind of virtual table. A view also has rows and columns as they are in a real
table in the database. We can create a view by selecting fields from one or more tables present
in the database. A View can either have all the rows of a table or specific rows based on certain
conditions.
Views account for logical data independence as the growth and restructuring of base tables are
not reflected in views.
Advantages of Views:
● As there is no physical location where the data in the view is stored, it generates output
without wasting resources.
● Data access is restricted as it does not allow commands like insertion, updation, and
deletion.
M Muzammal Murtaza - (PUCIT Fall 16)
Disadvantages of Views:
● The view becomes irrelevant if we drop a table related to that view.
● Much memory space is occupied when the view is created for large tables.
To see the data in the View, we can query the view in the same manner as we query a table.
SELECT * FROM DetailsView;
5. [for each row]: This specifies a row-level trigger, i.e., the trigger will be executed
for each row being affected.
6. [trigger_body]: This provides the operation to be performed as trigger is fired
https://www.geeksforgeeks.org/sql-difference-between-functions-and-stored-procedures-in-pl-sq
l/
Indexes
Indexes are special lookup tables that the database search engine can use to speed
up data retrieval. Simply put, an index is a pointer to data in a table. An index in a
database is very similar to an index in the back of a book.
For example, if you want to reference all pages in a book that discusses a certain topic,
you first refer to the index, which lists all the topics alphabetically and are then referred
to one or more specific page numbers.
An index helps to speed up SELECT queries and WHERE clauses, but it slows down
data input, with the UPDATE and the INSERT statements. Indexes can be created or
dropped with no effect on the data.
Creating an index involves the CREATE INDEX statement, which allows you to name
the index, to specify the table and which column or columns to index, and to indicate
whether the index is in an ascending or descending order.
Indexes can also be unique, like the UNIQUE constraint, in that the index prevents
duplicate entries in the column or combination of columns on which there is an index.
The CREATE INDEX Command
The basic syntax of a CREATE INDEX is as follows.
CREATE INDEX index_name ON table_name;
M Muzammal Murtaza - (PUCIT Fall 16)
Single-Column Indexes
A single-column index is created based on only one table column. The basic syntax is
as follows.
CREATE INDEX index_name
ON table_name (column_name);
Unique Indexes
Unique indexes are used not only for performance, but also for data integrity. A unique
index does not allow any duplicate values to be inserted into the table. The basic
syntax is as follows.
CREATE UNIQUE INDEX index_name
on table_name (column_name);
Composite Indexes
A composite index is an index on two or more columns of a table. Its basic syntax is as
follows.
CREATE INDEX index_name
on table_name (column1, column2);
Implicit Indexes
Implicit indexes are indexes that are automatically created by the database server
when an object is created. Indexes are automatically created for primary key
constraints and unique constraints
.
Advantages of indexes
Disadvantages of indexes
https://www.sqlshack.com/what-is-the-difference-between-clustered-and-non-clustered-indexes-i
n-sql-server/