0% found this document useful (0 votes)

7 views

DBMS Tutorial – Database system notes-combined

Uploaded by

francepv24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views

DBMS Tutorial – Database system notes-combined

Uploaded by

francepv24

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 509

Home Java C C++ DBMS Computer Network Python More…

DBMS Tutorial – Database Management System

notes
LAST UPDATED: OCTOBER 1, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

DBMS stands for Database Management System. We can break it like this DBMS = Database +
Management System. Database is a collection of data and Management System is a set of programs
to store and retrieve those data. Based on this we can define DBMS like this: DBMS is a collection of
inter-related data and set of programs to store & access those data in an easy and effective manner.
Here are the DBMS notes to help you learn database systems in a Systematic manner. Happy
Learning!!

DBMS Tutorial Index

DBMS Introduction:

Introduction to DBMS
Types of DBMS
DBMS Applications
Advantages of DBMS over file processing system
DBMS vs RDBMS
DBMS Architecture
Three level DBMS Architecture
View of Data
Data Abstraction
Instances and Schemas
DBMS languages

Data Models:

Data Models in DBMS

E-R Model in DBMS
ER Design issues
ER To table conversion
Recursive relationship in ER diagram
DBMS Generalization
DBMS Specialization
DBMS Aggregation
Relational Model in DBMS
Hierarchical data Model in DBMS
Network Model in DBMS

Relational Database:

RDBMS Concepts
Relational Algebra
Relational Calculus
View vs table
Keys in DBMS
Primary key
Super key
Candidate key
Alternate key
Composite key
Foreign key
Constraints in DBMS
Domain constraints
Mapping constraints
Cardinality in DBMS
Functional dependencies in DBMS
Trivial functional dependency
non-trivial functional dependency
Multivalued dependency
Transitive dependency
Normalization in dbms – This covers all the normal forms: First Normal Form(1NF), Second
Normal Form(2NF), Third Normal Form(3NF), Boyce–Codd Normal Form(BCNF)
Denormalization in DBMS
Denormalization vs Normalization
Decomposition in DBMS

Transaction Management:

Transaction Management in DBMS

ACID Properties
Transaction States
DBMS Schedules
Serializability
DBMS Conflict Serializability
DBMS View Serializability
Recoverability of Schedule
Failure classification
Log based recovery
DBMS checkpoint
Deadlock
Starvation in DBMS

Concurrency Control:

Concurrency Control
Lock based protocol
Timestamp based protocol
Validation based protocol

File Organization:

File Organization in DBMS

Sequential File Organization
Heap File Organization
Hash File Organization
DBMS ISAM
B+ File Organization
Cluster File Organization
Data replication in DBMS
Indexing in DBMS

SQL Introduction:

SQL Introduction
Characteristics of SQL
Advantages of SQL
SQL commands
SQL operators
SQL CREATE TABLE statement
SQL DROP TABLE statement
SQL SELECT statement
SQL INSERT statement

What is a Database?
A database is collection of interrelated data, stored in such a way so that a user can read, insert,
update and delete the data efficiently.

Database systems are basically developed for large amount of data. When dealing with huge amount
of data, there are two things that require optimization: Storage of data and retrieval of data.

Storage: According to the principles of database systems, the data is stored in such a way that it
acquires lot less space as the redundant data (duplicate data) has been removed before storage.

Fast Retrieval of data: Along with storing the data in an optimized and systematic manner, it is also
important that we retrieve the data quickly when needed. Database systems ensure that the data is
retrieved as quickly as possible.

Database Management System (DBMS)

DBMS is a software that manages the data for efficient storage and fast retrievals. MySQL, IBM Db2,
Oracle, PostgreSQL etc. are all DBMS softwares that manages the data.
DBMS is used in various applications such as telecom, banking, sales, airlines, education, online
shopping etc.

DBMS also secures the data from unauthorised access as well as corrupt data insertions. It allows
multiple users to access data simultaneously while maintaining the data consistency and data
integrity.

DBMS allows following operations to the authorized users of the database:

Data Modification: DBMS allows users to insert, update and delete the data from the tables. These
tables contains rows and columns, where row represents a record of data while column represents
attributes of the records.

Data Retrieval: DBMS allows users to fetch data from the database.

Characteristics of DBMS
Stores the data in such a way so that the relation between data is still maintained in the
database.
Allows fast retrieval.
It can handle multiple accessing the database at the same time.
It maintains data integrity by following ACID properties of the database.
It provides data security by managing user access.
DBMS allows automatic backup of database to handle accidental corruption or deletion of data.
It allows scaling of database as per the need.
It allows data rollback and redone in case of a data operation failure.

Advantages of DBMS
Handles Database redundancy: The major disadvantage of file based system of storing the data
is data redundancy, same data is stored in multiple files. DBMS handles data redundancy to
manage the storage space efficiently.
Data sharing: DBMS allows data sharing so that data can be shared between multiple users of
the same organization efficiently.
Data Maintenance: DBMS performs regular data checks and automatic backup.
Performance: Provides better performance for operations such as read, insert, update and
deletion of data.
Backup: It maintains backup of the database so that in case of a failure, database can be
recovered to the previous state using the backup.
Multiple users: It allows multiple users to access the data at the same time.

Disadvantages of DBMS
Hardware and Software Cost: Although DBMS has several advantages over file system of data
management, however all this comes with a cost. DBMS needs a dedicated hardware and
software system to manage the database.
Need large Storage: DBMS is usually used in the large organisations that require large amount of
data stored in the devices.
Complexity: Database management system is complex and not easy to implement.
Requires learning: In order to manage database, user require learning the concepts of DBMS
which require additional time and resources that a organization has to bear.

Next ❯

About the Author

I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.

– Chaitanya

Comments

b.ankammarao says
SEPTEMBER 3, 2015 AT 10:00 AM

sir please provide the First Normal Form(1NF)

Second Normal Form(2NF)
Third Normal Form(3NF)
Boyce–Codd Normal Form(BCNF)
Transaction Management in DBMS

Chaitanya Singh says

SEPTEMBER 3, 2017 AT 5:52 AM

I have already covered all the normal forms in the “Normalization in DBMS” topic and
the transaction management link has been added above.
Reply

Umar ibrahim says

JANUARY 22, 2017 AT 12:35 PM

Very good provides all information about DBMS

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *
Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Introduction to DBMS
LAST UPDATED: JULY 24, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

What is DBMS?
DBMS is a software that is used to manage the data. Some of the popular DBMS softwares are:
MySQL, IBM Db2, Oracle, PostgreSQL etc.
DBMS provides an interface to the user so that the operations on database can be performed
using the interface.
DBMS secure the data, that is the main advantage of DBMS over file system.
DBMS also secures the data from unauthorised access as well as corrupt data insertions. It
allows multiple users to access data simultaneously while maintaining the data consistency and
data integrity.

DBMS allows following operations to the authorized users of the database:

Data Definition: Creation of table, table schema creation, removal of table definition etc. comes under
data definition. It is basically a layout of the table and their relation with the other tables in the
database. This allows to properly structure the data in such a way so that the data that is related or
dependent on other data in real world can be represented the same way in database.

Data Retrieval: DBMS allows users to fetch data from the database. Searching and retrieval of data is
fast in DBMS. The size of the database doesn’t impact this operation, on the other hand in file system,
the size of the data can hugely impact the search operation efficiency.

User administration: DBMS also allows user management such as organizing users in different
groups with different access levels. Granting users access to certain tables in database, revoking
access from certain users etc. This allows the admin of the database to efficiently manage the
access to the database and prevent unauthorised access to the databases.

What is the need of DBMS?

Database systems are basically developed for large amount of data. When dealing with huge amount
of data, there are two things that require optimization: Storage of data and retrieval of data.

Storage: According to the principles of database systems, the data is stored in such a way that it
acquires lot less space as the redundant data (duplicate data) has been removed before storage. Let’s
take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account and another
is salary account. Let’s say bank stores saving account data at one place (these places are called
tables we will learn them later) and salary account data at another place, in that case if the customer
information such as customer name, address etc. are stored at both places then this is just a wastage
of storage (redundancy/ duplication of data), to organize the data in a better way the information
should be stored at one place and both the accounts should be linked to that information somehow.
The same thing we achieve in DBMS.

Purpose of Database Systems

The main purpose of database systems is to manage the data. Consider a university that keeps the
data of students, teachers, courses, books etc. To manage this data we need to store this data
somewhere where we can add new data, delete unused data, update outdated data, retrieve data, to
perform these operations on data we need a Database management system that allows us to store
the data in such a way so that all these operations can be performed on the data efficiently.

Database systems are much better than traditional file processing systems which we have discussed
in the separate article: DBMS vs File System.

❮ Previous Next ❯

Suranjith Nishalaka Ranasinghe says

FEBRUARY 19, 2017 AT 5:11 AM

Actually, this tutorial is so great! I have looked through so many sites, but this is definitely
the best one. thanks too much..

Punyabrata Rath says

MARCH 29, 2017 AT 7:11 AM
Everything has been explained in such a simple way that it becomes convenient for
everyone to read and understand…

jeff says
APRIL 29, 2017 AT 8:41 PM

Excellent presentation so easy to understand

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Types of DBMS (Database Management System)

LAST UPDATED: AUGUST 16, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

DBMS is a software that manages the data for efficient storage and fast retrievals of data from
database. MySQL, IBM Db2, Oracle, PostgreSQL etc. are all DBMS softwares that manages the data. In
this guide, you will learn various types of DBMS (Database Management System).

Types of DBMS
There are 4 types of DBMS:
1. Relational Database Management System (RDBMS)
2. Object Oriented Database Management System.
3. Hierarchical Database management system.
4. Network Database management system.

1. Relational Database Management System

In RDBMS, data is stored in tables, in form of rows and columns, where a row represent the record of
the data and column represents the attributes of the record.

For example, a student table stores the records of various students, a row of this table represents the
record of a single student and the column represents the attributes of the record such as student id,
name, age, address etc.
Student table

ID Name Age Address

--- --------- ---- --------
101 Ajeet 28 Delhi
102 Chaitanya 32 Noida
103 Hari 31 Pune
104 Rahul 30 Agra
105 Steve 35 Noida

We use SQL to manage, organize and perform various operations on RDBMS.

Examples of RDBMS: MySQL, Oracle, DB2 etc.

2. Object Oriented Database Management System

Data is stored as objects, attributes and methods. It typically stores and manages objects directly on
the database server’s disk. There are no tables, no rows, no columns, no foreign keys. There are only
objects.

Elements of Object Oriented Database:

Object: It is a combination of data and its behaviour(commonly referred as methods).

For example: A house is an object. An object has two characteristics: states and behaviour.

In this example of “House” being an object. The state of “House” is its address, color, area etc. and
behaviour is Open main door, close main door etc.

An object oriented database can be represented by the following diagram. To read more about object
oriented programming, refer this guide.
3. Hierarchical Database Management System
In hierarchical database management system, data is stored in form of one to many relationships. You
can visualize it like a tree where a root node is attached to several descendants nodes called leaves.

Example of Hierarchical database systems are: IMS by IDM, Windows registry by Microsoft.

For example: To store the data of an organization, the root node is organization itself. The immediate
child nodes are: Employees, Managers, Directors. These child nodes can have further child nodes
such as Employees can have child nodes such as: Engineers, Housekeeping staff, system admin etc.

It can be represented by the following diagram:

4. Network Database Management System
A network database management system (network DBMS) is based on a network data model, which
allows each record to be related to multiple primary records and multiple secondary records.

A network database is based on a traditional hierarchical database, except it allows each object to
have multiple parents instead of a single parent. This means data in network database can have one
to one or one to many relationships.

❮ DBMS Tutorial
Top Related Articles:
1. Decomposition in DBMS – Lossless and Lossy with examples
2. DBMS SQL Insert Statement
3. DBMS Tutorial – Database Management System notes
4. Indexed sequential access method (ISAM) in DBMS
5. Instance and schema in DBMS

About the Author

– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Database Applications – DBMS

LAST UPDATED: JULY 24, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this guide, you will learn the various DBMS applications. These applications help you understand
the use of DBMS in various fields.

DBMS applications
Applications where we use Database Management Systems are:

Telecom: There is a database to keeps track of the information regarding calls made, network
usage, customer details etc. Without the database systems it is hard to maintain that huge
amount of data that keeps updating every millisecond.
Industry: Where it is a manufacturing unit, warehouse or distribution centre, each one needs a
database to keep the records of ins and outs. For example distribution centre should keep a
track of the product units that supplied into the centre as well as the products that got delivered
out from the distribution centre on each day; this is where DBMS comes into picture.
Banking System: For storing customer info, tracking day to day credit and debit transactions,
generating bank statements etc. All this work has been done with the help of Database
management systems. Also, banking system needs security of data as the data is sensitive, this
is efficiently taken care by the DBMS systems.
Sales: To store customer information, production information and invoice details. Using DBMS,
you can track, manage and generate historical data to analyse the sales data.
Airlines: To travel though airlines, we make early reservations, this reservation information along
with flight schedule is stored in database. This is where the real-time update of data is necessary
as a flight seat reserved for one passenger should not be allocated to another passenger, this is
easily handled by the DBMS systems as the data updates are in real time and fast.
Education sector: Database systems are frequently used in schools and colleges to store and
retrieve the data regarding student details, staff details, course details, exam details, payroll data,
attendance details, fees details etc. There is a large amount of inter-related data that needs to be
stored and retrieved in an efficient manner.
Online shopping: You must be aware of the online shopping websites such as Amazon, Flipkart
etc. These sites store the product information, your addresses and preferences, credit details
and provide you the relevant list of products based on your query. All this involves a Database
management system. Along with managing the vast catalogue of items, there is a need to
secure the user private information such as bank & card details. All this is taken care of by
database management systems.

I have mentioned very few applications, this list is never going to end as almost every field where the
database needs to be managed is using DBMS now a days. The traditional file system is used only
where the data size is very small.

❮ Previous Next ❯

Download free marketing

guide
Marketing teams want to bring their true
visions to life on the web.

Webflow Open

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Advantages and Disadvantages of DBMS: DBMS vs

file System
LAST UPDATED: JULY 24, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this guide, you will learn advantages and disadvantages of DBMS. We will first discuss what is a file
processing system and how Database management systems are better than file processing systems.

Drawbacks of File system

Data redundancy: Data redundancy refers to the duplication of data, lets say we are managing
the data of a college where a student is enrolled for two courses, the same student details in
such case will be stored twice, which will take more storage than needed. Data redundancy often
leads to higher storage costs and poor access time.
Data inconsistency: Data redundancy leads to data inconsistency, lets take the same example
that we have taken above, a student is enrolled for two courses and we have student address
stored twice, now lets say student requests to change his address, if the address is changed at
one place and not on all the records then this can lead to data inconsistency.
Data Isolation: Because data are scattered in various files, and files may be in different formats,
writing new application programs to retrieve the appropriate data is difficult.
Dependency on application programs: Changing files would lead to change in application
programs.
Atomicity issues: Atomicity of a transaction refers to “All or nothing”, which means either all the
operations in a transaction executes or none.
For example: Let’s say Steve transfers 100$ to Negan’s account. This transaction consists
multiple operations such as debit 100$ from Steve’s account, credit 100$ to Negan’s account.
Like any other device, a computer system can fail lets say it fails after first operation then in that
case Steve’s account would have been debited by 100$ but the amount was not credited to
Negan’s account, in such case the rollback of operation should occur to maintain the atomicity of
transaction. It is difficult to achieve atomicity in file processing systems.

Data Security: Data should be secured from unauthorised access, for example a student in a
college should not be able to see the payroll details of the teachers, such kind of security
constraints are difficult to apply in file processing systems.

Advantage of DBMS over file system

There are several advantages of Database management system over file system. Few of them are as
follows:

No redundant data: Redundancy removed by data normalization. No data duplication saves

storage and improves access time.
Data Consistency and Integrity: As we discussed earlier the root cause of data inconsistency is
data redundancy, since data normalization takes care of the data redundancy, data inconsistency
also been taken care of as part of it
Data Security: It is easier to apply access constraints in database systems so that only
authorized user is able to access the data. Each user has a different set of access thus data is
secured from the issues such as identity theft, data leaks and misuse of data.
Privacy: Limited access means privacy of data. DBMS can grant and revoke access to the
database on user level that ensures who is accessing which data. It also helps user to manage
the constraints on database, this ensures which type of data can be entered into the table.
Easy access to data – Database systems manages data in such a way so that the data is easily
accessible with fast response times. Even if the database size is huge, the DBMS can still
provide faster access and updation of data.
Easy recovery: Since database systems keeps the backup of data, it is easier to do a full
recovery of data in case of a failure. This is very useful especially for almost all the
organizations, as the data maintained over time should not be lost during a system crash or
failure.
Flexible: Database systems are more flexible than file processing systems. DBMS systems are
scalable, the database size can be increased and decreased based on the amount of storage
required. It also allows addition of additional tables as well as removal of existing tables without
disturbing the consistency of data.

Disadvantages of DBMS
DBMS implementation cost is high compared to the file system
Complexity: Database systems are complex to understand
Performance: Database systems are generic, making them suitable for various applications.
However this feature affect their performance for some applications

❮ Database Applications DBMS vs RDBMS ❯

sia biswas says

MARCH 25, 2016 AT 3:44 AM
I want to understand “DIFFICULTY IN ACCESSING DATA” with example…..

Vivek says
JANUARY 5, 2017 AT 9:23 PM

if you want to access any type of data in file system you have to go through ever single
one of them to find out where it is (data) .
But in Database Management System you can search using query as ( select * from
table_name Where column_name = “enter a value “

DEVENDRA KUMAR says

MARCH 10, 2021 AT 8:50 PM

All the topics of DBMS tutorial has been described in the simplest way and easy to
understand. Thanks to the writer.

Reply
Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

DBMS vs RDBMS: Difference between DBMS and

RDBMS
LAST UPDATED: JULY 3, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this guide, you will learn the difference between DBMS (Database Management System) and
RDBMS (Relational Database Management System).

What is a DBMS (Database Management System)?

Database management system is nothing but a software that maintains the data on a system. It
allows the user to perform various operations on the data such as read, write, update etc. DBMS
typically maintains the data on the system in a form of file.

What is a RDBMS (Relational Database Management System)?

RDBMS stores the data in form of tables, these tables are interconnected to each other which helps in
identifying the relation between the data stored in different tables. It stores the data efficiently and the
operations on the data stored in RDBMS are faster compared to the traditional file based data
management system.

Difference between DBMS vs RDBMS

DBMS RDBMS

Data is stored in a files. Data is stored in a tables.

RDBMS supports normalization of tables, which

DBMS doesn’t support Normalization. reduces the data redundancy and avoid the

database from multiple anomalies.

RDBMS allows to set permissions on tables, which

DBMS doesn’t have a proper security of the prevents unauthorised access. It also allows

database. constraints to be set which make sure which data

can be entered into the table.

In DBMS, data is stored in files so the data In RDBMS, data is stored in tables and tables can

stored in different file is isolated and there is have a relationship with other tables. This helps in
no relation between the data stored in identifying the relationship between data stored in
different files. different tables.

DBMS doesn’t support distributed database. RDBMS supports distributed database.

RDBMS removes data redundancy using

Data redundancy is an issue in DBMS.
normalization.

DBMS is suitable for small organization

RDBMS is suitable for large organisations where the
where data size is small and there is no
size of the data is huge.
need to scale the data in future.
DBMS RDBMS

It support single user. It supports multiple users.

Software and hardware requirements are Software and hardware requirements are high since
low. the size of the data is big.

DBMS examples are: XML, MS Access etc. RDBMS examples are: IBM Db2, Oracle, MySQL etc.

❮ DBMS vs File System DBMS Architecture ❯

Types of DBMS Architecture

There are three types of DBMS architecture:

1. Single tier architecture

2. Two tier architecture
3. Three tier architecture

1. Single tier architecture

In this type of architecture, the database is readily available on the client machine, any request made
by client doesn’t require a network connection to perform the action on the database.
For example, lets say you want to fetch the records of employee from the database and the database
is available on your computer system, so the request to fetch employee details will be done by your
computer and the records will be fetched from the database by your computer as well. This type of
system is generally referred as local database system.

2. Two tier architecture

In two-tier architecture, the Database system is present at the server machine and the DBMS
application is present at the client machine, these two machines are connected with each other
through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server using a query
language like sql, the server perform the request on the database and returns the result back to the
client. The application connection interface such as JDBC, ODBC are used for the interaction between
server and client.

3. Three tier architecture

In three-tier architecture, another layer is present between the client machine and server machine. In
this architecture, the client application doesn’t communicate directly with the database systems
present at the server machine, rather the client application communicates with server application and
the server application internally communicates with the database system present at the server.
❮ DBMS vs RDBMS DBMS three level Architecture ❯

About the Author

– Chaitanya
Comments

sam says
APRIL 30, 2019 AT 9:40 AM

one of the most underrated website with the best explanation, no one in the world is as
best as u are

Why not try to build a platform where others can compete with each other on the basis of
their coding skills

OMONDI BENARD OPALA says

AUGUST 7, 2019 AT 9:09 AM

The tutorial is just fine and i appreciate very much for such a help.i was blank in DB but
having read through your notes am convinced that the DBMS is very simple and not
complicated as I thought before.Much appreciation for you guys.This is very great.

Reply
Sunanda says
DECEMBER 23, 2020 AT 4:55 PM

very helpful! .we can understand the concepts very clearly just by reading those simple and
effective explanations. Thank you so much:)

Mingso says
JANUARY 23, 2021 AT 4:37 AM

Wow! Great explanation ❤️

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

DBMS – Three Level Architecture

LAST UPDATED: NOVEMBER 13, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

In the previous tutorial we have seen the DBMS architecture – one-tier, two-tier and three-tier. In this
guide, we will discuss the three level DBMS architecture in detail.

DBMS Three Level Architecture Diagram

This architecture has three levels:
1. External level
2. Conceptual level
3. Internal level

1. External level
It is also called view level. The reason this level is called “view” is because several users can view their
desired data from this level which is internally fetched from database with the help of conceptual and
internal level mapping.
The user doesn’t need to know the database schema details such as data structure, table definition
etc. user is only concerned about data which is what returned back to the view level after it has been
fetched from database (present at the internal level).

External level is the “top level” of the Three Level DBMS Architecture.

2. Conceptual level
It is also called logical level. The whole design of the database such as relationship among data,
schema of data etc. are described in this level.

Database constraints and security are also implemented in this level of architecture. This level is
maintained by DBA (database administrator).

3. Internal level
This level is also known as physical level. This level describes how the data is actually stored in the
storage devices. This level is also responsible for allocating space to the data. This is the lowest level
of the architecture.

❮ Previous Next ❯

View of Data in DBMS

LAST UPDATED: JULY 24, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this guide, you will learn view of data in DBMS.

View of data in DBMS

Abstraction is one of the main features of database systems. Hiding irrelevant details from user and
providing abstract view of data to users, helps in easy and efficient user-database interaction. In the
previous tutorial, we discussed the three level of DBMS architecture, The top level of that architecture
is “view level”. The view level provides the “view of data” to the users and hides the irrelevant details
such as data relationship, database schema, constraints, security etc from the user.

To fully understand the view of data, you must have a basic knowledge of data abstraction and
instance & schema. Refer these two tutorials to learn them in detail.

1. Data abstraction:Database systems are made-up of complex data structures. To ease the user
interaction with database, the developers hide internal irrelevant details from users. This process
of hiding irrelevant details from user is called data abstraction.
2. Instance and schema: Design of a database is called the schema. Schema is of three types:
Physical schema, logical schema and view schema. The data stored in database at a particular
moment of time is called instance of database. Database schema defines the variable
declarations in tables that belong to a particular database; the value of these variables at a
moment of time is called the instance of that database.

❮ Previous Next ❯

2024 Global Threat

Report

CrowdStrike® Open
About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Data Abstraction in DBMS

LAST UPDATED: JULY 24, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding irrelevant
details from user is called data abstraction. The term “irrelevant” used here with respect to the user, it
doesn’t mean that the hidden data is not relevant with regard to the whole database. It just means that
the user is not concerned about that data.

For example: When you are booking a train ticket, you are not concerned how data is processing at the
back end when you click “book ticket”, what processes are happening when you are doing online
payments. You are just concerned about the message that pops up when your ticket is successfully
booked. This doesn’t mean that the process happening at the back end is not relevant, it just means
that you as a user are not concerned what is happening in the database.

Three levels of abstraction

Physical level: This is the lowest level of data abstraction. It describes how data is actually stored in
database. You can get the complex data structure details at this level.

Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data is
stored in database.

View level: Highest level of data abstraction. This level describes the user interaction with database
system.
Example: Let’s say we are storing customer information in a customer table. At physical level these
records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These
details are often hidden from the programmers.

At the logical level these records can be described as fields and attributes along with their data types,
their relationship among each other can be logically implemented. The programmers generally work at
this level because they are aware of such things about database systems.

At view level, user just interact with system with the help of GUI and enter the details at the screen,
they are not aware of how the data is stored and what data is stored; such details are hidden from
them.

❮ Previous Next ❯

Prashant Chakravarty says

APRIL 25, 2016 AT 9:33 PM

Need to understand Database quickly, the website does a great job! Many Thanks :)

Satwik mudhiraj says

JULY 3, 2016 AT 7:20 AM

Easy to understand if u don’t have much time u should prefer this

shiv says
APRIL 14, 2017 AT 3:45 AM
Wonderful Explanation. Helps to understand the levels of database and data abstraction in
one shot.

sandeep says
JULY 4, 2017 AT 7:04 AM

Thank you for sharing such a simple explanation of dbms concept

it is perfectly good in terms of user interface such as font used and making bold for
important topics.
awesome explanation!!!!

request you to add the next topic link in the bottom of every page, it would be helpful to
navigate to the next topic once we completed the current topic

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Instance and schema in DBMS

LAST UPDATED: JULY 24, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this guide, you will learn about instance and schema in DBMS.

DBMS Schema
Definition of schema: Design of a database is called the schema. For example: An employee table in
database exists with the following attributes:

EMP_NAME EMP_ID EMP_ADDRESS EMP_CONTACT

-------- ------ ----------- -----------

This is the schema of the employee table. Schema defines the attributes of tables in the database.
Schema is of three types: Physical schema, logical schema and view schema.

Schema represents the logical view of the database. It helps you understand what data needs to
go where.
Schema can be represented by a diagram as shown below.
Schema helps the database users to understand the relationship between data. This helps in
efficiently performing operations on database such as insert, update, delete, search etc.
In the following diagram, we have a schema that shows the relationship between three tables: Course,
Student and Section. The diagram only shows the design of the database, it doesn’t show the data
present in those tables. Schema is only a structural view(design) of a database as shown in the
diagram below.

The design of a database at physical level is called physical schema, how the data stored in blocks of
storage is described at this level.

Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of data records
gets stored in data structures, however the internal details such as implementation of data structure is
hidden at this level (available at physical level).

Design of database at view level is called view schema. This generally describes end user interaction
with database systems.

To learn more about these schemas, refer 3 level data abstraction architecture.

DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is called instance
of database. Database schema defines the attributes in tables that belong to a particular database.
The value of these attributes at a moment of time is called the instance of that database.

For example, we have seen the schema of table “employee” above. Let’s see the table with the data
now. At this moment the table contains two rows (records). This is the the current instance of the
table “employee” because this is the data that is stored in this table at this particular moment of time.

EMP_NAME EMP_ID EMP_ADDRESS EMP_CONTACT

------- ------ ----------- -----------
Chaitanya 101 Noida 95********
Ajeet 102 Delhi 99********

Let’s take another example: Let’s say we have a single table student in the database, today the table
has 100 records, so today the instance of the database has 100 records. We are going to add another
100 records in this table by tomorrow so the instance of database tomorrow will have 200 records in
table. In short, at a particular moment the data stored in database is called the instance, this changes
over time as and when we add, delete or update data in the database.
❮ Previous Next ❯

About the Author

– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

DBMS languages
LAST UPDATED: NOVEMBER 14, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

Database languages are used to read, update and store data in a database. There are several such
languages that can be used for this purpose; one of them is SQL (Structured Query Language).

Types of DBMS languages:

Data Definition Language (DDL)
DDL is used for specifying the database schema. It is used for creating tables, schema, indexes,
constraints etc. in database. Lets see the operations that we can perform on database using DDL:

To create the database instance – CREATE

To alter the structure of database – ALTER
To drop database instances – DROP
To delete tables in a database instance – TRUNCATE
To rename database instances – RENAME
To drop objects from database such as tables – DROP
To Comment – Comment

All of these commands either defines or update the database schema that’s why they come under
Data Definition language.

Data Manipulation Language (DML)

DML is used for accessing and manipulating data in a database. The following operations on
database comes under DML:

To read records from table(s) – SELECT

To insert record(s) into the table(s) – INSERT
Update the data in table(s) – UPDATE
Delete all the records from the table – DELETE

Data Control language (DCL)

DCL is used for granting and revoking user access on a database –

To grant access to user – GRANT

To revoke access from user – REVOKE

In practical data definition language, data manipulation language and data control languages are not
separate language, rather they are the parts of a single database language such as SQL.

Transaction Control Language(TCL)

The changes in the database that we made using DML commands are either performed or rollbacked
using TCL.
To persist the changes made by DML commands in database – COMMIT
To rollback the changes made to the database – ROLLBACK

❮ Previous Next ❯

About the Author

– Chaitanya

Comments

Fahad says
NOVEMBER 27, 2016 AT 4:35 PM

You guys are really awesome,

Thanks a lots for this valuable information about DBMS,
Well DBMS is a brief topic and very hard to remember,
Thank for making DBMS easy

habibur says
DECEMBER 8, 2016 AT 12:13 PM

Why SQL is taken for instance to categories DDL?

Is the SQL a sub category of DDL?
I’m not clear.

Sakhawat says
MARCH 17, 2018 AT 6:42 PM

DDL and DML and Query Languages are the mode of database language. On the other
hand SQL is the example of Database language, not a subcategory of DDL.

Reply
Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Data models in DBMS

LAST UPDATED: NOVEMBER 14, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

Data Model is a logical structure of Database. It describes the design of database to reflect entities,
attributes, relationship among data, constrains etc.

Types of Data Models

There are several types of data models in DBMS. We will cover them in detail in separate articles(Links
to those separate tutorials are already provided below). In this guide, we will just see a basic overview
of types of models.

Object based logical Models – Describe data at the conceptual and view levels.

1. E-R Model
2. Object oriented Model

Record based logical Models – Like Object based model, they also describe data at the conceptual
and view levels. These models specify logical structure of database with records, fields and attributes.

1. Relational Model
2. Hierarchical Model
3. Network Model – Network Model is same as hierarchical model except that it has graph-like
structure rather than a tree-based structure. Unlike hierarchical model, this model allows each
record to have more than one parent record.

Physical Data Models – These models describe data at the lowest level of abstraction.

❮ Previous Next ❯

New tiny portable

potentiostat
Smartphone potentiostat capable of EIS
up to 200 kHz. View specifications.

PalmSens Open

About the Author

– Chaitanya

Comments

Amrinder Singh says

JANUARY 21, 2017 AT 6:04 AM

Thanks a lot for simple explanations. Keep It Up. There is lot more work to do on this site
to make it best among all.

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Entity Relationship Diagram – ER Diagram in DBMS

LAST UPDATED: JULY 25, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

An Entity–relationship model (ER model) describes the structure of a database with the help of a
diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a design or
blueprint of a database that can later be implemented as a database. The main components of E-R
model are: entity set and relationship set.

What is an Entity Relationship Diagram (ER Diagram)?

An ER diagram shows the relationship among entity sets. An entity set is a group of similar entities
and these entities can have attributes. In terms of DBMS, an entity is a table or attribute of a table in
database, so by showing relationship among tables and their attributes, ER diagram shows the
complete logical structure of a database. Lets have a look at a simple ER diagram to understand this
concept.

A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship. The
relationship between Student and College is many to one as a college can have many students
however a student cannot study in multiple colleges at the same time. Student entity has attributes
such as Stu_Id, Stu_Name & Stu_Addr and College entity has attributes such as Col_ID & Col_Name.

Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss these terms in
detail in the next section(Components of a ER Diagram) of this guide so don’t worry too much about
these terms now, just go through them once.

Rectangle: Represents Entity sets.

Ellipses: Attributes
Diamonds: Relationship Set
Lines: They link attributes to Entity Sets and Entity sets to Relationship Set
Double Ellipses: Multivalued Attributes
Dashed Ellipses: Derived Attributes
Double Rectangles: Weak Entity Sets
Double Lines: Total participation of an entity in a relationship set
Components of a ER Diagram

As shown in the above diagram, an ER diagram has three main components:

1. Entity
2. Attribute
3. Relationship

1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an ER diagram.
For example: In the following ER diagram we have two entities Student and College and these two
entities have many to one relationship as many students study in a single college. We will read more
about relationships later, for now focus on entities.

Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the relationship with
other entity is called weak entity. The weak entity is represented by a double rectangle. For example –
a bank account cannot be uniquely identified without knowing the bank to which the account belongs,
so bank account is a weak entity.

2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an ER diagram.
There are four types of attributes:

1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute

1. Key attribute:

A key attribute can uniquely identify an entity from an entity set. For example, student roll number can
uniquely identify a student from a set of students. Key attribute is represented by oval same as other
attributes however the text of key attribute is underlined.

2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute. For example, In
student entity, the student address is a composite attribute as an address is composed of other
attributes such as pin code, state, country.

3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is represented with
double ovals in an ER Diagram. For example – A person can have more than one phone numbers so
the phone number attribute is multivalued.

4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is represented
by dashed oval in an ER Diagram. For example – Person age is a derived attribute as it changes over
time and can be derived from another attribute (Date of birth).
E-R diagram with multivalued and derived attributes:

3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the relationship among
entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many

1. One to One Relationship

When a single instance of an entity is associated with a single instance of another entity then it is
called one to one relationship. For example, a person has only one passport and a passport is given to
one person.
2. One to Many Relationship

When a single instance of an entity is associated with more than one instances of another entity then
it is called one to many relationship. For example – a customer can place many orders but a order
cannot be placed by many customers.

3. Many to One Relationship

When more than one instances of an entity is associated with a single instance of another entity then
it is called many to one relationship. For example – many students can study in a single college but a
student cannot study in many colleges at the same time.

4. Many to Many Relationship

When more than one instances of an entity is associated with more than one instances of another
entity then it is called many to many relationship. For example, a can be assigned to many projects
and a project can be assigned to many students.

Total Participation of an Entity set

Total participation of an entity set represents that each entity in entity set must have at least one
relationship in a relationship set. It is also called mandatory participation. For example: In the
following diagram each college must have at-least one associated Student. Total participation is
represented using a double line between the entity set and relationship set.

Partial participation of an Entity Set

Partial participation of an entity set represents that each entity in the entity set may or may not
participate in the relationship instance in that relationship set. It is also called as optional
participation

Partial participation is represented using a single line between the entity set and relationship set.

Example: Consider an example of an IT company. There are many employees working for the
company. Let’s take the example of relationship between employee and role software engineer. Every
software engineer is an employee but not every employee is software engineer as there are employees
for other roles as well, such as housekeeping, managers, CEO etc. so we can say that participation of
employee entity set to the software engineer relationship is partial.

❮ Previous Next ❯

Banti Kumar says

NOVEMBER 28, 2016 AT 3:30 PM

Thanks for this. Bcz this really very easiast way to describe this which is easily understood
by anyone.
Thanku so much sir…

Tamara Jahan says

DECEMBER 23, 2016 AT 8:46 AM

Thank you so much sir. Understood everything really well. saved a lot of googling hassle.

Prasad says
JULY 23, 2021 AT 7:08 PM

Thank you ❤👍👍 so much a very simple way to explain and very easy way to
understand even for average students and very very nice way of explanation.
Reply

virendra says
AUGUST 17, 2021 AT 7:17 AM

Very nice content if i could get the PPT of Entity Relationship Diagram – ER Diagram in
DBMS.
it could be helpful to me for future study.

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

DBMS – ER Design Issues

LAST UPDATED: JULY 25, 2021 BY CHAITANYA SINGH | FILED UNDER: DBMS

We have already covered ER diagram in our previous article DBMS ER Model Concept. In this post, we
will discuss the various issues that can arise while designing an ER diagram.

Here are some of the issues that can occur while ER diagram design process:

1. Choosing Entity Set vs Attributes

Here we will discuss how choosing an entity set vs an attribute can change the whole ER design
semantics. To understand this lets take an example, let’s say we have an entity set Student with
attributes such as student-name and student-id. Now we can say that the student-id itself can be an
entity with the attributes like student-class and student-section.

Now if we compare the two cases we discussed above, in the first case we can say that the student
can have only one student id, however in the second case when we chose student id as an entity it
implied that a student can have more than one student id.

2. Choosing Entity Set vs. Relationship Sets

It is hard to decide that an object can be best represented by an entity set or relationship set. To
comprehend and decide the perfect choice between these two (entity vs relationship), the user needs
to understand whether the entity would need a new relationship if a requirement arise in future, if this
is the case then it is better to choose entity set rather than relationship set.

Let’s take an example to understand it better: A person takes a loan from a bank, here we have two
entities person and bank and their relationship is loan. This is fine until there is a need to disburse a
joint loan, in such case a new relationship needs to be created to define the relationship between the
two individuals who have taken joint loan. In this scenario, it is better to choose loan as an entity set
rather than a relationship set.

3. Choosing Binary vs n-ary Relationship Sets

In most cases, the relationships described in an ER diagrams are binary. The n-ary relationships are
those where entity sets are more than two, if the entity sets are only two, their relationship can be
termed as binary relationship.

The n-ary relationships can make ER design complex, however the good news is that we can convert
and represent any n-ary relationship using multiple binary relationships.

This may sound confusing so lets take an example to understand how we can convert an n-ary
relationship to multiple binary relationships. Now lets say we have to describe a relationship between
four family members: father, mother, son and daughter. This can easily be represented in forms of
multiple binary relationships, father-mother relationship as “spouse”, son and daughter relationship as
“siblings” and father and mother relationship with their child as “child”.

4. Placing Relationship Attributes

The cardinality ratio in DBMS can help us determine in which scenarios we need to place relationship
attributes. It is recommended to represent the attributes of one to one or one to many relationship
sets with any participating entity sets rather than a relationship set.
For example, if an entity cannot be determined as a separate entity rather it is represented by the
combination of participating entity sets. In such case it is better to associate these entities to many-
to-many relationship sets.

❮ Previous Next ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *
POST COMMENT

Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap

Home Java C C++ DBMS Computer Network Python More…

DBMS – ER Diagram to Table Conversion

LAST UPDATED: JULY 25, 2021 BY CHAITANYA SINGH | FILED UNDER: DBMS

We have learned ER Diagram and ER design issues in previous articles. In this post, we will cover how
to convert ER diagram into database tables.

First we will convert simple ER diagrams to tables. In the end, we will take a complex ER diagram and
then we will convert it into set of tables.

1. Strong Entity set with Simple attributes

The Strong Entity set becomes the table and the attributes of the Entity set becomes the table
attributes. The key attribute of the entity set becomes the primary key of the table.

Let’s take an example: Here we have an entity set Employee with the attributes Name, Age, Emp_Id
and Salary. When we convert this ER diagram to table, the entity set becomes table so we have a table
named “Employee” as shown in the following diagram. The attributes of the entity set becomes the
attributes of the table.
2. Strong Entity Set With Composite Attributes
Now we will see how to convert Strong entity set with composite attributes ER to table. The
conversion is fairly simple in this case as well. The entity set will be the table and the simple attributes
of the composite attributes will become the attributes of the table while the composite attribute itself
will be ignored during conversion.
Let’s take an example. As you can see we have a composite attribute Name and this composite
attribute has two simple attributes First_N and Last_N. While converting this ER to table we have not
used the composite attribute itself in the table instead we have used the simple attributes of this
composite attribute as table’s attributes.

3. Strong Entity Set With Multi Valued Attributes

Entity set with multi-valued attributes will require two tables in the relational model.

We will understand this conversion with the help of a diagram. Let’s take the same example that we
have seen above, here we have added a new multi-valued attribute Dept. An employee can work in
multiple department so we have this Dept attribute marked as multi-valued. Whenever we have a
multi-valued attribute, there needs to be more than one table to represent the ER diagram. As you can
see we have created two tables to represent this ER.
4. Relationship Set to Table conversion
While converting the relationship set to a table, the primary attributes of the two entity sets becomes
the table attributes and if the relationship set has any attribute that also becomes the attribute of the
table.
In the following example, we have two entity sets Employee and Department. These entity sets are
associated to each other using the Works relationship set. To convert this relationship set Works to the
table, we take the primary attributes of each entity set, these are Emp_Id and Dept_Id and all the
attributes of the relationship set and form a table.

❮ Previous Next ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

DBMS – Recursive Relationship in ER Diagrams

LAST UPDATED: AUGUST 21, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

A relationship between two entities is called recursive relationship if the two entities are of similar
type. For example: A relationship between a manager and an engineer is a recursive relationship
because both manager and employee are employees of the company. Similarly a relationship
“marries” between two person is recursive relationship as a person marries to another person, in this
example the entity person is has a relationship with itself. In this guide, you will learn how to represent
a recursive relationship in an ER diagram.

Recursive relationship in ER diagram

A recursive relationship can be represented in ER diagram as shown below. As you can see this
relationship involves Employee entity twice. You can also call it as a relationship to itself. An employee
supervises another employee.
Here an Employee supervises another employee. This is one to many relationship as one employee
can supervise many employees.

Recursive relation ER diagram with a role name: In this ER diagram, we are depicting the supervises
relationship with the role names. This clearly shows that a supervisor employee has a one to many
relationship with the supervised employee.

Employees hierarchy:
Here we are displaying an employee hierarchy of 8 employees. This diagram shows us that, in this
supervisor relationship, the total participation is optional as there are some employees that are not
supervised by anyone such as “Chaitanya” and there are some employees, who do not supervise
anyone such as Rahul, Jim, Steve, Carl & Ron.
More examples of Recursive relationships
Some other example ER diagrams of recursive relationships:
❮ ER Diagram DBMS tutorial ❯

The ER diagram before generalization looks like this:

These two entities have two common attributes: Name and Address, we can make a generalized entity
with these common attributes. Lets have a look at the ER model after generalization.

The ER diagram after generalization:

We have created a new generalized entity Person and this entity has the common attributes of both
the entities. As you can see in the following ER diagram that after the generalization process the
entities Student and Teacher only has the specialized attributes Grade and Salary respectively and
their common attributes (Name & Address) are now associated with a new entity Person which is in
the relationship with both the entities (Student & Teacher).
Note:
1. Generalization uses bottom-up approach where two or more lower level entities combine together
to form a higher level new entity.
2. The new generalized entity can further combine together with lower level entity to create a further
higher level generalized entity.

❮ Previous Next ❯

About the Author

– Chaitanya

Comments
Manoj Kumar Dewangan says
DECEMBER 30, 2018 AT 2:18 PM

I am really very thankful to the author of this website. Really it helps me alot to prepare for
my university examination as well as clear the concept of many things which was too
much difficult for me.

Gopal says
SEPTEMBER 24, 2020 AT 7:18 AM

I understand specialization concept with you help

Krishna sree says

OCTOBER 13, 2020 AT 2:00 PM

Very Nice and Crystal clear information about DBMS , Thank you very much .

Reply
Aarya says
DECEMBER 25, 2020 AT 3:48 PM

Thank you so much for all the extra efforts you make to help us grow.

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

DBMS Aggregration
LAST UPDATED: NOVEMBER 16, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

Aggregation is a process in which a single entity alone is not able to make sense in a relationship so
the relationship of two entities acts as one entity. I know it sounds confusing but don’t worry the
example we will take, will clear all the doubts.

Aggregration Example
In real world, we know that a manager not only manages the employee working under them but he has
to manage the project as well. In such scenario if entity “Manager” makes a “manages” relationship
with either “Employee” or “Project” entity alone then it will not make any sense because he has to
manage both. In these cases the relationship of two entities acts as one entity. In our example, the
relationship “Works-On” between “Employee” & “Project” acts as one entity that has a relationship
“Manages” with the entity “Manager”.

❮ Previous Next ❯
Top Related Articles:
1. Alternate key in DBMS
2. ACID properties in DBMS
3. DBMS – ER Design Issues
4. Data Replication in DBMS
5. DBMS – Three Level Architecture

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Relational model in DBMS

LAST UPDATED: NOVEMBER 17, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

In relational model, the data and relationships are represented by collection of inter-related tables.
Each table is a group of column and rows, where column represents attribute of an entity and rows
represents records.

Sample relationship Model: Student table with 3 columns and four records.

Table: Student

Stu_Id Stu_Name Stu_Age

111 Ashish 23

123 Saurav 22

169 Lester 24

234 Lou 26
Table: Course

Stu_Id Course_Id Course_Name

111 C01 Science

111 C02 DBMS

169 C22 Java

169 C39 Computer Networks

Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id &
Course_Name are attributes of table Course. The rows with values are the records (commonly known
as tuples).

❮ Previous Next ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Hierarchical model in DBMS

LAST UPDATED: NOVEMBER 17, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

In hierarchical model, data is organized into a tree like structure with each record is having one parent
record and many children. The main drawback of this model is that, it can have only one to many
relationships between nodes.

Note: Hierarchical models are rarely used now.

Sample Hierarchical Model Diagram:

Lets say we have few students and few courses and a course can be assigned to a single student only,
however a student take any number of courses so this relationship becomes one to many.
Example of hierarchical data represented as relational tables: The above hierarchical model can be
represented as relational tables like this:

Stu_Id Stu_Name Stu_Age

123 Steve 29

367 Chaitanya 27
234 Ajeet 28

Course Table:

Course_Id Course_Name Stu_Id

C01 Cobol 123

C21 Java 367

C22 Perl 367

C33 JQuery 234

❮ Previous Next ❯

Student_Id Student_Name Student_Addr Student_Age

101 Chaitanya Dayal Bagh, Agra 27

102 Ajeet Delhi 26

103 Rahul Gurgaon 24

104 Shubham Chennai 25

2. Record or Tuple
Each row of a table is known as record. It is also known as tuple. For example, the following row is a
record that we have taken from the above table.

102 Ajeet Delhi 26

3. Field or Column name or Attribute

The above table “STUDENT” has four fields (or attributes): Student_Id, Student_Name, Student_Addr &
Student_Age.

4. Domain
A domain is a set of permitted values for an attribute in table. For example, a domain of month-of-year
can accept January, February,…December as values, a domain of dates can accept all possible valid
dates etc. We specify domain of attribute while creating a table.
An attribute cannot accept values that are outside of their domains. For example, In the above table
“STUDENT”, the Student_Id field has integer domain so that field cannot accept values that are not
integers for example, Student_Id cannot has values like, “First”, 10.11 etc.

5. Instance and Schema

I have already covered instance and schema in a separate guide, you can refer the guide here.

6. Keys
This is our next topic, I have covered the keys in detail in separate tutorials. You can refer the keys
index here.

❮ Previous Next ❯

DBMS Relational Algebra

LAST UPDATED: FEBRUARY 20, 2019 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this tutorial, we will discuss Relational Algebra. In the previous tutorial, we had a brief discussion on
the basics of relational algebra and calculus where we learned the need to use these theoretical
mathematical systems.

What is Relational Algebra in DBMS?

Relational algebra is a procedural query language that works on relational model. The purpose of a
query language is to retrieve data from database or perform various operations such as insert, update,
delete on the data. When I say that relational algebra is a procedural query language, it means that it
tells what data to be retrieved and how to be retrieved.

On the other hand relational calculus is a non-procedural query language, which means it tells what
data to be retrieved but doesn’t tell how to retrieve it. We will discuss relational calculus in a separate
tutorial.

Types of operations in relational algebra

We have divided these operations in two categories:
1. Basic Operations
2. Derived Operations
Basic/Fundamental Operations:
1. Select (σ)
2. Project (∏)
3. Union (∪)
4. Set Difference (-)
5. Cartesian product (X)
6. Rename (ρ)

Derived Operations:
1. Natural Join (⋈)
2. Left, Right, Full outer join (⟕, ⟖, ⟗)
3. Intersection (∩)
4. Division (÷)

Lets discuss these operations one by one with the help of examples.

Select Operator (σ)

Select Operator is denoted by sigma (σ) and it is used to find the tuples (or rows) in a relation (or
table) which satisfy the given condition.

If you understand little bit of SQL then you can think of it as a where clause in SQL, which is used for
the same purpose.

Syntax of Select Operator (σ)

σ Condition/Predicate(Relation/Table name)
Select Operator (σ) Example
Table: CUSTOMER
---------------

Customer_Id Customer_Name Customer_City

----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi

Query:

σ Customer_City="Agra" (CUSTOMER)

Output:

Customer_Id Customer_Name Customer_City

----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra

Project Operator (∏)

Project operator is denoted by ∏ symbol and it is used to select desired columns (or attributes) from a
table (or relation).

Project operator in relational algebra is similar to the Select statement in SQL.

Syntax of Project Operator (∏)

∏ column_name1, column_name2, ...., column_nameN(table_name)

Project Operator (∏) Example

In this example, we have a table CUSTOMER with three columns, we want to fetch only two columns
of the table, which we can do with the help of Project Operator ∏.

Table: CUSTOMER

Customer_Id Customer_Name Customer_City

----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi

Query:

∏ Customer_Name, Customer_City (CUSTOMER)

Output:

Customer_Name Customer_City
------------- -------------
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi

Union Operator (∪)

Union operator is denoted by ∪ symbol and it is used to select all the rows (tuples) from two tables
(relations).

Lets discuss union operator a bit more. Lets say we have two relations R1 and R2 both have same
columns and we want to select all the tuples(rows) from these relations then we can apply the union
operator on these relations.

Note: The rows (tuples) that are present in both the tables will only appear once in the union set. In
short you can say that there are no duplicates present after the union operation.

Syntax of Union Operator (∪)

table_name1 ∪ table_name2

Union Operator (∪) Example

Table 1: COURSE

Course_Id Student_Name Student_Id

--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931
Table 2: STUDENT

Student_Id Student_Name Student_Age

------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18

Query:

∏ Student_Name (COURSE) ∪ ∏ Student_Name (STUDENT)

Output:

Student_Name
------------
Aditya
Carl
Paul
Lucy
Rick
Steve

Note: As you can see there are no duplicate names present in the output even though we had few
common names in both the tables, also in the COURSE table we had the duplicate name itself.

Intersection Operator (∩)

Intersection operator is denoted by ∩ symbol and it is used to select common rows (tuples) from two
tables (relations).

Lets say we have two relations R1 and R2 both have same columns and we want to select all those
tuples(rows) that are present in both the relations, then in that case we can apply intersection
operation on these two relations R1 ∩ R2.

Note: Only those rows that are present in both the tables will appear in the result set.

Syntax of Intersection Operator (∩)

table_name1 ∩ table_name2

Intersection Operator (∩) Example

Lets take the same example that we have taken above.
Table 1: COURSE

Course_Id Student_Name Student_Id

--------- ------------ ----------
C101 Aditya S901
C104 Aditya S901
C106 Steve S911
C109 Paul S921
C115 Lucy S931

Table 2: STUDENT

Student_Id Student_Name Student_Age

------------ ---------- -----------
S901 Aditya 19
S911 Steve 18
S921 Paul 19
S931 Lucy 17
S941 Carl 16
S951 Rick 18

Query:

∏ Student_Name (COURSE) ∩ ∏ Student_Name (STUDENT)

Output:

Student_Name
------------
Aditya
Steve
Paul
Lucy

Set Difference (-)

Set Difference is denoted by – symbol. Lets say we have two relations R1 and R2 and we want to
select all those tuples(rows) that are present in Relation R1 but not present in Relation R2, this can be
done using Set difference R1 – R2.

Syntax of Set Difference (-)

table_name1 - table_name2

Set Difference (-) Example

Lets take the same tables COURSE and STUDENT that we have seen above.

Query:
Lets write a query to select those student names that are present in STUDENT table but not present in
COURSE table.

∏ Student_Name (STUDENT) - ∏ Student_Name (COURSE)

Output:

Student_Name
------------
Carl
Rick

Cartesian product (X)

Cartesian Product is denoted by X symbol. Lets say we have two relations R1 and R2 then the
cartesian product of these two relations (R1 X R2) would combine each tuple of first relation R1 with
the each tuple of second relation R2. I know it sounds confusing but once we take an example of this,
you will be able to understand this.

Syntax of Cartesian product (X)

R1 X R2

Cartesian product (X) Example

Table 1: R
Col_A Col_B
----- ------
AA 100
BB 200
CC 300

Table 2: S

Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101

Query:
Lets find the cartesian product of table R and S.

R X S

Output:

Col_A Col_B Col_X Col_Y

----- ------ ------ ------
AA 100 XX 99
AA 100 YY 11
AA 100 ZZ 101
BB 200 XX 99
BB 200 YY 11
BB 200 ZZ 101
CC 300 XX 99
CC 300 YY 11
CC 300 ZZ 101

Note: The number of rows in the output will always be the cross product of number of rows in each
table. In our example table 1 has 3 rows and table 2 has 3 rows so the output has 3×3 = 9 rows.

Rename (ρ)
Rename (ρ) operation can be used to rename a relation or an attribute of a relation.
Rename (ρ) Syntax:
ρ(new_relation_name, old_relation_name)

Rename (ρ) Example

Lets say we have a table customer, we are fetching customer names and we are renaming the resulted
relation to CUST_NAMES.

Table: CUSTOMER

Customer_Id Customer_Name Customer_City

----------- ------------- -------------
C10100 Steve Agra
C10111 Raghu Agra
C10115 Chaitanya Noida
C10117 Ajeet Delhi
C10118 Carl Delhi

Query:

ρ(CUST_NAMES, ∏(Customer_Name)(CUSTOMER))
Output:

CUST_NAMES
----------
Steve
Raghu
Chaitanya
Ajeet
Carl

❮ Previous Next ❯

DBMS Relational Calculus

LAST UPDATED: FEBRUARY 20, 2019 BY CHAITANYA SINGH | FILED UNDER: DBMS

In the previous tutorial, we discussed Relational Algebra which is a procedural query language. In this
tutorial, we will discuss Relational Calculus, which is a non-procedural query language.

What is Relational Calculus?

Relational calculus is a non-procedural query language that tells the system what data to be retrieved
but doesn’t tell how to retrieve it.

Types of Relational Calculus

1. Tuple Relational Calculus (TRC)
Tuple relational calculus is used for selecting those tuples that satisfy the given condition.
Table: Student

First_Name Last_Name Age

---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28

Lets write relational calculus queries.

Query to display the last name of those students where age is greater than 30

{ t.Last_Name | Student(t) AND t.age > 30 }

In the above query you can see two parts separated by | symbol. The second part is where we define
the condition and in the first part we specify the fields which we want to display for the selected
tuples.

The result of the above query would be:

Last_Name
---------
Singh

Query to display all the details of students where Last name is ‘Singh’

{ t | Student(t) AND t.Last_Name = 'Singh' }

Output:

First_Name Last_Name Age

---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31

2. Domain Relational Calculus (DRC)

In domain relational calculus the records are filtered based on the domains.
Again we take the same table to understand how DRC works.
Table: Student

First_Name Last_Name Age

---------- --------- ----
Ajeet Singh 30
Chaitanya Singh 31
Rajeev Bhatia 27
Carl Pratap 28

Query to find the first name and age of students where student age is greater than 27

{< First_Name, Age > | ∈ Student ∧ Age > 27}

Note:
The symbols used for logical operators are: ∧ for AND, ∨ for OR and ┓ for NOT.

Output:

First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28

❮ Previous Next ❯

Difference Between View and Table with examples

LAST UPDATED: AUGUST 27, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this article, we will discuss the difference between view and table. Both of these terms are
commonly used in relational database.

What is a view?
A view is a result of a SQL query. The result look like a table, however this table is not physically
present in the database, rather the data displayed as a view is fetched from the tables in database.
This is why view is often referred as virtual table.

Syntax for creating a view:

Create view view_name as select column_list from table_name

Example:

CREATE VIEW [Senior Employees] AS

SELECT Emp_Name, Emp_Age
FROM Employees
WHERE Emp_age >= 60;
Here we are creating a view (virtual table) with the name "Senior Employees" and this virtual table
contains the employee name and age records of those employee who are older than 60 years. This
data is fetched from the table Employees.

Employees table:

Emp_Id Emp_Name Emp_Age Emp_City Emp_Dept

101 Tom 55 Noida Sales
102 Ron 60 Delhi Sales
103 Ajeet 61 Gurgaon Retail
104 Carl 59 Noida HR
105 Daniel 65 Agra Manager

Senior Employees view:

Emp_Name Emp_Age
Ron 60
Ajeet 61
Daniel 65

What is a table?
A table contains the data in form of rows and columns. For example, if a student table contains
records of 100 students and details of each student consists of student name, id, age and address,
then the student table should have 100 rows and 4 columns.

The columns are the attributes of the records such as student name, id age and address and each
student record is stored in a row so 100 rows for 100 students.

Syntax for creating a table:

CREATE TABLE table_name (
column_name1 datatype,
column_name2 datatype,
column_name3 datatype,
....
);

View vs Table

VIEW TABLE

Table data is inserted through queries and

View is generated from a table. Its data depend on
it doesn’t depend on anything for getting
the data present in the underlying tables.
the data, rather the data is inserted by user.

View existence is limited for a single query, its

existence is temporary. Every time you run a SQL Tables are permanently stored in database,
query to create view, it gets recreated using the until they are deleted using SQL queries.
existing data of tables.

You can insert, delete or update data of

You cannot insert, delete or update data of a view.
tables.

A table cannot be recreated. You need to

A view can be easily recreated and populated with
delete the existing table to create a new
different data using replace view command.
table with the same name.
A table maintain the relationship with other
A view can contain data from multiple tables.
tables using foreign key.

View of Data in DBMS

ER Diagram to Table Conversion
Instance and Schema in DBMS
Denormalization vs Normalization
Decomposition in DBMS – LossLess and Lossy with examples

❮ Learn DBMS

Types of keys in DBMS

Note: Guys I have been getting comments that there are no examples of keys here. If you click on
the hyperlink provided below in green colour, you would see the complete separate tutorial of each
key with examples.

Primary Key – A primary is a column or set of columns in a table that uniquely identifies tuples (rows)
in that table.

Super Key – A super key is a set of one of more columns (attributes) to uniquely identify rows in a
table.

Candidate Key – A super key with no redundant attribute is known as candidate key

Alternate Key – Out of all candidate keys, only one gets selected as primary key, remaining keys are
known as alternate or secondary keys.
Composite Key – A key that consists of more than one attribute to uniquely identify rows (also known
as records & tuples) in a table is called composite key.

Foreign Key – Foreign keys are the columns of a table that points to the primary key of another table.
They act as a cross-reference between tables.

❮ Previous Next ❯

About the Author

– Chaitanya

Comments
Kgotso says
MARCH 14, 2017 AT 11:07 AM

It would really be helpful if there were examples for all the keys using one table to
demonstrate them

Scott says
JUNE 24, 2018 AT 2:14 PM

You can reach the examples of each by clicking either the hyperlinks on the left or the
ones in the web page itself.

Taslim Arif says

AUGUST 28, 2018 AT 6:26 PM

if u would have clicked on any of key u might have seen all the details mate.

Reply
Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Primary key in DBMS

LAST UPDATED: SEPTEMBER 18, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this guide, you will learn about primary key in DBMS with the help of examples. We will discuss,
what is a primary key, how it is different from other keys in DBMS such as foreign key and unique key.

What is a Primary Key

A primary key is a minimal set of attributes (columns) in a table that uniquely identifies tuples (rows)
of that table.

For example, you want to store student data in a table “student”. The attributes of this table are:
student_id, student_name, student_age, student_address. The primary key is a set of one or more of
these attributes to uniquely identify a record in the table. In the case, since student_id is different for
each student, this can be considered a primary key.

Characteristics of a primary key

Primary key has the following characteristics:

1. Minimal

The primary key should contain minimal number of attributes. The example we seen above, where
student_id is able to uniquely identify a record, here combination of two attributes such as {student_id,
student_name} can also uniquely identify record. However since we should choose minimal set of
attribute thus student is chosen as primary key instead of {student_id, student_name}.

2. Unique
The value of primary key should be unique for each row of the table. The column(s) that makes the
key cannot contain duplicate values. This is because non-unique value would not help us uniquely
identify record. If two students have same student_id then updating a record of one student based on
primary key can mistakenly update record of other student.

3. Non Null

The attribute(s) that is marked as primary key is not allowed to have null values.

4. Not dependent on Time

The primary key value should not change over time. It should remain as it is until explicitly updated by
the user.

5. Easily accessible

The primary key of the record should be accessible to all the users who are performing any operations
on the database.

6. Can have more than one attributes

It can be a set of more than one attributes (columns). For example {Stu_Id, Stu_Name} collectively can
identify the tuple in the above table, but we do not choose it as primary key because Stu_Id alone is
enough to uniquely identifies rows in a table and we always go for minimal set. Having that said, we
should choose more than one columns as primary key only when there is no single column that can
uniquely identify the tuple in table.
Syntax for Creating Primary key constraint:
While creating table you can define primary key like this:

CREATE TABLE table_name

(
column_name1 datatype [ NULL | NOT NULL ],
column_name2 datatype [ NULL | NOT NULL ],
...

CONSTRAINT constraint_name PRIMARY KEY (column_nameX, column_nameY..)

);

For example: Here we are making stu_id primary key while creating the table STUDENTS.

CREATE TABLE STUDENTS

( stu_id int NOT NULL
first_name VARCHAR(30) NOT NULL,
last_name VARCHAR(25) NOT NULL,
dob DATE,
CONSTRAINT student_pk PRIMARY KEY (stu_id)
);

Properties of a Primary Key

It doesn’t not allow duplicates.
A table can have only one primary key
Primary key is denoted by underlining the attribute name (column name).
It uniquely identifies each record of the table
It doesn’t allow null values to be inserted for the primary key column.
A primary key can consists of more than one columns, such primary key is known as composite
primary key.

What Are the Benefits of a Primary Key?

The following are the advantages of a primary key:

It uniquely identifies each row of a table. This is definitely useful to perform any operation on
data such as update, delete, search etc.
It allows faster access of the record because it uses the concept indexing in DBMS.

Primary Key Example in DBMS

Let’s take an example to understand the concept of primary key. In the following table, there are three
attributes: Stu_ID, Stu_Name & Stu_Age. Out of these three attributes, one attribute or a set of more
than one attributes can be a primary key.

Attribute Stu_Name alone cannot be a primary key as more than one students can have same
name.
Attribute Stu_Age alone cannot be a primary key as more than one students can have same age.
Attribute Stu_Id alone is a primary key as each student has a unique id that can identify the
student record in the table.

Note: In some cases an attribute alone cannot uniquely identify a record in a table, in that case we try
to find a set of attributes that can uniquely identify a row in table. We will see the example of it after
this example.
Table Name: STUDENTS
Another example: composite key with more than one attributes
Consider this table ORDER, this table keeps the daily record of the purchases made by the customer.
This table has three attributes: Customer_ID, Product_ID & Order_Quantity.

Customer_ID alone cannot be a primary key as a single customer can place more than one order
thus more than one rows of same Customer_ID value. As we see in the following example that
customer id 1011 has placed two orders with product if 9023 and 9111.
Product_ID alone cannot be a primary key as more than one customers can place a order for the
same product thus more than one rows with same product id. In the following table, customer id
1011 & 1122 placed an order for the same product (product id 9023).
Order_Quantity alone cannot be a primary key as more more than one customers can place the
order for the same quantity.
Since none of the attributes alone were able to become a primary key, let’s try to make a set of
attributes that plays the role of it. The set {Customer_ID, Product_ID} together can identify the
rows uniquely in the table so this set is the primary key for this table.

Table Name: ORDER

Note: While choosing a set of attributes for a primary key, we always choose the minimal set that has
minimum number of attributes. For example, if there are two sets that can identify row in table, the set
that has minimum number of attributes should be chosen as primary key.

How to define primary key in DBMS?

In the above example, we already had a table with data and we were trying to understand the purpose
and meaning of primary key. However you should know that generally we define the primary key during
table creation. We can define the primary key later as well but that rarely happens in the real world
scenario.
Let’s say we want to create the table that we have discussed above with the customer id and product
id set working as primary key. We can do that in SQL like this:

Create table ORDER

(
Customer_ID int not null,
Product_ID int not null,
Order_Quantity int not null,
Primary key (Customer_ID, Product_ID)
)

Suppose we didn’t define the primary key while creating table then we can define it later like this:

ALTER TABLE ORDER

ADD CONSTRAINT PK_Order PRIMARY KEY (Customer_ID, Product_ID);

Another way:
When we have only one attribute as primary key, like we see in the first example of STUDENT table. we
can define the key like this as well:

Create table STUDENTS

(
Stu_Id int primary key,
Stu_Name varchar(255) not null,
Stu_Age int not null
)

❮ Previous Next ❯

About the Author

– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Super key in DBMS

LAST UPDATED: DECEMBER 11, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

Definition of Super Key in DBMS: A super key is a set of one or more attributes (columns), which can
uniquely identify a row in a table. Often DBMS beginners get confused between super key and
candidate key, so we will also discuss candidate key and its relation with super key in this article.

How candidate key is different from super key?

Answer is simple – Candidate keys are selected from the set of super keys, the only thing we take care
while selecting candidate key is: It should not have any redundant attribute. That’s the reason they are
also termed as minimal super key.

Let’s take an example to understand this:

Table: Employee

Emp_SSN Emp_Number Emp_Name

--------- ---------- --------
123456789 226 Steve
999999321 227 Ajeet
888997212 228 Chaitanya
777778888 229 Robert
Super keys: The above table has following super keys. All of the following sets of super key are able to
uniquely identify a row of the employee table.

{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_SSN, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}

Candidate Keys: As I mentioned in the beginning, a candidate key is a minimal super key with no
redundant attributes. The following two set of super keys are chosen from the above sets as there are
no redundant attributes in these sets.

{Emp_SSN}
{Emp_Number}

Only these two sets are candidate keys as all other sets are having redundant attributes that are not
necessary for unique identification.

Super key vs Candidate Key

I have been getting lot of comments regarding the confusion between super key and candidate key.
Let me give you a clear explanation.
1. First you have to understand that all the candidate keys are super keys. This is because the
candidate keys are chosen out of the super keys.
2. How we choose candidate keys from the set of super keys? We look for those keys from which we
cannot remove any fields. In the above example, we have not chosen {Emp_SSN, Emp_Name} as
candidate key because {Emp_SSN} alone can identify a unique row in the table and Emp_Name is
redundant.

Primary key:
A Primary key is selected from a set of candidate keys. This is done by database admin or database
designer. We can say that either {Emp_SSN} or {Emp_Number} can be chosen as a primary key for the
table Employee.

❮ Previous Next ❯

Why Emp_Name is not candidate or super key ?

Lee says
MAY 14, 2017 AT 11:50 AM

Emp_Name cannot be both because names are not unique e.g. there could be
hundreds of Jack inside the database.
Reply

anuj says
FEBRUARY 26, 2017 AT 7:18 PM

sir why don’t Emp_Number is a super key

Gabriele says
JULY 16, 2017 AT 6:06 PM

Emp_Number is a super key, in fact with just Emp_Number we can lead up to

Emp_Name or Emp_SSN;

Emp_Name is NOT a super key because we can have 2 “Steve” or 3 or 4 in that table..
Name are not unique, we cannot say the same for SSN codes or Emp_Number. Bye
Gabriele

Chaitanya Singh says

APRIL 6, 2018 AT 4:43 AM
It is a super key as mentioned in the article.

Imran khan says

MARCH 30, 2018 AT 4:55 PM

What is the difference between super key and candidate key?

Chaitanya Singh says

APRIL 6, 2018 AT 4:50 AM

I have added more details on this in the guide, Please refer the added section above.

lavanya says
MAY 4, 2018 AT 7:55 AM
I have doubt that
{Emp_SSN, Emp_Number} pair also Candidate Keys?.because both are not a redundant
attributes

Chaitanya Singh says

MAY 6, 2018 AT 2:23 PM

No {Emp_SSN, Emp_Number} pair is not a candidate key because {Emp_SSN} alone is

sufficient to identify a unique row in table. The same applies for {Emp_Number}. So
based on this we can say that {Emp_SSN} and {Emp_Number} both are candidate keys
but {Emp_SSN, Emp_Number} is not a candidate key.

kirti says
SEPTEMBER 17, 2018 AT 1:29 PM

is candidate key also primary key?

Chaitanya Singh says

DECEMBER 11, 2018 AT 5:02 AM

There can be number of candidate keys present in a table, however there is only one
primary key. A primary key is always chosen from a set of candidate keys. The
decision of choosing primary key from a set of candidate keys is made by database
admin.

Sai kumar says

SEPTEMBER 17, 2018 AT 9:31 PM

why not the emp_name is the part of candidate key .though it is also having unique set of
values???

Chaitanya Singh says

DECEMBER 11, 2018 AT 4:58 AM

Because more than one employees can have same name.

Reply
keerthi says
OCTOBER 17, 2018 AT 1:25 PM

candidate keys are always come with pair so, why{Emp_SSN, Emp_Number} is not a
candidate key

Chaitanya Singh says

DECEMBER 11, 2018 AT 5:00 AM

No, candidate keys are not necessarily to be a pair of attributes.

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Candidate Key in DBMS

LAST UPDATED: DECEMBER 11, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

Definition of Candidate Key in DBMS: A super key with no redundant attribute is known as candidate
key. Candidate keys are selected from the set of super keys, the only thing we take care while
selecting candidate key is that the candidate key should not have any redundant attributes. That’s the
reason they are also termed as minimal super key.

Candidate Key Example

Lets take an example of table “Employee”. This table has three attributes: Emp_Id, Emp_Number &
Emp_Name. Here Emp_Id & Emp_Number will be having unique values and Emp_Name can have
duplicate values as more than one employees can have same name.

Emp_Id Emp_Number Emp_Name

------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert

How many super keys the above table can have?

1. {Emp_Id}
2. {Emp_Number}
3. {Emp_Id, Emp_Number}
4. {Emp_Id, Emp_Name}
5. {Emp_Id, Emp_Number, Emp_Name}
6. {Emp_Number, Emp_Name}

Lets select the candidate keys from the above set of super keys.

1. {Emp_Id} – No redundant attributes

2. {Emp_Number} – No redundant attributes
3. {Emp_Id, Emp_Number} – Redundant attribute. Either of those attributes can be a minimal super
key as both of these columns have unique values.
4. {Emp_Id, Emp_Name} – Redundant attribute Emp_Name.
5. {Emp_Id, Emp_Number, Emp_Name} – Redundant attributes. Emp_Id or Emp_Number alone are
sufficient enough to uniquely identify a row of Employee table.
6. {Emp_Number, Emp_Name} – Redundant attribute Emp_Name.

The candidate keys we have selected are:

{Emp_Id}
{Emp_Number}

Note: A primary key is selected from the set of candidate keys. That means we can either have
Emp_Id or Emp_Number as primary key. The decision is made by DBA (Database administrator)

❮ Previous Next ❯

About the Author

– Chaitanya

Comments
kamal pratap says
OCTOBER 19, 2017 AT 2:26 PM

Sir, Can a Candidate key contain NULL values ? If yes, then how many?

Dibbyendu says
MAY 16, 2018 AT 3:17 PM

A primary key is being selected from the group of candidate keys. That means we can
either have Emp_Id or Emp_Number as primary key. Now, a primary key can’t have a
null value [we learnt it in Primary key ]. Hence candidate key must not contain a null
value…

Amit Sharma says

SEPTEMBER 28, 2018 AT 4:49 AM

No Candidate key does not contain NULL values as a Primary key is selected by the
group of Candidate key and as we know that Primary Key has unique constraint and
NOT NULL.
Thanks!
Reply

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *
POST COMMENT

Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap

Home Java C C++ DBMS Computer Network Python More…

Alternate key in DBMS

LAST UPDATED: DECEMBER 11, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

As we have seen in the candidate key guide that a table can have multiple candidate keys. Among
these candidate keys, only one key gets selected as primary key, the remaining keys are known as
alternative or secondary keys.

Alternate Key Example

Lets take an example to understand the alternate key concept. Here we have a table Employee, this
table has three attributes: Emp_Id, Emp_Number & Emp_Name.

Table: Employee/strong>

Emp_Id Emp_Number Emp_Name

------ ---------- --------
E01 2264 Steve
E22 2278 Ajeet
E23 2288 Chaitanya
E45 2290 Robert

There are two candidate keys in the above table:

{Emp_Id}
{Emp_Number}

DBA (Database administrator) can choose any of the above key as primary key. Lets say Emp_Id is
chosen as primary key.

Since we have selected Emp_Id as primary key, the remaining key Emp_Number would be called
alternative or secondary key.

❮ Previous Next ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *
POST COMMENT

Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap

Home Java C C++ DBMS Computer Network Python More…

Composite key in DBMS

LAST UPDATED: DECEMBER 11, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

Definition of Composite key: A key that has more than one attributes is known as composite key. It is
also known as compound key.

Note: Any key such as super key, primary key, candidate key etc. can be called composite key if it has
more than one attributes.

Composite key Example

Lets consider a table Sales. This table has four columns (attributes) – cust_Id, order_Id, product_code
& product_count.

Table – Sales

cust_Id order_Id product_code product_count

-------- -------- ------------ -------------
C01 O001 P007 23
C02 O123 P007 19
C02 O123 P230 82
C01 O001 P890 42
None of these columns alone can play a role of key in this table.

Column cust_Id alone cannot become a key as a same customer can place multiple orders, thus the
same customer can have multiple entires.

Column order_Id alone cannot be a primary key as a same order can contain the order of multiple
products, thus same order_Id can be present multiple times.

Column product_code cannot be a primary key as more than one customers can place order for the
same product.

Column product_count alone cannot be a primary key because two orders can be placed for the same
product count.

Based on this, it is safe to assume that the key should be having more than one attributes:
Key in above table: {cust_id, product_code}

This is a composite key as it is made up of more than one attributes.

❮ Previous Next ❯

i think it should be {cust_id, product_code}

haritha says
OCTOBER 17, 2016 AT 11:13 AM

what is the difference between composite key and candidate key?

Lee says
MAY 14, 2017 AT 8:22 PM

The difference is that candidate key does not allow redundant attributes only unique
attributes like ID and Item Code etc.
Reply

Chaitanya Singh says

DECEMBER 11, 2018 AT 12:36 PM

A composite key must have more than one attributes while a candidate key can
contain a single attribute.

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *
Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Foreign key in DBMS

LAST UPDATED: DECEMBER 11, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

Definition: Foreign keys are the columns of a table that points to the primary key of another table.
They act as a cross-reference between tables.

For example:
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it points to
the primary key of the Student table.

Course_enrollment table:

Course_Id Stu_Id

C01 101

C02 102

C03 101

C05 102
C06 103

C07 102

Student table:

Stu_Id Stu_Name Stu_Age

101 Chaitanya 22

102 Arya 26

103 Bran 25

104 Jon 21

Note: Practically, the foreign key has nothing to do with the primary key tag of another table, if it points
to a unique column (not necessarily a primary key) of another table then too, it would be a foreign key.
So, a correct definition of foreign key would be: Foreign keys are the columns of a table that points to
the candidate key of another table.

❮ Previous Next ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Constraints in DBMS
LAST UPDATED: NOVEMBER 17, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

Constraints enforce limits to the data or type of data that can be inserted/updated/deleted from a
table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. In this tutorial we will learn several types of constraints that can be
created in RDBMS.

Types of constraints
NOT NULL
UNIQUE
DEFAULT
CHECK
Key Constraints – PRIMARY KEY, FOREIGN KEY
Domain constraints
Mapping constraints

NOT NULL:

NOT NULL constraint makes sure that a column does not hold NULL value. When we don’t provide
value for a particular column while inserting a record into a table, it takes NULL value by default. By
specifying NULL constraint, we can be sure that a particular column(s) cannot have NULL values.
Example:

CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (235),
PRIMARY KEY (ROLL_NO)
);

CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);

CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL CHECK(ROLL_NO >1000) ,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);

In the above example we have set the check constraint on ROLL_NO column of STUDENT table. Now,
the ROLL_NO field must have the value greater than 1000.

Key constraints:
PRIMARY KEY:

Primary key uniquely identifies each record in a table. It must have unique values and cannot contain
nulls. In the below example the ROLL_NO field is marked as primary key, that means the ROLL_NO
field cannot have duplicate and null values.

CREATE TABLE STUDENT(

ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL UNIQUE,
STU_AGE INT NOT NULL,
STU_ADDRESS VARCHAR (35) UNIQUE,
PRIMARY KEY (ROLL_NO)
);

FOREIGN KEY:

Foreign keys are the columns of a table that points to the primary key of another table. They act as a
cross-reference between tables.
Read more about it here.

Domain constraints:
Each table has certain set of columns and each column allows a same type of data, based on its data
type. The column does not accept values of any other data type.
Domain constraints are user defined data type and we can define them like this:

Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY / FOREIGN KEY /
CHECK / DEFAULT)

Mapping constraints:
Read about Mapping constraint here.

❮ Previous Next ❯

About the Author

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Domain constraints in DBMS

LAST UPDATED: NOVEMBER 19, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

A table is DBMS is a set of rows and columns that contain data. Columns in table have a unique name,
often referred as attributes in DBMS. A domain is a unique set of values permitted for an attribute in a
table. For example, a domain of month-of-year can accept January, February….December as possible
values, a domain of integers can accept whole numbers that are negative, positive and zero.

Definition: Domain constraints are user defined data type and we can define them like this:
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY / FOREIGN KEY /
CHECK / DEFAULT)

Example:
For example I want to create a table “student_info” with “stu_id” field having value greater than 100, I
can create a domain and table like this:

create domain id_value int

constraint id_test
check(value > 100);

create table student_info (

stu_id id_value PRIMARY KEY,
stu_name varchar(30),
stu_age int
);

Another example:
I want to create a table “bank_account” with “account_type” field having value either “checking” or
“saving”:

create domain account_type char(12)

constraint acc_type_test
check(value in ("Checking", "Saving"));

create table bank_account (

account_nbr int PRIMARY KEY,
account_holder_name varchar(30),
account_type account_type
);

❮ Previous Next ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *
POST COMMENT

Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap

Home Java C C++ DBMS Computer Network Python More…

Mapping constraints in DBMS

LAST UPDATED: APRIL 29, 2015 BY CHAITANYA SINGH | FILED UNDER: DBMS

Mapping constraints can be explained in terms of mapping cardinality:

Mapping Cardinality:
One to One: An entity of entity-set A can be associated with at most one entity of entity-set B and an
entity in entity-set B can be associated with at most one entity of entity-set A.

One to Many: An entity of entity-set A can be associated with any number of entities of entity-set B
and an entity in entity-set B can be associated with at most one entity of entity-set A.

Many to One: An entity of entity-set A can be associated with at most one entity of entity-set B and an
entity in entity-set B can be associated with any number of entities of entity-set A.

Many to Many: An entity of entity-set A can be associated with any number of entities of entity-set B
and an entity in entity-set B can be associated with any number of entities of entity-set A.

We can have these constraints in place while creating tables in database.

Example:
CREATE TABLE Customer (
customer_id int PRIMARY KEY NOT NULL,
first_name varchar(20),
last_name varchar(20)
);

CREATE TABLE Order (

order_id int PRIMARY KEY NOT NULL,
customer_id int,
order_details varchar(50),
constraint fk_Customers foreign key (customer_id)
references dbo.Customer
);

Assuming, that a customer orders more than once, the above relation represents one to many relation.
Similarly we can achieve other mapping constraints based on the requirements.

About the Author

– Chaitanya

Comments

mak says
JANUARY 6, 2016 AT 6:12 AM

You provided a very good material on DBMS. I really appreciate that.

Thank you,

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Cardinality in DBMS
LAST UPDATED: NOVEMBER 17, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

In DBMS you may hear cardinality term at two different places and it has two different meanings as
well.

In Context of Data Models:

In terms of data models, cardinality refers to the relationship between two tables. Relationship can be
of four types as we have already seen in Entity relationship guide:

One to One – A single row of first table associates with single row of second table. For example, a
relationship between person and passport table is one to one because a person can have only one
passport and a passport can be assigned to only one person.

One to Many – A single row of first table associates with more than one rows of second table. For
example, relationship between customer and order table is one to many because a customer can
place many orders but a order can be placed by a single customer alone.

Many to One – Many rows of first table associate with a single row of second table. For example,
relationship between student and university is many to one because a university can have many
students but a student can only study only in single university at a time.
Many to Many – Many rows of first table associate with many rows of second table. For example,
relationship between student and course table is many to many because a student can take many
courses at a time and a course can be assigned to many students.

In Context of Query Optimization:

In terms of query, the cardinality refers to the uniqueness of a column in a table. The column with all
unique values would be having the high cardinality and the column with all duplicate values would be
having the low cardinality. These cardinality scores helps in query optimization.

❮ Previous Next ❯

About the Author

– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Functional dependency in DBMS

LAST UPDATED: DECEMBER 14, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

The attributes of a table is said to be dependent on each other when an attribute of a table uniquely
identifies another attribute of the same table.

For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id
we can tell the student name associated with it. This is known as functional dependency and can be
written as Stu_Id->Stu_Name or in words we can say Stu_Name is functionally dependent on Stu_Id.

Formally:
If column A of a table uniquely identifies the column B of same table then it can represented as A->B
(Attribute B is functionally dependent on attribute A)

Types of Functional Dependencies

Trivial functional dependency
non-trivial functional dependency
Multivalued dependency
Transitive dependency

❮ Previous Next ❯
Top Related Articles:
1. Deadlock in DBMS
2. Instance and schema in DBMS
3. Alternate key in DBMS
4. Trivial functional dependency in DBMS with example
5. DBMS – ER Design Issues

About the Author

– Chaitanya

Comments

Prasad says
SEPTEMBER 1, 2015 AT 3:00 AM

Good work. Thanks

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Trivial functional dependency in DBMS with example

LAST UPDATED: APRIL 24, 2015 BY CHAITANYA SINGH | FILED UNDER: DBMS

The dependency of an attribute on a set of attributes is known as trivial functional dependency if the
set of attributes includes that attribute.

Symbolically: A ->B is trivial functional dependency if B is a subset of A.

The following dependencies are also trivial: A->A & B->B

For example: Consider a table with two columns Student_id and Student_Name.

{Student_Id, Student_Name} -> Student_Id is a trivial functional dependency as Student_Id is a subset

of {Student_Id, Student_Name}. That makes sense because if we know the values of Student_Id and
Student_Name then the value of Student_Id can be uniquely determined.

Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial dependencies too.

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *
Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Non trivial functional dependency in DBMS

LAST UPDATED: APRIL 24, 2015 BY CHAITANYA SINGH | FILED UNDER: DBMS

If a functional dependency X->Y holds true where Y is not a subset of X then this dependency is called
non trivial Functional dependency.

For example:
An employee table with three attributes: emp_id, emp_name, emp_address.
The following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)

On the other hand, the following dependencies are trivial:

{emp_id, emp_name} -> emp_name [emp_name is a subset of {emp_id, emp_name}]
Refer: trivial functional dependency.

Completely non trivial FD:

If a FD X->Y holds true where X intersection Y is null then this dependency is said to be completely non
trivial function dependency.

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Multivalued dependency in DBMS

LAST UPDATED: APRIL 24, 2015 BY CHAITANYA SINGH | FILED UNDER: DBMS

Multivalued dependency occurs when there are more than one independent multivalued attributes in a
table.

For example: Consider a bike manufacture company, which produces two colors (Black and white) in
each model every year.

bike_model manuf_year color

M1001 2007 Black

M1001 2007 Red

M2012 2008 Black

M2012 2008 Red

M2222 2009 Black

M2222 2009 Red

Here columns manuf_year and color are independent of each other and dependent on bike_model. In
this case these two columns are said to be multivalued dependent on bike_model. These
dependencies can be represented like this:

bike_model ->> manuf_year

bike_model ->> color

About the Author

– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Transitive dependency in DBMS

LAST UPDATED: APRIL 24, 2015 BY CHAITANYA SINGH | FILED UNDER: DBMS

A functional dependency is said to be transitive if it is indirectly formed by two functional

dependencies. For e.g.

X -> Z is a transitive dependency if the following three functional dependencies hold true:

X->Y
Y does not ->X
Y->Z

Note: A transitive dependency can only occur in a relation of three of more attributes. This
dependency helps us normalizing the database in 3NF (3rd Normal Form).

Example: Let’s take an example to understand it better:

Book Author Author_age

Game of Thrones George R. R. Martin 66

Harry Potter J. K. Rowling 49

Dying of the Light George R. R. Martin 66

{Book} ->{Author} (if we know the book, we knows the author name)

{Author} does not ->{Book}

{Author} -> {Author_age}

Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold, that makes
sense because if we know the book name we can know the author’s age.

About the Author

Comments

Jennifer says
DECEMBER 7, 2016 AT 2:17 AM

{Book} ->{Author} (if we know the book, we knows (KNOW*) the author name)

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Normalization in DBMS: 1NF, 2NF, 3NF and BCNF in

Database
LAST UPDATED: MAY 5, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Normalization is a process of organizing the data in database to avoid data redundancy, insertion
anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss
normal forms with examples.

Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are:
Insertion, update and deletion anomaly. Let’s take an example to understand this.

Example: A manufacturing company stores the employee details in a table Employee that has four
attributes: Emp_Id for storing employee’s id, Emp_Name for storing employee’s name, Emp_Address for
storing employee’s address and Emp_Dept for storing the department details in which the employee
works. At some point of time the table looks like this:

Emp_Id Emp_Name Emp_Address Emp_Dept

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004

This table is not normalized. We will see the problems that we face when a table in database is not
normalized.

Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the
same in two rows or the data will become inconsistent. If somehow, the correct address gets updated
in one department but not in other then as per the database, Rick would be having two different
addresses, which is not correct and would lead to inconsistent data.

Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if Emp_Dept
field doesn’t allow null.

Delete anomaly: Let’s say in future, company closes the department D890 then deleting the rows that
are having Emp_Dept as D890 would also delete the information of employee Maggie since she is
assigned only to this department.

To overcome these anomalies we need to normalize the data. In the next section we will discuss
about normalization.
Normalization
Here are the most commonly used normal forms:

First normal form(1NF)

Second normal form(2NF)
Third normal form(3NF)
Boyce & Codd normal form (BCNF)

First normal form (1NF)

A relation is said to be in 1NF (first normal form), if it doesn’t contain any multi-valued attribute. In
other words you can say that a relation is in 1NF if each attribute contains only atomic(single) value
only.

As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It
should hold only atomic values.

Example: Let’s say a company wants to store the names and contact details of its employees. It
creates a table in the database that looks like this:

Emp_Id Emp_Name Emp_Address Emp_Mobile

101 Herschel New Delhi 8912312390

8812121212 ,
102 Jon Kanpur
9900012222
103 Ron Chennai 7778881212

9990000123,
104 Lester Bangalore
8123450987

Two employees (Jon & Lester) have two mobile numbers that caused the Emp_Mobile field to have
multiple values for these two employees.

This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”,
the Emp_Mobile values for employees Jon & Lester violates that rule.

To make the table complies with 1NF we need to create separate rows for the each mobile number in
such a way so that none of the attributes contains multiple values.

Emp_Id Emp_Name Emp_Address Emp_Mobile

101 Herschel New Delhi 8912312390

102 Jon Kanpur 8812121212

102 Jon Kanpur 9900012222

103 Ron Chennai 7778881212

104 Lester Bangalore 9990000123

104 Lester Bangalore 8123450987

To learn more about 1NF refer this article: 1NF

Second normal form (2NF)

A table is said to be in 2NF if both the following conditions hold:

Table is in 1NF (First normal form)

No non-prime attribute is dependent on the proper subset of any candidate key of table.

An attribute that is not part of any candidate key is known as non-prime attribute.

Example: Let’s say a school wants to store the data of teachers and the subjects they teach. They
create a table Teacher that looks like this: Since a teacher can teach more than one subjects, the table
can have multiple rows for a same teacher.

Teacher_Id Subject Teacher_Age

111 Maths 38

111 Physics 38

222 Biology 38

333 Physics 40
333 Chemistry 40

Candidate Keys: {Teacher_Id, Subject}

Non prime attribute: Teacher_Age

This table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non
prime attribute Teacher_Age is dependent on Teacher_Id alone which is a proper subset of candidate
key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper
subset of any candidate key of the table”.

To make the table complies with 2NF we can disintegrate it in two tables like this:
Teacher_Details table:

Teacher_Id Teacher_Age

111 38

222 38

333 40

Teacher_Subject table:

Teacher_Id Subject
111 Maths

111 Physics

222 Biology

333 Physics

333 Chemistry

Now the tables are in Second normal form (2NF). To learn more about 2NF refer this guide: 2NF

Third Normal form (3NF)

A table design is said to be in 3NF if both the following conditions hold:

Table must be in 2NF

Transitive functional dependency of non-prime attribute on any super key should be removed.

An attribute that is not part of any candidate key is known as non-prime attribute.

In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:

X is a super key of table

Y is a prime attribute of table

An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Let’s say a company wants to store the complete address of each employee, they create a
table named Employee_Details that looks like this:

Emp_Id Emp_Name Emp_Zip Emp_State Emp_City Emp_District

1001 John 282005 UP Agra Dayal Bagh

1002 Ajeet 222008 TN Chennai M-City

1006 Lora 282007 TN Chennai Urrapakkam

1101 Lilly 292008 UK Pauri Bhagwan

1201 Steve 222999 MP Gwalior Ratan

Super keys: {Emp_Id}, {Emp_Id, Emp_Name}, {Emp_Id, Emp_Name, Emp_Zip}…so on

Candidate Keys: {Emp_Id}
Non-prime attributes: all attributes except Emp_Id are non-prime as they are not part of any candidate
keys.

Here, Emp_State, Emp_City & Emp_District dependent on Emp_Zip. Further Emp_zip is dependent on
Emp_Id that makes non-prime attributes (Emp_State, Emp_City & Emp_District) transitively
dependent on super key (Emp_Id). This violates the rule of 3NF.

To make this table complies with 3NF we have to disintegrate the table into two tables to remove the
transitive dependency:

Employee Table:
Emp_Id Emp_Name Emp_Zip

1001 John 282005

1002 Ajeet 222008

1006 Lora 282007

1101 Lilly 292008

1201 Steve 222999

Employee_Zip table:

Emp_Zip Emp_State Emp_City Emp_District

282005 UP Agra Dayal Bagh

222008 TN Chennai M-City

282007 TN Chennai Urrapakkam

292008 UK Pauri Bhagwan

222999 MP Gwalior Ratan

Boyce Codd normal form (BCNF)

It is an advance version of 3NF that’s why it is also referred as 3.5NF. BCNF is stricter than 3NF. A
table complies with BCNF if it is in 3NF and for every functional dependency X->Y, X should be the
super key of the table.

Example: Suppose there is a company wherein employees work in more than one department. They
store the data like this:

Emp_Id Emp_Nationality Emp_Dept Dept_Type Dept_No_Of_Emp

1001 Austrian Production and planning D001 200

1001 Austrian stores D001 250

1002 American design and technical support D134 100

1002 American Purchasing department D134 600

Functional dependencies in the table above:

Emp_Id -> Emp_Nationality
Emp_Dept -> {Dept_Type, Dept_No_Of_Emp}

Candidate key: {Emp_Id, Emp_Dept}

The table is not in BCNF as neither Emp_Id nor Emp_Dept alone are keys.

To make the table comply with BCNF we can break the table in three tables like this:
Emp_Nationality table:
Emp_Id Emp_Nationality

1001 Austrian

1002 American

Emp_Dept table:

Emp_Dept Dept_Type Dept_No_Of_Emp

Production and planning D001 200

stores D001 250

design and technical support D134 100

Purchasing department D134 600

Emp_Dept_Mapping table:

Emp_Id Emp_Dept

1001 Production and planning

1001 stores
1002 design and technical support

1002 Purchasing department

Functional dependencies:
Emp_Id -> Emp_Nationality
Emp_Dept -> {Dept_Type, Dept_No_Of_Emp}

Candidate keys:
For first table: Emp_Id
For second table: Emp_Dept
For third table: {Emp_Id, Emp_Dept}

This table is now in BCNF as in both the functional dependencies left side part is a key.

❮ Previous Next ❯

Robert Luse says

DECEMBER 7, 2015 AT 3:04 AM

If two employees have the same zip, they will share the row in the zip table. There
does not need to be two rows in the zip table and indeed, there should not be two rows
in the zip table.

harshal davane says

APRIL 5, 2017 AT 6:51 PM

WRONG IF WE CREATE NEW ZIP TABLE THEN WE CAN SEARCH THERE ZIP BYE
NAME ALSO ..

amit says
APRIL 22, 2017 AT 8:13 PM
name is not a prime attribute because multiple students can have same name
and each student may have a different zip
sagar -441124
sagar -345632

MUDASSIR AHMED says

DECEMBER 11, 2015 AT 6:47 AM

In employee table there will be 2 employees with same zip code but in employee_zip
table there will be 1 record related to that zip code.The tables are related by zip
code.So only 1 record will be fetched from employee_zip table. Hope you get the
answer.

Gulfam says
DECEMBER 14, 2015 AT 10:41 AM

Hey Mahak, there is only one record for every ZIP.

ZIP in itself the complete address.

Steve says
DECEMBER 16, 2015 AT 10:30 PM

Mahak, That is the point they are trying to make is that many employees could be
related to 1 Zip record. There would only be 1 entry in the Zip table per zip, since that’s
the key. That is the point of 3NF, is to denormalize the duplicate data in the Employee
table. Good luck!

DeepeshChaudhari says
OCTOBER 30, 2016 AT 6:52 AM

I think there is no issue related to emp_zip……

becouse if any two employee have same emp_zip then it it means that both employee
live’s in same area and so then in employee_zip table there is one row of that zip……..
and data will be fetched from single row………………….

Reply
Robert Luse says
DECEMBER 7, 2015 AT 3:08 AM

As part of Normalization, there will be only one row for the the zip, not two. If two
employees have the same zip, they will both use the information for that zip in the zip
table.

Omenesa says
APRIL 28, 2017 AT 1:04 PM

We should imagine a case scenario where two employees have the same zip code but
different emp districts or emp city, which record will be fetched in such a scenario.

MUDASSIR AHMED says

DECEMBER 11, 2015 AT 6:40 AM

In BCNF “dept_no_of_emp” is also candidate key.

Reply
Kalpesh says
MARCH 9, 2016 AT 6:40 AM

Hi there,

I have read whole article of Normalization and I must say, it a best explanation with
examples.
Examples are very useful for better understating the concept. I am really very thankful to
you for the blog.
Thank you.

Pushpa says
MAY 5, 2016 AT 7:48 AM

Hi Chaitanya,

The concept of normalization with example explained is very helpful. It helped me to

understand it clearly.
Thanks for sharing.

Best Wishes,
Pushpa
Reply

aman says
MAY 27, 2016 AT 7:18 PM

This topic was not understandable from book .after reading this I finally got it. Thank u.

Sid says
JUNE 26, 2016 AT 5:42 PM

How is teacher_I’d, subject be the candidate key? Subject is redundant and only teacher I’d
shld be sufficient.

Harsh Rohila says

JULY 14, 2016 AT 3:43 PM

Consider teacher_id 111, it is having two different subjects maths and physics. So only
teacher_id cannot determine the complete row. Therefore subject is also required.
Reply

Jaswinder says
NOVEMBER 12, 2016 AT 6:52 AM

Teacher I’d alone cannot be the Candidate key because there will be many entries for a
particular teacher as teacher can teach multiple subjects .And to fulfill criteria of
becoming candidate key there should be unique values.

Richard Kidd says

NOVEMBER 28, 2016 AT 7:51 PM

A candidate key should be able to UNIQUELY IDENTIFY a row in a table. In the case of
the teacher table, their are two rows in the table that can be identified with the
teacher_id 111. If we are given teacher_id 111, we cannot discern if we need the
record for subject ‘Maths’ or the record for subject ‘physics’. Therefore, teacher_id is
not sufficient to uniquely identify a row. Likewise, as there are two rows with the
teacher_id 111 and the teacher_age 38, these are also insufficient. The only minimal
combination of attributes that uniquely identify a given row is {teacher_id, subject}.

Reply
Tharun Kumar Sunku says
JULY 26, 2016 AT 5:48 AM

Superb explanation, Thank you for this valuable information

sandeep says
AUGUST 30, 2016 AT 11:49 AM

hi chaitanya,
you explained in a single table to partition into different tables so it is easy to understand
but my doubt is to how to partition those tables so please provide some information about
how to partition a table
And also one thing before using those keys it is better to briefly explain about the keys so it
is easy to understand

Seunfunmi says
OCTOBER 9, 2016 AT 12:22 PM
Very useful information. Thank you for this article. I read the textbook but did not
understand. Now I understand 1NF and 2NF. I’m still not fully clear with the 3NF and the
BCNF though. Pls anyone with more detailed information?

deepesh chaudhari says

OCTOBER 29, 2016 AT 7:14 PM

best notes of dbms forever……love it

Keynan says
NOVEMBER 24, 2016 AT 7:16 AM

Hi
Very good explanation.
I have one question: dosen’t the example you gave on the BCNF(before the BCNF solution)
also break the second rule? because non prime attributes depends on only subset of the
candidate key? for an example: the dept_type and dept_no_of_emp are only depended on a
subset of the candidate key which is emp_dept
Thanks

Reply
amit says
APRIL 22, 2017 AT 8:29 PM

In first table they are dependent, that is the violation of the 3NF. That’s why we
decomposed the table and in second table Emp_dept is super key or candidate key not
a subset of candidate key
just like foreign key concept

Anugya says
DECEMBER 31, 2016 AT 10:19 AM

thnku for the making me understand the concept of normalization.

Rajiv Rai says

JANUARY 17, 2017 AT 5:23 AM

Isn’t the attribute emp_zip also a candidate key(3NF example)? If yes then wouldnt it
violate the 3NF rule in the next table?
Reply

PuddiMan says
FEBRUARY 8, 2017 AT 12:47 PM

I don’t understand the example in BCNF. There are 2 primary keys, emp_id and emp_dept.
This violates 2NF rules, emp_nationality can be determined by only emp_id. So in the first
place, it is not in 2nf, why proceed to bcnf process?

Someone care to explain/correct me please(if i’m wrong)

Ninja says
FEBRUARY 12, 2017 AT 9:08 AM

Thanks a lot … 2morrow is my exam and this post really helped me.. Thanks a lot….

Yegon francis says

APRIL 24, 2017 AT 9:15 AM
Is it allowed to use two primary keys in a relationship table?

rahul says
JULY 12, 2017 AT 10:12 AM

you should more explain ,candidate key ,and super key.

it is very difficult to find

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Denormalization in DBMS
LAST UPDATED: AUGUST 25, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Denormalization is a process of adding redundant data to normalized tables in order to avoid

unnecessary join operations. This improves the performance of read operations as there is no need to
join multiple tables, however this requires extra storage space for redundant data, also it can cause
data inconsistencies in database, if the redundant data is updated frequently.

Note:
1. Denormalization is not a reverse of normalization in DBMS.
2. Denormalization cannot be used in any scenario (we discussed this in detail in this article after the
following example).

Denormalization Example
There are two tables Department and Employee, where Department table contains the data for
department id represented by Dept_Id, name (attribute name: Dept_Name), employee id (attribute
name Emp_Id). The Employee table contains fields such as employee id, name, age.

Department Table

Dept_Id Dept_Name Emp_Id

D01 Sales E101
D02 Marketing E102
D03 Retail E102
D04 IT E103
D05 HR E104

Employee Table

Emp_Id Emp_Name Emp_Age

E101 Ram 29
E102 Shyam 28
E103 Veer 30
E104 Mohan 27

Now every time when we need to access the department information along with the employee details
such as employee name, we need to join these two tables. One way of avoiding the unnecessary join
operation is to denormalise the Department table like this:

Department table:

Dept_Id Dept_Name Emp_Id Emp_Name

D01 Sales E101 Ram
D02 Marketing E102 Shyam
D03 Retail E102 Shyam
D04 IT E103 Veer
D05 HR E104 Mohan

Employee Table:

Emp_Id Emp_Name Emp_Age

E101 Ram 29
E102 Shyam 28
E103 Veer 30
E104 Mohan 27

After this denormalization, whenever we need to get the department data along with the employee
name, we do not need to join these tables as the Employee details are already present in the
Department table. This way, we avoided the join operations but we had to store the extra data in the
database. Along with that

When you should use Denormalization?

As discussed in the beginning, denormalization can cause data inconsistencies in database so we
must be very careful when using this process. Let’s point out the cases where you can use
denormalization safely:

1. When the redundant data doesn’t require to be updated frequently or doesn’t update at all. In our
example above, the redundant data is employee name and name doesn’t change frequently, thus it is
an ideal case where the denormalization can be safely used.

2. When there is a need to join multiple tables frequently in order to get meaningful data. In this case,
denormalization can significantly boost the performance of read operations at the cost of extra
storage space in the database.

Advantages of Denormalization
1. Read Operations are faster as table joins are not required for most of the queries.
2. Write query is easy to write to perform read, write, update operations on database.

Disadvantages of Denormalization
1. Requires more storage as redundant data needs to be written in the tables.
2. Data write operations are slower due to redundant data.
3. Data inconsistencies are present due to redundant data.
4. It requires extra effort to update the database. This is because when redundant data is present, it is
important to update the data in all the places else data inconsistencies may arise.

Difference between Denormalization and

Normalization
LAST UPDATED: AUGUST 26, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this guide, you will learn the difference between Denormalization and Normalization.

What is Denormalization
Denormalization is a process of adding redundant data to tables in order to get faster response time
for read operations. However this better performance comes with a cost of storing redundant data
that occupies additional storage in the database.

Denormalization is covered in detail with examples here.

What is Normalization
Normalization is a process of breaking the table into multiple tables in such a way so that the
redundant data is reduced. This removes data inconsistencies and helps maintaining DBMS ACID
properties.

Normalization is covered in detail with examples here.

Denormalization vs Normalization

DENORMALIZATION NORMALIZATION

Data access (or read) is slower as the join

It provides faster data access as costly time
operations are required when accessing data
intensive join operations are not required.
from multiple tables.

SQL queries are easy to write as it involves less SQL queries are complex as they usually
tables. involve multiple tables.

Redundant data is present. No redundant data exists.

Data inconsistencies are present as same data

No data inconsistencies as normalization
is available at more than one tables due to data
removes data redundancy.
redundancy.

Data write operations are slower due to

Data write operations are faster.
redundant data.

Requires more storage. Requires less storage.

❮ DBMS Tutorials Index

About the Author

– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…

Decomposition in DBMS – Lossless and Lossy with

examples
LAST UPDATED: AUGUST 22, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Decomposition is a process of dividing a relation into multiple relations to remove redundancy while
maintaining the original data. In this guide, you will learn decomposition in DBMS with the help of
examples.

Types of decomposition:
1. Lossless decomposition
2. Lossy decomposition

1. Lossless decomposition
A lossless decomposition of a relation ensures that:

a) No information is lost during decomposition. This is why the term lossless is used in this
decomposition as no information is lost.

b) If a relation R is divided into two relations R1 and R2 using lossless decomposition then the natural
join of R1 and R2 would return the original relation R.
Rules of Lossless decomposition: For these rules, we are assuming that a relation R is divided into
two relations R1 and R2.

1. Natural join of R1 and R2 should return the original relation R.

R1 U R2 = R

2. The intersection of R1 and R2 should not be null. This is because there are some common
attributes present in relation R1 and R2.

R1 ∩ R2 ≠ 0

3. The intersection of R1 and R2 is either a super key of R1 or R2, or both the relations R1 and R2.

R1 ∩ R2 = super key of R1 or R2 or both

Let’s say a relation R (A, B, C), where A is primary key is divided into two relations R1 (A, B) and R2 (C,
A).

Let’s check whether this decomposition is loss-less decomposition or not:

Rule 1:
R1 U R2 = (A, B) U (C, A) = (A, B, C)
Union or R1 and R2 gives the original relations, thus first rule of lossless decomposition applies here.

Rule 2:
R1 ∩ R2 = (A, B) ∩ (C, A) = (A)
Result is not null so the second rule also applies here.
Rule 3:
R1 ∩ R2 = (A, B) ∩ (C, A) = (A)
Result is a super key of both the relations thus third rule also applies here.

Rule 4: Dependency preserving

The dependencies that exists in the original relation, exists after decomposition.

Example of LossLess decomposition

StudentCourse Table:

Student_Id Student_Name Course_Id Course_Detail

---------- ------------- --------- -------------
S101 Chaitanya C01 Maths
S102 Ajeet C01 Maths
S103 Rahul C02 Science
S104 Steve C02 Science
S105 John C03 English
S101 Chaitanya C03 English
S102 Ajeet C02 Science

The primary key of given relation is {Student_Id, Course_Id}

This table has redundant data as the Course_Id and Course_Detail are common for several students.
Let’s decompose this relation into two relations.

Student Table:
The primary key of this table is {Student_Id, Course_Id}
Student_Id Student_Name Course_Id
---------- ------------ ---------
S101 Chaitanya C01
S102 Ajeet C01
S103 Rahul C02
S104 Steve C02
S105 John C03
S101 Chaitanya C03
S102 Ajeet C02

Course Table:
The primary key of this table is {Course_Id}

Course_Id Course_Detail
--------- -------------
C01 Maths
C02 Science
C03 English

Let’s check all the three rules of lossless decomposition to check whether this decomposition is
lossless or not.
Rule 1:

{Student} U {Course}

Union Result:

Student_Id Student_Name Course_Id Course_Detail

The union results in the original relation StudentCourse so we can say that the first rule holds true.

Rule 2 & 3:

R1 ∩ R2

Result:

Course_Id
C01
C02
C03

The result is not null so rule 2 holds true.

The result is a super key of the second relation R2 so the third rule also applies here.

Rule 4: Dependencies in original relation:

Student_Id -> {Student_Name}

Course_Id -> {Course_Detail}
These dependencies are still present in the decomposed relations. Thus we can say that this
decomposition is dependency preserving.

Since all the three rules applies here, the decomposition of relation StudentCourse into Student and
Course is a lossless decomposition.

2. Lossy Decomposition
As the name suggests, in lossy decomposition, the information is lost during decomposition. The
three rules that we discussed above would not apply in lossy decomposition. In lossy decomposition,
one or more rules will fail.

Let’s take the same example that we discussed above.

StudentCourse Table:

Student_Id Student_Name Course_Id Course_Detail

S101 Chaitanya C01 Maths
S102 Ajeet C01 Maths
S103 Rahul C02 Science
S104 Steve C02 Science
S105 John C03 English
S101 Chaitanya C03 English
S102 Ajeet C02 Science

Now if we divide this relation like this:

Student Table:
The primary key of this table is {Student_Id}

Student_Id Student_Name
S101 Chaitanya
S102 Ajeet
S103 Rahul
S104 Steve
S105 John

Course Table:
The primary key of this table is {Course_Id}

Course_Id Course_Detail
C01 Maths
C02 Science
C03 English

This is a lossy decomposition as the intersection of Student and Course relation will return null so the
second and third rule of lossless decomposition will fail here.

In this decomposition, the relation of Student and Course is lost, there is no way to form the original
relation from these two relations as the information that suggests who is attending which course is
lost during decomposition.

❮ DBMS Tutorial

Transaction Management in DBMS

LAST UPDATED: DECEMBER 11, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

A transaction is a set of logically related operations. For example, you are transferring money from
your bank account to your friend’s account, the set of operations would be like this:

Simple Transaction Example

1. Read your account balance
2. Deduct the amount from your balance
3. Write the remaining balance to your account
4. Read your friend’s account balance
5. Add the amount to his account balance
6. Write the new updated balance to his account

This whole set of operations can be called a transaction. Although I have shown you read, write and
update operations in the above example but the transaction can have operations like read, write,
insert, update, delete.

In DBMS, we write the above 6 steps transaction like this:

Lets say your account is A and your friend’s account is B, you are transferring 10000 from A to B, the
steps of the transaction are:
1. R(A);
2. A = A - 10000;
3. W(A);
4. R(B);
5. B = B + 10000;
6. W(B);

In the above transaction R refers to the Read operation and W refers to the write operation.

Transaction failure in between the operations

Now that we understand what is transaction, we should understand what are the problems associated
with it.

The main problem that can happen during a transaction is that the transaction can fail before finishing
the all the operations in the set. This can happen due to power failure, system crash etc. This is a
serious problem that can leave database in an inconsistent state. Assume that transaction fail after
third operation (see the example above) then the amount would be deducted from your account but
your friend will not receive it.

To solve this problem, we have the following two operations

Commit: If all the operations in a transaction are completed successfully then commit those changes
to the database permanently.

Rollback: If any of the operation fails then rollback all the changes done by previous operations.

Even though these operations can help us avoiding several issues that may arise during transaction
but they are not sufficient when two transactions are running concurrently. To handle those problems
we need to understand database ACID properties.

❮ Previous Next ❯

About the Author

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

ACID properties in DBMS

LAST UPDATED: AUGUST 19, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

To ensure the integrity and consistency of data during a transaction (A transaction is a unit of
program that updates various data items, read more about it here), the database system maintains
four properties. These properties are widely known as ACID properties.
Atomicity
This property ensures that either all the operations of a transaction reflect in database or none. The
logic here is simple, transaction is a single unit, it can’t execute partially. Either it executes completely
or it doesn’t, there shouldn’t be a partial execution.

Let’s take an example of banking system to understand this:

Suppose Account A has a balance of 400$ & B has 700$. Account A is transferring 100$ to Account B.
This is a transaction that has two operations
a) Debiting 100$ from A’s balance
b) Creating 100$ to B’s balance.

Let’s say first operation passed successfully while second failed, in this case A’s balance would be
300$ while B would be having 700$ instead of 800$. This is unacceptable in a banking system. Either
the transaction should fail without executing any of the operation or it should process both the
operations. The Atomicity property ensures that.

There are two key operations are involved in a transaction to maintain the atomicity of the
transaction.

Abort: If there is a failure in the transaction, abort the execution and rollback the changes made by the
transaction.

Commit: If transaction executes successfully, commit the changes to the database.

Consistency
Database must be in consistent state before and after the execution of the transaction. This ensures
that there are no errors in the database at any point of time. Application programmer is responsible for
maintaining the consistency of the database.

Example:
A transferring 1000 dollars to B. A’s initial balance is 2000 and B’s initial balance is 5000.

Before the transaction:

Total of A+B = 2000 + 5000 = 7000$
After the transaction:
Total of A+B = 1000 + 6000 = 7000$

The data is consitendct before and after the execution of the transaction so this example maintains
the consistency property of the database.

Isolation
A transaction shouldn’t interfere with the execution of another transaction. To preserve the
consistency of database, the execution of transaction should take place in isolation (that means no
other transaction should run concurrently when there is a transaction already running).

For example account A is having a balance of 400$ and it is transferring 100$ to account B & C both.
So we have two transactions here. Let’s say these transactions run concurrently and both the
transactions read 400$ balance, in that case the final balance of A would be 300$ instead of 200$.
This is wrong.

If the transaction were to run in isolation then the second transaction would have read the correct
balance 300$ (before debiting 100$) once the first transaction went successful.

Durability
Once a transaction completes successfully, the changes it has made into the database should be
permanent even if there is a system failure. The recovery-management component of database
systems ensures the durability of transaction.

ACID properties are the backbone of a database management system. These properties ensure that
even though there are multiple transaction reading and writing the data in the database, the data is
always correct and consistent. Without ACID properties there is no point in managing the data as it
can’t be trusted a used in a transaction.

❮ Previous Next ❯

Home Java C C++ DBMS Computer Network Python More…

DBMS Transaction States

LAST UPDATED: DECEMBER 14, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this guide, we will discuss the states of a transaction in DBMS. A transaction in DBMS can be in one
of the following states.

DBMS Transaction States Diagram

Lets discuss these states one by one.

Active State
As we have discussed in the DBMS transaction introduction that a transaction is a sequence of
operations. If a transaction is in execution then it is said to be in active state. It doesn’t matter which
step is in execution, until unless the transaction is executing, it remains in active state.

Failed State
If a transaction is executing and a failure occurs, either a hardware failure or a software failure then
the transaction goes into failed state from the active state.
Partially Committed State
As we can see in the above diagram that a transaction goes into “partially committed” state from the
active state when there are read and write operations present in the transaction.

A transaction contains number of read and write operations. Once the whole transaction is
successfully executed, the transaction goes into partially committed state where we have all the read
and write operations performed on the main memory (local memory) instead of the actual database.

The reason why we have this state is because a transaction can fail during execution so if we are
making the changes in the actual database instead of local memory, database may be left in an
inconsistent state in case of any failure. This state helps us to rollback the changes made to the
database in case of a failure during execution.

Committed State
If a transaction completes the execution successfully then all the changes made in the local memory
during partially committed state are permanently stored in the database. You can also see in the
above diagram that a transaction goes from partially committed state to committed state when
everything is successful.

Aborted State
As we have seen above, if a transaction fails during execution then the transaction goes into a failed
state. The changes made into the local memory (or buffer) are rolled back to the previous consistent
state and the transaction goes into aborted state from the failed state. Refer the diagram to see the
interaction between failed and aborted state.

❮ Previous Next ❯
Top Related Articles:
1. Transaction Management in DBMS
2. Cardinality in DBMS
3. Types of DBMS (Database Management System)
4. Failure Classification in DBMS
5. Log-Based Recovery in DBMS

About the Author

– Chaitanya
Comments

Harsha Chowdary says

APRIL 21, 2020 AT 10:18 AM

I have a doubt in partially commited state you told that this state help us to rollback the
changes made to the database in case of a failure during Execution,but actually the
changes should be made to the data stored in main memory right,so that in case if any
failure occurs ,it rollbacks(acquires) the previous value from the database

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

DBMS Schedules and the Types of Schedules

LAST UPDATED: DECEMBER 14, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

We know that transactions are set of instructions and these instructions perform operations on
database. When multiple transactions are running concurrently then there needs to be a sequence in
which the operations are performed because at a time only one operation can be performed on the
database. This sequence of operations is known as Schedule.

Lets take an example to understand what is a schedule in DBMS.

DBMS Schedule example

The following sequence of operations is a schedule. Here we have two transactions T1 & T2 which are
running concurrently.

This schedule determines the exact order of operations that are going to be performed on database.
In this example, all the instructions of transaction T1 are executed before the instructions of
transaction T2, however this is not always necessary and we can have various types of schedules
which we will discuss in this article.

T1 T2
---- ----
R(X)
W(X)
R(Y)
R(Y)
R(X)
W(Y)

Types of Schedules in DBMS

We have various types of schedules in DBMS. Lets discuss them one by one.

Serial Schedule
In Serial schedule, a transaction is executed completely before starting the execution of another
transaction. In other words, you can say that in serial schedule, a transaction does not start execution
until the currently running transaction finished execution. This type of execution of transaction is also
known as non-interleaved execution. The example we have seen above is the serial schedule.

Lets take another example.

Serial Schedule example

Here R refers to the read operation and W refers to the write operation. In this example, the transaction
T2 does not start execution until the transaction T1 is finished.

T1 T2
---- ----
R(A)
R(B)
W(A)
commit
R(B)
R(A)
W(B)
commit

Strict Schedule
In Strict schedule, if the write operation of a transaction precedes a conflicting operation (Read or
Write operation) of another transaction then the commit or abort operation of such transaction should
also precede the conflicting operation of other transaction.

Lets take an example.

Strict Schedule example

Lets say we have two transactions Ta and Tb. The write operation of transaction Ta precedes the read
or write operation of transaction Tb, so the commit or abort operation of transaction Ta should also
precede the read or write of Tb.

Ta Tb
----- -----
R(X)
R(X)
W(X)
commit
W(X)
R(X)
commit

Here the write operation W(X) of Ta precedes the conflicting operation (Read or Write operation) of Tb
so the conflicting operation of Tb had to wait the commit operation of Ta.

Cascadeless Schedule
In Cascadeless Schedule, if a transaction is going to perform read operation on a value, it has to wait
until the transaction who is performing write on that value commits.

Cascadeless Schedule example

For example, lets say we have two transactions Ta and Tb. Tb is going to read the value X after the
W(X) of Ta then Tb has to wait for the commit operation of transaction Ta before it reads the X.

Ta Tb
----- -----
R(X)
W(X)
W(X)
commit
R(X)
W(X)
commit

Recoverable Schedule
In Recoverable schedule, if a transaction is reading a value which has been updated by some other
transaction then this transaction can commit only after the commit of other transaction which is
updating value.

Recoverable Schedule example

Here Tb is performing read operation on X after the Ta has made changes in X using W(X) so Tb can
only commit after the commit operation of Ta.

Ta Tb
----- -----
R(X)
W(X)
R(X)
W(X)
R(X)
commit
commit

❮ Previous Next ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *

Name *

Email *

DBMS Serializability
LAST UPDATED: DECEMBER 20, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

When multiple transactions are running concurrently then there is a possibility that the database may
be left in an inconsistent state. Serializability is a concept that helps us to check which schedules are
serializable. A serializable schedule is the one that always leaves the database in consistent state.

What is a serializable schedule?

A serializable schedule always leaves the database in consistent state. A serial schedule is always a
serializable schedule because in serial schedule, a transaction only starts when the other transaction
finished execution. However a non-serial schedule needs to be checked for Serializability.

A non-serial schedule of n number of transactions is said to be serializable schedule, if it is equivalent

to the serial schedule of those n transactions. A serial schedule doesn’t allow concurrency, only one
transaction executes at a time and the other starts when the already running transaction finished.

Types of Serializability
There are two types of Serializability.

1. Conflict Serializability
2. View Serializability
❮ Previous Next ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

DBMS Conflict Serializability

LAST UPDATED: DECEMBER 20, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS

In the DBMS Schedules guide, we learned that there are two types of schedules – Serial & Non-Serial.
A Serial schedule doesn’t support concurrent execution of transactions while a non-serial schedule
supports concurrency. We also learned in Serializability tutorial that a non-serial schedule may leave
the database in inconsistent state so we need to check these non-serial schedules for the
Serializability.

Conflict Serializability is one of the type of Serializability, which can be used to check whether a non-
serial schedule is conflict serializable or not.

What is Conflict Serializability?

A schedule is called conflict serializable if we can convert it into a serial schedule after swapping its
non-conflicting operations.

Conflicting operations
Two operations are said to be in conflict, if they satisfy all the following three conditions:

1. Both the operations should belong to different transactions.

2. Both the operations are working on same data item.
3. At least one of the operation is a write operation.

Lets see some examples to understand this:

Example 1: Operation W(X) of transaction T1 and operation R(X) of transaction T2 are conflicting
operations, because they satisfy all the three conditions mentioned above. They belong to different
transactions, they are working on same data item X, one of the operation in write operation.

Example 2: Similarly Operations W(X) of T1 and W(X) of T2 are conflicting operations.

Example 3: Operations W(X) of T1 and W(Y) of T2 are non-conflicting operations because both the
write operations are not working on same data item so these operations don’t satisfy the second
condition.

Example 4: Similarly R(X) of T1 and R(X) of T2 are non-conflicting operations because none of them is
write operation.

Example 5: Similarly W(X) of T1 and R(X) of T1 are non-conflicting operations because both the
operations belong to same transaction T1.

Conflict Equivalent Schedules

Two schedules are said to be conflict Equivalent if one schedule can be converted into other schedule
after swapping non-conflicting operations.

Conflict Serializable check

Lets check whether a schedule is conflict serializable or not. If a schedule is conflict Equivalent to its
serial schedule then it is called Conflict Serializable schedule. Lets take few examples of schedules.
Example of Conflict Serializability
Lets consider this schedule:

T1 T2
----- ------
R(A)
R(B)
R(A)
R(B)
W(B)
W(A)

To convert this schedule into a serial schedule we must have to swap the R(A) operation of
transaction T2 with the W(A) operation of transaction T1. However we cannot swap these two
operations because they are conflicting operations, thus we can say that this given schedule is not
Conflict Serializable.

Lets take another example:

T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)

Lets swap non-conflicting operations:

After swapping R(A) of T1 and R(A) of T2 we get:

T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)

After swapping R(A) of T1 and R(B) of T2 we get:

T1 T2
----- ------
R(A)
R(B)
R(A)
W(B)
R(B)
W(A)

After swapping R(A) of T1 and W(B) of T2 we get:

T1 T2
----- ------
R(A)
R(B)
W(B)
R(A)
R(B)
W(A)

We finally got a serial schedule after swapping all the non-conflicting operations so we can say that
the given schedule is Conflict Serializable.

❮ Previous Next ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *
POST COMMENT

Home Java C C++ DBMS Computer Network Python More…

DBMS View Serializability

LAST UPDATED: JULY 4, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In the last tutorial, we learned Conflict Serializability. In this article, we will discuss another type of
serializability which is known as View Serializability.

What is View Serializability?

View Serializability is a process to find out that a given schedule is view serializable or not.

To check whether a given schedule is view serializable, we need to check whether the given schedule
is View Equivalent to its serial schedule. Lets take an example to understand what I mean by that.

Given Schedule:

T1 T2
----- ------
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
R(Y)
W(Y)

Serial Schedule of the above given schedule:

As we know that in Serial schedule a transaction only starts when the current running transaction is
finished. So the serial schedule of the above given schedule would look like this:

T1 T2
----- ------
R(X)
W(X)
R(Y)
W(Y)
R(X)
W(X)
R(Y)
W(Y)

If we can prove that the given schedule is View Equivalent to its serial schedule then the given
schedule is called view Serializable.

Why we need View Serializability?

We know that a serial schedule never leaves the database in inconsistent state because there are no
concurrent transactions execution. However a non-serial schedule can leave the database in
inconsistent state because there are multiple transactions running concurrently. By checking that a
given non-serial schedule is view serializable, we make sure that it is a consistent schedule.

You may be wondering instead of checking that a non-serial schedule is serializable or not, can’t we
have serial schedule all the time? The answer is no, because concurrent execution of transactions
fully utilize the system resources and are considerably faster compared to serial schedules.

View Equivalent
Lets learn how to check whether the two schedules are view equivalent.

Two schedules T1 and T2 are said to be view equivalent, if they satisfy all the following conditions:

1. Initial Read: Initial read of each data item in transactions must match in both schedules. For
example, if transaction T1 reads a data item X before transaction T2 in schedule S1 then in schedule
S2, T1 should read X before T2.

Read vs Initial Read: You may be confused by the term initial read. Here initial read means the first
read operation on a data item, for example, a data item X can be read multiple times in a schedule but
the first read operation on X is called the initial read. This will be more clear once we will get to the
example in the next section of this same article.

2. Final Write: Final write operations on each data item must match in both the schedules. For
example, a data item X is last written by Transaction T1 in schedule S1 then in S2, the last write
operation on X should be performed by the transaction T1.

3. Update Read: If in schedule S1, the transaction T1 is reading a data item updated by T2 then in
schedule S2, T1 should read the value after the write operation of T2 on same data item. For example,
In schedule S1, T1 performs a read operation on X after the write operation on X by T2 then in S2, T1
should read the X after T2 performs write on X.

View Serializable
If a schedule is view equivalent to its serial schedule then the given schedule is said to be View
Serializable. Lets take an example.

View Serializable Example

Lets check the three conditions of view serializability:

Initial Read
In schedule S1, transaction T1 first reads the data item X. In S2 also transaction T1 first reads the data
item X.

Lets check for Y. In schedule S1, transaction T1 first reads the data item Y. In S2 also the first read
operation on Y is performed by T1.

We checked for both data items X & Y and the initial read condition is satisfied in S1 & S2.

Final Write
In schedule S1, the final write operation on X is done by transaction T2. In S2 also transaction T2
performs the final write on X.

Lets check for Y. In schedule S1, the final write operation on Y is done by transaction T2. In schedule
S2, final write on Y is done by T2.

We checked for both data items X & Y and the final write condition is satisfied in S1 & S2.

Update Read
In S1, transaction T2 reads the value of X, written by T1. In S2, the same transaction T2 reads the X
after it is written by T1.

In S1, transaction T2 reads the value of Y, written by T1. In S2, the same transaction T2 reads the value
of Y after it is updated by T1.

The update read condition is also satisfied for both the schedules.

Result: Since all the three conditions that checks whether the two schedules are view equivalent are
satisfied in this example, which means S1 and S2 are view equivalent. Also, as we know that the
schedule S2 is the serial schedule of S1, thus we can say that the schedule S1 is view serializable
schedule.

❮ Conflict Serializability Recoverability of Schedule ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *

Name *

Email *

Recoverability of Schedule in DBMS

LAST UPDATED: JULY 4, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this guide, you will learn a very important concept in DBMS: Recoverability of Schedule. There are
times when few transactions in a schedule fail, due to a software or hardware issue. In that case, it
becomes important to rollback these failed transactions along with those successful transactions that
have used the value updated by failed transactions.

What is an Irrecoverable Schedule?

A schedule that cannot be rolled back because some transactions already used COMMIT to make the
changes permanent in database and these transactions have used values produced by the failed
transactions. These types of schedules are called irrecoverable schedules.

For example: In this example, we have a schedule that contains two transactions T1 and T2.
Transaction T1 reads X and make changes in the value of X and then writes the updated value of X.

This updated value of X is read by transaction T2, which then did some change in X and finally write
the value of X and used COMMIT statement to make the changes permanent.

After the changes are made permanent by transaction T2, the transaction T1 failed and it had to be
rolled back but the problem here is that T2 has already used commit statement so it cannot be rolled
back. This is why this schedule is irrecoverable because it cannot be successfully rolled back even
after the failure of one of the transaction.

T1 T2
---- ----
Read(X)
X = + 20
Write (X)
Read(X)
X = X + 100
Write(X)
Commit

Failed!
Rollback

What is a Recoverable Schedule?

A schedule that can be successfully rolled back in case of any failure is known as recoverable
Schedule.

For example: Let’s take the same example that we have seen above with some modifications. Here we
have moved the commit statement in transaction T2 after the commit statement in transaction T1.

T1 T2
---- ----
Read(X)
X = + 20
Write (X)
Read(X)
X = X + 100
Write(X)
Commit
Commit

Now let’s consider some cases of failure to understand whether this schedule can be successfully
rolled back.

Case 1: When T1 fails just before the commit statement. In this case both the transactions can be
rolled back as none of the transactions used COMMIT statement before the failure point in schedule.

T1 T2
---- ----
Read(X)
X = + 20
Write (X)
Read(X)
X = X + 100
Write(X)
Failed!
Commit
Commit

Case 2: Let’s say T2 failed after the commit statement in T1. This is also recoverable as the T2 can be
rolled back and T1 didn’t read value of X after write(X) in T1 so no bad read operation here, so no need
to rollback the T1 in this case.

T1 T2
---- ----
Read(X)
X = + 20
Write (X)
Read(X)
X = X + 100
Write(X)
Commit
Failed!
Commit

You can also try to put failure points in some places in this schedule other than the above two cases,
you will find that the schedule is recoverable.

❮ View Serializability Failure Classification ❯

Failure Classification in DBMS

LAST UPDATED: JULY 4, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In DBMS there are several transactions running in a specified schedule. However sometimes these
transactions fail due to several reasons. In previous tutorial, we learned how to identify a recoverable
schedule. In this guide, we will discuss the types of failures that can occur in DBMS.

Failures in DBMS are classified as follows:

1. Transaction failure
2. Underlying System crash
3. Data transfer fail

1. Transaction Failure
A transaction is a set of statements, if a transaction fails it means there is a statement in the
transaction which is not able to execute. This can happen due to various reasons such as:

Logical Error: If the logic used in the statement itself is wrong, it can be fail.

System Error: When the transaction is executing but due to a fault in system, the transaction fails
abruptly. For example: Deadlock condition in transaction can result in System error.
2. Underlying System Crash
The system on which the transactions are running can crash and that can result in failure of currently
running transactions.

System can crash due to various reasons such as:

Power supply disruptions

Software issues such as Operating system issues
Hardware issues

3. Hard-disk fail
Hard-disk fail can also cause transaction failure. When transactions are reading and writing data into
the disk, the failure in an underlying disk can cause failure of currently running transaction. This is
because transactions are unable to read and write data in disks due to disk not working properly. This
can result in loss of data as well.

There can be several reasons of a disk failure such as: formation of bad sectors in disk, corruption of
disk, viruses, not enough resources available on disk.

❮ Recoverability Of Schedule Log based recovery ❯

Log-Based Recovery in DBMS

LAST UPDATED: JULY 4, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In the previous chapters, you learned how to identify a recoverable schedule and what kind of failures
can occur in DBMS. In this chapter, you will learn how to recover a failed transaction using Log-based
recovery in DBMS. When a transaction fails, it is important to rollback the transaction so that changes
made by failed transaction doesn’t store in the database, this is important to maintain the integrity of
database.

What is log-based recovery in DBMS?

1. As the name suggests, log is a sequence of records that is maintained in a stable storage
devices to note down all the changes made by transactions in a sequential manner. This log is
used to recover the transaction in case of failure.
2. Any operation performed by transaction on database is recorded in the log.
3. It is important to record the log before the actual operation performed on the database, this
make sure that if an operation fail, it is already recorded in the log.

How the logs are maintained?

Let’s take an example to understand the log-based recovery in DBMS:
A transaction T1 is modifying the Department of an employee, for this operation, the following log is
maintained:

Log entry to mark the start of the transaction:

<T1, Start>

Just before the transaction modifies the department of the employee from “Sales” To “Marketing”, the
following log is maintained:

<T1, Department, 'Sales', 'Marketing' >

Log entry to mark the successful end of the transaction:

<T1, Commit>

Logs for different database modification approaches

There are two database modification approaches used by the transactions. Here we will learn how the
logs are maintained for each approach:

1. Deferred Database Modification

In this approach, the transaction does not commit the changes the database, until it is completed
successfully.

In this approach, all the logs are created at once and stored in the database.

2. Immediate Database Modification

In this approach, the transaction make change immediately after an operation is performed by the
transaction.

In this approach, logs are recorded just before the transaction is going to perform an operation in
database.

Recovery using Log Records

In case of a transaction failure, the log is referenced to recover the transaction and rollback or redone
all the changes done by the transaction.

If the log contains the entry <Tn, Start> and <Tn, Commit> or <Tn, Start> and <Tn, Abort>
then the transaction Tn needs to be redone based on the log entries for each operation recorded
in the log.
If the log contains the entry <Tn, Start> but doesn’t contain an entry for <Tn, Commit> or <Tn,
Abort> then the transaction needs to be rolled back.

❮ Failure Classification Checkpoint in DBMS ❯

Recovery using Checkpoint

Let’s understand how to recover a failed transaction using checkpoint.

The recovery system reads the log file in reverse (from end to start).
Recovery system maintains two files: one is redo-list file and second is undo-list file. One or both
of these files are used to recover a failed transaction.
If the recovery system finds a log entry with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list. This is because a commit statement represents
that some of transactions in this schedule are made permanent using commit statement, so it
becomes important to redone the failed transactions.
If the recovery system finds a log entry with <Tn, Start> but no entry with <Tn, commit> or <Tn,
Abort> , it puts the transaction in undo-list. This is because no transaction made the changes
permanent in the database as no commit statements found, in this case the transaction can be
rolled back by putting it in undo-list.

Example:
In the following diagram you can see a schedule with three transactions T1, T2 and T3. Since the log
entries are removed once a checkpoint is found, the entry <T1, Start> is not in the log as it is before
the checkpoint and the log is cleared at checkpoint. The entries that are there in the log are <T1,
Commit>, <T2, Start>, <T2, Commit> and <T3, Start>. The entry <T3, Commit> is not in the log
because the transaction is failed before that.

So based on the rules that we have seen above, T1 and T2 are put in redo-list as <T1, Commit> and
<T2, Commit> present in log file. Transaction T3 is put in undo-list as <T3, Start> is found but no entry
for <T3, Commit> or <T3, Abort>.

Checkpoint Implementation Considerations

Frequency: Checkpoint should be implemented frequently enough to ensure that the recovery is
smooth but you should be careful while doing this, if done so frequently then it can cause
significant performance overhead.
System Load: Checkpoint should be implemented in such a way that they occur during when
system load is not high. This can minimize performance impact.

❮ Log based Recovery Deadlock in DBMS ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *
POST COMMENT

Home Java C C++ DBMS Computer Network Python More…

Deadlock in DBMS
LAST UPDATED: JULY 4, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

A deadlock is a condition wherein two or more tasks are waiting for each other in order to be finished
but none of the task is willing to give up the resources that other task needs. In this situation no task
ever gets finished and is in waiting state forever.

Coffman conditions
Coffman stated four conditions for a deadlock occurrence. A deadlock may occur if all the following
conditions holds true.
Mutual exclusion condition: There must be at least one resource that cannot be used by more
than one process at a time.
Hold and wait condition: A process that is holding a resource can request for additional
resources that are being held by other processes in the system.
No preemption condition: A resource cannot be forcibly taken from a process. Only the process
can release a resource that is being held by it.
Circular wait condition: A condition where one process is waiting for a resource that is being
held by second process and second process is waiting for third process ….so on and the last
process is waiting for the first process. Thus making a circular chain of waiting.

Deadlock Handling
Ignore the deadlock (Ostrich algorithm)
Did that made you laugh? You may be wondering how ignoring a deadlock can come under deadlock
handling. But to let you know that the windows you are using on your PC, uses this approach of
deadlock handling and that is reason sometimes it hangs up and you have to reboot it to get it
working. Not only Windows but UNIX also uses this approach.

The question is why? Why instead of dealing with a deadlock they ignore it and why this is being
called as Ostrich algorithm?

Well! Let me answer the second question first, This is known as Ostrich algorithm because in this
approach we ignore the deadlock and pretends that it would never occur, just like Ostrich behavior “to
stick one’s head in the sand and pretend there is no problem.”

Let’s discuss why we ignore it: When it is believed that deadlocks are very rare and cost of deadlock
handling is higher, in that case ignoring is better solution than handling it. For example: Let’s take the
operating system example – If the time requires handling the deadlock is higher than the time requires
rebooting the windows then rebooting would be a preferred choice considering that deadlocks are
very rare in windows.

Deadlock detection
Resource scheduler is one that keeps the track of resources allocated to and requested by processes.
Thus, if there is a deadlock it is known to the resource scheduler. This is how a deadlock is detected.

Once a deadlock is detected it is being corrected by following methods:

Terminating processes involved in deadlock: Terminating all the processes involved in deadlock
or terminating process one by one until deadlock is resolved can be the solutions but both of
these approaches are not good. Terminating all processes cost high and partial work done by
processes gets lost. Terminating one by one takes lot of time because each time a process is
terminated, it needs to check whether the deadlock is resolved or not. Thus, the best approach is
considering process age and priority while terminating them during a deadlock condition.
Resource Preemption: Another approach can be the preemption of resources and allocation of
them to the other processes until the deadlock is resolved.

Deadlock prevention
We have learnt that if all the four Coffman conditions hold true then a deadlock occurs so preventing
one or more of them could prevent the deadlock.

Removing mutual exclusion: All resources must be sharable that means at a time more than one
processes can get a hold of the resources. That approach is practically impossible.
Removing hold and wait condition: This can be removed if the process acquires all the resources
that are needed before starting out. Another way to remove this to enforce a rule of requesting
resource when there are none in held by the process.
Preemption of resources: Preemption of resources from a process can result in rollback and
thus this needs to be avoided in order to maintain the consistency and stability of the system.
Avoid circular wait condition: This can be avoided if the resources are maintained in a hierarchy
and process can hold the resources in increasing order of precedence. This avoid circular wait.
Another way of doing this to force one resource per process rule – A process can request for a
resource once it releases the resource currently being held by it. This avoids the circular wait.

Deadlock Avoidance
Deadlock can be avoided if resources are allocated in such a way that it avoids the deadlock
occurrence. There are two algorithms for deadlock avoidance.

Wait/Die
Wound/Wait

Here is the table representation of resource allocation for each algorithm. Both of these algorithms
take process age into consideration while determining the best possible way of resource allocation for
deadlock avoidance.

Wait/Die Wound/Wait

Older process needs a resource held by younger

Older process waits Younger process dies
process

Younger process needs a resource held by older Younger process Younger process
process dies waits
Once of the famous deadlock avoidance algorithm is Banker’s algorithm

❮ CheckPoint in DBMS Concurrency Control ❯

About the Author

Comments

Fissseha Malele says

JANUARY 4, 2017 AT 12:11 PM

The tutorial you provide is so great.Thank you so much for that really!

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *
Name *

Email *

Starvation in DBMS
LAST UPDATED: AUGUST 19, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Starvation is a situation when one transaction keeps on waiting for another transaction to release the
lock. This is also called LiveLock. As we already learned in transaction management that a
transaction acquires lock before performing a write operation on data item, if the data item is already
locked by another transaction then the transaction waits for the lock to be released. In starvation
situation a transaction waits for another transaction for an infinite period of time.

Why Starvation occurs?

1. If the transactions are not having a priority set. Generally the older transaction is given higher
priority so that the transaction waiting for a longer period of time gets the lock sooner than the
transaction waiting for a short period of time. If the priorities are not set then a transaction can keep
on waiting while other transactions are continuously acquiring the lock on data item.

2. Resource leak: When a transaction does not release the lock after it has acquired the lock on a
particular data item.

3. Denial of service attack: A Denial-of-Service (DoS) attack is an attack that is meant to shut down a
machine or network, making it inaccessible to the users. DoS attack make the data item engaged so
that the transaction are not able to acquire the locks.
Starvation Example
Let’s say there are three transaction T1, T2 and T3 waiting to acquire lock on a data item ‘X’. System
grants a lock to the transaction T1, the other two transaction T2 and T3 are waiting for the lock to be
released.

Once the transaction T1 release the lock, the lock is granted to transaction T3, now transaction T2 is
waiting for the lock to be released.

While transaction T3 is performing an operation on ‘X’, a new transaction T4 enters into the system
and wait for the lock. The system grants the lock to T4. This way new transactions keep on entering
into the system and acquiring the lock on ‘X’ while the older transaction T2 keeps on waiting.

How to solve the starvation problem in DBMS?

1. Increase priority: One way of fixing the starvation issue is to grant higher priority to the older
transaction. This way the transaction that requested for the lock first will have higher priority than the
transaction that requested for the lock later.

The drawback to this solution is that a faulty transaction keeps on acquiring the lock and failing so it
never gets completed and remains there with the higher priority than other transactions, thus keeps on
getting the lock on a particular data item.

2. By changing the victim selection algorithm: In the above solution, we saw a drawback where a
victim transaction keeps on getting the lock. By lowering the priority of a victim transaction, we can fix
the drawback of above solution.

3. FCFS (First come first serve): In this approach, the transaction that entered into the system first,
gets the lock first. This way no transaction keeps on waiting.
4. Wait-die Scheme: If a transaction requests a lock on data item that is acquired by another
transaction then system checks for the timestamp and allow the older transaction to wait for the data
item.

5. Wound-wait Scheme: In this scheme, if older transaction requests for the lock which is held by
younger transaction then the system kills the younger transaction and grants the lock to older
transaction.

The killed younger transaction is restarted with a specific delay but with same timestamp, this make
sure that after some time when this transaction is old enough it can acquire the lock on particular data
item.

These both schemes can be represented in a tabular format like this:

SITUATION WAIT – DIE WOUND- WAIT

Older process needs a resource held by younger Younger process

Older process waits
process dies

Younger process needs a resource held by older Younger process Younger process
process dies waits

❮ Deadlock in DBMS Concurrency Control ❯

About the Author

Concurrency Control in DBMS

LAST UPDATED: JULY 5, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

When more than one transactions are running simultaneously there are chances of a conflict to occur
which can leave database to an inconsistent state. To handle these conflicts we need concurrency
control in DBMS, which allows transactions to run simultaneously but handles them in such a way so
that the integrity of data remains intact.

Concurrent Execution in DBMS

In a multi-user environment, multiple users are allowed to access data simultaneously. These
users can send requests at the same time, which means the DBMS is serving multiple requests
from different users simultaneously. This is called concurrent execution in DBMS.
In concurrent execution, there are several operations performed on the data items in database
concurrently. There can be multiple read requests, write requests or combination of both. Serving
a read request while another user is making change in the database can cause several issues
that we discussed in the next section of this article.
The goal of the concurrent execution is to allow multiple transactions to execute simultaneously
in such a way, that each executing transaction doesn’t affect the other transaction in any way.

Issues with the concurrent execution

Let’s take an example to understand what are the issues that can arise when transactions are
executing concurrently.

Conflict Example
You and your brother have a joint bank account, from which you both can withdraw money. Now let’s
say you both go to different branches of the same bank at the same time and try to withdraw 5000
INR, your joint account has only 6000 balance. Now if we don’t have concurrency control in place you
both can get 5000 INR at the same time but once both the transactions finish the account balance
would be -4000 which is not possible and leaves the database in inconsistent state.
We need something that controls the transactions in such a way that allows the transaction to run
concurrently but maintaining the consistency of data to avoid such issues.

Problems with the concurrent execution

There are two main operations in the database, read and write. If we somehow manage these two
operations on a data item in such a way, that the database always end up being in a consistent state,
then we can say that this is a perfect concurrency control scenario.

Problem 1: W-W Conflict – Lost Update Problem

This conflict occurs when two transactions perform a write operation on a same data item in
database in such a way that the database ends up in an inconsistent state. This problem is also
known as Write- Write conflict or W-W conflict or lost update problem.

Let’s take an example to understand this problem:

In this example, at time t1, the transaction T1 reads the value of data item A. Let’s say the value
of A is 1000.
At time t2, transaction T1 adds 100 to the data item A, the value of A becomes 1100. This is
value is yet to be updated in database as no write operation has been performed yet.
At time t3, transaction T2 read the value of data item A, the value of A in database is still 1000 so
transaction T2 reads the A value as 1000.
AT time t4, transaction T2 adds 200 to data item A, in transaction T2 the value of A becomes
1200 but this value is yet to be updated in database.
At time t5, transaction T1 writes the value of A in database, since the value of A is 1100
according to transaction T1, it updates the value of A in database to 1100.
At time t7, transaction T2 writes the value of A in database, since the value of A is 1200 in T2, it
updates the value of A to 1200 in database.

Lost Update: Initial value of A was 1000, if T1 is adding 100 and T2 is adding 200, the value of A in
database should be 1300 at the end of execution of both of these transactions. However as you can
see the value of A is 1200 in this case. This is because the update made by transaction T1 is lost.

Problem 2: W-R Conflict – Dirty Read Problem

This conflict occurs when a transaction make changes to a data item in the database and the
transaction fails after making the change. However before the failed transaction is rolled back,
another transaction reads the value updated by failed transaction. This is called Dirty Read in DBMS or
W-R (Write – Read) Conflict.

Let’s take an example to understand this conflict:

At time t1, transaction T1 reads the value of A, let’s say the value of A is 1000. T1 read A as 1000.
At time t2, T1 deducts 500 from A, the value of A becomes 500.
At time t3, T1 make the changes in database by writing the value of A. The database value of A
gets updated from 1000 to 500.
At time t4, another transaction T2 reads the value of A which is 500 now in database.
At time t5, T2 adds 100 and value of A is now 600 but in database it is still 500
At time t6, Transaction T1 fails and T2 writes the value of A in database, value of A gets updated
to 600 in database.
At time t7, transaction T1 is rolled back because it is failed in previous step.
Dirty Read: Since T1 failed and rolled back, the changes made by T1 should be reverted. T2 should
have read original value of A which is 1000 and at the end of T2 the value of A in database should be
1100. However as we have seen above, the value of A in database is 600. This is because T2 read an
updated value by failed transaction. This is called dirty read and it left the database in an inconsistent
state.

Problem 3: W-R Conflict – Non-Repeatable Read Problem

This conflict occurs when a transaction reads the different values for same data-item. This is also
known as inconsistent retrieval or Non-repeatable read problem.

Let’s take an example to understand this:

At time t1, the transaction T1 reads the value of A as 1000.
At time t6, the same transaction T1 reads reads the different value of A as 500.
This is because between time t1 and t6, an another transaction made the changes in the
database and updated the value of A from 1000 to 500.
This is an issue as the transaction reads two different values for same data item, thus it is called
non-repeatable read problem. This leaves the database in an inconsistent state.

Concurrency Control
Concurrency control is the technique that ensures that the the above three conflicts don’t occur in the
database. There are certain rules to avoid problems in concurrently running transactions and these
rules are defined as the concurrency control protocols.

Concurrency control protocols

Concurrency control protocols ensure that the database remain in a consistent state after the
execution of transactions. There are three concurrency control protocols:

1. Lock Based Concurrency Control Protocol

2. Time Stamp Concurrency Control Protocol
3. Validation Based Concurrency Control Protocol

❮ DBMS Deadlock Lock based Protocol ❯

Lock based Protocol in DBMS

LAST UPDATED: JULY 5, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

A lock is kind of a mechanism that ensures that the integrity of data is maintained. It does that, by
locking the data while a transaction is running, any transaction cannot read or write the data until it
acquires the appropriate lock. There are two types of a lock that can be placed while accessing the
data so that the concurrent transaction can not alter the data while we are processing it.

1. Shared Lock(S)
2. Exclusive Lock(X)

1. Shared Lock(S): Shared lock is placed when we are reading the data, multiple shared locks can be
placed on the data but when a shared lock is placed no exclusive lock can be placed.

To understand the lock mechanism let’s take an example of conflict:

You and your brother have a joint bank account, from which you both can withdraw money. Now let’s
say you both go to different branches of the same bank at the same time and try to withdraw 5000
INR, your joint account has only 6000 balance.

Now if we don’t have concurrency control in place you both can get 5000 INR at the same time but
once both the transactions finish the account balance would be -4000 which is not possible and
leaves the database in inconsistent state.

We need something that controls the transactions in such a way that allows the transaction to run
concurrently but maintaining the consistency of data to avoid such issues.

Solution of the above problem using Shared lock:

For example, when two transactions are reading Steve’s account balance, let them read by placing
shared lock but at the same time if another transaction wants to update the Steve’s account balance
by placing Exclusive lock, do not allow it until reading is finished.

2. Exclusive Lock(X): Exclusive lock is placed when we want to read and write the data. This lock
allows both the read and write operation, Once this lock is placed on the data no other lock (shared or
Exclusive) can be placed on the data until Exclusive lock is released.

For example, when a transaction wants to update the Steve’s account balance, let it do by placing X
lock on it but if a second transaction wants to read the data(S lock) don’t allow it, if another
transaction wants to write the data(X lock) don’t allow that either.

So based on this we can create a table like this:

Lock Compatibility Matrix

__________________________
| | S | X |
|-------------------------
| S | True | False |
|-------------------------
| X | False | False |
--------------------------

How to read this matrix?:

There are two rows, first row says that when S lock is placed, another S lock can be acquired so it is
marked true but no Exclusive locks can be acquired so marked False.
In second row, When X lock is acquired neither S nor X lock can be acquired so both marked false.

Types of Lock Protocols

1. Simplistic lock protocol
This protocol is simplest form of locking the data while a transaction is running. As per simplistic lock
protocol any transaction needs to acquire the lock on the data before performing any insert, update or
delete operation. The transaction releases the lock as soon as it is done performing the operation.
This prevents other transactions to read the data while its being updated.

2. Pre-claiming lock protocol

As the name suggests, this protocol checks the the transaction to see what all locks it requires
before it begins.
Before the transaction begins, it places the request to acquire all the locks on data items.
If all the locks are granted, the transaction begins execution and releases all the locks once it’s
done execution.
If all the locks are not granted this transaction waits until the required locks are granted.
3. Two Phase locking protocol (2PL)
In two phase locking protocol the locking and unlocking of data items is done in two phases.

Growing Phase: In this phase, the locks are acquired on the data items but none of the acquired locks
can be released in this phase.

Shrinking Phase: The existing locks can be released in this phase but no new locks can be acquired in
this phase.

Note: The point at which the transaction acquires final lock and the growing phase ends is called lock
point.
2 PL Example: Let’s take an example to understand how two phase locking protocol works: In the
following example there are two transaction T1 and T2 running concurrently.

Transaction T1: In this example, growing phase of T1 is from Step 1 to Step 5. Shrinking phase is
from Step 7 to Step 9. Lock point is at step 5.

Transaction T2: Growing phase of T2 is from Step 2 to Step 10. Shrinking phase is from Step 11 to
Step 13. Lock point is at step 10.

T1 T2
---- ----
Step 1 lock-S(A)
Step 2 .. lock-S(A)
Step 3 lock-S(B)
Step 4 ... lock-S(B)
Step 5 lock-X(C)
Step 6 ..
Step 7 Unlock(A)
Step 8 Unlock(B)
Step 9 Unlock(C)
Step 10 lock-S(C)
Step 11 Unblock(A)
Step 12 Unblock(B)
Step 13 Unblock(C)

4. Strict Two Phase Locking Protocol (Strict – 2PL)

It is somewhat similar to 2PL except that it doesn’t have a shrinking phase. This protocol releases all
the locks only after the transaction is completed successfully and used the commit statement to
make the changes permanent in the database.

It doesn’t release locks after performing an operation on data items. It releases all the locks at the
same time once the transaction commit successfully.

❮ DBMS Concurrency Control Timestamp based protocol ❯

Timestamp based Ordering Protocol in DBMS

LAST UPDATED: JULY 5, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In the previous chapter, you learned lock based protocol in DBMS to maintain the integrity of
database. In this chapter, you will learn Timestamp based ordering protocol.

What is Timestamp Ordering Protocol?

Timestamp ordering protocol maintains the order of transaction based on their timestamps.
A timestamp is a unique identifier that is being created by the DBMS when a transaction enters
into the system. This timestamp can be based on the system clock or a logical counter
maintained in the system.
Timestamp helps identifying the older transactions (transactions that are waiting in line to be
executed) and gives them higher priority compared to the newer transactions. This make sure
that none of the transactions are pending for a longer period of time.
This protocol also maintains the timestamps for the last read and last write on a data.
For example, let’s say an old transaction T1 timestamp is TS(T1) and a new transaction T2 enters
into the system, timestamp assigned to T2 is TS(T2). Here TS(T1) < TS(T2) so the T1 has the
higher priority because its timestamp is less than timestamp of T2. T1 would be given the
higher priority than T2. This is how timestamp based protocol maintains the serializability order.

How a Timestamp ordering protocol works?

Let’s see how a timestamp ordering protocol works in a DBMS system. Let’s say there is data item A in
the database.

W_TS(A) is the largest timestamp of a transaction that executed the operation write(A) successfully.
R_TS(A) is the largest timestamp of a transaction that executed the operation read(A) successfully.

1. Whenever a Transaction Tn issues a Write(A) operation, this protocol checks the following
conditions:
If R_TS(A) > TS(Tn) or if W_TS(A) > TS(Tn), then abort and rollback the transaction Tn and
reject the write (A) operation.
If R_TS(A) <= TS(Tn) or if W_TS(A) <= TS(Tn) then execute Write(A) operation of Tn and set
W_TS(A) to TS(Tn).

2. Whenever a Transaction Tn issues a Read(A) operation, this protocol checks the following
conditions:
If W_TS(A) > TS(Tn), then abort and reject Tn and reject the Read(A) operation.
If W_TS(A) <= TS(Tn), then execute the Read(A) operation of Tn and update the timestamp
R_TS(A).

Advantages of Timestamp based protocol

Schedules managed using timestamp based protocol are serializable just like the two phase
protocols
Since older transactions are given priority which means no transaction has to wait for longer
period of time that makes this protocol free from deadlock.

❮ Lock based Protocol Validation based Protocol ❯

About the Author

– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Validation Based Protocol in DBMS

LAST UPDATED: JULY 5, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Validation based protocol avoids the concurrency of the transactions and works based on the
assumption that if no transactions are running concurrently then no interference occurs. This is why it
is also called Optimistic Concurrency Control Technique.

In this protocol, a transaction doesn’t make any changes to the database directly, instead it performs
all the changes on the local copies of the data items that are maintained in the transaction itself. At
the end of the transaction, a validation is performed on the transaction. If it doesn’t violate any
serializability rule, the transaction commit the changes to the database else it is updated and
restarted.

Three phases of Validation based Protocol

1. Read phase: In this phase, a transaction reads the value of data items from database and store
their values into the temporary local variables. Transaction then starts executing but it doesn’t
update the data items in the database, instead it performs all the operations on temporary local
variables.
2. Validation phase: In this phase, a validation check is done on the temporary variables to see if it
violates the rules of serializability.
3. Write phase: This is the final phase of validation based protocol. In this phase, if the validation of
the transaction is successful then the values of temporary local variables is written to the
database and the transaction is committed. If the validation is failed in second phase then the
updates are discarded and transaction is slowed down to be restarted later.

Let’s look at the timestamps of each phase of a transaction:

Start(Tn): It represents the timestamp when the transaction Tn starts the execution.

Validation(Tn): It represents the timestamp when the transaction Tn finishes the read phase and
starts the validation phase.

Finish(Tn): It represents the timestamp when the transaction Tn finishes all the write operations.

This protocol uses the Validation(Tn) as the timestamp of the transaction Tn because this is actual
phase of the transaction where all the checks happen. So it is safe to say that TS(Tn) = Validation(Tn).

If there are two transactions T1 & T2 managed by validation based protocol and if Finish(T1) <
Start(T2) then the validation will be successful as the serializability is maintained because T1 finished
the execution well before the transaction T2 started the read phase.

❮ Timestamp based Protocol DBMS File Organization ❯

File Organization in DBMS

LAST UPDATED: JULY 5, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this article, you will learn what is file organization and what are benefits of doing it. We already
know that data is stored in database, when we refer this data in terms of RDBMS we call it collection
of inter-related tables. However in layman terms you can say that the data is stored in a physical
memory in form of files.

File organization is a way of organizing the data in such way so that it is easier to insert, delete,
modify and retrieve data from the files.

Purpose of File Organization

1. File organization makes it easier & faster to perform operations (such as read, write, update,
delete etc.) on data stored in files.
2. Removes data redundancy. File organization make sure that the redundant and duplicate data
gets removed. This alone saves the database from insert, update, delete operation errors which
usually happen when duplicate data is present in database.
3. It save storage cost. By organizing the data, the redundant data gets removed, which lowers the
storage space required to store the data.
4. Improves accuracy. When redundant data gets removed and the data is stored in efficient
manner, the chances of data gets wrong and corrupted goes down.
Types of File Organization
There are various ways to organize the data. Every file organization method is different from each
other, therefore each file organization method has its own advantages and disadvantages. It is upto
the developer which method they choose in order to organize the data. Usually this decision is made
based on what kind of data is present in database.

Types of file organization:

1. Sequential File Organization

2. Heap File Organization
3. Hash File Organization
4. B+ Tree File Organization
5. Clustered File Organization
6. Indexed sequential access method (ISAM)
❮ Validation based Protocol Sequential File Organization ❯

Sequential File Organization in DBMS

LAST UPDATED: JUNE 28, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this article, you will learn sequential file organization in DBMS. This is one of the easiest method of
file organization. In this method, files (records) are stored in a sequential manner, one after another.

There are two ways to do sequential file organization:

1. Pile File Method

2. Sorted File Method

1. Pile File Method

In Pile File method one record is inserted after another record and the new record is always inserted
at the end of the file.

If any record needs to be deleted, it gets searched in the memory block and once it is deleted a new
record can be written on the freed memory block.

The following diagram shows a File that is being organized using Pile File method, as you can see the
records are not sorted and inserted in first come first serve basis. If you want to organize the data in
such a way that it gets sorted after insertion then use the sorted file method, which is discussed in
next section.
Inserting a new record in file using Pile File method
Here we are demonstrating the insertion of a new record R3 in a already present file using Pile File
method. Since this method of sequential organization just adds the new record at the end of file, the
new record R3 gets added at the end of the file, as shown in the following diagram.

2. Sorted file Method

In sorted file method, a new record is inserted at the end of the file and then all the records are sorted
to adjust the position of the newly added record.

You can see in the following diagram that records appear in sorted order when the file is organized
using sorted file method.

In case of a record updation, once the update is complete, the whole file gets sorted again to change
the position of updated record in the file.

The sorting can be either ascending or descending, in this diagram the records are sorted in
ascending order.

Inserting a new record in file using Sorted File Method

In the following diagram, a new record R3 is added to an existing file. Although the record is added at
the end, its position gets changed after insertion. The whole file gets sorted after addition of the new
record and the new record R3, is placed just after record R1 as the file is sorted in ascending order
using sorted file method of sequential file organization.
Advantages of Sequential File Organization
1. It is simple to adapt method. The implementation is simple compared to other file organization
methods.
2. It is fast and efficient when we are dealing with huge amount of data.
3. This method of file organization is mostly used for generating various reports and performing
statistical operations on data.
4. Data can be stored on a cheap storage devices.

Disadvantages of Sequential File Organization

1. Sorting the file takes extra time and it requires additional storage for sorting operation.
2. Searching a record is time consuming process in sequential file organization as the records are
searched in a sequential order.

❮ File Organization Heap File Organization ❯

About the Author

– Chaitanya

Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Heap File Organization in DBMS

LAST UPDATED: JUNE 28, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Heap File Organization method is simple yet powerful file organization method. In this method, the
records are added in memory data blocks, in no particular order.

The following diagram demonstrates the Heap file organization. As you can see, records have been
assigned to data blocks in memory in no particular order.
Since the records are not sorted and not stored in consecutive data blocks in memory, searching a
record is time consuming process in this method. Update and delete operations also give poor
performance as the records needs to be searched first for updation and deletion, which is already a
time consuming operation. However if the file size is small, these operations give one of the best
performances compared to other methods so this method is widely used for small size files.

This method requires memory optimization and cleanup as this method doesn’t free up the allocated
data block after a record is deleted.

Insertion of a record using Heap File Organization method

The following diagram demonstrate the addition of a new record in the file using heap file
organization method. As you can see a free data block which has not been assigned to any record
previously, has been assigned to the newly added record R2. The insertion of new record is pretty
simple in this method as there is no need to perform any sorting, any free data block is assigned to
the new record.

Advantages of Heap File Organization Method

1. This is a popular method when huge amount of records needs to be added in the database.
Since the records are assigned to free data blocks in memory there is no need to perform any
special check for existing records, when a new record needs addition. This makes it easier to
insert multiple records all at once without worrying about messing with the file organization.
2. When the records are less and file size is small, it is faster to search and retrieve the data from
database using heap file organization compared to sequential file organization.

Disadvantages of Heap File Organization method

1. This method is inefficient if the file size is big, as the search, retrieve and update operations
consumes more time compared to sequential file organization.
2. This method doesn’t use the memory space efficiently, thus it requires memory cleanup and
optimization to free the unused data blocks in memory.

❮ Sequential File Organization Hash File Organization ❯

Hash File Organization in DBMS

LAST UPDATED: JUNE 29, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

In this method, hash function is used to compute the address of a data block in memory to store the
record. The hash function is applied on certain columns of the records, known as hash columns to
compute the block address. These columns/fields can either be key or non-key attributes.

The following diagram demonstrates, the hash file organization. As shown here, the records are stored
in database in no particular order and the data blocks are not consecutive. These memory addresses
are computed by applying hash function on certain attributes of these records.

Fetching a record is faster in this method as the record can be accessed using hash key column. No
need to search through the entire file to fetch a record.
Inserting a record using Hash file Organization method
In the following diagram, you can see that a new record R5 needs to be added to the file. The same
hash function that generated the address for existing records in the file, will be used again to compute
the address (find data block in memory) for this new record by applying the has function on the
certain columns of this record.
Advantages of Hash File Organization
1. This method doesn’t require sorting explicitly as the records are automatically sorted in the
memory based on hash keys.
2. Reading and fetching a record is faster compared to other methods as the hash key is used to
quickly read and retrieve the data from database.
3. Records are not dependant on each other and are not stored in consecutive memory locations so
that prevents the database from read, write, update, delete anomalies.

Disadvantages of Hash File Organization

1. Can cause accidental deletion of data, if columns are not selected properly for hash function. For
example, while deleting an Employee "Steve" using Employee_Name as hash column can cause
accidental deletion of other employee records if the other employee name is also "Steve". This
can be avoided by selecting the attributes properly, for example in this case combining age,
department or SSN with the employee_name for hash key can be more accurate in finding the
distinct record.
2. Memory is not efficiently used in hash file organization as records are not stored in consecutive
memory locations.
3. If there are more than one hash columns, searching a record using a single attribute will not give
accurate results.

❮ Heap File Organization ISAM in DBMS ❯

Indexed sequential access method (ISAM) in DBMS

LAST UPDATED: JUNE 30, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Indexed sequential access method also known as ISAM method, is an upgrade to the conventional
sequential file organization method. You can say that it is an advanced version of sequential file
organization method. In this method, primary key of the record is stored with an address, this address
is mapped to an address of a data block in memory. This address field works as an index of the file.

In this method, reading and fetching a record is done using the index of the file. Index field contains
the address of a data record in memory, which can be quickly used to read and fetch the record from
memory.
Advantages of ISAM
1. Searching a record is faster in ISAM file organization compared to other file organization
methods as the primary key can be used to identify the record and since primary key also has the
address of the record, it can read and fetch the data from memory.
2. This method is more flexible compared to other methods as this allows to generate the index
field (address field) for any column of the record. This makes searching easier and efficient as
searches can be done using multiple column fields.
3. This allows range retrieval of the records since the address file is stored with the primary key of
the record, we can retrieve the record based on a certain range of primary key columns.
4. This method allow partial searches as well. For example, employee name starting with “St” can
be used to search all the employees with the name starting with letters “St”. This will result all
the records where employee name begins with the letters “St”.

Disadvantages of ISAM
1. Requires additional space in the memory to store the index field.
2. After adding a record to the file, the file needs to be re-organized to maintain the sequence based
on primary key column.
3. Requires memory cleanup because when a record is deleted, the space used by the record
needs to be released in order to be used by the other record.
4. Performance issues are there if there are frequent deletion of records, as every deletion needs a
memory cleanup and optimization.

❮ Hash File Organization B+ File Organization ❯

B+ Tree File Organization in DBMS

LAST UPDATED: JUNE 30, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Similar to ISAM file organization, B+ file organization also works with key & index value of the
records. It stores the records in a tree like structure, that is why it is also known as B+ Tree file
organization. In B+ file organization, the leaf nodes store the records and intermediate nodes only
contain the pointer to the leaf nodes, these intermediate nodes do not store any record.

Root node and intermediate nodes contain key field and index field. The key field is a primary key of
record which can be used to distinctly identify a record, the index field contains the pointer (address)
to the leaf node where the actual record is stored.

B+ Tree Representation:
Let’s say we are storing the records of employees of an organization. These employee records contain
fields such as Employee_id, Employee_name, Employee_address etc. If we consider Employee_id as
primary key and the values of Employee_id ranges from 1001 to 1009 then the B+ tree representation
can be as follows.
The important point to note here is that the records are only stored at the leaf nodes, other
records contains the key and index value (pointer to leaf node).
Leaf Node 1001 means that it stores the complete record of the employee where employee id is
“1001”. Similarly nodes 1002 stores the record of employee with employee id “1002” and so on.
The main advantage of B+ file organization is that searching a record is faster. This is because
all the leaf nodes (where the actual record is stored) are at the same distance from the root node
and can be accessed faster.
Since intermediate nodes do not contain the records and only contains the pointer to the leaf
nodes, the height of the B+ tree is shorter that makes the traversing easier and faster.

Advantages of B+ Tree File Organization

1. Searching is faster: As we discussed earlier, since all the leaf nodes are at minimal distance
from the root node, searching a record is faster in B+ tree file organization.
2. Flexible: Adding new records and removing old records can be easily done in a B+ tree as the B+
tree is flexible in terms of size, it can grow and shrink based on the records that needs to be
stored. It has no restriction on the amount of the records that can be stored.
3. Allows range retrieval: It allows range retrieval. For example, if there is a requirement to fetch all
the records from a certain range, then it can be easily done using this file organization method.
4. Allows partial searches: Similar to ISAM, this also allows partial searches. For example, we can
search all the employees where id starts with “10“.
5. Better performance: This file organization method gives better performance than other file
organization methods for insert, update, delete operations.
6. Re-organization of records is not necessary to maintain performance.

Disadvantages of B+ Tree file Organization

1. Extra insertion and deletion cause space overhead.
2. This method is not suitable for static tables as it is not efficient for static tables compared to
other file organization methods.

❮ DBMS ISAM Cluster File Organization ❯

Cluster File Organization in DBMS

LAST UPDATED: JULY 3, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Cluster file organization is different from the other file organization methods. Other file organization
methods mainly focus on organizing the records in a single file (table). Cluster file organization is
used, when we frequently need combined data from multiple tables.

While other file organization methods organize tables separately and combine the result based on the
query, cluster file organization stores the combined data of two or more frequently joined tables in
the same file known as cluster. This helps in accessing the data faster.
Types of Cluster File Organization
There are two types of cluster file organizations:

1. Index based cluster file organization

2. Hash based cluster file organization

Index based cluster file organization: The example that we have shown in the above diagram is an
index based cluster file organization. In this type, the cluster is formed based on the cluster key and
this cluster key works as an index of the cluster.
Since EMP_DEP field is common in both the tables, this becomes the cluster key when these two tables
joined to form the cluster. Whenever we need to find the combined record of employees and
department based on the EMP_DEP, this cluster can be used to quickly retrieve the data.

Hash based cluster file organization: This is same as index based cluster file organization except that
in this type, the hash function is applied on the cluster key to generate the hash value and that value is
used in the cluster instead of the index.

Note: The main difference between these two types is that in index based cluster, records are stored
with cluster key while in hash based cluster, the records are stored with the hash value of the cluster
key.

Advantages of cluster file organization

1. This method is popularly used when multiple tables needs to be joined frequently based on the
same condition.
2. When a table in database is joined with multiple tables of the same database then cluster file
organization method will be more efficient compared to other file organization methods.

Disadvantages of cluster file organization

1. Not suitable for large databases: This method is not suitable if the size of the database is huge
as the performance of various operations on the data will be poor.
2. Not flexible with joining condition: This method is not suitable if the join condition of the tables
keep changing, as it may take additional time to traverse the joined tables again for the new
condition.
3. Isolated tables: If tables are not that related and there is rarely any join query on tables then
using this file organization is not recommended. This is because maintaining the cluster for such
tables will be useless when it is not used frequently.
❮ B+ File Organization DBMS SQL Intro ❯

About the Author

– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *

Comment *

Name *

Email *

Data Replication in DBMS

LAST UPDATED: AUGUST 20, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

Data replication is a process of making the multiple copies of database available on servers. This is
done to achieve distributed database. This is to minimize the load on the database and provide better
performance to the users.

In Data replication, the various users can access data from different sites available on distributed
system, however the data remains same in all sites. Replication is done in such a way so that the
data is always same on all sites and synchronized whenever there is a change.
This distributed database approach provides better performance and availability. It also helps to
recover data in case of a server failure.
There can be full replication where the entire database is available on all servers or there can be
partial replication where the frequently used chunks of data are available on all servers.
The transactional replication works on a concept of publisher and subscriber.
Publisher: The primary database that publishes data to all the secondary databases called
subscribers.
Subscriber: These are the secondary databases, these are nothing but the copies of the primary
database. These subscriber receives updates from the publisher as and when there is a change
in the publisher database.

Types of Data Replication

There are three types of data replication approaches in DBMS:
1. Transactional replication
2. Snapshot replication
3. Merge replication

1. Transactional replication
This approach is used to replicate the changes between multiple copies of databases. Any change
such as data update, primary key change, stored procedure change is replicated among all copies of
the database.

The changes occur in the subscriber in the same order in which they occurred in the publisher
database.

Subscriber databases can be used as read-only databases. The consistency between publisher and
subscriber is guaranteed as the publisher push all the changes to subscribers consistently and in
same order.
2. Snapshot replication
In this approach, the snapshot of publisher database is taken at a specific moment of time and that
snapshot is shared with all the subscribers.

Snapshot replication is slower than transactional replication, as the changes are not pushed real-time
rather they are pushed after a specific interval.
This approach is mostly used when:

Data does not change frequently.

Initial replication of data between Publisher and Subscriber.
A big change is happened in the publisher database (source database).

Role of snapshot agent in Snapshot replication: Snapshot agent is responsible for taking the
snapshot from publisher and making it available to the subscribers:

It establishers a connection between publisher and subscriber.

It acquires lock on the publisher tables when there is change happening on the tables.
Copies the data from publisher and writes the same in snapshot folder.
Once changes are done, it releases the lock on the publisher tables.
3. Merge replication
Similar to the transaction replication, it also starts with a snapshot of the publisher database. The
further changes made to the publisher database are made available to the subscribers using triggers.
When these triggers happen, the subscriber gets connected to the publisher and replicates all the
changes that are happened to the publisher since last time it synchronized with publisher.

Merge replication allows publishers and subscribers to make changes in the database and these
changes are replicated to other publisher and subscribers.

Replication Schemes
1. Full replication: In full replication, the entire database is available at every site of the distributed
database. This approach provides full availability and performance. In this approach, even if there is a
system failure, the database availability doesn’t get affected, thus this replication scheme is robust
and durable.
Advantages of full replication:
1. High availability
2. Best performance
3. Full recovery in case of failure.
4. Better load balance on every site of distributed database.

Disadvantages of full replication:

1. Requires high storage capacity.
2. Data redundancy as the data that is not frequently accessed is also replicated at every site.
3. Updates are slow as every changes has to be made live on every site of distributed database.
4. Maintaining data consistency at every site requires complex measures.

2. Partial replication: In partial replication, only the data that is frequently accessed is replicated on
every site of distributed database.
Advantages of partial replication:
1. Requires less storage capacity than full replication.
2. Provides good performance as the frequently used data is available at all sites.
3. Updates are faster as only important and frequently used data is replicated at all sites.
4. Maintaining data consistency is somewhat easier than full replication as the replicated data size is
small.

Disadvantages of partial replication:

1. Doesn’t provide high availability for non-frequently used data.
2. No full recovery in case of failure of source database.
3. Poor load balance as the data that is not present can be accessed from source server only.

❮ Cluster File Organization DBMS tutorial ❯

About the Author

Indexing in DBMS – Types of Indexes in Database

LAST UPDATED: AUGUST 24, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS

A database index is a data structure that helps in improving the speed of data access. However it
comes with a cost of additional write operations and storage space to store the database index. The
database index helps quickly locate the data in database without having to search every row of
database. The process of creating an index for a database is known indexing. In this guide, you will
learn various types of Indexes in DBMS (Database management system) with examples.

Real life example of Indexing

1. You must have read a book, the first few pages of book contains the index of book, which tells
which topic is covered at which page number. This helps you quickly locate the topic in the book using
the index. Without the index, you would have to scan the entire book to look for the topic which would
take a long time.

2. In the library, the books are arranged on the shelf in an alphabetical order. If you are looking for a
book starting with the the letter ‘A’ then you go to the shelf ‘A’. Here shelf naming with the letter ‘A’ is
the index. Imagine if the books are not arranged in alphabetical order in shelves, it would take a very
long time to search for a book.

Index structure in Database

The most common index data structure contains two fields.

1. First field is the search key, this is the column that a user can use to access the record quickly. For
example, if a user is searching for a student in database, the user can use student id as a search key
to quickly locate the student record.
2. The second field contains the address of the student record in the database. Remember indexing
doesn’t replicate the whole database, rather it creates an index that refers to the actual data in
database. This field is a reference to the data. If user is searching for a student with student id “S01”
then the S01 is the search key and the second field of the index contains the address where the
student data such as student name, age, address is stored.

Types of Indexes
1. Dense Index
2. Sparse Index
3. Clustered index
4. Non-clustered index or secondary index
5. Multilevel index
6. Reverse Index
1. Dense Index
In Dense Index, there is an index for every record in the database. For example, if a table student
contains 100 records then in dense index the number of indices would be 100, one index for each
record in table.

If more than one record has the same search key then the dense index points to the first record in the
database that has the search key.

The dense name is given to this index is based on the fact that every record in the database has a
corresponding index in index file so the index file is very dense in this index based database.

Advantages of dense indexes:

1. Searching a record is faster compared to other indexes.
2. It doesn’t require the database to be sorted in any order to generate a dense indexes.

Disadvantages of dense indexes:

1. Requires more space as the index file is huge because it contains indexes for all records.
2. More write operations to generate index file.
3. It requires more maintenance as any change in any record would require a maintenance in index
file.

2. Sparse Index
In this index based system, the indexes of very few data items are maintained in the index file. Unlike
Dense index system where every record has an index entry in index file, in this system, indexes are
limited to one per block of data items as shown in the following diagram.
In sparse indexing database needs to be sorted in an order.

For example, let’s say we are creating a sparse index file for student database that contains records
for 100 students.

Student records are divided in blocks where every block contains two records. If index file contains the
indexes for alternate records then we need to maintain indexes for only 50 records whereas in dense
index system, we had to have 100 records in index file.

Advantages of sparse indexing:

1. It requires less storage space for managing the index file as it stores the indexes of few records
instead of all records. This improves the performance.
2. Since limited entries need to be maintained in index file, it requires less write operations for
generating a sparse index file.
3. It requires less maintenance compared to dense indexes.

Disadvantages of sparse indexing:

1. Searching is little slower than dense indexes as not all records have corresponding indexes and it
requires a binary search to locate the search record.
2. Sparse index requires file to be sorted.

Difference between Dense and Sparse indexes

DESCRIPTION DENSE SPARSE

Write operations to generate indexes

1. Search is faster as index for every data
are faster as indexes for few records
Performance item is present.
needs to be generated.

2. Prerequisite No prerequisites It requires the database to be sorted.

3. Storage More storage space is required. Less storage space is required.

Requires more time as every insert, Requires less maintenance as

4.
update and delete operation in database number of indexes are less
Maintenance
requires maintenance in the index file. compared to dense index system.

3. Clustered Index
As the name suggests, in clustered index, the records with the similar type are grouped together to
form a cluster and an index is created for this cluster which is maintained in clustered index file.

For example:
Let’s say students are assigned to multiple courses and we are creating indexes on course_id filed. In
this case, all the students that are assigned to a particular course_id form a cluster and the index for
that particular course_id points to this cluster as shown in the following diagram.

This helps in quickly locating a record in a particular cluster as the the size of the cluster is limited and
smaller than the actual database so searching a record is faster.

One of the type of clustered indexing is primary indexing: In this type of clustered indexing, data is
sorted based on the search key. In this type of indexing, searching is even faster as the records are
sorted.
4. Non-clustered or secondary indexing
In non-clustered indexing, the indexing is done on multiple levels. This indexing is also known as
secondary indexing.

For example, let’s say we have records of 300 students in database, instead of creating indexes for
300 records on the root level, we create indexes for 1st student records, 101st student and 201st
student. This index is maintained in the primary memory such as RAM. Here we have divided the
complete index file in three groups.
The second level of indexes are stored in hard disk, the primary index file is stored in RAM, refers to
this file and this file then points to the actual data block in memory as shown below:

5. Multilevel index
In multilevel index strategy, the indexes are stored at multiple levels as shown in the following
diagram. This strategy is especially used when there is a large amount of data items, thus the size of
the index file is huge.

An index file with large number of records defeats the purpose of faster access and better
performance as accessing a large index file itself gives poor performance.

To solve this issue, in this strategy, index are divided in multiple levels such as outer index blocks,
inner index blocks, data blocks. The outer index blocks points to the inner index blocks and inner
index blocks points to the data blocks. Managing indexes this way, we don’t need to access the whole
index file as only those outer, inner and data blocks needs to be accessed that are matching the
criteria of search key.
The disadvantage of multilevel index strategy is that it requires additional storage space to maintain
this multilevel hierarchy of indexes.

6. Reverse Key Index

In reverse key index strategy, the search key value is reversed before it is written in the index file. For
example a search key value 34568 becomes 86543 in the reverse key index file.

This strategy is used when the search key field in the index data structure represents sequence
numbers where each key value is greater than the prior key value.

Reverse key indexes uses B-tree as data structures. The B-tree stores similar values in a single block
such as the value 86543 and 86544 are stored in a single block which makes them easier to access.

❮ DBMS Tutorial