DBMS Tutorial – Database system notes-combined
DBMS Tutorial – Database system notes-combined
DBMS stands for Database Management System. We can break it like this DBMS = Database +
Management System. Database is a collection of data and Management System is a set of programs
to store and retrieve those data. Based on this we can define DBMS like this: DBMS is a collection of
inter-related data and set of programs to store & access those data in an easy and effective manner.
Here are the DBMS notes to help you learn database systems in a Systematic manner. Happy
Learning!!
Introduction to DBMS
Types of DBMS
DBMS Applications
Advantages of DBMS over file processing system
DBMS vs RDBMS
DBMS Architecture
Three level DBMS Architecture
View of Data
Data Abstraction
Instances and Schemas
DBMS languages
Data Models:
Relational Database:
RDBMS Concepts
Relational Algebra
Relational Calculus
View vs table
Keys in DBMS
Primary key
Super key
Candidate key
Alternate key
Composite key
Foreign key
Constraints in DBMS
Domain constraints
Mapping constraints
Cardinality in DBMS
Functional dependencies in DBMS
Trivial functional dependency
non-trivial functional dependency
Multivalued dependency
Transitive dependency
Normalization in dbms – This covers all the normal forms: First Normal Form(1NF), Second
Normal Form(2NF), Third Normal Form(3NF), Boyce–Codd Normal Form(BCNF)
Denormalization in DBMS
Denormalization vs Normalization
Decomposition in DBMS
Transaction Management:
Concurrency Control:
Concurrency Control
Lock based protocol
Timestamp based protocol
Validation based protocol
File Organization:
SQL Introduction:
SQL Introduction
Characteristics of SQL
Advantages of SQL
SQL commands
SQL operators
SQL CREATE TABLE statement
SQL DROP TABLE statement
SQL SELECT statement
SQL INSERT statement
What is a Database?
A database is collection of interrelated data, stored in such a way so that a user can read, insert,
update and delete the data efficiently.
Database systems are basically developed for large amount of data. When dealing with huge amount
of data, there are two things that require optimization: Storage of data and retrieval of data.
Storage: According to the principles of database systems, the data is stored in such a way that it
acquires lot less space as the redundant data (duplicate data) has been removed before storage.
Fast Retrieval of data: Along with storing the data in an optimized and systematic manner, it is also
important that we retrieve the data quickly when needed. Database systems ensure that the data is
retrieved as quickly as possible.
DBMS also secures the data from unauthorised access as well as corrupt data insertions. It allows
multiple users to access data simultaneously while maintaining the data consistency and data
integrity.
Data Definition: Creation of table, table schema creation, removal of table definition etc. comes under
data definition. It is basically a layout of the table and their relation with the other tables in the
database.
Data Modification: DBMS allows users to insert, update and delete the data from the tables. These
tables contains rows and columns, where row represents a record of data while column represents
attributes of the records.
Data Retrieval: DBMS allows users to fetch data from the database.
User administration: DBMS also allows user management such as organizing users in different
groups with different access levels. Granting users access to certain tables in database, revoking
access from certain users etc.
Characteristics of DBMS
Stores the data in such a way so that the relation between data is still maintained in the
database.
Allows fast retrieval.
It can handle multiple accessing the database at the same time.
It maintains data integrity by following ACID properties of the database.
It provides data security by managing user access.
DBMS allows automatic backup of database to handle accidental corruption or deletion of data.
It allows scaling of database as per the need.
It allows data rollback and redone in case of a data operation failure.
Advantages of DBMS
Handles Database redundancy: The major disadvantage of file based system of storing the data
is data redundancy, same data is stored in multiple files. DBMS handles data redundancy to
manage the storage space efficiently.
Data sharing: DBMS allows data sharing so that data can be shared between multiple users of
the same organization efficiently.
Data Maintenance: DBMS performs regular data checks and automatic backup.
Performance: Provides better performance for operations such as read, insert, update and
deletion of data.
Backup: It maintains backup of the database so that in case of a failure, database can be
recovered to the previous state using the backup.
Multiple users: It allows multiple users to access the data at the same time.
Disadvantages of DBMS
Hardware and Software Cost: Although DBMS has several advantages over file system of data
management, however all this comes with a cost. DBMS needs a dedicated hardware and
software system to manage the database.
Need large Storage: DBMS is usually used in the large organisations that require large amount of
data stored in the devices.
Complexity: Database management system is complex and not easy to implement.
Requires learning: In order to manage database, user require learning the concepts of DBMS
which require additional time and resources that a organization has to bear.
Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
b.ankammarao says
SEPTEMBER 3, 2015 AT 10:00 AM
Reply
I have already covered all the normal forms in the “Normalization in DBMS” topic and
the transaction management link has been added above.
Reply
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Introduction to DBMS
LAST UPDATED: JULY 24, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS
DBMS stands for Database Management System. We can break it like this DBMS = Database +
Management System. Database is a collection of data and Management System is a set of programs
to store and retrieve those data. Based on this we can define DBMS like this: DBMS is a collection of
inter-related data and set of programs to store & access those data in an easy and effective manner.
What is DBMS?
DBMS is a software that is used to manage the data. Some of the popular DBMS softwares are:
MySQL, IBM Db2, Oracle, PostgreSQL etc.
DBMS provides an interface to the user so that the operations on database can be performed
using the interface.
DBMS secure the data, that is the main advantage of DBMS over file system.
DBMS also secures the data from unauthorised access as well as corrupt data insertions. It
allows multiple users to access data simultaneously while maintaining the data consistency and
data integrity.
Data Modification: DBMS allows users to insert, update and delete the data from the tables. These
tables contains rows and columns, where row represents a record of data while column represents
attributes of the records. You can also bulk update the several records in DBMS with a single click.
Data Retrieval: DBMS allows users to fetch data from the database. Searching and retrieval of data is
fast in DBMS. The size of the database doesn’t impact this operation, on the other hand in file system,
the size of the data can hugely impact the search operation efficiency.
User administration: DBMS also allows user management such as organizing users in different
groups with different access levels. Granting users access to certain tables in database, revoking
access from certain users etc. This allows the admin of the database to efficiently manage the
access to the database and prevent unauthorised access to the databases.
Storage: According to the principles of database systems, the data is stored in such a way that it
acquires lot less space as the redundant data (duplicate data) has been removed before storage. Let’s
take a layman example to understand this:
In a banking system, suppose a customer is having two accounts, one is saving account and another
is salary account. Let’s say bank stores saving account data at one place (these places are called
tables we will learn them later) and salary account data at another place, in that case if the customer
information such as customer name, address etc. are stored at both places then this is just a wastage
of storage (redundancy/ duplication of data), to organize the data in a better way the information
should be stored at one place and both the accounts should be linked to that information somehow.
The same thing we achieve in DBMS.
Fast Retrieval of data: Along with storing the data in an optimized and systematic manner, it is also
important that we retrieve the data quickly when needed. Database systems ensure that the data is
retrieved as quickly as possible.
Database systems are much better than traditional file processing systems which we have discussed
in the separate article: DBMS vs File System.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
joshua says
MAY 16, 2016 AT 6:05 PM
so helpful!!!!!!!!
Reply
Sasha says
JULY 5, 2016 AT 9:20 AM
This tutorial is so great! I have looked through so many sites, but this is definitely the best
one. :) thanks
Reply
Actually, this tutorial is so great! I have looked through so many sites, but this is definitely
the best one. thanks too much..
Reply
Reply
jeff says
APRIL 29, 2017 AT 8:41 PM
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
DBMS is a software that manages the data for efficient storage and fast retrievals of data from
database. MySQL, IBM Db2, Oracle, PostgreSQL etc. are all DBMS softwares that manages the data. In
this guide, you will learn various types of DBMS (Database Management System).
Types of DBMS
There are 4 types of DBMS:
1. Relational Database Management System (RDBMS)
2. Object Oriented Database Management System.
3. Hierarchical Database management system.
4. Network Database management system.
For example, a student table stores the records of various students, a row of this table represents the
record of a single student and the column represents the attributes of the record such as student id,
name, age, address etc.
Student table
For example: A house is an object. An object has two characteristics: states and behaviour.
In this example of “House” being an object. The state of “House” is its address, color, area etc. and
behaviour is Open main door, close main door etc.
An object oriented database can be represented by the following diagram. To read more about object
oriented programming, refer this guide.
3. Hierarchical Database Management System
In hierarchical database management system, data is stored in form of one to many relationships. You
can visualize it like a tree where a root node is attached to several descendants nodes called leaves.
Example of Hierarchical database systems are: IMS by IDM, Windows registry by Microsoft.
For example: To store the data of an organization, the root node is organization itself. The immediate
child nodes are: Employees, Managers, Directors. These child nodes can have further child nodes
such as Employees can have child nodes such as: Engineers, Housekeeping staff, system admin etc.
A network database is based on a traditional hierarchical database, except it allows each object to
have multiple parents instead of a single parent. This means data in network database can have one
to one or one to many relationships.
❮ DBMS Tutorial
Top Related Articles:
1. Decomposition in DBMS – Lossless and Lossy with examples
2. DBMS SQL Insert Statement
3. DBMS Tutorial – Database Management System notes
4. Indexed sequential access method (ISAM) in DBMS
5. Instance and schema in DBMS
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this guide, you will learn the various DBMS applications. These applications help you understand
the use of DBMS in various fields.
DBMS applications
Applications where we use Database Management Systems are:
Telecom: There is a database to keeps track of the information regarding calls made, network
usage, customer details etc. Without the database systems it is hard to maintain that huge
amount of data that keeps updating every millisecond.
Industry: Where it is a manufacturing unit, warehouse or distribution centre, each one needs a
database to keep the records of ins and outs. For example distribution centre should keep a
track of the product units that supplied into the centre as well as the products that got delivered
out from the distribution centre on each day; this is where DBMS comes into picture.
Banking System: For storing customer info, tracking day to day credit and debit transactions,
generating bank statements etc. All this work has been done with the help of Database
management systems. Also, banking system needs security of data as the data is sensitive, this
is efficiently taken care by the DBMS systems.
Sales: To store customer information, production information and invoice details. Using DBMS,
you can track, manage and generate historical data to analyse the sales data.
Airlines: To travel though airlines, we make early reservations, this reservation information along
with flight schedule is stored in database. This is where the real-time update of data is necessary
as a flight seat reserved for one passenger should not be allocated to another passenger, this is
easily handled by the DBMS systems as the data updates are in real time and fast.
Education sector: Database systems are frequently used in schools and colleges to store and
retrieve the data regarding student details, staff details, course details, exam details, payroll data,
attendance details, fees details etc. There is a large amount of inter-related data that needs to be
stored and retrieved in an efficient manner.
Online shopping: You must be aware of the online shopping websites such as Amazon, Flipkart
etc. These sites store the product information, your addresses and preferences, credit details
and provide you the relevant list of products based on your query. All this involves a Database
management system. Along with managing the vast catalogue of items, there is a need to
secure the user private information such as bank & card details. All this is taken care of by
database management systems.
I have mentioned very few applications, this list is never going to end as almost every field where the
database needs to be managed is using DBMS now a days. The traditional file system is used only
where the data size is very small.
❮ Previous Next ❯
Webflow Open
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this guide, you will learn advantages and disadvantages of DBMS. We will first discuss what is a file
processing system and how Database management systems are better than file processing systems.
Data Security: Data should be secured from unauthorised access, for example a student in a
college should not be able to see the payroll details of the teachers, such kind of security
constraints are difficult to apply in file processing systems.
Disadvantages of DBMS
DBMS implementation cost is high compared to the file system
Complexity: Database systems are complex to understand
Performance: Database systems are generic, making them suitable for various applications.
However this feature affect their performance for some applications
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Reply
Vivek says
JANUARY 5, 2017 AT 9:23 PM
if you want to access any type of data in file system you have to go through ever single
one of them to find out where it is (data) .
But in Database Management System you can search using query as ( select * from
table_name Where column_name = “enter a value “
Reply
All the topics of DBMS tutorial has been described in the simplest way and easy to
understand. Thanks to the writer.
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this guide, you will learn the difference between DBMS (Database Management System) and
RDBMS (Relational Database Management System).
In DBMS, data is stored in files so the data In RDBMS, data is stored in tables and tables can
stored in different file is isolated and there is have a relationship with other tables. This helps in
no relation between the data stored in identifying the relationship between data stored in
different files. different tables.
Software and hardware requirements are Software and hardware requirements are high since
low. the size of the data is big.
DBMS examples are: XML, MS Access etc. RDBMS examples are: IBM Db2, Oracle, MySQL etc.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
DBMS Architecture
LAST UPDATED: JULY 3, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS
In the previous tutorials, we learned basics of DBMS. In this guide, we will see the DBMS architecture.
Database management systems architecture will help us understand the components of database
system and the relation among them.
The architecture of DBMS depends on the computer system on which it runs. For example, in a client-
server DBMS architecture, the database systems at server machine can run several requests made by
client machine. We will understand this communication with the help of diagrams.
In two-tier architecture, the Database system is present at the server machine and the DBMS
application is present at the client machine, these two machines are connected with each other
through a reliable network as shown in the above diagram.
Whenever client machine makes a request to access the database present at server using a query
language like sql, the server perform the request on the database and returns the result back to the
client. The application connection interface such as JDBC, ODBC are used for the interaction between
server and client.
In three-tier architecture, another layer is present between the client machine and server machine. In
this architecture, the client application doesn’t communicate directly with the database systems
present at the server machine, rather the client application communicates with server application and
the server application internally communicates with the database system present at the server.
❮ DBMS vs RDBMS DBMS three level Architecture ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
sam says
APRIL 30, 2019 AT 9:40 AM
one of the most underrated website with the best explanation, no one in the world is as
best as u are
Why not try to build a platform where others can compete with each other on the basis of
their coding skills
Reply
The tutorial is just fine and i appreciate very much for such a help.i was blank in DB but
having read through your notes am convinced that the DBMS is very simple and not
complicated as I thought before.Much appreciation for you guys.This is very great.
Reply
Sunanda says
DECEMBER 23, 2020 AT 4:55 PM
very helpful! .we can understand the concepts very clearly just by reading those simple and
effective explanations. Thank you so much:)
Reply
Mingso says
JANUARY 23, 2021 AT 4:37 AM
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In the previous tutorial we have seen the DBMS architecture – one-tier, two-tier and three-tier. In this
guide, we will discuss the three level DBMS architecture in detail.
1. External level
It is also called view level. The reason this level is called “view” is because several users can view their
desired data from this level which is internally fetched from database with the help of conceptual and
internal level mapping.
The user doesn’t need to know the database schema details such as data structure, table definition
etc. user is only concerned about data which is what returned back to the view level after it has been
fetched from database (present at the internal level).
External level is the “top level” of the Three Level DBMS Architecture.
2. Conceptual level
It is also called logical level. The whole design of the database such as relationship among data,
schema of data etc. are described in this level.
Database constraints and security are also implemented in this level of architecture. This level is
maintained by DBA (database administrator).
3. Internal level
This level is also known as physical level. This level describes how the data is actually stored in the
storage devices. This level is also responsible for allocating space to the data. This is the lowest level
of the architecture.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
To fully understand the view of data, you must have a basic knowledge of data abstraction and
instance & schema. Refer these two tutorials to learn them in detail.
1. Data abstraction:Database systems are made-up of complex data structures. To ease the user
interaction with database, the developers hide internal irrelevant details from users. This process
of hiding irrelevant details from user is called data abstraction.
2. Instance and schema: Design of a database is called the schema. Schema is of three types:
Physical schema, logical schema and view schema. The data stored in database at a particular
moment of time is called instance of database. Database schema defines the variable
declarations in tables that belong to a particular database; the value of these variables at a
moment of time is called the instance of that database.
❮ Previous Next ❯
CrowdStrike® Open
About the Author
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Database systems are made-up of complex data structures. To ease the user interaction with
database, the developers hide internal irrelevant details from users. This process of hiding irrelevant
details from user is called data abstraction. The term “irrelevant” used here with respect to the user, it
doesn’t mean that the hidden data is not relevant with regard to the whole database. It just means that
the user is not concerned about that data.
For example: When you are booking a train ticket, you are not concerned how data is processing at the
back end when you click “book ticket”, what processes are happening when you are doing online
payments. You are just concerned about the message that pops up when your ticket is successfully
booked. This doesn’t mean that the process happening at the back end is not relevant, it just means
that you as a user are not concerned what is happening in the database.
Logical level: This is the middle level of 3-level data abstraction architecture. It describes what data is
stored in database.
View level: Highest level of data abstraction. This level describes the user interaction with database
system.
Example: Let’s say we are storing customer information in a customer table. At physical level these
records can be described as blocks of storage (bytes, gigabytes, terabytes etc.) in memory. These
details are often hidden from the programmers.
At the logical level these records can be described as fields and attributes along with their data types,
their relationship among each other can be logically implemented. The programmers generally work at
this level because they are aware of such things about database systems.
At view level, user just interact with system with the help of GUI and enter the details at the screen,
they are not aware of how the data is stored and what data is stored; such details are hidden from
them.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Tushar says
MARCH 13, 2016 AT 12:01 PM
preparing for exams and limited time only….!
Amazing! nicely explained the topics.
I liked it.
Thank you…!
Reply
Need to understand Database quickly, the website does a great job! Many Thanks :)
Reply
Reply
shiv says
APRIL 14, 2017 AT 3:45 AM
Wonderful Explanation. Helps to understand the levels of database and data abstraction in
one shot.
Reply
sandeep says
JULY 4, 2017 AT 7:04 AM
request you to add the next topic link in the bottom of every page, it would be helpful to
navigate to the next topic once we completed the current topic
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this guide, you will learn about instance and schema in DBMS.
DBMS Schema
Definition of schema: Design of a database is called the schema. For example: An employee table in
database exists with the following attributes:
This is the schema of the employee table. Schema defines the attributes of tables in the database.
Schema is of three types: Physical schema, logical schema and view schema.
Schema represents the logical view of the database. It helps you understand what data needs to
go where.
Schema can be represented by a diagram as shown below.
Schema helps the database users to understand the relationship between data. This helps in
efficiently performing operations on database such as insert, update, delete, search etc.
In the following diagram, we have a schema that shows the relationship between three tables: Course,
Student and Section. The diagram only shows the design of the database, it doesn’t show the data
present in those tables. Schema is only a structural view(design) of a database as shown in the
diagram below.
The design of a database at physical level is called physical schema, how the data stored in blocks of
storage is described at this level.
Design of database at logical level is called logical schema, programmers and database
administrators work at this level, at this level data can be described as certain types of data records
gets stored in data structures, however the internal details such as implementation of data structure is
hidden at this level (available at physical level).
Design of database at view level is called view schema. This generally describes end user interaction
with database systems.
To learn more about these schemas, refer 3 level data abstraction architecture.
DBMS Instance
Definition of instance: The data stored in database at a particular moment of time is called instance
of database. Database schema defines the attributes in tables that belong to a particular database.
The value of these attributes at a moment of time is called the instance of that database.
For example, we have seen the schema of table “employee” above. Let’s see the table with the data
now. At this moment the table contains two rows (records). This is the the current instance of the
table “employee” because this is the data that is stored in this table at this particular moment of time.
Let’s take another example: Let’s say we have a single table student in the database, today the table
has 100 records, so today the instance of the database has 100 records. We are going to add another
100 records in this table by tomorrow so the instance of database tomorrow will have 200 records in
table. In short, at a particular moment the data stored in database is called the instance, this changes
over time as and when we add, delete or update data in the database.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
DBMS languages
LAST UPDATED: NOVEMBER 14, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS
Database languages are used to read, update and store data in a database. There are several such
languages that can be used for this purpose; one of them is SQL (Structured Query Language).
All of these commands either defines or update the database schema that’s why they come under
Data Definition language.
In practical data definition language, data manipulation language and data control languages are not
separate language, rather they are the parts of a single database language such as SQL.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Fahad says
NOVEMBER 27, 2016 AT 4:35 PM
Reply
habibur says
DECEMBER 8, 2016 AT 12:13 PM
Reply
Sakhawat says
MARCH 17, 2018 AT 6:42 PM
DDL and DML and Query Languages are the mode of database language. On the other
hand SQL is the example of Database language, not a subcategory of DDL.
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Data Model is a logical structure of Database. It describes the design of database to reflect entities,
attributes, relationship among data, constrains etc.
Object based logical Models – Describe data at the conceptual and view levels.
1. E-R Model
2. Object oriented Model
Record based logical Models – Like Object based model, they also describe data at the conceptual
and view levels. These models specify logical structure of database with records, fields and attributes.
1. Relational Model
2. Hierarchical Model
3. Network Model – Network Model is same as hierarchical model except that it has graph-like
structure rather than a tree-based structure. Unlike hierarchical model, this model allows each
record to have more than one parent record.
Physical Data Models – These models describe data at the lowest level of abstraction.
❮ Previous Next ❯
PalmSens Open
– Chaitanya
Comments
Thanks a lot for simple explanations. Keep It Up. There is lot more work to do on this site
to make it best among all.
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
An Entity–relationship model (ER model) describes the structure of a database with the help of a
diagram, which is known as Entity Relationship Diagram (ER Diagram). An ER model is a design or
blueprint of a database that can later be implemented as a database. The main components of E-R
model are: entity set and relationship set.
A simple ER Diagram:
In the following diagram we have two entities Student and College and their relationship. The
relationship between Student and College is many to one as a college can have many students
however a student cannot study in multiple colleges at the same time. Student entity has attributes
such as Stu_Id, Stu_Name & Stu_Addr and College entity has attributes such as Col_ID & Col_Name.
Here are the geometric shapes and their meaning in an E-R Diagram. We will discuss these terms in
detail in the next section(Components of a ER Diagram) of this guide so don’t worry too much about
these terms now, just go through them once.
1. Entity
An entity is an object or component of data. An entity is represented as rectangle in an ER diagram.
For example: In the following ER diagram we have two entities Student and College and these two
entities have many to one relationship as many students study in a single college. We will read more
about relationships later, for now focus on entities.
Weak Entity:
An entity that cannot be uniquely identified by its own attributes and relies on the relationship with
other entity is called weak entity. The weak entity is represented by a double rectangle. For example –
a bank account cannot be uniquely identified without knowing the bank to which the account belongs,
so bank account is a weak entity.
2. Attribute
An attribute describes the property of an entity. An attribute is represented as Oval in an ER diagram.
There are four types of attributes:
1. Key attribute
2. Composite attribute
3. Multivalued attribute
4. Derived attribute
1. Key attribute:
A key attribute can uniquely identify an entity from an entity set. For example, student roll number can
uniquely identify a student from a set of students. Key attribute is represented by oval same as other
attributes however the text of key attribute is underlined.
2. Composite attribute:
An attribute that is a combination of other attributes is known as composite attribute. For example, In
student entity, the student address is a composite attribute as an address is composed of other
attributes such as pin code, state, country.
3. Multivalued attribute:
An attribute that can hold multiple values is known as multivalued attribute. It is represented with
double ovals in an ER Diagram. For example – A person can have more than one phone numbers so
the phone number attribute is multivalued.
4. Derived attribute:
A derived attribute is one whose value is dynamic and derived from another attribute. It is represented
by dashed oval in an ER Diagram. For example – Person age is a derived attribute as it changes over
time and can be derived from another attribute (Date of birth).
E-R diagram with multivalued and derived attributes:
3. Relationship
A relationship is represented by diamond shape in ER diagram, it shows the relationship among
entities. There are four types of relationships:
1. One to One
2. One to Many
3. Many to One
4. Many to Many
When a single instance of an entity is associated with a single instance of another entity then it is
called one to one relationship. For example, a person has only one passport and a passport is given to
one person.
2. One to Many Relationship
When a single instance of an entity is associated with more than one instances of another entity then
it is called one to many relationship. For example – a customer can place many orders but a order
cannot be placed by many customers.
When more than one instances of an entity is associated with a single instance of another entity then
it is called many to one relationship. For example – many students can study in a single college but a
student cannot study in many colleges at the same time.
Partial participation is represented using a single line between the entity set and relationship set.
Example: Consider an example of an IT company. There are many employees working for the
company. Let’s take the example of relationship between employee and role software engineer. Every
software engineer is an employee but not every employee is software engineer as there are employees
for other roles as well, such as housekeeping, managers, CEO etc. so we can say that participation of
employee entity set to the software engineer relationship is partial.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Gurudevi says
MAY 26, 2016 AT 6:03 PM
Thank so much sir .This is a very easy way explanation so anyone can understand
Reply
Thanks for this. Bcz this really very easiast way to describe this which is easily understood
by anyone.
Thanku so much sir…
Reply
Thank you so much sir. Understood everything really well. saved a lot of googling hassle.
Reply
Prasad says
JULY 23, 2021 AT 7:08 PM
Thank you ❤👍👍 so much a very simple way to explain and very easy way to
understand even for average students and very very nice way of explanation.
Reply
virendra says
AUGUST 17, 2021 AT 7:17 AM
Very nice content if i could get the PPT of Entity Relationship Diagram – ER Diagram in
DBMS.
it could be helpful to me for future study.
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
We have already covered ER diagram in our previous article DBMS ER Model Concept. In this post, we
will discuss the various issues that can arise while designing an ER diagram.
Here are some of the issues that can occur while ER diagram design process:
Now if we compare the two cases we discussed above, in the first case we can say that the student
can have only one student id, however in the second case when we chose student id as an entity it
implied that a student can have more than one student id.
Let’s take an example to understand it better: A person takes a loan from a bank, here we have two
entities person and bank and their relationship is loan. This is fine until there is a need to disburse a
joint loan, in such case a new relationship needs to be created to define the relationship between the
two individuals who have taken joint loan. In this scenario, it is better to choose loan as an entity set
rather than a relationship set.
The n-ary relationships can make ER design complex, however the good news is that we can convert
and represent any n-ary relationship using multiple binary relationships.
This may sound confusing so lets take an example to understand how we can convert an n-ary
relationship to multiple binary relationships. Now lets say we have to describe a relationship between
four family members: father, mother, son and daughter. This can easily be represented in forms of
multiple binary relationships, father-mother relationship as “spouse”, son and daughter relationship as
“siblings” and father and mother relationship with their child as “child”.
❮ Previous Next ❯
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
We have learned ER Diagram and ER design issues in previous articles. In this post, we will cover how
to convert ER diagram into database tables.
First we will convert simple ER diagrams to tables. In the end, we will take a complex ER diagram and
then we will convert it into set of tables.
Let’s take an example: Here we have an entity set Employee with the attributes Name, Age, Emp_Id
and Salary. When we convert this ER diagram to table, the entity set becomes table so we have a table
named “Employee” as shown in the following diagram. The attributes of the entity set becomes the
attributes of the table.
2. Strong Entity Set With Composite Attributes
Now we will see how to convert Strong entity set with composite attributes ER to table. The
conversion is fairly simple in this case as well. The entity set will be the table and the simple attributes
of the composite attributes will become the attributes of the table while the composite attribute itself
will be ignored during conversion.
Let’s take an example. As you can see we have a composite attribute Name and this composite
attribute has two simple attributes First_N and Last_N. While converting this ER to table we have not
used the composite attribute itself in the table instead we have used the simple attributes of this
composite attribute as table’s attributes.
We will understand this conversion with the help of a diagram. Let’s take the same example that we
have seen above, here we have added a new multi-valued attribute Dept. An employee can work in
multiple department so we have this Dept attribute marked as multi-valued. Whenever we have a
multi-valued attribute, there needs to be more than one table to represent the ER diagram. As you can
see we have created two tables to represent this ER.
4. Relationship Set to Table conversion
While converting the relationship set to a table, the primary attributes of the two entity sets becomes
the table attributes and if the relationship set has any attribute that also becomes the attribute of the
table.
In the following example, we have two entity sets Employee and Department. These entity sets are
associated to each other using the Works relationship set. To convert this relationship set Works to the
table, we take the primary attributes of each entity set, these are Emp_Id and Dept_Id and all the
attributes of the relationship set and form a table.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
A relationship between two entities is called recursive relationship if the two entities are of similar
type. For example: A relationship between a manager and an engineer is a recursive relationship
because both manager and employee are employees of the company. Similarly a relationship
“marries” between two person is recursive relationship as a person marries to another person, in this
example the entity person is has a relationship with itself. In this guide, you will learn how to represent
a recursive relationship in an ER diagram.
Recursive relation ER diagram with a role name: In this ER diagram, we are depicting the supervises
relationship with the role names. This clearly shows that a supervisor employee has a one to many
relationship with the supervised employee.
Employees hierarchy:
Here we are displaying an employee hierarchy of 8 employees. This diagram shows us that, in this
supervisor relationship, the total participation is optional as there are some employees that are not
supervised by anyone such as “Chaitanya” and there are some employees, who do not supervise
anyone such as Rahul, Jim, Steve, Carl & Ron.
More examples of Recursive relationships
Some other example ER diagrams of recursive relationships:
❮ ER Diagram DBMS tutorial ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
DBMS Generalization
LAST UPDATED: NOVEMBER 16, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS
Generalization is a process in which the common attributes of more than one entities form a new
entity. This newly formed entity is called generalized entity.
Generalization Example
Lets say we have two entities Student and Teacher.
Attributes of Entity Student are: Name, Address & Grade
Attributes of Entity Teacher are: Name, Address & Salary
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
DBMS Specialization
LAST UPDATED: NOVEMBER 16, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS
Specialization is a process in which an entity is divided into sub-entities. You can think of it as a
reverse process of generalization, in generalization two entities combine together to form a new
higher level entity. Specialization is a top-down process.
The idea behind Specialization is to find the subsets of entities that have few distinguish attributes.
For example – Consider an entity employee which can be further classified as sub-entities Technician,
Engineer & Accountant because these sub entities have some distinguish attributes.
Specialization Example
In the above diagram, we can see that we have a higher level entity “Employee” which we have divided
in sub entities “Technician”, “Engineer” & “Accountant”. All of these are just an employee of a company,
however their role is completely different and they have few different attributes. Just for the example, I
have shown that Technician handles service requests, Engineer works on a project and Accountant
handles the credit & debit details. All of these three employee types have few attributes common such
as name & salary which we had left associated with the parent entity “Employee” as shown in the
above diagram.
❮ Previous Next ❯
Top Related Articles:
1. Alternate key in DBMS
2. DBMS – Three Level Architecture
3. DBMS – ER Design Issues
4. ACID properties in DBMS
5. Data Replication in DBMS
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Manoj Kumar Dewangan says
DECEMBER 30, 2018 AT 2:18 PM
I am really very thankful to the author of this website. Really it helps me alot to prepare for
my university examination as well as clear the concept of many things which was too
much difficult for me.
Reply
Gopal says
SEPTEMBER 24, 2020 AT 7:18 AM
Reply
Very Nice and Crystal clear information about DBMS , Thank you very much .
Reply
Aarya says
DECEMBER 25, 2020 AT 3:48 PM
Thank you so much for all the extra efforts you make to help us grow.
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
DBMS Aggregration
LAST UPDATED: NOVEMBER 16, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS
Aggregation is a process in which a single entity alone is not able to make sense in a relationship so
the relationship of two entities acts as one entity. I know it sounds confusing but don’t worry the
example we will take, will clear all the doubts.
Aggregration Example
In real world, we know that a manager not only manages the employee working under them but he has
to manage the project as well. In such scenario if entity “Manager” makes a “manages” relationship
with either “Employee” or “Project” entity alone then it will not make any sense because he has to
manage both. In these cases the relationship of two entities acts as one entity. In our example, the
relationship “Works-On” between “Employee” & “Project” acts as one entity that has a relationship
“Manages” with the entity “Manager”.
❮ Previous Next ❯
Top Related Articles:
1. Alternate key in DBMS
2. ACID properties in DBMS
3. DBMS – ER Design Issues
4. Data Replication in DBMS
5. DBMS – Three Level Architecture
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In relational model, the data and relationships are represented by collection of inter-related tables.
Each table is a group of column and rows, where column represents attribute of an entity and rows
represents records.
Sample relationship Model: Student table with 3 columns and four records.
Table: Student
111 Ashish 23
123 Saurav 22
169 Lester 24
234 Lou 26
Table: Course
Here Stu_Id, Stu_Name & Stu_Age are attributes of table Student and Stu_Id, Course_Id &
Course_Name are attributes of table Course. The rows with values are the records (commonly known
as tuples).
❮ Previous Next ❯
CrowdStrike® Open
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In hierarchical model, data is organized into a tree like structure with each record is having one parent
record and many children. The main drawback of this model is that, it can have only one to many
relationships between nodes.
123 Steve 29
367 Chaitanya 27
234 Ajeet 28
Course Table:
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
RDBMS Concepts
LAST UPDATED: OCTOBER 16, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS
RDBMS stands for relational database management system. A relational model can be represented as
a table of rows and columns. A relational database has following major components:
1. Table
2. Record or Tuple
3. Field or Column name or Attribute
4. Domain
5. Instance
6. Schema
7. Keys
1. Table
A table is a collection of data represented in rows and columns. Each table has a name in database.
For example, the following table “STUDENT” stores the information of students in database.
Table: STUDENT
2. Record or Tuple
Each row of a table is known as record. It is also known as tuple. For example, the following row is a
record that we have taken from the above table.
4. Domain
A domain is a set of permitted values for an attribute in table. For example, a domain of month-of-year
can accept January, February,…December as values, a domain of dates can accept all possible valid
dates etc. We specify domain of attribute while creating a table.
An attribute cannot accept values that are outside of their domains. For example, In the above table
“STUDENT”, the Student_Id field has integer domain so that field cannot accept values that are not
integers for example, Student_Id cannot has values like, “First”, 10.11 etc.
6. Keys
This is our next topic, I have covered the keys in detail in separate tutorials. You can refer the keys
index here.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this tutorial, we will discuss Relational Algebra. In the previous tutorial, we had a brief discussion on
the basics of relational algebra and calculus where we learned the need to use these theoretical
mathematical systems.
On the other hand relational calculus is a non-procedural query language, which means it tells what
data to be retrieved but doesn’t tell how to retrieve it. We will discuss relational calculus in a separate
tutorial.
Derived Operations:
1. Natural Join (⋈)
2. Left, Right, Full outer join (⟕, ⟖, ⟗)
3. Intersection (∩)
4. Division (÷)
Lets discuss these operations one by one with the help of examples.
If you understand little bit of SQL then you can think of it as a where clause in SQL, which is used for
the same purpose.
σ Condition/Predicate(Relation/Table name)
Select Operator (σ) Example
Table: CUSTOMER
---------------
Query:
σ Customer_City="Agra" (CUSTOMER)
Output:
Table: CUSTOMER
Query:
Output:
Customer_Name Customer_City
------------- -------------
Steve Agra
Raghu Agra
Chaitanya Noida
Ajeet Delhi
Carl Delhi
Lets discuss union operator a bit more. Lets say we have two relations R1 and R2 both have same
columns and we want to select all the tuples(rows) from these relations then we can apply the union
operator on these relations.
Note: The rows (tuples) that are present in both the tables will only appear once in the union set. In
short you can say that there are no duplicates present after the union operation.
table_name1 ∪ table_name2
Query:
Output:
Student_Name
------------
Aditya
Carl
Paul
Lucy
Rick
Steve
Note: As you can see there are no duplicate names present in the output even though we had few
common names in both the tables, also in the COURSE table we had the duplicate name itself.
Lets say we have two relations R1 and R2 both have same columns and we want to select all those
tuples(rows) that are present in both the relations, then in that case we can apply intersection
operation on these two relations R1 ∩ R2.
Note: Only those rows that are present in both the tables will appear in the result set.
table_name1 ∩ table_name2
Table 2: STUDENT
Query:
Output:
Student_Name
------------
Aditya
Steve
Paul
Lucy
table_name1 - table_name2
Query:
Lets write a query to select those student names that are present in STUDENT table but not present in
COURSE table.
Output:
Student_Name
------------
Carl
Rick
R1 X R2
Table 2: S
Col_X Col_Y
----- -----
XX 99
YY 11
ZZ 101
Query:
Lets find the cartesian product of table R and S.
R X S
Output:
Note: The number of rows in the output will always be the cross product of number of rows in each
table. In our example table 1 has 3 rows and table 2 has 3 rows so the output has 3×3 = 9 rows.
Rename (ρ)
Rename (ρ) operation can be used to rename a relation or an attribute of a relation.
Rename (ρ) Syntax:
ρ(new_relation_name, old_relation_name)
Table: CUSTOMER
Query:
ρ(CUST_NAMES, ∏(Customer_Name)(CUSTOMER))
Output:
CUST_NAMES
----------
Steve
Raghu
Chaitanya
Ajeet
Carl
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In the previous tutorial, we discussed Relational Algebra which is a procedural query language. In this
tutorial, we will discuss Relational Calculus, which is a non-procedural query language.
Query to display the last name of those students where age is greater than 30
Last_Name
---------
Singh
Query to display all the details of students where Last name is ‘Singh’
Output:
Query to find the first name and age of students where student age is greater than 27
Note:
The symbols used for logical operators are: ∧ for AND, ∨ for OR and ┓ for NOT.
Output:
First_Name Age
---------- ----
Ajeet 30
Chaitanya 31
Carl 28
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this article, we will discuss the difference between view and table. Both of these terms are
commonly used in relational database.
What is a view?
A view is a result of a SQL query. The result look like a table, however this table is not physically
present in the database, rather the data displayed as a view is fetched from the tables in database.
This is why view is often referred as virtual table.
Example:
Employees table:
Emp_Name Emp_Age
Ron 60
Ajeet 61
Daniel 65
What is a table?
A table contains the data in form of rows and columns. For example, if a student table contains
records of 100 students and details of each student consists of student name, id, age and address,
then the student table should have 100 rows and 4 columns.
The columns are the attributes of the records such as student name, id age and address and each
student record is stored in a row so 100 rows for 100 students.
View vs Table
VIEW TABLE
Recommended Posts:
❮ Learn DBMS
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
keys in DBMS
LAST UPDATED: NOVEMBER 19, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS
Key plays an important role in relational database; it is used for identifying unique rows from table. It
also establishes relationship among tables.
Note: Guys I have been getting comments that there are no examples of keys here. If you click on
the hyperlink provided below in green colour, you would see the complete separate tutorial of each
key with examples.
Primary Key – A primary is a column or set of columns in a table that uniquely identifies tuples (rows)
in that table.
Super Key – A super key is a set of one of more columns (attributes) to uniquely identify rows in a
table.
Candidate Key – A super key with no redundant attribute is known as candidate key
Alternate Key – Out of all candidate keys, only one gets selected as primary key, remaining keys are
known as alternate or secondary keys.
Composite Key – A key that consists of more than one attribute to uniquely identify rows (also known
as records & tuples) in a table is called composite key.
Foreign Key – Foreign keys are the columns of a table that points to the primary key of another table.
They act as a cross-reference between tables.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Kgotso says
MARCH 14, 2017 AT 11:07 AM
It would really be helpful if there were examples for all the keys using one table to
demonstrate them
Reply
Scott says
JUNE 24, 2018 AT 2:14 PM
You can reach the examples of each by clicking either the hyperlinks on the left or the
ones in the web page itself.
Reply
if u would have clicked on any of key u might have seen all the details mate.
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this guide, you will learn about primary key in DBMS with the help of examples. We will discuss,
what is a primary key, how it is different from other keys in DBMS such as foreign key and unique key.
For example, you want to store student data in a table “student”. The attributes of this table are:
student_id, student_name, student_age, student_address. The primary key is a set of one or more of
these attributes to uniquely identify a record in the table. In the case, since student_id is different for
each student, this can be considered a primary key.
1. Minimal
The primary key should contain minimal number of attributes. The example we seen above, where
student_id is able to uniquely identify a record, here combination of two attributes such as {student_id,
student_name} can also uniquely identify record. However since we should choose minimal set of
attribute thus student is chosen as primary key instead of {student_id, student_name}.
2. Unique
The value of primary key should be unique for each row of the table. The column(s) that makes the
key cannot contain duplicate values. This is because non-unique value would not help us uniquely
identify record. If two students have same student_id then updating a record of one student based on
primary key can mistakenly update record of other student.
3. Non Null
The attribute(s) that is marked as primary key is not allowed to have null values.
The primary key value should not change over time. It should remain as it is until explicitly updated by
the user.
5. Easily accessible
The primary key of the record should be accessible to all the users who are performing any operations
on the database.
It can be a set of more than one attributes (columns). For example {Stu_Id, Stu_Name} collectively can
identify the tuple in the above table, but we do not choose it as primary key because Stu_Id alone is
enough to uniquely identifies rows in a table and we always go for minimal set. Having that said, we
should choose more than one columns as primary key only when there is no single column that can
uniquely identify the tuple in table.
Syntax for Creating Primary key constraint:
While creating table you can define primary key like this:
For example: Here we are making stu_id primary key while creating the table STUDENTS.
It uniquely identifies each row of a table. This is definitely useful to perform any operation on
data such as update, delete, search etc.
It allows faster access of the record because it uses the concept indexing in DBMS.
Attribute Stu_Name alone cannot be a primary key as more than one students can have same
name.
Attribute Stu_Age alone cannot be a primary key as more than one students can have same age.
Attribute Stu_Id alone is a primary key as each student has a unique id that can identify the
student record in the table.
Note: In some cases an attribute alone cannot uniquely identify a record in a table, in that case we try
to find a set of attributes that can uniquely identify a row in table. We will see the example of it after
this example.
Table Name: STUDENTS
Another example: composite key with more than one attributes
Consider this table ORDER, this table keeps the daily record of the purchases made by the customer.
This table has three attributes: Customer_ID, Product_ID & Order_Quantity.
Customer_ID alone cannot be a primary key as a single customer can place more than one order
thus more than one rows of same Customer_ID value. As we see in the following example that
customer id 1011 has placed two orders with product if 9023 and 9111.
Product_ID alone cannot be a primary key as more than one customers can place a order for the
same product thus more than one rows with same product id. In the following table, customer id
1011 & 1122 placed an order for the same product (product id 9023).
Order_Quantity alone cannot be a primary key as more more than one customers can place the
order for the same quantity.
Since none of the attributes alone were able to become a primary key, let’s try to make a set of
attributes that plays the role of it. The set {Customer_ID, Product_ID} together can identify the
rows uniquely in the table so this set is the primary key for this table.
Note: While choosing a set of attributes for a primary key, we always choose the minimal set that has
minimum number of attributes. For example, if there are two sets that can identify row in table, the set
that has minimum number of attributes should be chosen as primary key.
Suppose we didn’t define the primary key while creating table then we can define it later like this:
Another way:
When we have only one attribute as primary key, like we see in the first example of STUDENT table. we
can define the key like this as well:
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Definition of Super Key in DBMS: A super key is a set of one or more attributes (columns), which can
uniquely identify a row in a table. Often DBMS beginners get confused between super key and
candidate key, so we will also discuss candidate key and its relation with super key in this article.
{Emp_SSN}
{Emp_Number}
{Emp_SSN, Emp_Number}
{Emp_SSN, Emp_Name}
{Emp_SSN, Emp_Number, Emp_Name}
{Emp_Number, Emp_Name}
Candidate Keys: As I mentioned in the beginning, a candidate key is a minimal super key with no
redundant attributes. The following two set of super keys are chosen from the above sets as there are
no redundant attributes in these sets.
{Emp_SSN}
{Emp_Number}
Only these two sets are candidate keys as all other sets are having redundant attributes that are not
necessary for unique identification.
Primary key:
A Primary key is selected from a set of candidate keys. This is done by database admin or database
designer. We can say that either {Emp_SSN} or {Emp_Number} can be chosen as a primary key for the
table Employee.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Ansh says
JULY 17, 2016 AT 5:07 PM
Reply
Lee says
MAY 14, 2017 AT 11:50 AM
Emp_Name cannot be both because names are not unique e.g. there could be
hundreds of Jack inside the database.
Reply
anuj says
FEBRUARY 26, 2017 AT 7:18 PM
Reply
Gabriele says
JULY 16, 2017 AT 6:06 PM
Emp_Name is NOT a super key because we can have 2 “Steve” or 3 or 4 in that table..
Name are not unique, we cannot say the same for SSN codes or Emp_Number. Bye
Gabriele
Reply
Reply
Reply
I have added more details on this in the guide, Please refer the added section above.
Reply
lavanya says
MAY 4, 2018 AT 7:55 AM
I have doubt that
{Emp_SSN, Emp_Number} pair also Candidate Keys?.because both are not a redundant
attributes
Reply
Reply
kirti says
SEPTEMBER 17, 2018 AT 1:29 PM
Reply
There can be number of candidate keys present in a table, however there is only one
primary key. A primary key is always chosen from a set of candidate keys. The
decision of choosing primary key from a set of candidate keys is made by database
admin.
Reply
why not the emp_name is the part of candidate key .though it is also having unique set of
values???
Reply
Reply
keerthi says
OCTOBER 17, 2018 AT 1:25 PM
candidate keys are always come with pair so, why{Emp_SSN, Emp_Number} is not a
candidate key
Reply
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Definition of Candidate Key in DBMS: A super key with no redundant attribute is known as candidate
key. Candidate keys are selected from the set of super keys, the only thing we take care while
selecting candidate key is that the candidate key should not have any redundant attributes. That’s the
reason they are also termed as minimal super key.
Lets select the candidate keys from the above set of super keys.
Note: A primary key is selected from the set of candidate keys. That means we can either have
Emp_Id or Emp_Number as primary key. The decision is made by DBA (Database administrator)
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
kamal pratap says
OCTOBER 19, 2017 AT 2:26 PM
Sir, Can a Candidate key contain NULL values ? If yes, then how many?
Reply
Dibbyendu says
MAY 16, 2018 AT 3:17 PM
A primary key is being selected from the group of candidate keys. That means we can
either have Emp_Id or Emp_Number as primary key. Now, a primary key can’t have a
null value [we learnt it in Primary key ]. Hence candidate key must not contain a null
value…
Reply
No Candidate key does not contain NULL values as a Primary key is selected by the
group of Candidate key and as we know that Primary Key has unique constraint and
NOT NULL.
Thanks!
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
As we have seen in the candidate key guide that a table can have multiple candidate keys. Among
these candidate keys, only one key gets selected as primary key, the remaining keys are known as
alternative or secondary keys.
Table: Employee/strong>
DBA (Database administrator) can choose any of the above key as primary key. Lets say Emp_Id is
chosen as primary key.
Since we have selected Emp_Id as primary key, the remaining key Emp_Number would be called
alternative or secondary key.
❮ Previous Next ❯
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Definition of Composite key: A key that has more than one attributes is known as composite key. It is
also known as compound key.
Note: Any key such as super key, primary key, candidate key etc. can be called composite key if it has
more than one attributes.
Table – Sales
Column cust_Id alone cannot become a key as a same customer can place multiple orders, thus the
same customer can have multiple entires.
Column order_Id alone cannot be a primary key as a same order can contain the order of multiple
products, thus same order_Id can be present multiple times.
Column product_code cannot be a primary key as more than one customers can place order for the
same product.
Column product_count alone cannot be a primary key because two orders can be placed for the same
product count.
Based on this, it is safe to assume that the key should be having more than one attributes:
Key in above table: {cust_id, product_code}
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Anonymous says
APRIL 6, 2016 AT 8:56 PM
“Key in above table: {cust_id, order_id}”. But this cannot be because 1st and 4th rows have
the same key then and so does not uniquely identify the rows.. right?
Reply
Himanshu says
JUNE 4, 2016 AT 12:06 PM
Agreed..
Any one of the both. cust_id or order_id should be different here i think, to make
{cust_id, order_id} key.
Reply
pooja says
SEPTEMBER 1, 2016 AT 4:57 PM
Reply
haritha says
OCTOBER 17, 2016 AT 11:13 AM
Reply
Lee says
MAY 14, 2017 AT 8:22 PM
The difference is that candidate key does not allow redundant attributes only unique
attributes like ID and Item Code etc.
Reply
A composite key must have more than one attributes while a candidate key can
contain a single attribute.
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Definition: Foreign keys are the columns of a table that points to the primary key of another table.
They act as a cross-reference between tables.
For example:
In the below example the Stu_Id column in Course_enrollment table is a foreign key as it points to
the primary key of the Student table.
Course_enrollment table:
Course_Id Stu_Id
C01 101
C02 102
C03 101
C05 102
C06 103
C07 102
Student table:
101 Chaitanya 22
102 Arya 26
103 Bran 25
104 Jon 21
Note: Practically, the foreign key has nothing to do with the primary key tag of another table, if it points
to a unique column (not necessarily a primary key) of another table then too, it would be a foreign key.
So, a correct definition of foreign key would be: Foreign keys are the columns of a table that points to
the candidate key of another table.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Constraints in DBMS
LAST UPDATED: NOVEMBER 17, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS
Constraints enforce limits to the data or type of data that can be inserted/updated/deleted from a
table. The whole purpose of constraints is to maintain the data integrity during an
update/delete/insert into a table. In this tutorial we will learn several types of constraints that can be
created in RDBMS.
Types of constraints
NOT NULL
UNIQUE
DEFAULT
CHECK
Key Constraints – PRIMARY KEY, FOREIGN KEY
Domain constraints
Mapping constraints
NOT NULL:
NOT NULL constraint makes sure that a column does not hold NULL value. When we don’t provide
value for a particular column while inserting a record into a table, it takes NULL value by default. By
specifying NULL constraint, we can be sure that a particular column(s) cannot have NULL values.
Example:
UNIQUE:
UNIQUE Constraint enforces a column or set of columns to have unique values. If a column has a
unique constraint, it means that particular column cannot have duplicate values in a table.
DEFAULT:
The DEFAULT constraint provides a default value to a column when there is no value provided while
inserting a record into a table.
CREATE TABLE STUDENT(
ROLL_NO INT NOT NULL,
STU_NAME VARCHAR (35) NOT NULL,
STU_AGE INT NOT NULL,
EXAM_FEE INT DEFAULT 10000,
STU_ADDRESS VARCHAR (35) ,
PRIMARY KEY (ROLL_NO)
);
CHECK:
This constraint is used for specifying range of values for a particular column of a table. When this
constraint is being set on a column, it ensures that the specified column must have the value falling in
the specified range.
In the above example we have set the check constraint on ROLL_NO column of STUDENT table. Now,
the ROLL_NO field must have the value greater than 1000.
Key constraints:
PRIMARY KEY:
Primary key uniquely identifies each record in a table. It must have unique values and cannot contain
nulls. In the below example the ROLL_NO field is marked as primary key, that means the ROLL_NO
field cannot have duplicate and null values.
FOREIGN KEY:
Foreign keys are the columns of a table that points to the primary key of another table. They act as a
cross-reference between tables.
Read more about it here.
Domain constraints:
Each table has certain set of columns and each column allows a same type of data, based on its data
type. The column does not accept values of any other data type.
Domain constraints are user defined data type and we can define them like this:
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY / FOREIGN KEY /
CHECK / DEFAULT)
Mapping constraints:
Read about Mapping constraint here.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
A table is DBMS is a set of rows and columns that contain data. Columns in table have a unique name,
often referred as attributes in DBMS. A domain is a unique set of values permitted for an attribute in a
table. For example, a domain of month-of-year can accept January, February….December as possible
values, a domain of integers can accept whole numbers that are negative, positive and zero.
Definition: Domain constraints are user defined data type and we can define them like this:
Domain Constraint = data type + Constraints (NOT NULL / UNIQUE / PRIMARY KEY / FOREIGN KEY /
CHECK / DEFAULT)
Example:
For example I want to create a table “student_info” with “stu_id” field having value greater than 100, I
can create a domain and table like this:
Another example:
I want to create a table “bank_account” with “account_type” field having value either “checking” or
“saving”:
❮ Previous Next ❯
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Mapping Cardinality:
One to One: An entity of entity-set A can be associated with at most one entity of entity-set B and an
entity in entity-set B can be associated with at most one entity of entity-set A.
One to Many: An entity of entity-set A can be associated with any number of entities of entity-set B
and an entity in entity-set B can be associated with at most one entity of entity-set A.
Many to One: An entity of entity-set A can be associated with at most one entity of entity-set B and an
entity in entity-set B can be associated with any number of entities of entity-set A.
Many to Many: An entity of entity-set A can be associated with any number of entities of entity-set B
and an entity in entity-set B can be associated with any number of entities of entity-set A.
Example:
CREATE TABLE Customer (
customer_id int PRIMARY KEY NOT NULL,
first_name varchar(20),
last_name varchar(20)
);
Assuming, that a customer orders more than once, the above relation represents one to many relation.
Similarly we can achieve other mapping constraints based on the requirements.
– Chaitanya
Comments
mak says
JANUARY 6, 2016 AT 6:12 AM
Thank you,
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Cardinality in DBMS
LAST UPDATED: NOVEMBER 17, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS
In DBMS you may hear cardinality term at two different places and it has two different meanings as
well.
One to One – A single row of first table associates with single row of second table. For example, a
relationship between person and passport table is one to one because a person can have only one
passport and a passport can be assigned to only one person.
One to Many – A single row of first table associates with more than one rows of second table. For
example, relationship between customer and order table is one to many because a customer can
place many orders but a order can be placed by a single customer alone.
Many to One – Many rows of first table associate with a single row of second table. For example,
relationship between student and university is many to one because a university can have many
students but a student can only study only in single university at a time.
Many to Many – Many rows of first table associate with many rows of second table. For example,
relationship between student and course table is many to many because a student can take many
courses at a time and a course can be assigned to many students.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
The attributes of a table is said to be dependent on each other when an attribute of a table uniquely
identifies another attribute of the same table.
For example: Suppose we have a student table with attributes: Stu_Id, Stu_Name, Stu_Age. Here Stu_Id
attribute uniquely identifies the Stu_Name attribute of student table because if we know the student id
we can tell the student name associated with it. This is known as functional dependency and can be
written as Stu_Id->Stu_Name or in words we can say Stu_Name is functionally dependent on Stu_Id.
Formally:
If column A of a table uniquely identifies the column B of same table then it can represented as A->B
(Attribute B is functionally dependent on attribute A)
❮ Previous Next ❯
Top Related Articles:
1. Deadlock in DBMS
2. Instance and schema in DBMS
3. Alternate key in DBMS
4. Trivial functional dependency in DBMS with example
5. DBMS – ER Design Issues
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Prasad says
SEPTEMBER 1, 2015 AT 3:00 AM
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
The dependency of an attribute on a set of attributes is known as trivial functional dependency if the
set of attributes includes that attribute.
For example: Consider a table with two columns Student_id and Student_Name.
Also, Student_Id -> Student_Id & Student_Name -> Student_Name are trivial dependencies too.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
If a functional dependency X->Y holds true where Y is not a subset of X then this dependency is called
non trivial Functional dependency.
For example:
An employee table with three attributes: emp_id, emp_name, emp_address.
The following functional dependencies are non-trivial:
emp_id -> emp_name (emp_name is not a subset of emp_id)
emp_id -> emp_address (emp_address is not a subset of emp_id)
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Multivalued dependency occurs when there are more than one independent multivalued attributes in a
table.
For example: Consider a bike manufacture company, which produces two colors (Black and white) in
each model every year.
Here columns manuf_year and color are independent of each other and dependent on bike_model. In
this case these two columns are said to be multivalued dependent on bike_model. These
dependencies can be represented like this:
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
X -> Z is a transitive dependency if the following three functional dependencies hold true:
X->Y
Y does not ->X
Y->Z
Note: A transitive dependency can only occur in a relation of three of more attributes. This
dependency helps us normalizing the database in 3NF (3rd Normal Form).
{Book} ->{Author} (if we know the book, we knows the author name)
Therefore as per the rule of transitive dependency: {Book} -> {Author_age} should hold, that makes
sense because if we know the book name we can know the author’s age.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Jennifer says
DECEMBER 7, 2016 AT 2:17 AM
{Book} ->{Author} (if we know the book, we knows (KNOW*) the author name)
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Normalization is a process of organizing the data in database to avoid data redundancy, insertion
anomaly, update anomaly & deletion anomaly. Let’s discuss about anomalies first then we will discuss
normal forms with examples.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These are:
Insertion, update and deletion anomaly. Let’s take an example to understand this.
Example: A manufacturing company stores the employee details in a table Employee that has four
attributes: Emp_Id for storing employee’s id, Emp_Name for storing employee’s name, Emp_Address for
storing employee’s address and Emp_Dept for storing the department details in which the employee
works. At some point of time the table looks like this:
This table is not normalized. We will see the problems that we face when a table in database is not
normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs to two
departments of the company. If we want to update the address of Rick then we have to update the
same in two rows or the data will become inconsistent. If somehow, the correct address gets updated
in one department but not in other then as per the database, Rick would be having two different
addresses, which is not correct and would lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and currently not
assigned to any department then we would not be able to insert the data into the table if Emp_Dept
field doesn’t allow null.
Delete anomaly: Let’s say in future, company closes the department D890 then deleting the rows that
are having Emp_Dept as D890 would also delete the information of employee Maggie since she is
assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will discuss
about normalization.
Normalization
Here are the most commonly used normal forms:
As per the rule of first normal form, an attribute (column) of a table cannot hold multiple values. It
should hold only atomic values.
Example: Let’s say a company wants to store the names and contact details of its employees. It
creates a table in the database that looks like this:
8812121212 ,
102 Jon Kanpur
9900012222
103 Ron Chennai 7778881212
9990000123,
104 Lester Bangalore
8123450987
Two employees (Jon & Lester) have two mobile numbers that caused the Emp_Mobile field to have
multiple values for these two employees.
This table is not in 1NF as the rule says “each attribute of a table must have atomic (single) values”,
the Emp_Mobile values for employees Jon & Lester violates that rule.
To make the table complies with 1NF we need to create separate rows for the each mobile number in
such a way so that none of the attributes contains multiple values.
An attribute that is not part of any candidate key is known as non-prime attribute.
Example: Let’s say a school wants to store the data of teachers and the subjects they teach. They
create a table Teacher that looks like this: Since a teacher can teach more than one subjects, the table
can have multiple rows for a same teacher.
111 Maths 38
111 Physics 38
222 Biology 38
333 Physics 40
333 Chemistry 40
This table is in 1 NF because each attribute has atomic values. However, it is not in 2NF because non
prime attribute Teacher_Age is dependent on Teacher_Id alone which is a proper subset of candidate
key. This violates the rule for 2NF as the rule says “no non-prime attribute is dependent on the proper
subset of any candidate key of the table”.
To make the table complies with 2NF we can disintegrate it in two tables like this:
Teacher_Details table:
Teacher_Id Teacher_Age
111 38
222 38
333 40
Teacher_Subject table:
Teacher_Id Subject
111 Maths
111 Physics
222 Biology
333 Physics
333 Chemistry
Now the tables are in Second normal form (2NF). To learn more about 2NF refer this guide: 2NF
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each functional
dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
Example: Let’s say a company wants to store the complete address of each employee, they create a
table named Employee_Details that looks like this:
Here, Emp_State, Emp_City & Emp_District dependent on Emp_Zip. Further Emp_zip is dependent on
Emp_Id that makes non-prime attributes (Emp_State, Emp_City & Emp_District) transitively
dependent on super key (Emp_Id). This violates the rule of 3NF.
To make this table complies with 3NF we have to disintegrate the table into two tables to remove the
transitive dependency:
Employee Table:
Emp_Id Emp_Name Emp_Zip
Employee_Zip table:
Example: Suppose there is a company wherein employees work in more than one department. They
store the data like this:
The table is not in BCNF as neither Emp_Id nor Emp_Dept alone are keys.
To make the table comply with BCNF we can break the table in three tables like this:
Emp_Nationality table:
Emp_Id Emp_Nationality
1001 Austrian
1002 American
Emp_Dept table:
Emp_Dept_Mapping table:
Emp_Id Emp_Dept
1001 stores
1002 design and technical support
Functional dependencies:
Emp_Id -> Emp_Nationality
Emp_Dept -> {Dept_Type, Dept_No_Of_Emp}
Candidate keys:
For first table: Emp_Id
For second table: Emp_Dept
For third table: {Emp_Id, Emp_Dept}
This table is now in BCNF as in both the functional dependencies left side part is a key.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
Mahak says
NOVEMBER 17, 2015 AT 8:14 AM
Are you sure, that the example you given for Third Normal form (3NF) is correct. I have
doubt, In the employee table and employee_zip table you relate ZIP in both tables but what
If two employes having the same zip which record will be fetched from the employee_zip
table ??
Reply
If two employees have the same zip, they will share the row in the zip table. There
does not need to be two rows in the zip table and indeed, there should not be two rows
in the zip table.
Reply
WRONG IF WE CREATE NEW ZIP TABLE THEN WE CAN SEARCH THERE ZIP BYE
NAME ALSO ..
Reply
amit says
APRIL 22, 2017 AT 8:13 PM
name is not a prime attribute because multiple students can have same name
and each student may have a different zip
sagar -441124
sagar -345632
Reply
In employee table there will be 2 employees with same zip code but in employee_zip
table there will be 1 record related to that zip code.The tables are related by zip
code.So only 1 record will be fetched from employee_zip table. Hope you get the
answer.
Reply
Gulfam says
DECEMBER 14, 2015 AT 10:41 AM
Steve says
DECEMBER 16, 2015 AT 10:30 PM
Mahak, That is the point they are trying to make is that many employees could be
related to 1 Zip record. There would only be 1 entry in the Zip table per zip, since that’s
the key. That is the point of 3NF, is to denormalize the duplicate data in the Employee
table. Good luck!
Reply
DeepeshChaudhari says
OCTOBER 30, 2016 AT 6:52 AM
Reply
Robert Luse says
DECEMBER 7, 2015 AT 3:08 AM
As part of Normalization, there will be only one row for the the zip, not two. If two
employees have the same zip, they will both use the information for that zip in the zip
table.
Reply
Omenesa says
APRIL 28, 2017 AT 1:04 PM
We should imagine a case scenario where two employees have the same zip code but
different emp districts or emp city, which record will be fetched in such a scenario.
Reply
Reply
Kalpesh says
MARCH 9, 2016 AT 6:40 AM
Hi there,
I have read whole article of Normalization and I must say, it a best explanation with
examples.
Examples are very useful for better understating the concept. I am really very thankful to
you for the blog.
Thank you.
Reply
Pushpa says
MAY 5, 2016 AT 7:48 AM
Hi Chaitanya,
Best Wishes,
Pushpa
Reply
aman says
MAY 27, 2016 AT 7:18 PM
This topic was not understandable from book .after reading this I finally got it. Thank u.
Reply
Sid says
JUNE 26, 2016 AT 5:42 PM
How is teacher_I’d, subject be the candidate key? Subject is redundant and only teacher I’d
shld be sufficient.
Reply
Consider teacher_id 111, it is having two different subjects maths and physics. So only
teacher_id cannot determine the complete row. Therefore subject is also required.
Reply
Jaswinder says
NOVEMBER 12, 2016 AT 6:52 AM
Teacher I’d alone cannot be the Candidate key because there will be many entries for a
particular teacher as teacher can teach multiple subjects .And to fulfill criteria of
becoming candidate key there should be unique values.
Reply
A candidate key should be able to UNIQUELY IDENTIFY a row in a table. In the case of
the teacher table, their are two rows in the table that can be identified with the
teacher_id 111. If we are given teacher_id 111, we cannot discern if we need the
record for subject ‘Maths’ or the record for subject ‘physics’. Therefore, teacher_id is
not sufficient to uniquely identify a row. Likewise, as there are two rows with the
teacher_id 111 and the teacher_age 38, these are also insufficient. The only minimal
combination of attributes that uniquely identify a given row is {teacher_id, subject}.
Reply
Tharun Kumar Sunku says
JULY 26, 2016 AT 5:48 AM
Reply
sandeep says
AUGUST 30, 2016 AT 11:49 AM
hi chaitanya,
you explained in a single table to partition into different tables so it is easy to understand
but my doubt is to how to partition those tables so please provide some information about
how to partition a table
And also one thing before using those keys it is better to briefly explain about the keys so it
is easy to understand
Reply
Seunfunmi says
OCTOBER 9, 2016 AT 12:22 PM
Very useful information. Thank you for this article. I read the textbook but did not
understand. Now I understand 1NF and 2NF. I’m still not fully clear with the 3NF and the
BCNF though. Pls anyone with more detailed information?
Reply
Reply
Keynan says
NOVEMBER 24, 2016 AT 7:16 AM
Hi
Very good explanation.
I have one question: dosen’t the example you gave on the BCNF(before the BCNF solution)
also break the second rule? because non prime attributes depends on only subset of the
candidate key? for an example: the dept_type and dept_no_of_emp are only depended on a
subset of the candidate key which is emp_dept
Thanks
Reply
amit says
APRIL 22, 2017 AT 8:29 PM
In first table they are dependent, that is the violation of the 3NF. That’s why we
decomposed the table and in second table Emp_dept is super key or candidate key not
a subset of candidate key
just like foreign key concept
Reply
Anugya says
DECEMBER 31, 2016 AT 10:19 AM
Reply
Isn’t the attribute emp_zip also a candidate key(3NF example)? If yes then wouldnt it
violate the 3NF rule in the next table?
Reply
PuddiMan says
FEBRUARY 8, 2017 AT 12:47 PM
I don’t understand the example in BCNF. There are 2 primary keys, emp_id and emp_dept.
This violates 2NF rules, emp_nationality can be determined by only emp_id. So in the first
place, it is not in 2nf, why proceed to bcnf process?
Reply
Ninja says
FEBRUARY 12, 2017 AT 9:08 AM
Thanks a lot … 2morrow is my exam and this post really helped me.. Thanks a lot….
Reply
Reply
rahul says
JULY 12, 2017 AT 10:12 AM
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Denormalization in DBMS
LAST UPDATED: AUGUST 25, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS
Note:
1. Denormalization is not a reverse of normalization in DBMS.
2. Denormalization cannot be used in any scenario (we discussed this in detail in this article after the
following example).
Denormalization Example
There are two tables Department and Employee, where Department table contains the data for
department id represented by Dept_Id, name (attribute name: Dept_Name), employee id (attribute
name Emp_Id). The Employee table contains fields such as employee id, name, age.
Department Table
Employee Table
Now every time when we need to access the department information along with the employee details
such as employee name, we need to join these two tables. One way of avoiding the unnecessary join
operation is to denormalise the Department table like this:
Department table:
Employee Table:
After this denormalization, whenever we need to get the department data along with the employee
name, we do not need to join these tables as the Employee details are already present in the
Department table. This way, we avoided the join operations but we had to store the extra data in the
database. Along with that
1. When the redundant data doesn’t require to be updated frequently or doesn’t update at all. In our
example above, the redundant data is employee name and name doesn’t change frequently, thus it is
an ideal case where the denormalization can be safely used.
2. When there is a need to join multiple tables frequently in order to get meaningful data. In this case,
denormalization can significantly boost the performance of read operations at the cost of extra
storage space in the database.
Advantages of Denormalization
1. Read Operations are faster as table joins are not required for most of the queries.
2. Write query is easy to write to perform read, write, update operations on database.
Disadvantages of Denormalization
1. Requires more storage as redundant data needs to be written in the tables.
2. Data write operations are slower due to redundant data.
3. Data inconsistencies are present due to redundant data.
4. It requires extra effort to update the database. This is because when redundant data is present, it is
important to update the data in all the places else data inconsistencies may arise.
Recommended Articles:
1. Indexing in DBMS
2. Decomposition in DBMS
3. First Normal Form (1NF)
4. Second Normal Form (2NF)
❮ DBMS Tutorial
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this guide, you will learn the difference between Denormalization and Normalization.
What is Denormalization
Denormalization is a process of adding redundant data to tables in order to get faster response time
for read operations. However this better performance comes with a cost of storing redundant data
that occupies additional storage in the database.
What is Normalization
Normalization is a process of breaking the table into multiple tables in such a way so that the
redundant data is reduced. This removes data inconsistencies and helps maintaining DBMS ACID
properties.
DENORMALIZATION NORMALIZATION
SQL queries are easy to write as it involves less SQL queries are complex as they usually
tables. involve multiple tables.
Recommended Articles:
1. Denormalization in DBMS
2. Normalization in DBMS: 1NF, 2NF, 3NF and BCNF
3. Types of DBMS
4. Indexing in DBMS
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Decomposition is a process of dividing a relation into multiple relations to remove redundancy while
maintaining the original data. In this guide, you will learn decomposition in DBMS with the help of
examples.
Types of decomposition:
1. Lossless decomposition
2. Lossy decomposition
1. Lossless decomposition
A lossless decomposition of a relation ensures that:
a) No information is lost during decomposition. This is why the term lossless is used in this
decomposition as no information is lost.
b) If a relation R is divided into two relations R1 and R2 using lossless decomposition then the natural
join of R1 and R2 would return the original relation R.
Rules of Lossless decomposition: For these rules, we are assuming that a relation R is divided into
two relations R1 and R2.
R1 U R2 = R
2. The intersection of R1 and R2 should not be null. This is because there are some common
attributes present in relation R1 and R2.
R1 ∩ R2 ≠ 0
3. The intersection of R1 and R2 is either a super key of R1 or R2, or both the relations R1 and R2.
Let’s say a relation R (A, B, C), where A is primary key is divided into two relations R1 (A, B) and R2 (C,
A).
Rule 2:
R1 ∩ R2 = (A, B) ∩ (C, A) = (A)
Result is not null so the second rule also applies here.
Rule 3:
R1 ∩ R2 = (A, B) ∩ (C, A) = (A)
Result is a super key of both the relations thus third rule also applies here.
This table has redundant data as the Course_Id and Course_Detail are common for several students.
Let’s decompose this relation into two relations.
Student Table:
The primary key of this table is {Student_Id, Course_Id}
Student_Id Student_Name Course_Id
---------- ------------ ---------
S101 Chaitanya C01
S102 Ajeet C01
S103 Rahul C02
S104 Steve C02
S105 John C03
S101 Chaitanya C03
S102 Ajeet C02
Course Table:
The primary key of this table is {Course_Id}
Course_Id Course_Detail
--------- -------------
C01 Maths
C02 Science
C03 English
Let’s check all the three rules of lossless decomposition to check whether this decomposition is
lossless or not.
Rule 1:
{Student} U {Course}
Union Result:
The union results in the original relation StudentCourse so we can say that the first rule holds true.
Rule 2 & 3:
R1 ∩ R2
Result:
Course_Id
C01
C02
C03
The result is a super key of the second relation R2 so the third rule also applies here.
Since all the three rules applies here, the decomposition of relation StudentCourse into Student and
Course is a lossless decomposition.
2. Lossy Decomposition
As the name suggests, in lossy decomposition, the information is lost during decomposition. The
three rules that we discussed above would not apply in lossy decomposition. In lossy decomposition,
one or more rules will fail.
Student_Id Student_Name
S101 Chaitanya
S102 Ajeet
S103 Rahul
S104 Steve
S105 John
Course Table:
The primary key of this table is {Course_Id}
Course_Id Course_Detail
C01 Maths
C02 Science
C03 English
This is a lossy decomposition as the intersection of Student and Course relation will return null so the
second and third rule of lossless decomposition will fail here.
In this decomposition, the relation of Student and Course is lost, there is no way to form the original
relation from these two relations as the information that suggests who is attending which course is
lost during decomposition.
❮ DBMS Tutorial
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
A transaction is a set of logically related operations. For example, you are transferring money from
your bank account to your friend’s account, the set of operations would be like this:
This whole set of operations can be called a transaction. Although I have shown you read, write and
update operations in the above example but the transaction can have operations like read, write,
insert, update, delete.
In the above transaction R refers to the Read operation and W refers to the write operation.
The main problem that can happen during a transaction is that the transaction can fail before finishing
the all the operations in the set. This can happen due to power failure, system crash etc. This is a
serious problem that can leave database in an inconsistent state. Assume that transaction fail after
third operation (see the example above) then the amount would be deducted from your account but
your friend will not receive it.
Commit: If all the operations in a transaction are completed successfully then commit those changes
to the database permanently.
Rollback: If any of the operation fails then rollback all the changes done by previous operations.
Even though these operations can help us avoiding several issues that may arise during transaction
but they are not sufficient when two transactions are running concurrently. To handle those problems
we need to understand database ACID properties.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
To ensure the integrity and consistency of data during a transaction (A transaction is a unit of
program that updates various data items, read more about it here), the database system maintains
four properties. These properties are widely known as ACID properties.
Atomicity
This property ensures that either all the operations of a transaction reflect in database or none. The
logic here is simple, transaction is a single unit, it can’t execute partially. Either it executes completely
or it doesn’t, there shouldn’t be a partial execution.
Let’s say first operation passed successfully while second failed, in this case A’s balance would be
300$ while B would be having 700$ instead of 800$. This is unacceptable in a banking system. Either
the transaction should fail without executing any of the operation or it should process both the
operations. The Atomicity property ensures that.
There are two key operations are involved in a transaction to maintain the atomicity of the
transaction.
Abort: If there is a failure in the transaction, abort the execution and rollback the changes made by the
transaction.
Consistency
Database must be in consistent state before and after the execution of the transaction. This ensures
that there are no errors in the database at any point of time. Application programmer is responsible for
maintaining the consistency of the database.
Example:
A transferring 1000 dollars to B. A’s initial balance is 2000 and B’s initial balance is 5000.
The data is consitendct before and after the execution of the transaction so this example maintains
the consistency property of the database.
Isolation
A transaction shouldn’t interfere with the execution of another transaction. To preserve the
consistency of database, the execution of transaction should take place in isolation (that means no
other transaction should run concurrently when there is a transaction already running).
For example account A is having a balance of 400$ and it is transferring 100$ to account B & C both.
So we have two transactions here. Let’s say these transactions run concurrently and both the
transactions read 400$ balance, in that case the final balance of A would be 300$ instead of 200$.
This is wrong.
If the transaction were to run in isolation then the second transaction would have read the correct
balance 300$ (before debiting 100$) once the first transaction went successful.
Durability
Once a transaction completes successfully, the changes it has made into the database should be
permanent even if there is a system failure. The recovery-management component of database
systems ensures the durability of transaction.
ACID properties are the backbone of a database management system. These properties ensure that
even though there are multiple transaction reading and writing the data in the database, the data is
always correct and consistent. Without ACID properties there is no point in managing the data as it
can’t be trusted a used in a transaction.
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
rajneesh says
OCTOBER 11, 2015 AT 12:53 PM
this is my first ever comment on any site..
realy this article help me so lot.
i was searching from 2 hours n all sites giving me stupid examples.
this is awsm thnxxx owner
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
In this guide, we will discuss the states of a transaction in DBMS. A transaction in DBMS can be in one
of the following states.
Active State
As we have discussed in the DBMS transaction introduction that a transaction is a sequence of
operations. If a transaction is in execution then it is said to be in active state. It doesn’t matter which
step is in execution, until unless the transaction is executing, it remains in active state.
Failed State
If a transaction is executing and a failure occurs, either a hardware failure or a software failure then
the transaction goes into failed state from the active state.
Partially Committed State
As we can see in the above diagram that a transaction goes into “partially committed” state from the
active state when there are read and write operations present in the transaction.
A transaction contains number of read and write operations. Once the whole transaction is
successfully executed, the transaction goes into partially committed state where we have all the read
and write operations performed on the main memory (local memory) instead of the actual database.
The reason why we have this state is because a transaction can fail during execution so if we are
making the changes in the actual database instead of local memory, database may be left in an
inconsistent state in case of any failure. This state helps us to rollback the changes made to the
database in case of a failure during execution.
Committed State
If a transaction completes the execution successfully then all the changes made in the local memory
during partially committed state are permanently stored in the database. You can also see in the
above diagram that a transaction goes from partially committed state to committed state when
everything is successful.
Aborted State
As we have seen above, if a transaction fails during execution then the transaction goes into a failed
state. The changes made into the local memory (or buffer) are rolled back to the previous consistent
state and the transaction goes into aborted state from the failed state. Refer the diagram to see the
interaction between failed and aborted state.
❮ Previous Next ❯
Top Related Articles:
1. Transaction Management in DBMS
2. Cardinality in DBMS
3. Types of DBMS (Database Management System)
4. Failure Classification in DBMS
5. Log-Based Recovery in DBMS
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
I have a doubt in partially commited state you told that this state help us to rollback the
changes made to the database in case of a failure during Execution,but actually the
changes should be made to the data stored in main memory right,so that in case if any
failure occurs ,it rollbacks(acquires) the previous value from the database
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
We know that transactions are set of instructions and these instructions perform operations on
database. When multiple transactions are running concurrently then there needs to be a sequence in
which the operations are performed because at a time only one operation can be performed on the
database. This sequence of operations is known as Schedule.
This schedule determines the exact order of operations that are going to be performed on database.
In this example, all the instructions of transaction T1 are executed before the instructions of
transaction T2, however this is not always necessary and we can have various types of schedules
which we will discuss in this article.
T1 T2
---- ----
R(X)
W(X)
R(Y)
R(Y)
R(X)
W(Y)
Serial Schedule
In Serial schedule, a transaction is executed completely before starting the execution of another
transaction. In other words, you can say that in serial schedule, a transaction does not start execution
until the currently running transaction finished execution. This type of execution of transaction is also
known as non-interleaved execution. The example we have seen above is the serial schedule.
T1 T2
---- ----
R(A)
R(B)
W(A)
commit
R(B)
R(A)
W(B)
commit
Strict Schedule
In Strict schedule, if the write operation of a transaction precedes a conflicting operation (Read or
Write operation) of another transaction then the commit or abort operation of such transaction should
also precede the conflicting operation of other transaction.
Ta Tb
----- -----
R(X)
R(X)
W(X)
commit
W(X)
R(X)
commit
Here the write operation W(X) of Ta precedes the conflicting operation (Read or Write operation) of Tb
so the conflicting operation of Tb had to wait the commit operation of Ta.
Cascadeless Schedule
In Cascadeless Schedule, if a transaction is going to perform read operation on a value, it has to wait
until the transaction who is performing write on that value commits.
Ta Tb
----- -----
R(X)
W(X)
W(X)
commit
R(X)
W(X)
commit
Recoverable Schedule
In Recoverable schedule, if a transaction is reading a value which has been updated by some other
transaction then this transaction can commit only after the commit of other transaction which is
updating value.
Ta Tb
----- -----
R(X)
W(X)
R(X)
W(X)
R(X)
commit
commit
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
DBMS Serializability
LAST UPDATED: DECEMBER 20, 2018 BY CHAITANYA SINGH | FILED UNDER: DBMS
When multiple transactions are running concurrently then there is a possibility that the database may
be left in an inconsistent state. Serializability is a concept that helps us to check which schedules are
serializable. A serializable schedule is the one that always leaves the database in consistent state.
Types of Serializability
There are two types of Serializability.
1. Conflict Serializability
2. View Serializability
❮ Previous Next ❯
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In the DBMS Schedules guide, we learned that there are two types of schedules – Serial & Non-Serial.
A Serial schedule doesn’t support concurrent execution of transactions while a non-serial schedule
supports concurrency. We also learned in Serializability tutorial that a non-serial schedule may leave
the database in inconsistent state so we need to check these non-serial schedules for the
Serializability.
Conflict Serializability is one of the type of Serializability, which can be used to check whether a non-
serial schedule is conflict serializable or not.
Conflicting operations
Two operations are said to be in conflict, if they satisfy all the following three conditions:
Example 3: Operations W(X) of T1 and W(Y) of T2 are non-conflicting operations because both the
write operations are not working on same data item so these operations don’t satisfy the second
condition.
Example 4: Similarly R(X) of T1 and R(X) of T2 are non-conflicting operations because none of them is
write operation.
Example 5: Similarly W(X) of T1 and R(X) of T1 are non-conflicting operations because both the
operations belong to same transaction T1.
T1 T2
----- ------
R(A)
R(B)
R(A)
R(B)
W(B)
W(A)
To convert this schedule into a serial schedule we must have to swap the R(A) operation of
transaction T2 with the W(A) operation of transaction T1. However we cannot swap these two
operations because they are conflicting operations, thus we can say that this given schedule is not
Conflict Serializable.
T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)
T1 T2
----- ------
R(A)
R(A)
R(B)
W(B)
R(B)
W(A)
T1 T2
----- ------
R(A)
R(B)
R(A)
W(B)
R(B)
W(A)
T1 T2
----- ------
R(A)
R(B)
W(B)
R(A)
R(B)
W(A)
We finally got a serial schedule after swapping all the non-conflicting operations so we can say that
the given schedule is Conflict Serializable.
❮ Previous Next ❯
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
In the last tutorial, we learned Conflict Serializability. In this article, we will discuss another type of
serializability which is known as View Serializability.
To check whether a given schedule is view serializable, we need to check whether the given schedule
is View Equivalent to its serial schedule. Lets take an example to understand what I mean by that.
Given Schedule:
T1 T2
----- ------
R(X)
W(X)
R(X)
W(X)
R(Y)
W(Y)
R(Y)
W(Y)
T1 T2
----- ------
R(X)
W(X)
R(Y)
W(Y)
R(X)
W(X)
R(Y)
W(Y)
If we can prove that the given schedule is View Equivalent to its serial schedule then the given
schedule is called view Serializable.
You may be wondering instead of checking that a non-serial schedule is serializable or not, can’t we
have serial schedule all the time? The answer is no, because concurrent execution of transactions
fully utilize the system resources and are considerably faster compared to serial schedules.
View Equivalent
Lets learn how to check whether the two schedules are view equivalent.
Two schedules T1 and T2 are said to be view equivalent, if they satisfy all the following conditions:
1. Initial Read: Initial read of each data item in transactions must match in both schedules. For
example, if transaction T1 reads a data item X before transaction T2 in schedule S1 then in schedule
S2, T1 should read X before T2.
Read vs Initial Read: You may be confused by the term initial read. Here initial read means the first
read operation on a data item, for example, a data item X can be read multiple times in a schedule but
the first read operation on X is called the initial read. This will be more clear once we will get to the
example in the next section of this same article.
2. Final Write: Final write operations on each data item must match in both the schedules. For
example, a data item X is last written by Transaction T1 in schedule S1 then in S2, the last write
operation on X should be performed by the transaction T1.
3. Update Read: If in schedule S1, the transaction T1 is reading a data item updated by T2 then in
schedule S2, T1 should read the value after the write operation of T2 on same data item. For example,
In schedule S1, T1 performs a read operation on X after the write operation on X by T2 then in S2, T1
should read the X after T2 performs write on X.
View Serializable
If a schedule is view equivalent to its serial schedule then the given schedule is said to be View
Serializable. Lets take an example.
Initial Read
In schedule S1, transaction T1 first reads the data item X. In S2 also transaction T1 first reads the data
item X.
Lets check for Y. In schedule S1, transaction T1 first reads the data item Y. In S2 also the first read
operation on Y is performed by T1.
We checked for both data items X & Y and the initial read condition is satisfied in S1 & S2.
Final Write
In schedule S1, the final write operation on X is done by transaction T2. In S2 also transaction T2
performs the final write on X.
Lets check for Y. In schedule S1, the final write operation on Y is done by transaction T2. In schedule
S2, final write on Y is done by T2.
We checked for both data items X & Y and the final write condition is satisfied in S1 & S2.
Update Read
In S1, transaction T2 reads the value of X, written by T1. In S2, the same transaction T2 reads the X
after it is written by T1.
In S1, transaction T2 reads the value of Y, written by T1. In S2, the same transaction T2 reads the value
of Y after it is updated by T1.
The update read condition is also satisfied for both the schedules.
Result: Since all the three conditions that checks whether the two schedules are view equivalent are
satisfied in this example, which means S1 and S2 are view equivalent. Also, as we know that the
schedule S2 is the serial schedule of S1, thus we can say that the schedule S1 is view serializable
schedule.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this guide, you will learn a very important concept in DBMS: Recoverability of Schedule. There are
times when few transactions in a schedule fail, due to a software or hardware issue. In that case, it
becomes important to rollback these failed transactions along with those successful transactions that
have used the value updated by failed transactions.
For example: In this example, we have a schedule that contains two transactions T1 and T2.
Transaction T1 reads X and make changes in the value of X and then writes the updated value of X.
This updated value of X is read by transaction T2, which then did some change in X and finally write
the value of X and used COMMIT statement to make the changes permanent.
After the changes are made permanent by transaction T2, the transaction T1 failed and it had to be
rolled back but the problem here is that T2 has already used commit statement so it cannot be rolled
back. This is why this schedule is irrecoverable because it cannot be successfully rolled back even
after the failure of one of the transaction.
T1 T2
---- ----
Read(X)
X = + 20
Write (X)
Read(X)
X = X + 100
Write(X)
Commit
Failed!
Rollback
For example: Let’s take the same example that we have seen above with some modifications. Here we
have moved the commit statement in transaction T2 after the commit statement in transaction T1.
T1 T2
---- ----
Read(X)
X = + 20
Write (X)
Read(X)
X = X + 100
Write(X)
Commit
Commit
Now let’s consider some cases of failure to understand whether this schedule can be successfully
rolled back.
Case 1: When T1 fails just before the commit statement. In this case both the transactions can be
rolled back as none of the transactions used COMMIT statement before the failure point in schedule.
T1 T2
---- ----
Read(X)
X = + 20
Write (X)
Read(X)
X = X + 100
Write(X)
Failed!
Commit
Commit
Case 2: Let’s say T2 failed after the commit statement in T1. This is also recoverable as the T2 can be
rolled back and T1 didn’t read value of X after write(X) in T1 so no bad read operation here, so no need
to rollback the T1 in this case.
T1 T2
---- ----
Read(X)
X = + 20
Write (X)
Read(X)
X = X + 100
Write(X)
Commit
Failed!
Commit
You can also try to put failure points in some places in this schedule other than the above two cases,
you will find that the schedule is recoverable.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In DBMS there are several transactions running in a specified schedule. However sometimes these
transactions fail due to several reasons. In previous tutorial, we learned how to identify a recoverable
schedule. In this guide, we will discuss the types of failures that can occur in DBMS.
1. Transaction failure
2. Underlying System crash
3. Data transfer fail
1. Transaction Failure
A transaction is a set of statements, if a transaction fails it means there is a statement in the
transaction which is not able to execute. This can happen due to various reasons such as:
Logical Error: If the logic used in the statement itself is wrong, it can be fail.
System Error: When the transaction is executing but due to a fault in system, the transaction fails
abruptly. For example: Deadlock condition in transaction can result in System error.
2. Underlying System Crash
The system on which the transactions are running can crash and that can result in failure of currently
running transactions.
3. Hard-disk fail
Hard-disk fail can also cause transaction failure. When transactions are reading and writing data into
the disk, the failure in an underlying disk can cause failure of currently running transaction. This is
because transactions are unable to read and write data in disks due to disk not working properly. This
can result in loss of data as well.
There can be several reasons of a disk failure such as: formation of bad sectors in disk, corruption of
disk, viruses, not enough resources available on disk.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In the previous chapters, you learned how to identify a recoverable schedule and what kind of failures
can occur in DBMS. In this chapter, you will learn how to recover a failed transaction using Log-based
recovery in DBMS. When a transaction fails, it is important to rollback the transaction so that changes
made by failed transaction doesn’t store in the database, this is important to maintain the integrity of
database.
<T1, Start>
Just before the transaction modifies the department of the employee from “Sales” To “Marketing”, the
following log is maintained:
<T1, Commit>
In this approach, all the logs are created at once and stored in the database.
In this approach, logs are recorded just before the transaction is going to perform an operation in
database.
If the log contains the entry <Tn, Start> and <Tn, Commit> or <Tn, Start> and <Tn, Abort>
then the transaction Tn needs to be redone based on the log entries for each operation recorded
in the log.
If the log contains the entry <Tn, Start> but doesn’t contain an entry for <Tn, Commit> or <Tn,
Abort> then the transaction needs to be rolled back.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Checkpoint in DBMS
LAST UPDATED: MAY 31, 2024 BY CHAITANYA SINGH | FILED UNDER: DBMS
In the previous chapter, you learned how to recover a transaction using log based recovery method in
DBMS. In this guide, you will learn how to use checkpoint in database and how to recover a failed
transaction using checkpoint.
What is a checkpoint?
Checkpoint is like a bookmark in the transaction that helps us rollback a transaction till a certain
point.
These are really useful when a transaction performs several operations. If such transaction fail
at any point of time, instead of undoing the whole transaction, we can rollback to a certain
checkpoint.
We can have more than one checkpoint in a transaction. These checkpoints can be given any
name so it’s easier to identify the particular point in the transaction and rollback to a certain
point in case of failure.
You can say that by using checkpoints, you can divide the transaction in smaller parts. Once a
checkpoint is reached, the changes are made permanent in the database till that point and the
log entries are removed. This is because that part of the transaction is successfully completed
so there is no need to roll back or redone, thus no need to maintain those logs.
A checkpoint represents a point till which all transactions are completed and database is in
consistent state.
The recovery system reads the log file in reverse (from end to start).
Recovery system maintains two files: one is redo-list file and second is undo-list file. One or both
of these files are used to recover a failed transaction.
If the recovery system finds a log entry with <Tn, Start> and <Tn, Commit> or just <Tn,
Commit>, it puts the transaction in the redo-list. This is because a commit statement represents
that some of transactions in this schedule are made permanent using commit statement, so it
becomes important to redone the failed transactions.
If the recovery system finds a log entry with <Tn, Start> but no entry with <Tn, commit> or <Tn,
Abort> , it puts the transaction in undo-list. This is because no transaction made the changes
permanent in the database as no commit statements found, in this case the transaction can be
rolled back by putting it in undo-list.
Example:
In the following diagram you can see a schedule with three transactions T1, T2 and T3. Since the log
entries are removed once a checkpoint is found, the entry <T1, Start> is not in the log as it is before
the checkpoint and the log is cleared at checkpoint. The entries that are there in the log are <T1,
Commit>, <T2, Start>, <T2, Commit> and <T3, Start>. The entry <T3, Commit> is not in the log
because the transaction is failed before that.
So based on the rules that we have seen above, T1 and T2 are put in redo-list as <T1, Commit> and
<T2, Commit> present in log file. Transaction T3 is put in undo-list as <T3, Start> is found but no entry
for <T3, Commit> or <T3, Abort>.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Deadlock in DBMS
LAST UPDATED: JULY 4, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS
A deadlock is a condition wherein two or more tasks are waiting for each other in order to be finished
but none of the task is willing to give up the resources that other task needs. In this situation no task
ever gets finished and is in waiting state forever.
Coffman conditions
Coffman stated four conditions for a deadlock occurrence. A deadlock may occur if all the following
conditions holds true.
Mutual exclusion condition: There must be at least one resource that cannot be used by more
than one process at a time.
Hold and wait condition: A process that is holding a resource can request for additional
resources that are being held by other processes in the system.
No preemption condition: A resource cannot be forcibly taken from a process. Only the process
can release a resource that is being held by it.
Circular wait condition: A condition where one process is waiting for a resource that is being
held by second process and second process is waiting for third process ….so on and the last
process is waiting for the first process. Thus making a circular chain of waiting.
Deadlock Handling
Ignore the deadlock (Ostrich algorithm)
Did that made you laugh? You may be wondering how ignoring a deadlock can come under deadlock
handling. But to let you know that the windows you are using on your PC, uses this approach of
deadlock handling and that is reason sometimes it hangs up and you have to reboot it to get it
working. Not only Windows but UNIX also uses this approach.
The question is why? Why instead of dealing with a deadlock they ignore it and why this is being
called as Ostrich algorithm?
Well! Let me answer the second question first, This is known as Ostrich algorithm because in this
approach we ignore the deadlock and pretends that it would never occur, just like Ostrich behavior “to
stick one’s head in the sand and pretend there is no problem.”
Let’s discuss why we ignore it: When it is believed that deadlocks are very rare and cost of deadlock
handling is higher, in that case ignoring is better solution than handling it. For example: Let’s take the
operating system example – If the time requires handling the deadlock is higher than the time requires
rebooting the windows then rebooting would be a preferred choice considering that deadlocks are
very rare in windows.
Deadlock detection
Resource scheduler is one that keeps the track of resources allocated to and requested by processes.
Thus, if there is a deadlock it is known to the resource scheduler. This is how a deadlock is detected.
Terminating processes involved in deadlock: Terminating all the processes involved in deadlock
or terminating process one by one until deadlock is resolved can be the solutions but both of
these approaches are not good. Terminating all processes cost high and partial work done by
processes gets lost. Terminating one by one takes lot of time because each time a process is
terminated, it needs to check whether the deadlock is resolved or not. Thus, the best approach is
considering process age and priority while terminating them during a deadlock condition.
Resource Preemption: Another approach can be the preemption of resources and allocation of
them to the other processes until the deadlock is resolved.
Deadlock prevention
We have learnt that if all the four Coffman conditions hold true then a deadlock occurs so preventing
one or more of them could prevent the deadlock.
Removing mutual exclusion: All resources must be sharable that means at a time more than one
processes can get a hold of the resources. That approach is practically impossible.
Removing hold and wait condition: This can be removed if the process acquires all the resources
that are needed before starting out. Another way to remove this to enforce a rule of requesting
resource when there are none in held by the process.
Preemption of resources: Preemption of resources from a process can result in rollback and
thus this needs to be avoided in order to maintain the consistency and stability of the system.
Avoid circular wait condition: This can be avoided if the resources are maintained in a hierarchy
and process can hold the resources in increasing order of precedence. This avoid circular wait.
Another way of doing this to force one resource per process rule – A process can request for a
resource once it releases the resource currently being held by it. This avoids the circular wait.
Deadlock Avoidance
Deadlock can be avoided if resources are allocated in such a way that it avoids the deadlock
occurrence. There are two algorithms for deadlock avoidance.
Wait/Die
Wound/Wait
Here is the table representation of resource allocation for each algorithm. Both of these algorithms
take process age into consideration while determining the best possible way of resource allocation for
deadlock avoidance.
Wait/Die Wound/Wait
Younger process needs a resource held by older Younger process Younger process
process dies waits
Once of the famous deadlock avoidance algorithm is Banker’s algorithm
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Comments
The tutorial you provide is so great.Thank you so much for that really!
Reply
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Starvation in DBMS
LAST UPDATED: AUGUST 19, 2022 BY CHAITANYA SINGH | FILED UNDER: DBMS
Starvation is a situation when one transaction keeps on waiting for another transaction to release the
lock. This is also called LiveLock. As we already learned in transaction management that a
transaction acquires lock before performing a write operation on data item, if the data item is already
locked by another transaction then the transaction waits for the lock to be released. In starvation
situation a transaction waits for another transaction for an infinite period of time.
2. Resource leak: When a transaction does not release the lock after it has acquired the lock on a
particular data item.
3. Denial of service attack: A Denial-of-Service (DoS) attack is an attack that is meant to shut down a
machine or network, making it inaccessible to the users. DoS attack make the data item engaged so
that the transaction are not able to acquire the locks.
Starvation Example
Let’s say there are three transaction T1, T2 and T3 waiting to acquire lock on a data item ‘X’. System
grants a lock to the transaction T1, the other two transaction T2 and T3 are waiting for the lock to be
released.
Once the transaction T1 release the lock, the lock is granted to transaction T3, now transaction T2 is
waiting for the lock to be released.
While transaction T3 is performing an operation on ‘X’, a new transaction T4 enters into the system
and wait for the lock. The system grants the lock to T4. This way new transactions keep on entering
into the system and acquiring the lock on ‘X’ while the older transaction T2 keeps on waiting.
The drawback to this solution is that a faulty transaction keeps on acquiring the lock and failing so it
never gets completed and remains there with the higher priority than other transactions, thus keeps on
getting the lock on a particular data item.
2. By changing the victim selection algorithm: In the above solution, we saw a drawback where a
victim transaction keeps on getting the lock. By lowering the priority of a victim transaction, we can fix
the drawback of above solution.
3. FCFS (First come first serve): In this approach, the transaction that entered into the system first,
gets the lock first. This way no transaction keeps on waiting.
4. Wait-die Scheme: If a transaction requests a lock on data item that is acquired by another
transaction then system checks for the timestamp and allow the older transaction to wait for the data
item.
5. Wound-wait Scheme: In this scheme, if older transaction requests for the lock which is held by
younger transaction then the system kills the younger transaction and grants the lock to older
transaction.
The killed younger transaction is restarted with a specific delay but with same timestamp, this make
sure that after some time when this transaction is old enough it can acquire the lock on particular data
item.
Younger process needs a resource held by older Younger process Younger process
process dies waits
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
When more than one transactions are running simultaneously there are chances of a conflict to occur
which can leave database to an inconsistent state. To handle these conflicts we need concurrency
control in DBMS, which allows transactions to run simultaneously but handles them in such a way so
that the integrity of data remains intact.
Conflict Example
You and your brother have a joint bank account, from which you both can withdraw money. Now let’s
say you both go to different branches of the same bank at the same time and try to withdraw 5000
INR, your joint account has only 6000 balance. Now if we don’t have concurrency control in place you
both can get 5000 INR at the same time but once both the transactions finish the account balance
would be -4000 which is not possible and leaves the database in inconsistent state.
We need something that controls the transactions in such a way that allows the transaction to run
concurrently but maintaining the consistency of data to avoid such issues.
Lost Update: Initial value of A was 1000, if T1 is adding 100 and T2 is adding 200, the value of A in
database should be 1300 at the end of execution of both of these transactions. However as you can
see the value of A is 1200 in this case. This is because the update made by transaction T1 is lost.
Concurrency Control
Concurrency control is the technique that ensures that the the above three conflicts don’t occur in the
database. There are certain rules to avoid problems in concurrently running transactions and these
rules are defined as the concurrency control protocols.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
A lock is kind of a mechanism that ensures that the integrity of data is maintained. It does that, by
locking the data while a transaction is running, any transaction cannot read or write the data until it
acquires the appropriate lock. There are two types of a lock that can be placed while accessing the
data so that the concurrent transaction can not alter the data while we are processing it.
1. Shared Lock(S)
2. Exclusive Lock(X)
1. Shared Lock(S): Shared lock is placed when we are reading the data, multiple shared locks can be
placed on the data but when a shared lock is placed no exclusive lock can be placed.
You and your brother have a joint bank account, from which you both can withdraw money. Now let’s
say you both go to different branches of the same bank at the same time and try to withdraw 5000
INR, your joint account has only 6000 balance.
Now if we don’t have concurrency control in place you both can get 5000 INR at the same time but
once both the transactions finish the account balance would be -4000 which is not possible and
leaves the database in inconsistent state.
We need something that controls the transactions in such a way that allows the transaction to run
concurrently but maintaining the consistency of data to avoid such issues.
For example, when two transactions are reading Steve’s account balance, let them read by placing
shared lock but at the same time if another transaction wants to update the Steve’s account balance
by placing Exclusive lock, do not allow it until reading is finished.
2. Exclusive Lock(X): Exclusive lock is placed when we want to read and write the data. This lock
allows both the read and write operation, Once this lock is placed on the data no other lock (shared or
Exclusive) can be placed on the data until Exclusive lock is released.
For example, when a transaction wants to update the Steve’s account balance, let it do by placing X
lock on it but if a second transaction wants to read the data(S lock) don’t allow it, if another
transaction wants to write the data(X lock) don’t allow that either.
Growing Phase: In this phase, the locks are acquired on the data items but none of the acquired locks
can be released in this phase.
Shrinking Phase: The existing locks can be released in this phase but no new locks can be acquired in
this phase.
Note: The point at which the transaction acquires final lock and the growing phase ends is called lock
point.
2 PL Example: Let’s take an example to understand how two phase locking protocol works: In the
following example there are two transaction T1 and T2 running concurrently.
Transaction T1: In this example, growing phase of T1 is from Step 1 to Step 5. Shrinking phase is
from Step 7 to Step 9. Lock point is at step 5.
Transaction T2: Growing phase of T2 is from Step 2 to Step 10. Shrinking phase is from Step 11 to
Step 13. Lock point is at step 10.
T1 T2
---- ----
Step 1 lock-S(A)
Step 2 .. lock-S(A)
Step 3 lock-S(B)
Step 4 ... lock-S(B)
Step 5 lock-X(C)
Step 6 ..
Step 7 Unlock(A)
Step 8 Unlock(B)
Step 9 Unlock(C)
Step 10 lock-S(C)
Step 11 Unblock(A)
Step 12 Unblock(B)
Step 13 Unblock(C)
It doesn’t release locks after performing an operation on data items. It releases all the locks at the
same time once the transaction commit successfully.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In the previous chapter, you learned lock based protocol in DBMS to maintain the integrity of
database. In this chapter, you will learn Timestamp based ordering protocol.
W_TS(A) is the largest timestamp of a transaction that executed the operation write(A) successfully.
R_TS(A) is the largest timestamp of a transaction that executed the operation read(A) successfully.
1. Whenever a Transaction Tn issues a Write(A) operation, this protocol checks the following
conditions:
If R_TS(A) > TS(Tn) or if W_TS(A) > TS(Tn), then abort and rollback the transaction Tn and
reject the write (A) operation.
If R_TS(A) <= TS(Tn) or if W_TS(A) <= TS(Tn) then execute Write(A) operation of Tn and set
W_TS(A) to TS(Tn).
2. Whenever a Transaction Tn issues a Read(A) operation, this protocol checks the following
conditions:
If W_TS(A) > TS(Tn), then abort and reject Tn and reject the Read(A) operation.
If W_TS(A) <= TS(Tn), then execute the Read(A) operation of Tn and update the timestamp
R_TS(A).
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Validation based protocol avoids the concurrency of the transactions and works based on the
assumption that if no transactions are running concurrently then no interference occurs. This is why it
is also called Optimistic Concurrency Control Technique.
In this protocol, a transaction doesn’t make any changes to the database directly, instead it performs
all the changes on the local copies of the data items that are maintained in the transaction itself. At
the end of the transaction, a validation is performed on the transaction. If it doesn’t violate any
serializability rule, the transaction commit the changes to the database else it is updated and
restarted.
Start(Tn): It represents the timestamp when the transaction Tn starts the execution.
Validation(Tn): It represents the timestamp when the transaction Tn finishes the read phase and
starts the validation phase.
Finish(Tn): It represents the timestamp when the transaction Tn finishes all the write operations.
This protocol uses the Validation(Tn) as the timestamp of the transaction Tn because this is actual
phase of the transaction where all the checks happen. So it is safe to say that TS(Tn) = Validation(Tn).
If there are two transactions T1 & T2 managed by validation based protocol and if Finish(T1) <
Start(T2) then the validation will be successful as the serializability is maintained because T1 finished
the execution well before the transaction T2 started the read phase.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this article, you will learn what is file organization and what are benefits of doing it. We already
know that data is stored in database, when we refer this data in terms of RDBMS we call it collection
of inter-related tables. However in layman terms you can say that the data is stored in a physical
memory in form of files.
File organization is a way of organizing the data in such way so that it is easier to insert, delete,
modify and retrieve data from the files.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this article, you will learn sequential file organization in DBMS. This is one of the easiest method of
file organization. In this method, files (records) are stored in a sequential manner, one after another.
If any record needs to be deleted, it gets searched in the memory block and once it is deleted a new
record can be written on the freed memory block.
The following diagram shows a File that is being organized using Pile File method, as you can see the
records are not sorted and inserted in first come first serve basis. If you want to organize the data in
such a way that it gets sorted after insertion then use the sorted file method, which is discussed in
next section.
Inserting a new record in file using Pile File method
Here we are demonstrating the insertion of a new record R3 in a already present file using Pile File
method. Since this method of sequential organization just adds the new record at the end of file, the
new record R3 gets added at the end of the file, as shown in the following diagram.
You can see in the following diagram that records appear in sorted order when the file is organized
using sorted file method.
In case of a record updation, once the update is complete, the whole file gets sorted again to change
the position of updated record in the file.
The sorting can be either ascending or descending, in this diagram the records are sorted in
ascending order.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Heap File Organization method is simple yet powerful file organization method. In this method, the
records are added in memory data blocks, in no particular order.
The following diagram demonstrates the Heap file organization. As you can see, records have been
assigned to data blocks in memory in no particular order.
Since the records are not sorted and not stored in consecutive data blocks in memory, searching a
record is time consuming process in this method. Update and delete operations also give poor
performance as the records needs to be searched first for updation and deletion, which is already a
time consuming operation. However if the file size is small, these operations give one of the best
performances compared to other methods so this method is widely used for small size files.
This method requires memory optimization and cleanup as this method doesn’t free up the allocated
data block after a record is deleted.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
In this method, hash function is used to compute the address of a data block in memory to store the
record. The hash function is applied on certain columns of the records, known as hash columns to
compute the block address. These columns/fields can either be key or non-key attributes.
The following diagram demonstrates, the hash file organization. As shown here, the records are stored
in database in no particular order and the data blocks are not consecutive. These memory addresses
are computed by applying hash function on certain attributes of these records.
Fetching a record is faster in this method as the record can be accessed using hash key column. No
need to search through the entire file to fetch a record.
Inserting a record using Hash file Organization method
In the following diagram, you can see that a new record R5 needs to be added to the file. The same
hash function that generated the address for existing records in the file, will be used again to compute
the address (find data block in memory) for this new record by applying the has function on the
certain columns of this record.
Advantages of Hash File Organization
1. This method doesn’t require sorting explicitly as the records are automatically sorted in the
memory based on hash keys.
2. Reading and fetching a record is faster compared to other methods as the hash key is used to
quickly read and retrieve the data from database.
3. Records are not dependant on each other and are not stored in consecutive memory locations so
that prevents the database from read, write, update, delete anomalies.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Indexed sequential access method also known as ISAM method, is an upgrade to the conventional
sequential file organization method. You can say that it is an advanced version of sequential file
organization method. In this method, primary key of the record is stored with an address, this address
is mapped to an address of a data block in memory. This address field works as an index of the file.
In this method, reading and fetching a record is done using the index of the file. Index field contains
the address of a data record in memory, which can be quickly used to read and fetch the record from
memory.
Advantages of ISAM
1. Searching a record is faster in ISAM file organization compared to other file organization
methods as the primary key can be used to identify the record and since primary key also has the
address of the record, it can read and fetch the data from memory.
2. This method is more flexible compared to other methods as this allows to generate the index
field (address field) for any column of the record. This makes searching easier and efficient as
searches can be done using multiple column fields.
3. This allows range retrieval of the records since the address file is stored with the primary key of
the record, we can retrieve the record based on a certain range of primary key columns.
4. This method allow partial searches as well. For example, employee name starting with “St” can
be used to search all the employees with the name starting with letters “St”. This will result all
the records where employee name begins with the letters “St”.
Disadvantages of ISAM
1. Requires additional space in the memory to store the index field.
2. After adding a record to the file, the file needs to be re-organized to maintain the sequence based
on primary key column.
3. Requires memory cleanup because when a record is deleted, the space used by the record
needs to be released in order to be used by the other record.
4. Performance issues are there if there are frequent deletion of records, as every deletion needs a
memory cleanup and optimization.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Similar to ISAM file organization, B+ file organization also works with key & index value of the
records. It stores the records in a tree like structure, that is why it is also known as B+ Tree file
organization. In B+ file organization, the leaf nodes store the records and intermediate nodes only
contain the pointer to the leaf nodes, these intermediate nodes do not store any record.
Root node and intermediate nodes contain key field and index field. The key field is a primary key of
record which can be used to distinctly identify a record, the index field contains the pointer (address)
to the leaf node where the actual record is stored.
B+ Tree Representation:
Let’s say we are storing the records of employees of an organization. These employee records contain
fields such as Employee_id, Employee_name, Employee_address etc. If we consider Employee_id as
primary key and the values of Employee_id ranges from 1001 to 1009 then the B+ tree representation
can be as follows.
The important point to note here is that the records are only stored at the leaf nodes, other
records contains the key and index value (pointer to leaf node).
Leaf Node 1001 means that it stores the complete record of the employee where employee id is
“1001”. Similarly nodes 1002 stores the record of employee with employee id “1002” and so on.
The main advantage of B+ file organization is that searching a record is faster. This is because
all the leaf nodes (where the actual record is stored) are at the same distance from the root node
and can be accessed faster.
Since intermediate nodes do not contain the records and only contains the pointer to the leaf
nodes, the height of the B+ tree is shorter that makes the traversing easier and faster.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Cluster file organization is different from the other file organization methods. Other file organization
methods mainly focus on organizing the records in a single file (table). Cluster file organization is
used, when we frequently need combined data from multiple tables.
While other file organization methods organize tables separately and combine the result based on the
query, cluster file organization stores the combined data of two or more frequently joined tables in
the same file known as cluster. This helps in accessing the data faster.
Types of Cluster File Organization
There are two types of cluster file organizations:
Index based cluster file organization: The example that we have shown in the above diagram is an
index based cluster file organization. In this type, the cluster is formed based on the cluster key and
this cluster key works as an index of the cluster.
Since EMP_DEP field is common in both the tables, this becomes the cluster key when these two tables
joined to form the cluster. Whenever we need to find the combined record of employees and
department based on the EMP_DEP, this cluster can be used to quickly retrieve the data.
Hash based cluster file organization: This is same as index based cluster file organization except that
in this type, the hash function is applied on the cluster key to generate the hash value and that value is
used in the cluster instead of the index.
Note: The main difference between these two types is that in index based cluster, records are stored
with cluster key while in hash based cluster, the records are stored with the hash value of the cluster
key.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Leave a Reply
Your email address will not be published. Required fields are marked *
Comment *
Name *
Email *
POST COMMENT
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
Data replication is a process of making the multiple copies of database available on servers. This is
done to achieve distributed database. This is to minimize the load on the database and provide better
performance to the users.
In Data replication, the various users can access data from different sites available on distributed
system, however the data remains same in all sites. Replication is done in such a way so that the
data is always same on all sites and synchronized whenever there is a change.
This distributed database approach provides better performance and availability. It also helps to
recover data in case of a server failure.
There can be full replication where the entire database is available on all servers or there can be
partial replication where the frequently used chunks of data are available on all servers.
The transactional replication works on a concept of publisher and subscriber.
Publisher: The primary database that publishes data to all the secondary databases called
subscribers.
Subscriber: These are the secondary databases, these are nothing but the copies of the primary
database. These subscriber receives updates from the publisher as and when there is a change
in the publisher database.
1. Transactional replication
This approach is used to replicate the changes between multiple copies of databases. Any change
such as data update, primary key change, stored procedure change is replicated among all copies of
the database.
The changes occur in the subscriber in the same order in which they occurred in the publisher
database.
Subscriber databases can be used as read-only databases. The consistency between publisher and
subscriber is guaranteed as the publisher push all the changes to subscribers consistently and in
same order.
2. Snapshot replication
In this approach, the snapshot of publisher database is taken at a specific moment of time and that
snapshot is shared with all the subscribers.
Snapshot replication is slower than transactional replication, as the changes are not pushed real-time
rather they are pushed after a specific interval.
This approach is mostly used when:
Role of snapshot agent in Snapshot replication: Snapshot agent is responsible for taking the
snapshot from publisher and making it available to the subscribers:
Merge replication allows publishers and subscribers to make changes in the database and these
changes are replicated to other publisher and subscribers.
Replication Schemes
1. Full replication: In full replication, the entire database is available at every site of the distributed
database. This approach provides full availability and performance. In this approach, even if there is a
system failure, the database availability doesn’t get affected, thus this replication scheme is robust
and durable.
Advantages of full replication:
1. High availability
2. Best performance
3. Full recovery in case of failure.
4. Better load balance on every site of distributed database.
2. Partial replication: In partial replication, only the data that is frequently accessed is replicated on
every site of distributed database.
Advantages of partial replication:
1. Requires less storage capacity than full replication.
2. Provides good performance as the frequently used data is available at all sites.
3. Updates are faster as only important and frequently used data is replicated at all sites.
4. Maintaining data consistency is somewhat easier than full replication as the replicated data size is
small.
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap
Home Java C C++ DBMS Computer Network Python More…
A database index is a data structure that helps in improving the speed of data access. However it
comes with a cost of additional write operations and storage space to store the database index. The
database index helps quickly locate the data in database without having to search every row of
database. The process of creating an index for a database is known indexing. In this guide, you will
learn various types of Indexes in DBMS (Database management system) with examples.
2. In the library, the books are arranged on the shelf in an alphabetical order. If you are looking for a
book starting with the the letter ‘A’ then you go to the shelf ‘A’. Here shelf naming with the letter ‘A’ is
the index. Imagine if the books are not arranged in alphabetical order in shelves, it would take a very
long time to search for a book.
1. First field is the search key, this is the column that a user can use to access the record quickly. For
example, if a user is searching for a student in database, the user can use student id as a search key
to quickly locate the student record.
2. The second field contains the address of the student record in the database. Remember indexing
doesn’t replicate the whole database, rather it creates an index that refers to the actual data in
database. This field is a reference to the data. If user is searching for a student with student id “S01”
then the S01 is the search key and the second field of the index contains the address where the
student data such as student name, age, address is stored.
Types of Indexes
1. Dense Index
2. Sparse Index
3. Clustered index
4. Non-clustered index or secondary index
5. Multilevel index
6. Reverse Index
1. Dense Index
In Dense Index, there is an index for every record in the database. For example, if a table student
contains 100 records then in dense index the number of indices would be 100, one index for each
record in table.
If more than one record has the same search key then the dense index points to the first record in the
database that has the search key.
The dense name is given to this index is based on the fact that every record in the database has a
corresponding index in index file so the index file is very dense in this index based database.
2. Sparse Index
In this index based system, the indexes of very few data items are maintained in the index file. Unlike
Dense index system where every record has an index entry in index file, in this system, indexes are
limited to one per block of data items as shown in the following diagram.
In sparse indexing database needs to be sorted in an order.
For example, let’s say we are creating a sparse index file for student database that contains records
for 100 students.
Student records are divided in blocks where every block contains two records. If index file contains the
indexes for alternate records then we need to maintain indexes for only 50 records whereas in dense
index system, we had to have 100 records in index file.
3. Clustered Index
As the name suggests, in clustered index, the records with the similar type are grouped together to
form a cluster and an index is created for this cluster which is maintained in clustered index file.
For example:
Let’s say students are assigned to multiple courses and we are creating indexes on course_id filed. In
this case, all the students that are assigned to a particular course_id form a cluster and the index for
that particular course_id points to this cluster as shown in the following diagram.
This helps in quickly locating a record in a particular cluster as the the size of the cluster is limited and
smaller than the actual database so searching a record is faster.
One of the type of clustered indexing is primary indexing: In this type of clustered indexing, data is
sorted based on the search key. In this type of indexing, searching is even faster as the records are
sorted.
4. Non-clustered or secondary indexing
In non-clustered indexing, the indexing is done on multiple levels. This indexing is also known as
secondary indexing.
For example, let’s say we have records of 300 students in database, instead of creating indexes for
300 records on the root level, we create indexes for 1st student records, 101st student and 201st
student. This index is maintained in the primary memory such as RAM. Here we have divided the
complete index file in three groups.
The second level of indexes are stored in hard disk, the primary index file is stored in RAM, refers to
this file and this file then points to the actual data block in memory as shown below:
5. Multilevel index
In multilevel index strategy, the indexes are stored at multiple levels as shown in the following
diagram. This strategy is especially used when there is a large amount of data items, thus the size of
the index file is huge.
An index file with large number of records defeats the purpose of faster access and better
performance as accessing a large index file itself gives poor performance.
To solve this issue, in this strategy, index are divided in multiple levels such as outer index blocks,
inner index blocks, data blocks. The outer index blocks points to the inner index blocks and inner
index blocks points to the data blocks. Managing indexes this way, we don’t need to access the whole
index file as only those outer, inner and data blocks needs to be accessed that are matching the
criteria of search key.
The disadvantage of multilevel index strategy is that it requires additional storage space to maintain
this multilevel hierarchy of indexes.
This strategy is used when the search key field in the index data structure represents sequence
numbers where each key value is greater than the prior key value.
Reverse key indexes uses B-tree as data structures. The B-tree stores similar values in a single block
such as the value 86543 and 86544 are stored in a single block which makes them easier to access.
❮ DBMS Tutorial
I have 15 years of experience in the IT industry, working with renowned multinational corporations.
Additionally, I have dedicated over a decade to teaching, allowing me to refine my skills in delivering
information in a simple and easily understandable manner.
– Chaitanya
Copyright © 2012 – 2024 BeginnersBook . Privacy Policy . Sitemap