Introduction of DBMS
Introduction of DBMS
Introduction of DBMS
Purpose of Database
The intent of a database is that a collection of data should serve as many applications as possible. Therefore, a
database is often thought of as a repository of information needed to run certain functions in a corporation or
organization. It would permit only the retrieval of data but also the continuous modification of data needed for the
control of operations. It may be possible to search the database to obtain answers to questions or information for
planning purposes.
In a typical file-processing system, permanent records are stored in different files. Many different application
programs are written to extract the records and add the records to the appropriate files. However, this scheme has
several major limitations and disadvantages, such as data redundancy (duplication of data), data inconsistency,
maladaptive data, non-standard data, insecure data, incorrect data, etc. A database management system is an
answer to all these problems as it provides centralized control of the data.
Database Abstraction
A major purpose of a database is to provide the user with only as much information as is required of them. This
means that the system does not disclose all the details of the data, rather it hides some details of how the data is
stored and maintained. The complexity of databases is hidden from them which, if necessary, are ordered through
multiple levels of abstraction to facilitate their interaction with the system. The different levels of the database are
implemented through three layers:
1. Internal Level(Physical Level): The lowest level of abstraction, the internal level, is closest to physical
storage. It describes how the data is stored concretely on the storage medium.
2. Conceptual Level: This level of abstraction describes what data is concretely stored in the database. It also
describes the relationships that exist between the data. At this level, databases are described logically in
terms of simple data structures. Users at this level are not concerned with how these logical data structures
will be implemented at the physical level.
3. External Level(View Level): It is the level closest to users and is related to the way the data is viewed by
individual users.
UNIT 1: Database System Concepts & Architecture
Since a database can be viewed through three levels of abstraction, any change at one level can affect plans at other
levels. As databases continue to grow, there may be frequent changes to it at times. This should not lead to redesign
and re-implementation of the database. In such a context the concept of data independence proves beneficial.
Concept of Database
To store and manage data efficiently in the database let us understand some key terms:
1. Database Schema: It is a design of the database. Or we can say that it is a skeleton of the database that is used to
represent the structure, types of data will be stored in the rows and columns, constraints, relationships between the
tables.
2. Data Constraints: In a database, sometimes we put some restrictions on the table that what type of data can be
stored in one or more columns of the table, it can be done by using constraints. Constraints are defined while we are
creating a table.
3. Data dictionary or Metadata: Metadata is known as the data about the data. Or we can say that the database
schema along with different types of constraints on the data is stored by DBMS in the dictionary is known as
metadata.
4. Database instance: In a database, a database instance is used to define the complete database environment and
its components. Or we can say that it is a set of memory structures and background processes that are used to
access the database files.
5. Query: In a database, a query is used to access data from the database. So users have to write queries to retrieve
or manipulate data from the database.
6. Data manipulation: In a database, we can easily manipulate data using the three main operations that is Insertion,
Deletion, and updation.
7. Data Engine: It is an underlying component that is used to create and manage various database queries.
UNIT 1: Database System Concepts & Architecture
Advantages of Database
Let us consider some of the benefits provided by a database system and see how a database system overcomes the
above-mentioned problems:-
Therefore, for systems with better performance and efficiency, database systems are preferred.
Disadvantages of Database
With the complex tasks to be performed by the database system, some things may come up which can be termed as
the disadvantages of using the database system. These are:-
database-related concept that is often overlooked or noDatabase system is an excellent computer-based record-
keeping system. A collection of data, commonly called a database, contains information about a particular
enterprise. It maintains any information that may necessary to the decision-making process involved in the
management of that organization. It can also be defined as a collection of interrelated data stored together to serve
multiple applications, the data is stored so that it is independent of programs that use the data. A generic and
controlled approach is used to add new data and modify and retrieve existing data within the database. The data is
structured so as to provide the basis for future application development.
Purpose of Database
The intent of a database is that a collection of data should serve as many applications as possible. Therefore, a
database is often thought of as a repository of information needed to run certain functions in a corporation or
organization. It would permit only the retrieval of data but also the continuous modification of data needed for the
control of operations. It may be possible to search the database to obtain answers to questions or information for
planning purposes.
In a typical file-processing system, permanent records are stored in different files. Many different application
programs are written to extract the records and add the records to the appropriate files. But this scheme has several
major limitations and disadvantages, such as data redundancy (duplication of data), data inconsistency, maladaptive
data, non-standard data, insecure data, incorrect data, etc. A database management system is an answer to all these
problems as it provides centralized control of the data.
UNIT 1: Database System Concepts & Architecture
Concept of Database
To store and manage data efficiently in the database let us understand some key terms:
1. Database Schema: It is a design of the database. Or we can say that it is a skeleton of the database that is used to
represent the structure, types of data will be stored in the rows and columns, constraints, relationships between the
tables.
2. Data Constraints: In a database, sometimes we put some restrictions on the table that what type of data can be
stored in one or more columns of the table, it can be done by using constraints. Constraints are defined while we are
creating a table.
3. Data dictionary or Metadata: Metadata is known as the data about the data. Or we can say that the database
schema along with different types of constraints on the data is stored by DBMS in the dictionary is known as
metadata.
4. Database instance: In a database, a database instance is used to define the complete database environment and
its components. Or we can say that it is a set of memory structures and background processes that are used to
access the database files.
5. Query: In a database, a query is used to access data from the database. So users have to write queries to retrieve
or manipulate data from the database.
6. Data manipulation: In a database, we can easily manipulate data using the three main operations that is Insertion,
Deletion, and updation.
7. Data Engine: It is an underlying component that is used to create and manage various database queries.
Advantages of Database
Let us consider some of the benefits provided by a database system and see how a database system overcomes the
above-mentioned problems:-
Therefore, for systems with better performance and efficiency, database systems are preferred.
Disadvantages of Database
With the complex tasks to be performed by the database system, some things may come up which can be termed as
the disadvantages of using the database system. These are:-
This is the process of organizing a database to minimize redundancy and dependency by breaking down complex
tables into smaller, more manageable ones. It’s important to understand normalization because it helps you create
efficient and scalable databases, reduces data inconsistency and duplication, and makes it easier to update and
maintain the database over time.
This information is often skipped over in introductory material because it can be technical and complex, but it is
crucial for understanding how to properly design and maintain a database.
This is the lowest level of data abstraction. It tells us how the data is actually stored in memory. Access methods like
sequential or random access and file organization methods like B+ trees and hashing are used for the same.
Usability, size of memory, and the number of times the records are factors that we need to know while designing the
database.
Suppose we need to store the details of an employee. Blocks of storage and the amount of memory used for these
purposes are kept hidden from the user.
This level comprises the information that is actually stored in the database in the form of tables. It also stores the
relationship among the data entities in relatively simple structures. At this level, the information available to the
user at the view level is unknown.
We can store the various attributes of an employee and relationships, e.g. with the manager can also be stored.
This is the highest level of abstraction. Only a part of the actual database is viewed by the users. This level exists to
ease the accessibility of the database by an individual user. Users view data in the form of rows and columns. Tables
and relations are used to store data. Multiple views of the same database may exist. Users can just view the data
and interact with the database, storage and implementation details are hidden from them.
Logical level – it will contain the fields and the attributes of data.
Data Abstraction
The main purpose of data abstraction is to achieve data independence in order to save the time and cost required
when the database is modified or altered.
Data Independence
Data Independence is mainly defined as a property of DBMS that helps you to change the database schema at one
level of a system without requiring to change the schema at the next level. it helps to keep the data separated from
all program that makes use of it.
We have namely two levels of data independence arising from these levels of abstraction:
Data Independence
UNIT 1: Database System Concepts & Architecture
Physical Level Data Independence
It refers to the characteristic of being able to modify the physical schema without any alterations to the conceptual
or logical schema, done for optimization purposes, e.g., the Conceptual structure of the database would not be
affected by any change in storage size of the database system server. Changing from sequential to random access
files is one such example. These alterations or modifications to the physical structure may include:
It refers characteristic of being able to modify the logical schema without affecting the external schema or
application program. The user view of the data would not be affected by any changes to the conceptual view of the
data. These changes may include insertion or deletion of attributes, altering table structures entities or relationships
to the logical schema, etc.
o A DBMS has appropriate languages and interfaces to express database queries and updates.
o Database languages can be used to read, store and update the data in the database.
o DDL stands for Data Definition Language. It is used to define database structure or pattern.
o Using the DDL statements, you can create the skeleton of the database.
o Data definition language is used to store the information of metadata like the number of tables and
schemas, their names, indexes, columns in each table, constraints, etc.
UNIT 1: Database System Concepts & Architecture
Here are some tasks that come under DDL:
These commands are used to update the database schema that's why they come under Data definition language.
DML stands for Data Manipulation Language. It is used for accessing and manipulating data in a database. It handles
user requests.
o DCL stands for Data Control Language. It is used to retrieve the stored or saved data.
(But in Oracle database, the execution of data control language does not have the feature of rolling back.)
There are the following operations which have the authorization of Revoke:
TCL is used to run the changes made by the DML statement. TCL can be grouped into a logical transaction.
UNIT 1: Database System Concepts & Architecture
Here are some tasks that come under TCL:
o Rollback: It is used to restore the database to original since the last Commit.
Components of DBMS
DBMS stands for DataBase Management System. DBMS is a type of software by which we can save and retrieve the
user's data with the security process. DBMS can manipulate the database with the help of a group of programs. The
DBMS can accept the request from the operating system to supply the data. The DBMS also can accept the request
to retrieve a large amount of data through the user and third-party software.
DBMS also give permission to the user to use the data according to their needs. The word "DBMS" contains
information regarding the database program and the users. It also provides an interface between the user and the
software. In this topic, we are going to discuss the various types of DBMS.
Components of DBMS
There are many components available in the DBMS. Each component has a significant task in the DBMS. A database
environment is a collection of components that regulates the use of data, management, and a group of data. These
components consist of people, the technique of Handel the database, data, hardware, software, etc. there are
several components available for the DBMS. We are going to explain five main topics of the database below.
1. Hardware
o Here the hardware means the physical part of the DBMS. Here the hardware includes output devices like a
printer, monitor, etc., and storage devices like a hard disk.
UNIT 1: Database System Concepts & Architecture
o In DBMS, information hardware is the most important visible part. The equipment which is used for the
visibility of the data is the printer, computer, scanner, etc. This equipment is used to capture the data and
present the output to the user.
o With the help of hardware, the DBMS can access and update the database.
o The server can store a large amount of data, which can be shared with the help of the user's own system.
o The database can be run in any system that ranges from microcomputers to mainframe computers. And this
database also provides an interface between the real worlds to the database.
o When we try to run any database software like MySQL, we can type any commands with the help of our
keyboards, and RAM, ROM, and processor are part of our computer system.
2. Software
o Software is defined as the collection of programs that are used to instruct the computer about its work. The
software consists of a set of procedures, programs, and routines associated with the computer system's
operation and performance. Also, we can say that computer software is a set of instructions that is used to
instruct the computer hardware for the operation of the computers.
o The software includes so many software like network software and operating software. The database
software is used to access the database, and the database application performs the task.
o This software has the ability to understand the database accessing language and then convert these
languages to real database commands and then execute the database.
o This is the main component as the total database operation works on a software or application. We can also
be called as database software the wrapper of the whole physical database, which provides an easy interface
for the user to store, update and delete the data from the database.
o Some examples of DBMS software include MySQL, Oracle, SQL Server, dBase, FileMaker, Clipper, Foxpro,
Microsoft Access, etc.
3. Data
o The term data means the collection of any raw fact stored in the database. Here the data are any type of raw
material from which meaningful information is generated.
o The database can store any form of data, such as structural data, non-structural data, and logical data.
o The structured data are highly specific in the database and have a structured format. But in the case of non-
structural data, it is a collection of different types of data, and these data are stored in their native format.
o We also call the database the structure of the DBMS. With the help of the database, we can create and
construct the DBMS. After the creation of the database, we can create, access, and update that database.
o The main reason behind discovering the database is to create and manage the data within the database.
o Data is the most important part of the DBMS. Here the database contains the actual data and metadata.
Here metadata means data about data.
o For example, when the user stores the data in a database, some data, such as the size of the data, the name
of the data, and some data related to the user, are stored within the database. These data are called
metadata.
UNIT 1: Database System Concepts & Architecture
4. Procedures
o The procedure is a type of general instruction or guidelines for the use of DBMS. This instruction includes
how to set up the database, how to install the database, how to log in and log out of the database, how to
manage the database, how to take a backup of the database, and how to generate the report of the
database.
o In DBMS, with the help of procedure, we can validate the data, control the access and reduce the traffic
between the server and the clients. The DBMS can offer better performance to extensive or complex
business logic when the user follows all the procedures correctly.
o The main purpose of the procedure is to guide the user during the management and operation of the
database.
o The procedure of the databases is so similar to the function of the database. The major difference between
the database procedure and database function is that the database function acts the same as the SQL
statement. In contrast, the database procedure is invoked using the CALL statement of the DBMS.
o Database procedures can be created in two ways in enterprise architecture. These two ways are as below.
2. <Datatype>,...)
3. IS
5. BEGIN
6. Execution section
7. EXCEPTION
8. Exception section
9. END
o Database Access Language is a simple language that allows users to write commands to perform the desired
operations on the data that is stored in the database.
o Database Access Language is a language used to write commands to access, upsert, and delete data stored in
a database.
o Users can write commands or query the database using Database Access Language before submitting them
to the database for execution.
o Through utilizing the language, users can create new databases and tables, insert data and delete data.
o Examples of database languages are SQL (structured query language), My Access, Oracle, etc. A database
language is comprised of two languages.
UNIT 1: Database System Concepts & Architecture
1. Data Definition Language(DDL):It is used to construct a database. DDL implements database schema at the
physical, logical, and external levels.
The following commands serve as the base for all DDL commands:
o ALTER<object>
o COMMENT
o CREATE<object>
o DESCRIBE<object>
o DROP<object>
o SHOW<object>
o USE<object>
2. Data Manipulation Language(DML): It is used to access a database. The DML provides the statements to retrieve,
modify, insert and delete the data from the database.
The following commands serve as the base for all DML commands:
o INSERT
o UPDATE
o DELETE
o LOCK
o CALL
o EXPLAIN PLAN
6. People
o The people who control and manage the databases and perform different types of operations on the
database in the DBMS.
o Database administrator-database administrator is the one who manages the complete database
management system. DBA takes care of the security of the DBMS, its availability, managing the license keys,
managing user accounts and access, etc.
o Software developer- theThis user group is involved in developing and designing the parts of DBMS. They can
handle massive quantities of data, modify and edit databases, design and develop new databases, and
troubleshoot database issues.
o End user - These days, all modern web or mobile applications store user data. How do you think they do it?
Yes, applications are programmed in such a way that they collect user data and store the data on a DBMS
system running on their server. End users are the ones who store, retrieve, update and delete data.
i. Native Users
v. Application Users
Administrative DBA –
Their job is to maintain the server and keep it functional. They are concerned with data backups, security,
troubleshooting, replication, migration, etc.
Cloud DBA –
Nowadays companies are preferring to save their workpiece on cloud storage. As it reduces the chance of
data loss and provides an extra layer of data security and integrity.
Development DBA –
They build and develop queries, stores procedure, etc. that meets firm or organization needs. They are par
at programming.
Application DBA –
They particularly manage all requirements of application components that interact with the database and
accomplish activities such as application installation and coordination, application upgrades, database
cloning, data load process management, etc.
Architect –
They are held responsible for designing schemas like building tables. They work to build a structure that
meets organizational needs. The design is further used by developers and development DBAs to design and
implement real applications.
OLAP DBA –
They design and build multi-dimensional cubes for determination support or OLAP systems.
Data Modeler –
In general, a data modeler is in charge of a portion of a data architect’s duties. A data modeler is typically
not regarded as a DBA, but this is not a hard and fast rule.
Task-Oriented DBA –
To concentrate on a specific DBA task, large businesses may hire highly specialised DBAs. They are quite
uncommon outside of big corporations. Recovery and backup DBA, whose responsibility it is to guarantee
that the databases of businesses can be recovered, is an example of a task-oriented DBA. However, this
specialism is not present in the majority of firms. These task-oriented DBAs will make sure that highly
qualified professionals are working on crucial DBA tasks when it is possible.
UNIT 1: Database System Concepts & Architecture
Database Analyst –
This position doesn’t actually have a set definition. Junior DBAs may occasionally be referred to as database
analysts. A database analyst occasionally performs functions that are comparable to those of a database
architect. The term “Data Administrator” is also used to describe database analysts and data analysts.
Additionally, some businesses occasionally refer to database administrators as data analysts.
Database Administrator manages and controls three levels of database internal level, conceptual level, and
external level of Database management system architecture and in discussion with the comprehensive user
community, gives a definition of the world view of the database. It then provides an external view of
different users and applications.
Database Administrator ensures held responsible to maintain integrity and security of database restricting
from unauthorized users. It grants permission to users of the database and contains a profile of each and
every user in the database.
Database Administrators are also held accountable that the database is protected and secured and that any
chance of data loss keeps at a minimum.
Database Administrator is solely responsible for reducing the risk of data loss as it backup the data at regular
intervals.
Decides hardware –
They decide on economical hardware, based on cost, performance, and efficiency of hardware, and best
suits the organization. It is hardware that is an interface between end users and the database.
Database Accessibility –
Database Administrator is solely responsible for giving permission to access data available in the database. It
also makes sure who has the right to change the content.
Database design –
DBA is held responsible and accountable for logical, physical design, external model design, and integrity and
security control.
Database implementation –
DBA implements DBMS and checks database loading at the time of its implementation.
DBA decides the user access level and security checks for accessing, modifying or manipulating data.
DBA is responsible for specifying various techniques for monitoring the database performance.
The operation management deals with the data problems which arises on day to day basis, and the
responsibilities include are:
1. The various programming and soft skills are required to DBA are as follows,
2. In addition, this aspect of database administration includes maintenance of data security, which involves
maintaining security authorization tables, conducting periodic security audits, investigating all known security
breaches.
3. To carry out all these functions, it is crucial that the DBA has all the accurate information about the company’s
data readily on hand. For this purpose he maintains a data dictionary.
4. The data dictionary contains definitions of all data items and structures, the various schemes, the relevant
authorization and validation checks and the different mapping definitions.
5. It should also have information about the source and destination of a data item and the flow of a data item as it is
used by a system. This type of information is a great help to the DBA in maintaining centralized control of data.
A Data Dictionary can be defined as a collection of information on all data elements or contents of databases such as
data types, and text descriptions of the system. It makes it easier for users and analysts to use data as well as
understand and have common knowledge about inputs, outputs, components of a database, and intermediate
calculations.
There is less information and details provided by data models. So, a data dictionary is essential and needed to have
proper knowledge and usage of contents. Data Dictionary provides all information about names that are used in
system models. Data Dictionary also provides information about entities, relationships, and attributes that are
present in the system model. As a part of the structured analysis and design tool, the implementation of a data
dictionary is done.
UNIT 1: Database System Concepts & Architecture
The following data name type of Information is used to store in a data dictionary:
Name Description
Name generally includes the primary name of all composite data or control items
Name
available, and the name of the external entity or data store.
Where or How it’s A data dictionary generally gives information about where and how data or control items
used? are used which may include an input/ output to process.
Integrated Data Dictionary can be seen as a catalog that can be maintained by the relational database. In previous
databases, there is not any functionality of Integrated Data Dictionary, so they use Stand Alone Data Dictionary.
Active Data Dictionary: Active Database Dictionary is a type of database that is updated automatically in
case any changes are to be done to the database. These are self-updating.
Passive Data Dictionary: Passive Databases are the databases that have to be maintained or updated
manually in case of any changes have been made to the system.
Stand Alone Data Dictionary is a type of flexible data dictionary as Database Administrator has ease of managing
data. It does not require data that is computer-based. It has no fixed format. But some elements are common in this
kind of database.
Data Elements: It has the elements like name, datatype, validation rules, etc.
Tables: These contain all the necessary information that is required for the table, how many rows in the
table, how many columns in the table, etc.
Programs: These are used for accessing the database, and can include SQL Queries, Reports, etc.
Relationship between Data Elements: This stores the relationship among the different databases, like
cardinality, connectivity, etc.
UNIT 1: Database System Concepts & Architecture
How to Create a Data Dictionary?
Data Dictionary can be formed with the help of the following relational database like MySQL, SQL Server, etc. Several
notations are present everywhere which enhances the quality of the Data Dictionary. We will also look into that.
Database Administrators can have the following templates like SQL Server, that help in making Data Dictionary.
Composition = Is composed of
Selection [ |] Represents OR
Example: Let us understand this by taking an example of the reservation system. In the reservation system,
“passenger” is a data item whose information is available on the data dictionary as follows:
Keys Values
Name Passenger
Aliases None
The main challenge that occurs in front of us is that a data dictionary is somehow difficult and it may lead to
take much time in case when data is not prepared previously.
Data Preparation is a bit lengthy and hectic process for a large scale of data.
When you don’t do Data Preparation, Data Dictionary is not that much efficient.
Data Dictionary can be an insecure process as if you give access to so many persons, it might be a challenge
to the security of the Data Dictionary.
Conclusion
Data dictionaries enhance the maintainability, security, accessibility, and quality of data.
Data dictionaries can be installed as a stand-alone program or integrated into the database management
system.
An informational database about the data within a database is called a data dictionary.
They include thorough explanations of all database elements, such as constraints, tables, columns, and
indexes.
UNIT 2: Data Modeling Technique
Data Models in DBMS
A Data Model in Database Management System (DBMS) is the concept of tools that are developed to summarize the
description of the database. Data Models provide us with a transparent picture of data which helps us in creating an
actual database. It shows us from the design of the data to its proper implementation of data.
The conceptual data model describes the database at a very high level and is useful to understand the needs or
requirements of the database. It is this model, that is used in the requirement-gathering process i.e. before the
Database Designers start making a particular database. One such popular model is the entity/relationship model (ER
model). The E/R model specializes in entities, relationships, and even attributes that are used by database designers.
In terms of this concept, a discussion can be made even with non-computer science(non-technical) users and
stakeholders, and their requirements can be understood.
Entity-Relationship Model( ER Model): It is a high-level data model which is used to define the data and the
relationships between them. It is basically a conceptual design of any database which is easy to design the view of
data.
Components of ER Model:
1. Entity: An entity is referred to as a real-world object. It can be a name, place, object, class, etc. These are
represented by a rectangle in an ER Diagram.
2. Attributes: An attribute can be defined as the description of the entity. These are represented by Eclipse in
an ER Diagram. It can be Age, Roll Number, or Marks for a Student.
3. Relationship: Relationships are used to define relations among different entities. Diamonds and Rhombus
are used to show Relationships.
UNIT 2: Data Modeling Technique
Characteristics of a conceptual data model
This type of Data Models are designed and developed for a business audience.
The conceptual model is developed independently of hardware specifications like data storage capacity,
location or software specifications like DBMS vendor and technology. The focus is to represent data as a user
will see it in the “real world.”
Conceptual data models known as Domain models create a common vocabulary for all stakeholders by establishing
basic concepts and scope
This type of data model is used to represent only the logical part of the database and does not represent the physical
structure of the database. The representational data model allows us to focus primarily, on the design part of the
database. A popular representational model is a Relational model. The relational Model consists of Relational
Algebra and Relational Calculus. In the Relational Model, we basically use tables to represent our data and the
relationships between them. It is a theoretical concept whose practical implementation is done in Physical Data
Model.
The advantage of using a Representational data model is to provide a foundation to form the base for the Physical
model
The physical Data Model is used to practically implement Relational Data Model. Ultimately, all data in a database is
stored physically on a secondary storage device such as discs and tapes. This is stored in the form of files, records,
and certain other data structures. It has all the information on the format in which the files are present and the
structure of the databases, the presence of external data structures, and their relation to each other. Here, we
basically save tables in memory so they can be accessed efficiently. In order to come up with a good physical model,
we have to work on the relational model in a better way. Structured Query Language (SQL) is used to practically
implement Relational Algebra.
This Data Model describes HOW the system will be implemented using a specific DBMS system. This model is
typically created by DBA and developers. The purpose is actual implementation of the database.
The physical data model describes data need for a single project or application though it maybe integrated
with other physical data models based on project scope.
Data Model contains relationships between tables that which addresses cardinality and nullability of the
relationships.
Developed for a specific version of a DBMS, location, data storage or technology to be used in the project.
Columns should have exact datatypes, lengths assigned and default values.
Primary and Foreign keys, views, indexes, access profiles, and authorizations, etc. are defined
1. Hierarchical Model
The hierarchical Model is one of the oldest models in the data model which was developed by IBM, in the 1950s. In a
hierarchical model, data are viewed as a collection of tables, or we can say segments that form a hierarchical
relation. In this, the data is organized into a tree-like structure where each record consists of one parent record and
UNIT 2: Data Modeling Technique
many children. Even if the segments are connected as a chain-like structure by logical associations, then the instant
structure can be a fan structure with multiple branches. We call the illogical associations as directional associations.
2. Network Model
The Network Model was formalized by the Database Task group in the 1960s. This model is the generalization of the
hierarchical model. This model can consist of multiple parent segments and these segments are grouped as levels
but there exists a logical association between the segments belonging to any level. Mostly, there exists a many-to-
many logical association between any of the two segments.
In the Object-Oriented Data Model, data and their relationships are contained in a single structure which is referred
to as an object in this data model. In this, real-world problems are represented as objects with different attributes.
All objects have multiple relationships between them. Basically, it is a combination of Object Oriented programming
and a Relational Database Model.
The float data model basically consists of a two-dimensional array of data models that do not contain any duplicate
elements in the array. This data model has one drawback it cannot store a large amount of data that is the tables can
not be of large size.
The Context data model is simply a data model which consists of more than one data model. For example, the
Context data model consists of ER Model, Object-Oriented Data Model, etc. This model allows users to do more than
one thing which each individual data model can do.
Semi-Structured data models deal with the data in a flexible way. Some entities may have extra attributes and some
entities may have some missing attributes. Basically, you can represent data here in a flexible way.
2. It helps us in finding the missing data and also in minimizing Data Redundancy.
4. The data model should be detailed enough to be used for building the physical database.
5. The information in the data model can be used for defining the relationship between tables, primary and
foreign keys, and stored procedures.
1. In the case of a vast database, sometimes it becomes difficult to understand the data model.
2. You must have the proper knowledge of SQL to use physical models.
3. Even smaller change made in structure require modification in the entire application.
5. To develop Data model one should know physical data stored characteristics.
UNIT 2: Data Modeling Technique
Conclusion
Data modeling is the process of developing data model for the data to be stored in a Database.
Data Models ensure consistency in naming conventions, default values, semantics, security while ensuring
quality of the data.
Data Model structure helps to define the relational tables, primary and foreign keys and stored procedures.
The main aim of conceptual model is to establish the entities, their attributes, and their relationships.
Logical data model defines the structure of the data elements and set the relationships between them.
A Physical Data Model describes the database specific implementation of the data model.
The main goal of a designing data model is to make certain that data objects offered by the functional team
are represented accurately.
The biggest drawback is that even smaller change made in structure require modification in the entire
application.
Reading this Data Modeling tutorial, you will learn from the basic concepts such as What is Data Model?
Introduction to different types of Data Model, advantages, disadvantages, and data model example.
If there are multiple nodes appear at the top level, then these can be called as root segments.
In the above figure, Electronics is the root node which has two children i.e. Televisions and Portable Electronics.
These two has further children for which they act as parent. For example: Television has children as Tube, LCD and
Plasma, for these three Television act as parent. It follows one to many relationship.
In the above figure, Project is the root node which has two children i.e. Project 1 and Project 2. Project 1 has 3
children and Project 2 has 2 children. Total there are 5 children i.e Department A, Department B and Department C,
they are network related children as we said that this model can have more than one parent. So, for the Department
B and Department C have two parents i.e. Project 1 and Project 2.
At the intersection of row and column there will be only one value for the tuple.
S.
No. Hierarchical Data Model Network Data Model Relational Data Model
To organize records, it uses It organizes records in the form of It organizes records in the form
2.
tree structure. directed graphs. of tables.
In addition to 1:1 and 1:n it also In addition to 1:1 and 1:n it also
It implements 1:1 and 1:n
3. implements many to many implements many to many
relations.
relationships. relationships.
UNIT 2: Data Modeling Technique
S.
No. Hierarchical Data Model Network Data Model Relational Data Model
This model lacks data There is partial data independence This model provides data
8.
independence. in this model. independence.
No such facility for querying No such facility for querying SQL-based declarative querying
9.
database is supported. database is supported. is supported.
&XML and XAML use this VAX-DBMS, DMS-1100 of UNIVAC It is mostly used in real world
13.
model. and SUPRADBMS’s use this model. applications. Oracle, SQL.
Enhanced ER Model
Prerequisite – Introduction of ER Model
Today the complexity of the data is increasing so it becomes more and more difficult to use the traditional ER model
for database modeling. To reduce this complexity of modeling we have to make improvements or enhancements to
the existing ER model to make it able to handle the complex application in a better way.
Enhanced entity-relationship diagrams are advanced database diagrams very similar to regular ER diagrams which
represent the requirements and complexities of complex databases.
It is a diagrammatic technique for displaying the Sub Class and Super Class; Specialization and Generalization; Union
or Category; Aggregation etc.
Generalization and Specialization: These are very common relationships found in real entities. However, this kind of
relationship was added later as an enhanced extension to the classical ER model. Specialized classes are often
called subclass while a generalized class is called a superclass, probably inspired by object-oriented programming. A
sub-class is best understood by “IS-A analysis”. The following statements hopefully make some sense to your mind
“Technician IS-A Employee”, and “Laptop IS-A Computer”.
An entity is a specialized type/class of another entity. For example, a Technician is a special Employee in a university
system Faculty is a special class of Employees. We call this phenomenon generalization/specialization. In the
example here Employee is a generalized entity class while the Technician and Faculty are specialized classes of
Employee.
Example:
This example instance of “sub-class” relationships. Here we have four sets of employees: Secretary, Technician, and
Engineer. The employee is a super-class of the rest three sets of individual sub-class is a subset of Employee set.
UNIT 2: Data Modeling Technique
An entity belonging to a sub-class is related to some super-class entity. For instance emp, no 1001 is a
secretary, and his typing speed is 68. Emp no 1009 is an engineer (sub-class) and her trade is “Electrical”, so
forth.
Sub-class entity “inherits” all attributes of super-class; for example, employee 1001 will have attributes eno,
name, salary, and typing speed.
UNIT 2: Data Modeling Technique
Enhanced ER model of above example
1. Total or Partial – A sub-classing relationship is total if every super-class entity is to be associated with some
sub-class entity, otherwise partial. Sub-class “job type based employee category” is partial sub-classing – not
necessary every employee is one of (secretary, engineer, and technician), i.e. union of these three types is a
proper subset of all employees. Whereas other sub-classing “Salaried Employee AND Hourly Employee” is
total; the union of entities from sub-classes is equal to the total employee set, i.e. every employee
necessarily has to be one of them.
2. Overlapped or Disjoint – If an entity from a super-set can be related (can occur) in multiple sub-class sets,
then it is overlapped sub-classing, otherwise disjoint. Both the examples: job-type based and salaries/hourly
employee sub-classing are disjoint.
Note – These constraints are independent of each other: can be “overlapped and total or partial” or “disjoint and
total or partial”. Also, sub-classing has transitive properties.
Set of Library Members is UNION of Faculty, Student, and Staff. A union relationship indicates either type;
for example, a library member is either Faculty or Staff or Student.
Below are two examples that show how UNION can be depicted in ERD – Vehicle Owner is UNION of
PERSON and Company, and RTO Registered Vehicle is UNION of Car and Truck.
You might see some confusion in Sub-class and UNION; consider an example in above figure Vehicle is super-class of
CAR and Truck; this is very much the correct example of the subclass as well but here use it differently we are saying
RTO Registered vehicle is UNION of Car and Vehicle, they do not inherit any attribute of Vehicle, attributes of car and
truck are altogether independent set, where is in sub-classing situation car and truck would be inheriting the
attribute of vehicle class.
An Enhanced Entity-Relationship (EER) model is an extension of the original Entity-Relationship (ER) model that
includes additional concepts and features to support more complex data modeling requirements. The EER model
includes all the elements of the ER model and adds new constructs, such as subtypes and supertypes, generalization
and specialization, and inheritance.
Subtypes and Supertypes: The EER model allows for the creation of subtypes and supertypes. A supertype is
a generalization of one or more subtypes, while a subtype is a specialization of a supertype. For example, a
vehicle could be a supertype, while car, truck, and motorcycle could be subtypes.
Generalization and Specialization: Generalization is the process of identifying common attributes and
relationships between entities and creating a supertype based on these common features. Specialization is
UNIT 2: Data Modeling Technique
the process of identifying unique attributes and relationships between entities and creating subtypes based
on these unique features.
Inheritance: Inheritance is a mechanism that allows subtypes to inherit attributes and relationships from
their supertype. This means that any attribute or relationship defined for a supertype is automatically
inherited by all its subtypes.
Constraints: The EER model allows for the specification of constraints that must be satisfied by entities and
relationships. Examples of constraints include cardinality constraints, which specify the number of
relationships that can exist between entities, and participation constraints, which specify whether an entity
is required to participate in a relationship.
Overall, the EER model provides a powerful and flexible way to model complex data relationships, making it
a popular choice for database design. An Enhanced Entity-Relationship (EER) model is an extension of the
traditional Entity-Relationship (ER) model that includes additional features to represent complex
relationships between entities more accurately. Some of the main features of the EER model are:
Subclasses and Superclasses: EER model allows for the creation of a hierarchical structure of entities where
a superclass can have one or more subclasses. Each subclass inherits attributes and relationships from its
superclass, and it can also have its unique attributes and relationships.
Specialization and Generalization: EER model uses the concepts of specialization and generalization to
create a hierarchy of entities. Specialization is the process of defining subclasses from a superclass, while
generalization is the process of defining a superclass from two or more subclasses.
Attribute Inheritance: EER model allows attributes to be inherited from a superclass to its subclasses. This
means that attributes defined in the superclass are automatically inherited by all its subclasses.
Union Types: EER model allows for the creation of a union type, which is a combination of two or more
entity types. The union type can have attributes and relationships that are common to all the entity types
that make up the union.
Aggregation: EER model allows for the creation of an aggregate entity that represents a group of entities as
a single entity. The aggregate entity has its unique attributes and relationships.
Multi-valued Attributes: EER model allows an attribute to have multiple values for a single entity instance.
For example, an entity representing a person may have multiple phone numbers.
Relationships with Attributes: EER model allows relationships between entities to have attributes. These
attributes can describe the nature of the relationship or provide additional information about the
relationship.
Generalization
Specialization
Aggregation
Generalization
Generalization is the process of extracting common properties from a set of entities and creating a generalized entity
from it. It is a bottom-up approach in which two or more entities can be generalized to a higher-level entity if they
UNIT 2: Data Modeling Technique
have some attributes in common. For Example, STUDENT and FACULTY can be generalized to a higher-level entity
called PERSON as shown in Figure 1. In this case, common attributes like P_NAME, and P_ADD become part of a
higher entity (PERSON), and specialized attributes like S_FEE become part of a specialized entity (STUDENT).
Generalization
Specialization
In specialization, an entity is divided into sub-entities based on its characteristics. It is a top-down approach where
the higher-level entity is specialized into two or more lower-level entities. For Example, an EMPLOYEE entity in an
Employee management system can be specialized into DEVELOPER, TESTER, etc. as shown in Figure 2. In this case,
common attributes like E_NAME, E_SAL, etc. become part of a higher entity (EMPLOYEE), and specialized attributes
like TES_TYPE become part of a specialized entity (TESTER).
Specialization
Attribute inheritance: allows lower level entities to inherit the attributes of higher level entities and vice
versa.
in diagram: Car entity is an inheritance of Vehicle entity ,So Car can acquire attributes
of Vehicle example:car can acquire Model attribute of Vehicle.
Participation inheritance: In participation inheritance, relationships involving higher level entity set also
inherited by lower level entity and vice versa.
in diagram: Vehicle entity has an relationship with Cycle entity ,So Cycle entity can acquire attributes of
lower level entities i.e Car and Bus since it is inheritance of Vehicle
Aggregation
An ER diagram is not capable of representing the relationship between an entity and a relationship which may be
required in some scenarios. In those cases, a relationship with its corresponding entities is aggregated into a higher-
level entity. Aggregation is an abstraction through which we can represent relationships as higher-level entity sets.
For Example, an Employee working on a project may require some machinery. So, REQUIRE relationship is needed
between the relationship WORKS_FOR and entity MACHINERY. Using aggregation, WORKS_FOR relationship with its
entities EMPLOYEE and PROJECT is aggregated into a single entity and relationship REQUIRE is created between the
aggregated entity and MACHINERY.
UNIT 2: Data Modeling Technique
Aggregation
The relational model represents how data is stored in Relational Databases. A relational database consists of a
collection of tables, each of which is assigned a unique name. Consider a relation STUDENT with attributes ROLL_NO,
NAME, ADDRESS, PHONE, and AGE shown in the table.
Table Student
4 SURESH DELHI 18
Important Terminologies
Attribute: Attributes are the properties that define an entity. e.g.; ROLL_NO, NAME, ADDRESS
Relation Schema: A relation schema defines the structure of the relation and represents the name of the
relation with its attributes. e.g.; STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, and AGE) is the relation
schema for STUDENT. If a schema has more than 1 relation, it is called Relational Schema.
Tuple: Each row in the relation is known as a tuple. The above relation contains 4 tuples, one of which is
shown as:
Relation Instance: The set of tuples of a relation at a particular instance of time is called a relation instance.
Table 1 shows the relation instance of STUDENT at a particular time. It can change whenever there is an
insertion, deletion, or update in the database.
Degree: The number of attributes in the relation is known as the degree of the relation.
The STUDENT relation defined above has degree 5.
Cardinality: The number of tuples in a relation is known as cardinality. The STUDENT relation defined above
has cardinality 4.
UNIT 3: Relational Data Model in Depth
Column: The column represents the set of values for a particular attribute. The column ROLL_NO is
extracted from the relation STUDENT.
ROLL_NO
NULL Values: The value which is not known or unavailable is called a NULL value. It is represented by blank
space. e.g.; PHONE of STUDENT having ROLL_NO 4 is NULL.
Relation Key: These are basically the keys that are used to identify the rows uniquely or also help in
identifying tables. These are of the following types.
Primary Key
Candidate Key
Super Key
Foreign Key
Alternate Key
Composite Key
While designing the Relational Model, we define some conditions which must hold for data present in the database
are called Constraints. These constraints are checked before performing any operation (insertion, deletion, and
updation ) in the database. If there is a violation of any of the constraints, the operation will fail.
Domain Constraints
These are attribute-level constraints. An attribute can only take values that lie inside the domain range. e.g.; If a
constraint AGE>0 is applied to STUDENT relation, inserting a negative value of AGE will result in failure.
Key Integrity
Every relation in the database should have at least one set of attributes that defines a tuple uniquely. Those set of
attributes is called keys. e.g.; ROLL_NO in STUDENT is key. No two students can have the same roll number. So a key
has two properties:
Referential Integrity
UNIT 3: Relational Data Model in Depth
When one attribute of a relation can only take values from another attribute of the same relation or any other
relation, it is called referential integrity. Let us suppose we have 2 relations
Table Student
4 SURESH DELHI 18 IT
Table Branch
BRANCH_CODE BRANCH_NAME
CS COMPUTER SCIENCE
IT INFORMATION TECHNOLOGY
CV CIVIL ENGINEERING
BRANCH_CODE of STUDENT can only take the values which are present in BRANCH_CODE of BRANCH which is called
referential integrity constraint. The relation which is referencing another relation is called REFERENCING RELATION
(STUDENT in this case) and the relation to which other relations refer is called REFERENCED RELATION (BRANCH in
this case).
An anomaly is an irregularity or something which deviates from the expected or normal state. When designing
databases, we identify three types of anomalies: Insert, Update, and Delete.
We can’t insert a row in REFERENCING RELATION if referencing attribute’s value is not present in the referenced
attribute value. e.g.; Insertion of a student with BRANCH_CODE ‘ME’ in STUDENT relation will result in an error
because ‘ME’ is not present in BRANCH_CODE of BRANCH.
UNIT 3: Relational Data Model in Depth
Deletion/ Updation Anomaly in Referenced Relation:
We can’t delete or update a row from REFERENCED RELATION if the value of REFERENCED ATTRIBUTE is used in the
value of REFERENCING ATTRIBUTE. e.g; if we try to delete a tuple from BRANCH having BRANCH_CODE ‘CS’, it will
result in an error because ‘CS’ is referenced by BRANCH_CODE of STUDENT, but if we try to delete the row from
BRANCH with BRANCH_CODE CV, it will be deleted as the value is not been used by referencing relation. It can be
handled by the following method:
On Delete Cascade
It will delete the tuples from REFERENCING RELATION if the value used by REFERENCING ATTRIBUTE is deleted from
REFERENCED RELATION. e.g.; For, if we delete a row from BRANCH with BRANCH_CODE ‘CS’, the rows in STUDENT
relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be deleted.
On Update Cascade
It will update the REFERENCING ATTRIBUTE in REFERENCING RELATION if the attribute value used by REFERENCING
ATTRIBUTE is updated in REFERENCED RELATION. e.g;, if we update a row from BRANCH with BRANCH_CODE ‘CS’ to
‘CSE’, the rows in STUDENT relation with BRANCH_CODE CS (ROLL_NO 1 and 2 in this case) will be updated with
BRANCH_CODE ‘CSE’.
Super Keys
Any set of attributes that allows us to identify unique rows (tuples) in a given relationship is known as super keys.
Out of these super keys, we can always choose a proper subset among these that can be used as a primary key. Such
keys are known as Candidate keys. If there is a combination of two or more attributes that are being used as the
primary key then we call it a Composite key.
Edgar F Codd proposed the relational database model where he stated rules. Now these are known as Codd’s Rules.
For any database to be the perfect one, it has to follow the rules.
Simple model: Relational Model is simple and easy to use in comparison to other languages.
Flexible: Relational Model is more flexible than any other relational model present.
Secure: Relational Model is more secure than any other relational model.
Data Integrity: The integrity of the data is maintained in the relational model.
Operations can be Applied Easily: It is better to perform operations in the relational model.
Because of the complex structure, the response time for queries is high.
Data is stored in tables having relationships between them called the Relational model.
UNIT 3: Relational Data Model in Depth
The relational model supports the operations like Data definition, Data manipulation, and Transaction
management.
Each column has a distinct name and they are representing attributes.
Types of Keys in Relational Model (Candidate, Super, Primary, Alternate and Foreign)
Keys are one of the basic requirements of a relational database model. It is widely used to identify the tuples(rows)
uniquely in the table. We also use keys to set up relations amongst various columns and tables of a relational
database.
Candidate Key
Primary Key
Super Key
Alternate Key
Foreign Key
Composite Key
Candidate Key
The minimal set of attributes that can uniquely identify a tuple is known as a candidate key. For Example, STUD_NO
in STUDENT relation.
A table can have multiple candidate keys but only one primary key.
The value of the Candidate Key is unique and may be null for a tuple.
Example:
The candidate key can be simple (having only one attribute) or composite as well.
Example:
Table STUDENT_COURSE
1 001 C001
2 056 C005
Note: In SQL Server a unique constraint that has a nullable column, allows the value ‘null‘ in that column only once.
That’s why the STUD_PHONE attribute is a candidate here, but can not be a ‘null’ value in the primary key attribute.
Primary Key
There can be more than one candidate key in relation out of which one can be chosen as the primary key. For
Example, STUD_NO, as well as STUD_PHONE, are candidate keys for relation STUDENT but STUD_NO can be chosen
as the primary key (only one out of many candidate keys).
It is a unique key.
It cannot be NULL.
Primary keys are not necessarily to be a single column; more than one column can also be a primary key for
a table.
Example:
Super Key
The set of attributes that can uniquely identify a tuple is known as Super Key. For Example, STUD_NO, (STUD_NO,
STUD_NAME), etc. A super key is a group of single or multiple keys that identifies rows in a table. It supports NULL
values.
Adding zero or more attributes to the candidate key generates the super key.
Example:
The candidate key other than the primary key is called an alternate key.
All the keys which are not primary keys are called alternate keys.
It is a secondary key.
Example:
Foreign Key
If an attribute can only take the values which are present as values of some other attribute, it will be a foreign key to
the attribute to which it refers. The relation which is being referenced is called referenced relation and the
corresponding attribute is called referenced attribute the relation which refers to the referenced relation is called
referencing relation and the corresponding attribute is called referencing attribute. The referenced attribute of the
referenced relation should be the primary key to it.
UNIT 3: Relational Data Model in Depth
It is a key it acts as a primary key in one table and it acts as
secondary key in another table.
For example, DNO is a primary key in the DEPT table and a non-key in EMP
Example:
Table STUDENT_COURSE
1 005 C001
2 056 C005
It may be worth noting that, unlike the Primary Key of any given relation, Foreign Key can be NULL as well as may
contain duplicate tuples i.e. it need not follow uniqueness constraint. For Example, STUD_NO in the
STUDENT_COURSE relation is not unique. It has been repeated for the first and third tuples. However, the STUD_NO
in STUDENT relation is a primary key and it needs to be always unique, and it cannot be null.
Sometimes, a table might not have a single column/attribute that uniquely identifies all the records of a table. To
uniquely identify rows of a table, a combination of two or more columns/attributes can be used. It still can give
duplicate values in rare cases. So, we need to find the optimal set of attributes that can uniquely identify rows in a
table.
Different combinations of attributes may give different accuracy in terms of identifying the rows uniquely.
Example:
It uniquely identifies a record in the It refers to the field in a table which is the primary key of
relational database table. another table.
It is a combination of UNIQUE and It can contain duplicate values and a table in a relational
Not Null constraints. database.
It does not allow NULL values. It can also contain NULL values.
Conclusion
In conclusion, the relational model makes use of a number of keys: Candidate keys allow for distinct identification,
the Primary key serves as the chosen identifier, Alternate keys offer other choices, and Foreign keys create vital
linkages that guarantee data integrity between tables. The creation of strong and effective relational databases
requires the thoughtful application of these keys.
In models like Entity-Relationship models, we did not have such features. Database Constraints can be categorized
into 3 main categories:
1. Constraints that are applied in the data model are called Implicit Constraints.
2. Constraints that are directly applied in the schemas of the data model, by specifying them in the DDL(Data
Definition Language). These are called Schema-Based Constraints or Explicit Constraints.
3. Constraints that cannot be directly applied in the schemas of the data model. We call these Application-
based or Semantic Constraints.
These are the restrictions or sets of rules imposed on the database contents. It validates the quality of the database.
It validates the various operations like data insertion, updation, and other processes that have to be performed
without affecting the integrity of the data. It protects us against threats/damages to the database. Mainly
Constraints on the relational database are of 4 types
Domain constraints
1. Domain Constraints
Every domain must contain atomic values(smallest indivisible units) which means composite and multi-
valued attributes are not allowed.
We perform a datatype check here, which means when we assign a data type to a column we limit the
values that it can contain. Eg. If we assign the datatype of attribute age as int, we can’t give it values other
than int datatype.
UNIT 3: Relational Data Model in Depth
Example:
123456789
01 Bikash Dutta
234456678
Explanation: In the above relation, Name is a composite attribute and Phone is a multi-values attribute, so it is
violating domain constraint.
These are called uniqueness constraints since it ensures that every tuple in the relation should be unique.
A relation can have multiple keys or candidate keys(minimal superkey), out of which we choose one of the
keys as the primary key, we don’t have any restriction on choosing the primary key out of candidate keys,
but it is suggested to go with the candidate key with less number of attributes.
Null values are not allowed in the primary key, hence Not Null constraint is also part of the key constraint.
Example:
01 Bikash 6000000009
02 Paul 9000090009
01 Tuhin 9234567892
Explanation: In the above table, EID is the primary key, and the first and the last tuple have the same value in EID ie
01, so it is violating the key constraint.
Entity Integrity constraints say that no primary key can take a NULL value, since using the primary key we
identify each tuple uniquely in a relation.
Example:
01 Bikash 9000900099
02 Paul 600000009
UNIT 3: Relational Data Model in Depth
EID Name Phone
Explanation: In the above relation, EID is made the primary key, and the primary key can’t take NULL values but in
the third tuple, the primary key is null, so it is violating Entity Integrity constraints.
The Referential integrity constraint is specified between two relations or tables and used to maintain the
consistency among the tuples in two relations.
This constraint is enforced through a foreign key, when an attribute in the foreign key of relation R1 has the
same domain(s) as the primary key of relation R2, then the foreign key of R1 is said to reference or refer to
the primary key of relation R2.
The values of the foreign key in a tuple of relation R1 can either take the values of the primary key for some
tuple in relation R2, or can take NULL values, but can’t be empty.
Example:
01 Divine 12
02 Dino 22
04 Vivian 14
DNO Place
12 Jaipur
13 Mumbai
14 Delhi
Explanation: In the above tables, the DNO of Table 1 is the foreign key, and DNO in Table 2 is the primary key. DNO =
22 in the foreign key of Table 1 is not allowed because DNO = 22 is not defined in the primary key of table 2.
Therefore, Referential integrity constraints are violated here.
Data Integrity: The relational database model enforces data integrity through various constraints such as
primary keys, foreign keys, and unique constraints. This ensures that the data in the database is accurate,
consistent, and valid.
Flexibility: The relational database model is highly flexible and can handle a wide range of data types and
structures. It also allows for easy modification and updating of the data without affecting other parts of the
database.
Scalability: The relational database model can scale to handle large amounts of data by adding more tables,
indexes, or partitions to the database. This allows for better performance and faster query response times.
Security: The relational database model provides robust security features to protect the data in the
database. These include user authentication, authorization, and encryption of sensitive data.
Data consistency: The relational database model ensures that the data in the database is consistent across
all tables. This means that if a change is made to one table, the corresponding changes will be made to all
related tables.
Query Optimization: The relational database model provides a query optimizer that can analyze and
optimize SQL queries to improve their performance. This allows for faster query response times and better
scalability.
Few database relations have certain limits which can’t be expanded further.
Complexity: The relational model can be complex and difficult to understand, particularly for users who are
not familiar with SQL and database design principles. This can make it challenging to set up and maintain a
relational database.
Performance: The relational model can suffer from performance issues when dealing with large data sets or
complex queries. In particular, joins between tables can be slow, and indexing strategies can be difficult to
optimize.
Scalability: While the relational model is generally scalable, it can become difficult to manage as the
database grows in size. Adding new tables or indexes can be time-consuming, and managing relationships
between tables can become complex.
Cost: Relational databases can be expensive to license and maintain, particularly for large-scale
deployments. Additionally, relational databases often require dedicated hardware and specialized software
to run, which can add to the cost.
Limited flexibility: The relational model is designed to work with tables that have predefined structures and
relationships. This can make it difficult to work with data that does not fit neatly into a table-based format,
such as unstructured or semi-structured data.
Data redundancy: In some cases, the relational model can lead to data redundancy, where the same data is
stored in multiple tables. This can lead to inefficiencies and can make it difficult to ensure data consistency
across the database.
UNIT 3: Relational Data Model in Depth
Difference between Data Security and Data Integrity
1. Data Security :
Data security refers to the prevention of data from unauthorized users. It is only allowed to access the data to the
authorized users. In database, the DBA or head of department can access all the data. Some users are only allowed
to retrieve data, whereas others are allowed to retrieve as well as to modify the data.
2. Data Integrity :
Data integrity is defined as the data contained in the database is both correct and consistent. For this purpose, the
data stored in the database must satisfy certain types of procedures (rules). The data in a database must be correct
and consistent. DBMS provides different ways to implement such types of constraints (rules). It can be implemented
by rules i.e., Primary Key, Secondary Key, Foreign key. This improves data integrity in a database.
2. Its motive is the protection of data. Its motive is the validity of data.
It relates to the physical form of data against accidental or It relates to the logical protection (correct, complete
6. intentional loss or misuse and destruction. and consistence) of data.
7. It avoids unauthorized access of data. It avoids human error when data is entered.
Relational Algebra is a procedural query language. Relational algebra mainly provides a theoretical foundation for
relational databases and SQL. The main purpose of using Relational Algebra is to define operators that transform one
or more input relations into an output relation. Given that these operators accept relations as input and produce
relations as output, they can be combined and used to express potentially complex queries that transform
potentially many input relations (whose data are stored in the database) into a single output relation (the query
results). As it is pure mathematics, there is no use of English Keywords in Relational Algebra and operators are
represented using symbols.
Fundamental Operators
1. Selection(σ)
2. Projection(π)
3. Union(U)
4. Set Difference(-)
5. Set Intersection(∩)
6. Rename(ρ)
7. Cartesian Product(X)
Example:
A B C
1 2 4
2 2 3
3 2 3
4 3 4
For the above relation, σ(c>3)R will select the tuples which have c more than 3.
UNIT 3: Relational Data Model in Depth
A B C
1 2 4
4 3 4
Note: The selection operator only selects the required tuples but does not display them. For display, the data
projection operator is used.
B C
2 4
2 3
3 4
3. Union(U): Union operation in relational algebra is the same as union operation in set theory.
Example:
FRENCH
Student_Name Roll_Number
Ram 01
Mohan 02
Vivek 13
Geeta 17
UNIT 3: Relational Data Model in Depth
GERMAN
Student_Name Roll_Number
Vivek 13
Geeta 17
Shyam 21
Rohan 25
Consider the following table of Students having different optional subjects in their course.
π(Student_Name)FRENCH U π(Student_Name)GERMAN
Student_Name
Ram
Mohan
Vivek
Geeta
Shyam
Rohan
Note: The only constraint in the union of two relations is that both relations must have the same set of Attributes.
4. Set Difference(-): Set Difference in relational algebra is the same set difference operation as in set theory.
Example: From the above table of FRENCH and GERMAN, Set Difference is used as follows
π(Student_Name)FRENCH - π(Student_Name)GERMAN
UNIT 3: Relational Data Model in Depth
Student_Name
Ram
Mohan
Note: The only constraint in the Set Difference between two relations is that both relations must have the same set
of Attributes.
5. Set Intersection(∩): Set Intersection in relational algebra is the same set intersection operation in set theory.
Example: From the above table of FRENCH and GERMAN, the Set Intersection is used as follows
π(Student_Name)FRENCH ∩ π(Student_Name)GERMAN
Student_Name
Vivek
Geeta
Note: The only constraint in the Set Difference between two relations is that both relations must have the same set
of Attributes.
7. Cross Product(X): Cross-product between two relations. Let’s say A and B, so the cross product between A X B will
result in all the attributes of A followed by each attribute of B. Each record of A will pair with every record of B.
Example:
Ram 14 M
Sona 15 F
Kim 20 M
B
UNIT 3: Relational Data Model in Depth
ID Course
1 DS
2 DBMS
AXB
Ram 14 M 1 DS
Ram 14 M 2 DBMS
Sona 15 F 1 DS
Sona 15 F 2 DBMS
Kim 20 M 1 DS
Kim 20 M 2 DBMS
Note: If A has ‘n’ tuples and B has ‘m’ tuples then A X B will have ‘ n*m ‘ tuples.
Derived Operators
These are some of the derived operators, which are derived from the fundamental operators.
1. Natural Join(⋈)
2. Conditional Join
1. Natural Join(⋈): Natural join is a binary operator. Natural join between two or more relations will result in a set of
all combinations of tuples where they have an equal common attribute.
Example:
UNIT 3: Relational Data Model in Depth
EMP
Name ID Dept_Name
A 120 IT
B 125 HR
C 110 Sales
D 111 IT
DEPT
Dept_Name Manager
Sales Y
Production Z
IT A
EMP.Dept_Name = DEPT.Dept_Name
EMP ⋈ DEPT
A 120 IT A
C 110 Sales Y
D 111 IT A
3. Conditional Join: Conditional join works similarly to natural join. In natural join, by default condition is equal
between common attributes while in conditional join we can specify any condition such as greater than, less
than, or not equal.
Example:
UNIT 3: Relational Data Model in Depth
R
ID Sex Marks
1 F 45
2 F 55
3 F 60
ID Sex Marks
10 M 20
11 M 22
12 M 59
1 F 45 10 M 20
1 F 45 11 M 22
2 F 55 10 M 20
2 F 55 11 M 22
3 F 60 10 M 20
3 F 60 11 M 22
3 F 60 12 M 59
UNIT 3: Relational Data Model in Depth
Relational Calculus
As Relational Algebra is a procedural query language, Relational Calculus is a non-procedural query language. It
basically deals with the end results. It always tells me what to do but never tells me how to do it.
In this SQL tutorial, you’ll learn all the basic to advanced concepts like SQL queries, SQL join, SQL injection, SQL
insert, create tables in SQL. SQL is easy to learn, there are no higher prerequisites to learning SQL.
What is a Database?
Data is the new fuel of this world but only data is unorganized information, so to organize that data we make a
database. A database is the organized collection of structured data which is usually controlled by a database
management system (DBMS). Databases help us with easily storing, accessing, and manipulating data held on a
computer.
A Table is a combination of rows and columns. For creating a table we have to define the structure of a table by
adding names to columns and providing data type and size of data to be stored in columns.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Syntax:
columnN datatype(size)
);
Let us create a table to store data of Customers, so the table name is Customer, Columns are Name, Country, age,
phone, and so on.
CustomerName VARCHAR(50),
LastName VARCHAR(50),
Country VARCHAR(50),
Age int(2),
Phone int(10)
);
Output:
To add data to the table, we use INSERT INTO, the syntax is as shown below:
Syntax:
Example Query
Output:
We can also use CREATE TABLE to create a copy of an existing table. In the new table, it gets the exact column
definition all columns or specific columns can be selected.
If an existing table was used to create a new table, by default the new table would be populated with the existing
values ??from the old table.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Syntax:
FROM existing_table_name
WHERE ….;
Query:
FROM customer;
Output:
We can use * instead of column name to copy whole table to another table.
Note : In this query another table will be created with all the columns instead of just two columns.
Syntax :
SELECT *
FROM existing_table_name
WHERE ….;
Query :
Output :
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
You can also use LIMIT to insert specific number of records from old table. In below example it will populate
new table with first three records.
Syntax :
Query :
Output :
1. Only Values
The first method is to specify only the value of data to be inserted without the column names.
In the second method we will specify both the columns which we want to fill and their corresponding values as
shown below:
Query:
Output:
Method 2 (Inserting values in only specified columns) – SQL INSERT INTO Statement
If we want to insert values in the specified columns then we use the following query:
Query:
Output:
Notice that the columns for which the values are not provided are filled by null. Which are the default values for
those columns?
We can use the SELECT statement with INSERT INTO statement to copy rows from one table and insert them into
another table. The use of this statement is similar to that of the INSERT INTO statement. The difference is that the
SELECT statement is used here to select data from a different table. The different ways of using INSERT INTO SELECT
statement are shown below:
We can copy all the data of a table and insert it into a different table.
Syntax:
We have used the SELECT statement to copy the data from one table and the INSERT INTO statement to insert from
a different table.
We can copy only those columns of a table that we want to insert into a different table.
Syntax:
We have used the SELECT statement to copy the data of the selected columns only from the second table and the
INSERT INTO statement to insert in the first table.
We can copy specific rows from a table to insert into another table by using the WHERE clause with the SELECT
statement. We have to provide appropriate conditions in the WHERE clause to select specific rows.
Output:
This query will insert all the data of the table LateralStudent in the table Student. The table Student will now look like
this,
If we want to insert values in the specified columns then we use the following query:
Output:
This query will insert the data in the columns ROLL_NO, NAME, and Age of the table LateralStudent in the table
Student and the remaining columns in the Student table will be filled by null which is the default value of the
remaining columns. The table Student will now look like this,
Output:
This query will select only the first row from table LateralStudent to insert into the table Student. The table Student
will now look like this,
Syntax:
(Value1, Value2,Value3,…..),
(Value1, Value2,Value3,…..),
……………………….. ;
Where,
For each new row inserted, you need To provide Multiple lists of values where each list is separated by “,”.
Every list of values corresponds to values to be inserted in each new row of the table. Values in the next list
tell values to be inserted in the next Row of the table.
Example:
Query:
Output:
In a very simple way, we can say that SQL commands(UPDATE and DELETE) are used to change the data that is
already in the database. The SQL DELETE command uses a WHERE clause.
Syntax
WHERE condition;
Note: In the above query the SET statement is used to set new values to the particular column and
the WHERE clause is used to select the rows for which the columns are needed to be updated. If we have not used
the WHERE clause then the columns in all the rows will be updated. So the WHERE clause is used to choose the
particular rows.
Query:
Output:
Update the column NAME and set the value to ‘Nitin’ in the rows where the Age is 22.
Update the columns NAME to ‘Satyam’ and Country to ‘USA’ where CustomerID is 1.
Output:
Note: For updating multiple columns we have used comma(,) to separate the names and values of two columns.
If we omit the WHERE clause from the update query then all of the rows will get updated.
Output:
Syntax:
WHERE some_condition;
Parameter Explanation
Note: We can delete single as well as multiple records depending on the condition we provide in the WHERE clause.
If we omit the WHERE clause then all of the records will be deleted and the table will be empty.
Query:
Assume we have created a table named GFG_Employee which contains the personal details of the Employee
including their id, name, email and department etc. as shown below −
Output
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
GFG_Employees
You can delete the records named Rithvik by using the below query:
Query
Output
output
Delete the rows from the table GFG_Employees where the department is “Development”. This will delete 2
rows(the first row and the seventh row).
Query
Output
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
output
To remove all the entries from the table, you can use the following query:
Query
Output
All of the records in the table will be deleted, there are no records left to display. The table GFG_EMPLOyees will
become empty.
output
SQL | Constraints
Constraints are the rules that we can apply on the type of data in a table. That is, we can specify the limit on the type
of data that can be stored in a particular column in a table using constraints.
NOT NULL: This constraint tells that we cannot store a null value in a column. That is, if a column is specified
as NOT NULL then we will not be able to store null in this particular column any more.
UNIQUE: This constraint when specified with a column, tells that all the values in the column must be
unique. That is, the values in any row of a column must not be repeated.
PRIMARY KEY: A primary key is a field which can uniquely identify each row in a table. And this constraint is
used to specify a field in a table as primary key.
FOREIGN KEY: A Foreign key is a field which can uniquely identify each row in a another table. And this
constraint is used to specify a field as Foreign key.
CHECK: This constraint helps to validate the values of a column to meet a particular condition. That is, it
helps to ensure that the value stored in a column meets a specific condition.
DEFAULT: This constraint specifies a default value for the column when no value is specified by the user.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
How to specify constraints?
We can specify constraints at the time of creating the table using CREATE TABLE statement. We can also specify the
constraints after creating a table using ALTER TABLE statement.
Syntax:
Below is the syntax to create constraints using CREATE TABLE statement at the time of creating the table.
....
);
constraint_name: Name of the constraint. for example- NOT NULL, UNIQUE, PRIMARY KEY etc.
1. NOT NULL –
If we specify a field in a table to be NOT NULL. Then the field will never accept null value. That is, you will be not
allowed to insert a new row in the table without specifying any value to this field.
For example, the below query creates a table Student with the fields ID and NAME as NOT NULL. That is, we are
bound to specify values for these two fields every time we wish to insert a new row.
ADDRESS varchar(20)
);
2. UNIQUE –
This constraint helps to uniquely identify each row in the table. i.e. for a particular column, all the rows should have
unique values. We can have more than one UNIQUE columns in a table.
For example, the below query creates a table Student where the field ID is specified as UNIQUE. i.e, no two students
can have the same ID. Unique constraint in detail.
(
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
ID int(6) NOT NULL UNIQUE,
NAME varchar(10),
ADDRESS varchar(20)
);
3. PRIMARY KEY –
Primary Key is a field which uniquely identifies each row in the table. If a field in a table as primary key, then the field
will not be able to contain NULL values as well as all the rows should have unique values for this field. So, in other
words we can say that this is combination of NOT NULL and UNIQUE constraints.
A table can have only one field as primary key. Below query will create a table named Student and specifies the field
ID as primary key.
NAME varchar(10),
ADDRESS varchar(20),
PRIMARY KEY(ID)
);
4. FOREIGN KEY –
Foreign Key is a field in a table which uniquely identifies each row of a another table. That is, this field points to
primary key of another table. This usually creates a kind of link between the tables.
Consider the two tables as shown below:
Orders
1 2253 3
2 3325 3
3 4521 2
4 8532 1
Customers
1 RAMESH DELHI
2 SURESH NOIDA
3 DHARMESH GURGAON
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
As we can see clearly that the field C_ID in Orders table is the primary key in Customers table, i.e. it uniquely
identifies each row in the Customers table. Therefore, it is a Foreign Key in Orders table.
Syntax:
C_ID int,
(i) CHECK –
Using the CHECK constraint we can specify a condition for a field, which should be satisfied at the time of entering
values for this field.
For example, the below query creates a table Student and specifies the condition for the field AGE as (AGE >= 18 ).
That is, the user will not be allowed to enter any record in the table with AGE < 18. Check constraint in detail
);
(ii) DEFAULT –
This constraint is used to provide a default value for the fields. That is, if at the time of entering new records in the
table if the user does not specify any value for these fields then the default value will be assigned to them.
For example, the below query will create a table named Student and specify the default value for the field AGE as
18.
);
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Primary key constraint in SQL
A primary key constraint depicts a key comprising one or more columns that will help uniquely identify every
tuple/record in a table.
Properties :
1. No duplicate values are allowed, i.e. Column assigned as primary key should have UNIQUE values only.
2. NO NULL values are present in column with Primary key. Hence there is Mandatory value in column having
Primary key.
3. Only one primary key per table exist although Primary key may have multiple columns.
4. No new row can be inserted with the already existing primary key.
5. Classified as : a) Simple primary key that has a Single column 2) Composite primary key has Multiple column.
The primary key can be created in a table using PRIMARY KEY constraint. It can be created at two levels.
1. Column
2. Table.
Syntax :
Name varchar2(20),
Address varchar2(50)
);
Here NOT NULL is added to make sure ID should have unique values. SQL will automatically set null values to the
primary key if it is not specified.
Example-1 :
To verify the working of Primary key :
Output :
1 row created
Example-2 :
Let’s see if you will insert the same values again.
Output :
Example-3 :
Output :
Primary Key cannot contain Null Values so That too will throw an error.
Syntax:
Name varchar2(20),
Address varchar2(50),
);
Here, you have only one Primary Key in a table but it consists of Multiple Columns(Id, Name). However, the
Following are permissible.
Since multiple columns make up Primary Key so both the rows are considered different. SQL permits either of the
two values can be duplicated but the combination must be unique.
Syntax :
It is necessary that the column added as primary key MUST contain unique values or else it will be violated. An id
cannot be made Primary key if it contains duplicate values. It violates the basic rule of Primary Key. Altering the table
to add Id as a primary key that may contain duplicate values generates an error.
Output :
Also, A column added as primary key using alter statement should not have null values. Altering table to add Id as
primary key that may contain null values generates an error.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output :
Error at line 1: column contains NULL values; cannot alter to NOT NULL
Set Operations
Union
Union of the sets A and B, denoted by A ∪ B, is the set of distinct elements that belong to set A or set B, or both.
Venn diagram of A ∪ B
Intersection
The intersection of the sets A and B, denoted by A ∩ B, is the set of elements that belong to both A and B i.e. set of
the common elements in A and B.
Venn diagram of A ∩ B
A and B are disjoint sets since both of them have no common elements.
Set Difference
The difference between sets is denoted by ‘A – B’, which is the set containing elements that are in A but not in B. i.e.,
all elements of A except the element of B.
Complement
The complement of a set A, denoted by AC is the set of all the elements except the elements in A. Complement of the
set A is U – A.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Find AC
Addition of sets A and B, referred to as Minkowski addition, is the set in whose elements are the sum of each
possible pair of elements from the 2 sets (that is one element is from set A and the other is from set B).
Set subtraction follows the same rule, but with the subtraction operation on the elements. It is to be observed that
these operations are operable only on numeric data types. Even if operated otherwise, it would only be a symbolic
representation without any significance. Further, it can be seen easily that set addition is commutative, while
subtraction is not.
1) Count()
2) Sum()
3) Avg()
4) Min()
5) Max()
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Now let us understand each Aggregate function with a example:
Id Name Salary
-----------------------
1 A 80
2 B 40
3 C 60
4 D 70
5 E 60
6 F Null
Count():
Count(salary): Return number of Non Null values over the column salary. i.e 5.
Count(Distinct Salary): Return number of distinct Non Null values over the column salary .i.e 4
Sum():
sum(salary): Sum all Non Null values of Column salary i.e., 310
Avg():
Min():
Min(salary): Minimum value in the salary column except NULL i.e., 40.
Output: 116
3. CHAR_LENGTH(): Doesn’t work for SQL Server. Use LEN() for SQL Server. This function is used to find the
length of a word.
Output: 6
5. CHARACTER_LENGTH(): Doesn’t work for SQL Server. Use LEN() for SQL Server. This function is used to find
the length of a line.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
6. Syntax: SELECT CHARACTER_LENGTH('geeks for geeks');
Output: 15
Output: ‘GeeksforGeeks’
9. CONCAT_WS(): This function is used to add two words or strings with a symbol as concatenating symbol.
Output: geeks_for_geeks
11. FIND_IN_SET(): This function is used to find a symbol from a set of symbols.
Output: 2
13. FORMAT(): This function is used to display a number in the given format.
Output: ‘98.10%’
15. INSERT(): This function is used to insert the data into a database.
16. Syntax: INSERT INTO database (geek_id, geek_name) VALUES (5000, 'abc');
19. LCASE(): This function is used to convert the given string into lower case.
21. LEFT(): This function is used to SELECT a sub string from the left of given size or characters.
Output: geeks
Output: 13
25. LOCATE(): This function is used to find the nth position of the given word in a string.
27. LOWER(): This function is used to convert the upper case string into lower case.
Output: geeksforgeeks.org
29. LPAD(): This function is used to make the given string of the given size by adding the given symbol.
31. Output:
000geeks
32. LTRIM(): This function is used to cut the given sub string from the original string.
Output: geeks
34. MID(): This function is to find a word from the given position and of the given size.
Output: for
36. POSITION(): This function is used to find position of the first occurrence of the given alphabet.
Output: 2
38. REPEAT(): This function is used to write the given string again and again till the number of times mentioned.
Output: geeksgeeks
40. REPLACE(): This function is used to cut the given string by removing the given sub string.
Output: geeks
Output: ‘gro.skeegrofskeeg’
44. RIGHT(): This function is used to SELECT a sub string from the right end of the given size.
Output: ‘.org’
46. RPAD(): This function is used to make the given string as long as the given size by adding the given symbol on
the right.
Output: ‘geeks000’
48. RTRIM(): This function is used to cut the given sub string from the original string.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
49. Syntax: RTRIM('geeksxyxzyyy', 'xyz');
Output: ‘geeks’
50. SPACE(): This function is used to write the given number of spaces.
Output: ‘ ‘
If string1 and string2 are the same, the STRCMP function will return 0.
If string1 is smaller than string2, the STRCMP function will return -1.
Output: -1
54. SUBSTR(): This function is used to find a sub string from the a string from the given position.
Output: ‘geeks’
56. SUBSTRING(): This function is used to find an alphabet from the mentioned size and the given string.
Output: ‘G’
58. SUBSTRING_INDEX(): This function is used to find a sub string before the given symbol.
Output: ‘www’
60. TRIM(): This function is used to cut the given symbol from the string.
Output: 123
62. UCASE(): This function is used to make the string in upper case.
64. Output:
GEEKSFORGEEKS
For storing a date or a date and time value in a database,MySQL offers the following data types:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
DATE format YYYY-MM-DD
NOW()
Query:
SELECT NOW();
Output:
CURDATE()
Query:
SELECT CURDATE();
Output:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
CURTIME()
Query:
SELECT CURTIME();
Output:
DATE()
Extracts the date part of a date or date/time expression. Example: For the below table named ‘Test’
Id Name BirthTime
Query:
Output:
Name BirthDate
Pratik 1996-09-26
EXTRACT()
Syntax
Several units can be considered but only some are used such as MICROSECOND, SECOND, MINUTE, HOUR, DAY,
WEEK, MONTH, QUARTER, YEAR, etc. And ‘date’ is a valid date expression. Example: For the below table named
‘Test’
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Id Name BirthTime
Query:
Output:
Name Birthday
Pratik 26
Query:
Output:
Name BirthYear
Pratik 1996
Query:
Output:
Name BirthSecond
Pratik 581
DATE_ADD()
Syntax:
Where, date – valid date expression, and expr is the number of intervals we want to add. and type can be one of the
following: MICROSECOND, SECOND, MINUTE, HOUR, DAY, WEEK, MONTH, QUARTER, YEAR, etc. Example: For the
below table named ‘Test’
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Id Name BirthTime
Query:
Output:
Name BirthTimeModified
Query:
Output:
Name BirthDayModified
Query:
Output:
Name BirthSecond
DATE_SUB()
Subtracts a specified time interval from a date. The syntax for DATE_SUB is the same as DATE_ADD just the
difference is that DATE_SUB is used to subtract a given interval of date.
DATEDIFF()
Syntax:
DATEDIFF(date1, date2);
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
date1 & date2- date/time expression
Query:
Output:
DateDiff
10
DATE_FORMAT()
Syntax:
DATE_FORMAT(date,format);
the date is a valid date and the format specifies the output format for the date/time. The formats that can be used
are:
%f-Microseconds (000000-999999)
%H-Hour (00-23)
%h-Hour (01-12)
%I-Hour (01-12)
%k-Hour (0-23)
%l-Hour (1-12)
%p-AM or PM
%S-Seconds (00-59)
%s-Seconds (00-59)
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
%T-Time, 24-hour (hh:mm: ss)
%V-Week (01-53) where Sunday is the first day of the week, used with %X
%v-Week (01-53) where Monday is the first day of the week, used with %x
%X-Year for the week where Sunday is the first day of the week, four digits, used with %V
%x-Year for the week where Monday is the first day of the week, four digits, used with %v
NULL values could be possible because at the time of data entry information is not available. So SQL supports a
special value known as NULL which is used to represent the values of attributes that may be unknown or not apply
to a tuple. SQL places a NULL value in the field in the absence of a user-defined value. For example, the
Apartment_number attribute of an address applies only to addresses that are in apartment buildings and not to
other types of residences.
So, NULL values are those values in which there is no data value in the particular field in the table.
A NULL value is used to represent a missing value, but it usually has one of three different interpretations:
It is often not possible to determine which of the meanings is intended. Hence, SQL does not distinguish
between the different meanings of NULL.
Setting a NULL value is appropriate when the actual value is unknown, or when a value is not meaningful.
A NULL value is not equivalent to a value of ZERO if the data type is a number and is not equivalent to spaces
if the data type is a character.
Suppose if any column has a NULL value, then UNIQUE, FOREIGN key, and CHECK constraints will ignore by
SQL.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
In general, each NULL value is considered to be different from every other NULL in the database. When a NULL is
involved in a comparison operation, the result is considered to be UNKNOWN. Hence, SQL uses a three-valued logic
with values True, False, and Unknown. It is, therefore, necessary to define the results of three-valued logical
expressions when the logical connectives AND, OR, and NOT are used.
SQL allows queries that check whether an attribute value is NULL. Rather than using = or to compare an attribute
value to NULL, SQL uses IS and IS NOT. This is because SQL considers each NULL value as being distinct from every
other NULL value, so equality comparison is not appropriate.
Query:
Fname VARCHAR(50),
Lname VARCHAR(50),
SSN VARCHAR(11),
Phoneno VARCHAR(15),
Salary FLOAT
);
VALUES
Suppose we find the Fname and Lname of the Employee having no Super_ssn then the query will be:
Query:
Output:
Query:
Output:
We can update the NULL values present in a table using the UPDATE statement in SQL. To do so, we can use the IS
NULL operator in the WHERE clause to select the rows with NULL values and then we can set the new value using the
SET keyword.
Let’s suppose that we want to update SSN in the row where it is NULL.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Query:
UPDATE Employee
Output:
Basics of SQL
Nested queries are a way to perform more complex queries by embedding one query within another. A nested
query is a query that appears inside another query, and it helps retrieve data from multiple tables or apply
conditions based on the results of another query. The result of inner query is used in execution of outer query. We
will use STUDENT, COURSE, STUDENT_COURSE tables for understanding nested queries.
STUDENT
C_ID C_NAME
C1 DSA
C2 Programming
C3 DBMS
STUDENT_COURSE
S_ID C_ID
S1 C1
S1 C3
S2 C1
S3 C2
S4 C2
S4 C3
Independent Nested Queries: In independent nested queries, query execution starts from innermost query
to outermost queries. The execution of inner query is independent of outer query, but the result of inner
query is used in execution of outer query. Various operators like IN, NOT IN, ANY, ALL etc are used in writing
independent nested queries. IN: If we want to find out S_ID who are enrolled in C_NAME ‘DSA’ or ‘DBMS’,
we can write it with the help of independent nested query and IN operator. From COURSE table, we can find
out C_ID for C_NAME ‘DSA’ or DBMS’ and we can use these C_IDs for finding S_IDs
from STUDENT_COURSE TABLE. STEP 1: Finding C_ID for C_NAME =’DSA’ or ‘DBMS’
Select C_ID from COURSE where C_NAME = ‘DSA’ or C_NAME = ‘DBMS’ STEP 2: Using C_ID of step 1 for
finding S_ID Select S_ID from STUDENT_COURSE where C_ID IN
(SELECT C_ID from COURSE where C_NAME = ‘DSA’ or C_NAME=’DBMS’); The inner query will return a set
with members C1 and C3 and outer query will return those S_IDs for which C_ID is equal to any member of
set (C1 and C3 in this case). So, it will return S1, S2 and S4. Note: If we want to find out names of STUDENTs
who have either enrolled in ‘DSA’ or ‘DBMS’, it can be done as: Select S_NAME from STUDENT where S_ID IN
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
(Select S_ID from STUDENT_COURSE where C_ID IN (SELECT C_ID from COURSE where C_NAME=’DSA’
or C_NAME=’DBMS’)); NOT IN: If we want to find out S_IDs of STUDENTs who have neither enrolled in
‘DSA’ nor in ‘DBMS’, it can be done as: Select S_ID from STUDENT where S_ID NOT IN
(Select S_ID from STUDENT_COURSE where C_ID IN (SELECT C_ID from COURSE where C_NAME=’DSA’
or C_NAME=’DBMS’)); The innermost query will return a set with members C1 and C3. Second inner query
will return those S_IDs for which C_ID is equal to any member of set (C1 and C3 in this case) which are S1, S2
and S4. The outermost query will return those S_IDs where S_ID is not a member of set (S1, S2 and S4). So it
will return S3.
Co-related Nested Queries: In co-related nested queries, the output of inner query depends on the row
which is being currently executed in outer query. e.g.; If we want to find out S_NAME of STUDENTs who are
enrolled in C_ID ‘C1’, it can be done with the help of co-related nested query as: Select S_NAME
from STUDENT S where EXISTS ( select * from STUDENT_COURSE SC where S.S_ID=SC.S_ID and
SC.C_ID=’C1’); For each row of STUDENT S, it will find the rows from STUDENT_COURSE where S.S_ID =
SC.S_ID and SC.C_ID=’C1’. If for a S_ID from STUDENT S, atleast a row exists in STUDENT_COURSE SC
with C_ID=’C1’, then inner query will return true and corresponding S_ID will be returned as output.
SELECT StudentName
FROM Students
WHERE StudentID IN (
SELECT StudentID
FROM Grades
);
This article has been contributed by Muhammed Sami. Please write comments if you find anything incorrect, or
you want to share more information about the topic discussed above
Overview :
SQL is a computer language that is used for storing, manipulating, and retrieving data in a structured format. This
language was invented by IBM. Here SQL stands for Structured Query Language. Interacting databases with SQL
queries, we can handle a large amount of data. There are several SQL-supported database servers such as
MySQL, PostgreSQL, sqlite3 and so on. Data can be stored in a secured and structured format through these
database servers. SQL queries are often used for data manipulation and business insights better.
SQL Database :
Here, we will discuss the queries and will understand with the help of examples.
Query-1 :
Show existing databases –
Let’s consider the existing database like information_schema, mysql, performance_schema, sakila, student, sys, and
world. And if you want to show the exiting database then we will use the show database query as follows.
SHOW DATABASES;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output :
information_schema
mysql
performance_schema
sakila
student
sys
world
Query-2 :
Drop a database –
Suppose we want to drop the database namely student.
SHOW DATABASES;
Database Name
information_schema
mysql
performance_schema
sakila
sys
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Database Name
world
Query-3 :
Create a database –
Suppose we want to create a database namely a bank.
SHOW DATABASES;
Database Name
bank
information_schema
mysql
performance_schema
sakila
sys
world
Query-4 :
Using a database –
USE bank;
Query-5 :
Create a Table –
Here data type may be varchar, integer, date, etc.
column1 datatype,
column2 datatype,
....
);
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Example –
EmployeeID int,
FirstName varchar(55),
LastName varchar(55),
Email varchar(150),
DOB date
);
Query-6 :
Show tables in the same database –
SHOW TABLES;
Query-7 :
Dropping a Table –
Query-8 :
Inserting values into an existing table –
VALUES(1111,'Dipak','Bera','dipakbera@gmail.com','1994-11-22');
Query-9 :
Fetching values in a table –
Query-10 :
Not Null –
We can specify which column does not accept the null value when we insert a value(row) in a table. It will be done at
the time of table creation.
column2 datatype,
....
);
Query-11 :
Unique –
We can also specify that entries in a particular column should be unique.
column2 datatype,
....
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
);
Example –
FirstName varchar(55),
LastName varchar(55)
);
PRIMARY KEY –
The constraint PRIMARY KEY suggests that entries should be neither null nor duplicate corresponding to the
specified column.
FName varchar(55),
LName varchar(55),
Email varchar(100),
DOB date,
);
FOREIGN KEY –
The FOREIGN KEY is used to build a connection between the current table and the previous table containing
the primary key.
AType varchar(20),
OBal int,
OD date,
CurBal int,
);
Here, AccNo column in the Account table is referred to by CustID column in the Customer table. Here
Account table is a child table and the Customer table is the parent table.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
ORDER BY :
The ORDER BY keyword is used to show the result in ascending or descending order. By default, it is in ascending
order.
Syntax –
FROM table_name
Scenario-1 :
Suppose we have the Account Table as follows.
Output :
(By default it will be in increasing order)
Scenario-2 :
For descending order :
GROUP BY :
This keyword is used for grouping the results.
Example –
Output :
AType count(AType)
savings 2
current 2
JOIN CONCEPTS :
Here, we will discuss the join concept as follows.
LEFT JOIN :
The LEFT JOIN keyword returns all records from the left table (table1) along with the matching records from
the right table (table2).
Syntax –
SELECT column_name(s)
FROM table1
ON table1.column_name = table2.column_name;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
LEFT JOIN
RIGHT JOIN :
The RIGHT JOIN keyword returns all records from the right table (table2) along with the matching records
from the left table (table1).
Syntax –
SELECT column_name(s)
FROM table1
ON table1.column_name = table2.column_name;
RIGHT JOIN
INNER JOIN :
The INNER JOIN keyword returns all matching records from both the table.
Syntax –
SELECT column_name(s)
FROM table1
ON table1.column_name = table2.column_name;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
INNER JOIN
FULL JOIN :
The FULL JOIN or FULL OUTER JOIN keyword returns all records from both the table.
Syntax –
SELECT column_name(s)
FROM table1
ON table1.column_name = table2.column_name;
FULL JOIN
Note –
This keyword is not used in MySQL’s latest version. Instead, the keyword UNION is used. Here the syntax is
as follows.
UNION
SELF JOIN :
This is a regular join between aliases of the same table.
Syntax –
SELECT column_name(s)
WHERE condition;
WHERE CLAUSE :
This clause is used for filtering our data.
Syntax –
FROM table_name
WHERE condition;
Example –
Output :
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
AccNo CurBal
1111 6000
1113 4000
1114 1000
HAVING CLAUSE :
This is required as the WHERE clause does not support aggregate functions such as count, min, max, avg, sum, and so
on.
FROM table_name
HAVING condition;
Example –
Output :
AccNo CurBal
1111 6000
INNER JOIN
LEFT JOIN
RIGHT JOIN
FULL JOIN
NATURAL JOIN
StudentCourse
A. INNER JOIN
The INNER JOIN keyword selects all rows from both the tables as long as the condition is satisfied. This keyword will
create the result-set by combining all rows from both the tables where the condition satisfies i.e value of the
common field will be the same.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
INNER JOIN table2
ON table1.matching_column = table2.matching_column;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
table1: First table.
table2: Second table
matching_column: Column common to both the tables.
Note: We can also write JOIN instead of INNER JOIN. JOIN is same as INNER JOIN.
This query will show the names and age of students enrolled in different courses.
Output:
B. LEFT JOIN
This join returns all the rows of the table on the left side of the join and matches rows for the table on the right side
of the join. For the rows for which there is no matching row on the right side, the result-set will contain null. LEFT
JOIN is also known as LEFT OUTER JOIN.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
LEFT JOIN table2
ON table1.matching_column = table2.matching_column;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Note: We can also use LEFT OUTER JOIN instead of LEFT JOIN, both are the same.
SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
LEFT JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
C. RIGHT JOIN
RIGHT JOIN is similar to LEFT JOIN. This join returns all the rows of the table on the right side of the join and
matching rows for the table on the left side of the join. For the rows for which there is no matching row on the left
side, the result-set will contain null. RIGHT JOIN is also known as RIGHT OUTER JOIN.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
RIGHT JOIN table2
ON table1.matching_column = table2.matching_column;
Note: We can also use RIGHT OUTER JOIN instead of RIGHT JOIN, both are the same.
SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
RIGHT JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
Output:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
D. FULL JOIN
FULL JOIN creates the result-set by combining results of both LEFT JOIN and RIGHT JOIN. The result-set will contain
all the rows from both tables. For the rows for which there is no matching, the result-set will contain NULL values.
Syntax:
SELECT table1.column1,table1.column2,table2.column1,....
FROM table1
FULL JOIN table2
ON table1.matching_column = table2.matching_column;
SELECT Student.NAME,StudentCourse.COURSE_ID
FROM Student
FULL JOIN StudentCourse
ON StudentCourse.ROLL_NO = Student.ROLL_NO;
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Output:
NAME COURSE_ID
HARSH 1
PRATIK 2
RIYANKA 2
DEEP 3
SAPTARHI 1
DHANRAJ NULL
ROHIT NULL
NIRAJ NULL
NULL 4
NULL 5
NULL 4
Natural join can join tables based on the common columns in the tables being joined. A natural join returns all rows
by matching values in common columns having same name and data type of columns and that column should be
present in both tables.
Both table must have at list one common column with same column name and same data type.
Example:
Employee
1 Ram 10
2 Jon 30
3 Bob 50
Department
Dept_id Dept_name
10 IT
30 HR
40 TIS
1 Ram 10 10 IT
2 Jon 30 30 HR
PL/SQL Introduction
PL/SQL is a block structured language that enables developers to combine the power of SQL with procedural
statements.All the statements of a block are passed to oracle engine all at once which increases processing speed
and decreases the traffic.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Basics of PL/SQL
•PL/SQL stands for Procedural Language extensions to the Structured Query Language (SQL).
PL/SQL is a combination of SQL along with the procedural features of programming languages.
PL/SQL includes procedural language elements like conditions and loops. It allows declaration of constants
and variables, procedures and functions, types and variable of those types and triggers.
Disadvantages of SQL:
SQL doesn’t provide the programmers with a technique of condition checking, looping and branching.
SQL statements are passed to Oracle engine one at a time which increases traffic and decreases speed.
Features of PL/SQL:
1. PL/SQL is basically a procedural language, which provides the functionality of decision making, iteration and
many more features of procedural programming languages.
2. PL/SQL can execute a number of queries in one block using single command.
3. One can create a PL/SQL unit such as procedures, functions, packages, triggers, and types, which are stored
in the database for reuse by applications.
4. PL/SQL provides a feature to handle the exception which occurs in PL/SQL block known as exception
handling block.
5. Applications written in PL/SQL are portable to computer hardware or operating system where Oracle is
operational.
SQL PL/SQL
SQL is a single query that is used to perform DML and PL/SQL is a block of codes that used to write the entire
DDL operations. program blocks/ procedure/ function, etc.
It is declarative, that defines what needs to be done, PL/SQL is procedural that defines how the things needs
rather than how things need to be done. to be done.
Cannot contain PL/SQL code in it. It is an extension of SQL, so it can contain SQL inside it.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
Structure of PL/SQL Block:
PL/SQL extends SQL by adding constructs found in procedural languages, resulting in a structural language that is
more powerful than SQL. The basic unit in PL/SQL is a block. All PL/SQL programs are made up of blocks, which can
Typically, each block performs a logical action in the program. A block has the following structure:
DECLARE
declaration statements;
BEGIN
executable statements
EXCEPTIONS
END;
Declare section starts with DECLARE keyword in which variables, constants, records as cursors can be
declared which stores data temporarily. It basically consists definition of PL/SQL identifiers. This part of the
code is optional.
Execution section starts with BEGIN and ends with END keyword.This is a mandatory section and here the
program logic is written to perform any task like loops and conditional statements. It supports
all DML commands, DDL commands and SQL*PLUS built-in functions as well.
Exception section starts with EXCEPTION keyword.This section is optional which contains statements that
are executed when a run-time error occurs. Any exceptions can be handled in this section.
PL/SQL identifiers
There are several PL/SQL identifiers such as variables, constants, procedures, cursors, triggers etc.
1. Variables: Like several other programming languages, variables in PL/SQL must be declared prior to its use.
They should have a valid name and data type as well. Syntax for declaration of variables:
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
variable_name datatype [NOT NULL := value ];
Blocks in PL/SQL
In PL/SQL, All statements are classified into units that is called Blocks. PL/SQL blocks can include variables, SQL
statements, loops, constants, conditional statements and exception handling. Blocks can also build a function or a
procedure or a package.
The Declaration section: Code block start with a declaration section, in which memory variables, constants, cursors
and other oracle objects can be declared and if required initialized.
The Begin section: Consist of set of SQL and PL/SQL statements, which describe processes that have to be applied to
table data. Actual data manipulation, retrieval, looping and branching constructs are specified in this section.
The Exception section: This section deals with handling errors that arise during execution data manipulation
statements, which make up PL/SQL code block. Errors can arise due to syntax, logic and/or validation rule.
Broadly, PL/SQL blocks are two types: Anonymous blocks and Named blocks are as follows:
1. Anonymous blocks: In PL/SQL, That’s blocks which is not have header are known as anonymous blocks. These
blocks do not form the body of a function or triggers or procedure. Example: Here a code example of find greatest
number with Anonymous blocks.
2. Named blocks: That’s PL/SQL blocks which having header or labels are known as Named blocks. These blocks can
either be subprograms like functions, procedures, packages or Triggers. Example: Here a code example of find
greatest
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
PL/SQL Architecture
Prerequisite : PL/SQL Introduction
In Oracle, PL/SQL (Procedural Language/SQL) is the procedural language extension to the non-procedural SQL. It
combines the data manipulation power of SQL and the procedural power of standard programming
languages. PL/SQL was developed by Oracle Corporation within the early ’90s to reinforce the capabilities of SQL. It
integrates well with SQL* PLUS and other application development products of Oracle. PL/SQL is the superset of SQL.
It provides SQL data manipulation commands and SQL data types. In PL/SQL, a block without any name is called
Anonymous Block. PL/ SQL block consists of various functions, library, procedures, trigger, packages etc.
In PL/SQL the semicolon (;) is placed at the end of an SQL statement or PL/SQL control statement.
Section keyword DECLARE, BEGIN and EXCEPTION are not followed by semicolons.
END keyword and all other PL/SQL statements require a semicolon to terminate the statements.
Before you start creating code, define the goals and capabilities of your PL/SQL software.
Select names for constants, variables, and other identifiers that are both meaningful and descriptive.
To keep track of the modifications in your PL/SQL code, use version control tools (like Git).
Use the COMMIT and ROLLBACK instructions to provide proper transaction management.
Features of PL/SQL :
PL/SQL have an error-checking facility and displays user-friendly messages when an error occurs in a
program.
It offers logging and debugging capabilities, including the capacity to use exception messages.
When certain data events, such as INSERT, UPDATE, or DELETE actions on a table, occur, triggers are
specialized forms of stored processes that are automatically invoked.
The PL/SQL runtime system is a technology and not an independent product. This technology is actually like an
engine that exhibits PL/SQL blocks, subprograms like functions and procedures. This engine can be installed in an
Oracle Server or in application development tools such as Oracle Form Builder, Oracle Reports Builder etc.
UNIT 4: Structured Query Language (should be taught preferably in Laboratory)
PL/SQL Architecture
These two environments are independent of each other. In either environment, the PL/SQL engine accepts any valid
PL/SQL block as input. The PL/SQL engine executes the procedural part of the statements and sends the SQL
statement executer in the Oracle Server. A single transfer is required to send the block from the application to the
Oracle Server, thus improving performance, especially in a Client-Server network. PL/SQL code can also be stored in
the Oracle server as subprograms that can be referenced by any number of applications connected to the database.
Advantages of PL/SQL :
Disadvantages of PL/SQL :
Subsequently, R, Boyce, and E. F. Codd introduced a stronger definition of 3NF called Boyce-Codd Normal Form. With
the exception of 1NF, all these normal forms are based on functional dependency among the attributes of a table.
Higher normal forms that go beyond BCNF were introduced later such as Fourth Normal Form (4NF) and Fifth
Normal Form (5NF). However, these later normal forms deal with situations that are very rare.
Now, we are going to describe the process of Normalization by considering an Example. Let us look into that.
In this example, we will look into that how we can normalize the database with the help of different Normal Forms.
We will look into each of the Normal Forms separately in this example.
UNIT 5: Functional Dependencies & Normalization
Consider the table 1 shown below:
Now, we are re-structuring the table according to the 1st Normal Form.
Primary Key does not functionally dependent on any subset of Candidate Key.
Table 1
Table 2
ID Subject
1 DBMS
1 OS
2 CN
2 COA
3 OS
Here, the Id in Table 2 is Foreign Key to the Id in Table 1. Now, we have to remove Transitive Functional
Dependency from our Table to Normalize our Database. A Transitive Functional Dependency basically tells us that
there is an indirect relationship between functional dependency.
Table 1
Table 2
ID Subject
1 DBMS
1 OS
2 CN
2 COA
3 OS
UNIT 5: Functional Dependencies & Normalization
Table 3
Result_ID Result
1 Pass
2 Fail
3 On Hold
Finally, Our Database is Normalized. From the above-mentioned example, we have reached our level of
Normalization. In fact, there are also some higher forms or next levels of Normalization. Now, we are going to
discuss them one by one.
Sometimes, when the database is in the 3rd Normal Form, there exist some anomalies in DBMS, like when more
than one Candidate Keys is present in the Database. This has to be removed under BCNF. BCNF is also called 3.5NF.
Whenever a Database contains multivalued and independent data in two or more tables, then the database is to be
considered in the 4th Normal Form.
Whenever a Database Table is not in 4NF, and we cannot divide it into smaller tables keeping our data safe with us,
then our Database is in 5th Normal Form.
UNIT 5: Functional Dependencies & Normalization
Summary of Normalization in a Nutshell
Normal
Form Test Remedy (Normalization)
The relation should have no non-atomic attributes or Form a name relation for each non-
1NF
nested relations. atomic attribute or nested relation.
4NF two attributes of a table are independent of each other Decompose the table into two subtables.
but both depend on a
third attribute.
It can be observed that values of attribute college name, college rank, and course are being repeated which can lead
to problems. Problems caused due to redundancy are:
Insertion anomaly
Deletion anomaly
Updation anomaly
Insertion Anomaly
If a student detail has to be inserted whose course is not being decided yet then insertion will not be possible till the
time course is decided for the student.
This problem happens when the insertion of a data record is not possible without adding some additional unrelated
data to the record.
Deletion Anomaly
If the details of students in this table are deleted then the details of the college will also get deleted which should
not occur by common sense. This anomaly happens when the deletion of a data record results in losing some
unrelated information that was stored as part of the record that was deleted from a table.
It is not possible to delete some information without losing some other information in the table as well.
Updation Anomaly
Suppose the rank of the college changes then changes will have to be all over the database which will be time-
consuming and computationally costly.
UNIT 5: Functional Dependencies & Normalization
Student_ID Name Contact College Course Rank
All places should be updated, If updation does not occur at all places then the database will be in an inconsistent
state.
Redundancy in a database occurs when the same data is stored in multiple places. Redundancy can cause various
problems such as data inconsistencies, higher storage requirements, and slower data retrieval.
Data Inconsistency: Redundancy can lead to data inconsistencies, where the same data is stored in multiple
locations, and changes to one copy of the data are not reflected in the other copies. This can result in
incorrect data being used in decision-making processes and can lead to errors and inconsistencies in the
data.
Storage Requirements: Redundancy increases the storage requirements of a database. If the same data is
stored in multiple places, more storage space is required to store the data. This can lead to higher costs and
slower data retrieval.
Update Anomalies: Redundancy can lead to update anomalies, where changes made to one copy of the data
are not reflected in the other copies. This can result in incorrect data being used in decision-making
processes and can lead to errors and inconsistencies in the data.
Performance Issues: Redundancy can also lead to performance issues, as the database must spend more
time updating multiple copies of the same data. This can lead to slower data retrieval and slower overall
performance of the database.
Security Issues: Redundancy can also create security issues, as multiple copies of the same data can be
accessed and manipulated by unauthorized users. This can lead to data breaches and compromise
the confidentiality, integrity, and availability of the data.
Maintenance Complexity: Redundancy can increase the complexity of database maintenance, as multiple
copies of the same data must be updated and synchronized. This can make it more difficult to troubleshoot
and resolve issues and can require more time and resources to maintain the database.
Data Duplication: Redundancy can lead to data duplication, where the same data is stored in multiple
locations, resulting in wasted storage space and increased maintenance complexity. This can also lead to
confusion and errors, as different copies of the data may have different values or be out of sync.
Data Integrity: Redundancy can also compromise data integrity, as changes made to one copy of the data
may not be reflected in the other copies. This can result in inconsistencies and errors and can make it
difficult to ensure that the data is accurate and up-to-date.
UNIT 5: Functional Dependencies & Normalization
Usability Issues: Redundancy can also create usability issues, as users may have difficulty accessing the
correct version of the data or may be confused by inconsistencies and errors. This can lead to frustration and
decreased productivity, as users spend more time searching for the correct data or correcting errors.
To prevent redundancy in a database, normalization techniques can be used. Normalization is the process of
organizing data in a database to eliminate redundancy and improve data integrity. Normalization involves breaking
down a larger table into smaller tables and establishing relationships between them. This reduces redundancy and
makes the database more efficient and reliable.
Enhanced Query Performance: By eliminating the need for intricate joins, redundancy helps expedite data
retrieval.
Offline Access: In offline circumstances, redundant copies allow data access even in the absence of
continuous connectivity.
Increased Availability: Redundancy helps to increase fault tolerance, which makes data accessible even in
the event of server failures.
Increased storage requirements: Redundant data takes up additional storage space within the database,
which can increase costs and slow down performance.
Inconsistency: If the same data is stored in multiple places within the database, there is a risk that updates
or changes made to one copy of the data may not be reflected in other copies, leading to inconsistency and
potentially incorrect results.
Difficulty in maintenance: With redundant data, it becomes more difficult to maintain the accuracy and
consistency of the data. It requires more effort and resources to ensure that all copies of the data are
updated correctly.
Increased risk of errors: When data is redundant, there is a greater risk of errors in the database. For
example, if the same data is stored in multiple tables, there is a risk of inconsistencies between the tables.
Reduced flexibility: Redundancy can reduce the flexibility of the database. For example, if a change needs to
be made to a particular piece of data, it may need to be updated in multiple places, which can be time-
consuming and error-prone.
Functional dependencies are used to mathematically express relations among database entities and are very
important to understand advanced concepts in Relational Database System and understanding problems in
competitive exams like Gate.
Example:
UNIT 5: Functional Dependencies & Normalization
roll_no name dept_name dept_building
42 abc CO A4
43 pqr IT A3
44 xyz CO A4
45 xyz IT A3
46 mno EC B2
47 jkl ME B2
From the above table we can conclude some valid functional dependencies:
roll_no → { name, dept_name, dept_building },→ Here, roll_no can determine values of fields name,
dept_name and dept_building, hence a valid Functional dependency
roll_no → dept_name , Since, roll_no can determine whole set of {name, dept_name, dept_building}, it can
determine its subset dept_name also.
dept_name → dept_building , Dept_name can identify the dept_building accurately, since departments with
different dept_name will also have a different dept_building
More valid functional dependencies: roll_no → name, {roll_no, name} ⇢ {dept_name, dept_building}, etc.
name → dept_name Students with the same name can have different dept_name, hence this is not a valid
functional dependency.
dept_building → dept_name There can be multiple departments in the same building. Example, in the
above table departments ME and EC are in the same building B2, hence dept_building → dept_name is an
invalid functional dependency.
More invalid functional dependencies: name → roll_no, {name, dept_name} → roll_no, dept_building →
roll_no, etc.
3. Transitivity: If X → Y and Y → Z are both valid dependencies, then X→Z is also valid by the Transitivity rule.
Example, roll_no → dept_name & dept_name → dept_building, then roll_no → dept_building is also valid.
UNIT 5: Functional Dependencies & Normalization
Types of Functional Dependencies in DBMS
In Trivial Functional Dependency, a dependent is always a subset of the determinant. i.e. If X → Y and Y is the
subset of X, then it is called trivial functional dependency
Example:
42 abc 17
43 pqr 18
44 xyz 18
Here, {roll_no, name} → name is a trivial functional dependency, since the dependent name is a subset of
determinant set {roll_no, name}. Similarly, roll_no → roll_no is also an example of trivial functional dependency.
In Non-trivial functional dependency, the dependent is strictly not a subset of the determinant. i.e. If X → Y and Y is
not a subset of X, then it is called Non-trivial functional dependency.
Example:
42 abc 17
43 pqr 18
44 xyz 18
Here, roll_no → name is a non-trivial functional dependency, since the dependent name is not a subset
of determinant roll_no. Similarly, {roll_no, name} → age is also a non-trivial functional dependency, since age is not
a subset of {roll_no, name}
UNIT 5: Functional Dependencies & Normalization
3. Multivalued Functional Dependency
In Multivalued functional dependency, entities of the dependent set are not dependent on each other. i.e. If a →
{b, c} and there exists no functional dependency between b and c, then it is called a multivalued functional
dependency.
For example,
42 abc 17
43 pqr 18
44 xyz 18
45 abc 19
Here, roll_no → {name, age} is a multivalued functional dependency, since the dependents name & age are not
dependent on each other(i.e. name → age or age → name doesn’t exist !)
In transitive functional dependency, dependent is indirectly dependent on determinant. i.e. If a → b & b → c, then
according to axiom of transitivity, a → c. This is a transitive functional dependency.
For example,
42 abc CO 4
43 pqr EC 2
44 xyz IT 1
45 abc EC 2
Here, enrol_no → dept and dept → building_no. Hence, according to the axiom of transitivity, enrol_no →
building_no is a valid functional dependency. This is an indirect functional dependency, hence called Transitive
functional dependency.
UNIT 5: Functional Dependencies & Normalization
5. Fully Functional Dependency
In full functional dependency an attribute or a set of attributes uniquely determines another attribute or set of
attributes. If a relation R has attributes X, Y, Z with the dependencies X->Y and X->Z which states that those
dependencies are fully functional.
In partial functional dependency a non key attribute depends on a part of the composite key, rather than the whole
key. If a relation R has attributes X, Y, Z where X and Y are the composite key and Z is non key attribute. Then X->Z is
a partial functional dependency in RBDMS.
Functional dependencies having numerous applications in the field of database management system. Here are some
applications listed below:
1. Data Normalization
Data normalization is the process of organizing data in a database in order to minimize redundancy and increase
data integrity. Functional dependencies play an important part in data normalization. With the help of functional
dependencies we are able to identify the primary key, candidate key in a table which in turns helps in normalization.
2. Query Optimization
With the help of functional dependencies we are able to decide the connectivity between the tables and the
necessary attributes need to be projected to retrieve the required data from the tables. This helps in query
optimization and improves performance.
3. Consistency of Data
Functional dependencies ensures the consistency of the data by removing any redundancies or inconsistencies that
may exist in the data. Functional dependency ensures that the changes made in one attribute does not affect
inconsistency in another set of attributes thus it maintains the consistency of the data in database.
Functional dependencies ensure that the data in the database to be accurate, complete and updated. This helps to
improve the overall quality of the data, as well as it eliminates errors and inaccuracies that might occur during data
analysis and decision making, thus functional dependency helps in improving the quality of data in database.
Normalization is the process of minimizing redundancy from a relation or set of relations. Redundancy in relation
may cause insertion, deletion, and update anomalies. So, it helps to minimize the redundancy in relations. Normal
forms are used to eliminate or reduce redundancy in database tables.
Levels of Normalization
If a relation contains a composite or multi-valued attribute, it violates the first normal form, or the relation is in the
first normal form if it does not contain any composite or multi-valued attribute. A relation is in first normal form if
every attribute in that relation is single-valued attribute.
A table is in 1 NF if:
Example 1:
Relation STUDENT in table 1 is not in 1NF because of multi-valued attribute STUD_PHONE. Its decomposition into
1NF has been shown in table 2.
Example
UNIT 5: Functional Dependencies & Normalization
Example 2:
ID Name Courses
------------------
1 A c1, c2
2 E c3
3 M c2, c3
ID Name Course
------------------
1 A c1
1 A c2
2 E c3
3 M c2
3 M c3
Note: A database design is considered as bad if it is not even in the First Normal Form (1NF).
The second Normal Form (2NF) is based on the concept of fully functional dependency. The second Normal Form
applies to relations with composite keys, that is, relations with a primary key composed of two or more attributes. A
relation with a single-attribute primary key is automatically in at least 2NF. A relation that is not in 2NF may suffer
from the update anomalies. To be in the second normal form, a relation must be in the first normal form and the
relation must not contain any partial dependency. A relation is in 2NF if it has No Partial Dependency, i.e., no non-
prime attribute (attributes that are not part of any candidate key) is dependent on any proper subset of any
candidate key of the table. In other words,
A relation that is in First Normal Form and every non-primary-key attribute is fully functionally dependent on the
primary key, then the relation is in Second Normal Form (2NF).
Note – If the proper subset of the candidate key determines a non-prime attribute, it is called partial dependency.
The normalization of 1NF relations to 2NF involves the removal of partial dependencies. If a partial dependency
exists, we remove the partially dependent attribute(s) from the relation by placing them in a new relation along with
a copy of their determinant. Consider the examples given below.
Table 1 Table 2
STUD_NO COURSE_NO COURSE_NO COURSE_FEE
1 C1 C1 1000
2 C2 C2 1500
1 C4 C3 1000
4 C3 C4 2000
4 C1 C5 2000
2 C5
Note – 2NF tries to reduce the redundant data getting stored in memory. For instance, if there are 100 students
taking C1 course, we dont need to store its Fee as 1000 for all the 100 records, instead once we can store it in the
second table as the course fee for C1 is 1000.
In this case, we can see that the relation R has a composite candidate key {A,B} as AB->C. Therefore, A and B
together uniquely determine the value of C. Similarly, BC -> D shows that B and C together uniquely determine the
value of D.
The relation R is already in 1NF because it does not have any repeating groups or nested relations.
However, we can see that the non-prime attribute D is functionally dependent on only part of a candidate key, BC.
This violates the 2NF condition.
A relation is in the third normal form, if there is no transitive dependency for non-prime attributes as well as it is in
the second normal form. A relation is in 3NF if at least one of the following conditions holds in every non-trivial
function dependency X –> Y.
X is a super key.
In other words,
A relation that is in First and Second Normal Form and in which no non-primary-key attribute is transitively
dependent on the primary key, then it is in Third Normal Form (3NF).
UNIT 5: Functional Dependencies & Normalization
Example 1:
FD set: {STUD_NO -> STUD_NAME, STUD_NO -> STUD_STATE, STUD_STATE -> STUD_COUNTRY, STUD_NO ->
STUD_AGE} Candidate Key: {STUD_NO} For this relation in table 4, STUD_NO -> STUD_STATE and STUD_STATE ->
STUD_COUNTRY are true. So STUD_COUNTRY is transitively dependent on STUD_NO. It violates the third normal
form. To convert it in third normal form, we will decompose the relation STUDENT (STUD_NO, STUD_NAME,
STUD_PHONE, STUD_STATE, STUD_COUNTRY_STUD_AGE) as:
Example 2:
A -> BC,
CD -> E,
B -> D,
E -> A
All possible candidate keys in above relation are {A, E, CD, BC} All attribute are on right sides of all functional
dependencies are prime.
MVD or multivalued dependency means that for a single value of attribute ‘a’ multiple values of attribute ‘b’ exist.
We write it as,
a --> --> b
It is read as a is multi-valued dependent on b. Suppose a person named Geeks is working on 2 projects Microsoft and
Oracle and has 2 hobbies namely Reading and Music. This can be expressed in a tabular format in the following way.
UNIT 5: Functional Dependencies & Normalization
Example
Project and Hobby are multivalued attributes as they have more than one value for a single person i.e., Geeks.
When one attribute in a database depends on another attribute and has many independent values, it is said to have
multivalued dependency (MVD). It supports maintaining data accuracy and managing intricate data interactions.
We can say that multivalued dependency exists if the following conditions are met.
Any attribute say a multiple define another attribute b; if any legal relation r(R), for all pairs of tuples t1 and t2 in r,
such that,
t1[a] = t2[a]
Then multivalued (MVD) dependency exists. To check the MVD in given table, we apply the conditions stated above
and we check it with the values in the given table.
UNIT 5: Functional Dependencies & Normalization
Example
t1[b] = t3[b]
And
t2[b] = t4[b]
t1[b] = t3[b] = MS
And
t2[b] = t4[b] = Oracle
t1 = t4 = Reading
And
t2 = t3 = Music
a --> --> b
UNIT 5: Functional Dependencies & Normalization
According to table we have got,
And for,
a --> --> C
We get,
Hence, we know that MVD exists in the above table and it can be stated by,
Although, 3NF is an adequate normal form for relational databases, still, this (3NF) normal form may not remove
100% redundancy because of X−>Y functional dependency if X is not a candidate key of the given relation. This can
be solved by Boyce-Codd Normal Form (BCNF).
Boyce–Codd Normal Form (BCNF) is based on functional dependencies that take into account all candidate keys in a
relation; however, BCNF also has additional constraints compared with the general definition of 3NF.
Rule 2: X should be a superkey for every functional dependency (FD) X−>Y in a given relation.
Note: To test whether a relation is in BCNF, we identify all the determinants and make sure that they are candidate
keys.
BCNF in DBMS
UNIT 5: Functional Dependencies & Normalization
You came across a similar hierarchy known as the Chomsky Normal Form in the Theory of Computation. Now,
carefully study the hierarchy above. It can be inferred that every relation in BCNF is also in 3NF. To put it another
way, a relation in 3NF need not be in BCNF. Ponder over this statement for a while.
To determine the highest normal form of a given relation R with functional dependencies, the first step is to check
whether the BCNF condition holds. If R is found to be in BCNF, it can be safely deduced that the relation is also
in 3NF, 2NF, and 1NF as the hierarchy shows. The 1NF has the least restrictive constraint – it only requires a relation
R to have atomic values in each tuple. The 2NF has a slightly more restrictive constraint.
The 3NF has a more restrictive constraint than the first two normal forms but is less restrictive than the BCNF. In this
manner, the restriction increases as we traverse down the hierarchy.
Examples
Here, we are going to discuss some basic examples which let you understand the properties of BCNF. We will discuss
multiple examples here.
Example 1
Let us consider the student database, in which data of the student are mentioned.
The table present above is not in BCNF, because as we can see that neither Stu_ID nor Stu_Course is a Super Key. As
the rules mentioned above clearly tell that for a table to be in BCNF, it must follow the property that for functional
dependency X−>Y, X must be in Super Key and here this property fails, that’s why this table is not in BCNF.
For satisfying this table in BCNF, we have to decompose it into further tables. Here is the full procedure through
which we transform this table into BCNF. Let us first divide this main table into two
tables Stu_Branch and Stu_Course Table.
UNIT 5: Functional Dependencies & Normalization
Stu_Branch Table
Stu_ID Stu_Branch
Stu_Course Table
Stu_ID Stu_Course_No
101 201
101 202
102 401
102 402
After decomposing into further tables, now it is in BCNF, as it is passing the condition of Super Key, that in functional
dependency X−>Y, X is a Super Key.
UNIT 5: Functional Dependencies & Normalization
Example 2
Find the highest normal form of a relation R(A, B, C, D, E) with FD set as:
Explanation:
Step-1: As we can see, (AC)+ ={A, C, B, E, D} but none of its subsets can determine all attributes of the
relation, So AC will be the candidate key. A or C can’t be derived from any other attribute of the relation, so
there will be only 1 candidate key {AC}.
Step-2: Prime attributes are those attributes that are part of candidate key {A, C} in this example and others
will be non-prime {B, D, E} in this example.
Step-3: The relation R is in 1st normal form as a relational DBMS does not allow multi-valued or composite
attributes.
The relation is in 2nd normal form because BC->D is in 2nd normal form (BC is not a proper subset of candidate key
AC) and AC->BE is in 2nd normal form (AC is candidate key) and B->E is in 2nd normal form (B is not a proper subset
of candidate key AC).
The relation is not in 3rd normal form because in BC->D (neither BC is a super key nor D is a prime attribute) and in
B->E (neither B is a super key nor E is a prime attribute) but to satisfy 3rd normal for, either LHS of an FD should be
super key or RHS should be a prime attribute. So the highest normal form of relation will be the 2nd Normal form.
AB ->C
C ->B
AB ->B
Suppose, it is known that the only candidate key of R is AB. A careful observation is required to conclude that the
above dependency is a Transitive Dependency as the prime attribute B transitively depends on the key AB through C.
Now, the first and the third FD are in BCNF as they both contain the candidate key (or simply KEY) on their left sides.
The second dependency, however, is not in BCNF but is definitely in 3NF due to the presence of the prime attribute
on the right side. So, the highest normal form of R is 3NF as all three FDs satisfy the necessary conditions to be in
3NF.
Example 3
A -> BC,
B -> A
Note: BCNF decomposition may always not be possible with dependency preserving, however, it always satisfies
the lossless join condition. For example, relation R (V, W, X, Y, Z), with functional dependencies:
V, W -> X
Y, Z -> X
W -> Y
UNIT 5: Functional Dependencies & Normalization
Introduction of 4th and 5th Normal Form in DBMS
Two of the highest levels of database normalization are the fourth normal form (4NF) and the fifth normal form
(5NF). Multivalued dependencies are handled by 4NF, whereas join dependencies are handled by 5NF.
If two or more independent relations are kept in a single relation or we can say multivalue dependency occurs when
the presence of one or more rows in a table implies the presence of one or more other rows in that same table. Put
another way, two attributes (or columns) in a table are independent of one another, but both depend on a third
attribute. A multivalued dependency always requires at least three attributes because it consists of at least two
attributes that are dependent on a third.
For a dependency A -> B, if for a single value of A, multiple values of B exist, then the table may have a multi-valued
dependency. The table should have at least 3 attributes and B and C should be independent for A ->> B multivalued
dependency.
Example:
Person->-> mobile,
Person ->-> food_likes
This is read as “person multi determines mobile” and “person multi determines food_likes.”
Note that a functional dependency is a special case of multivalued dependency. In a functional dependency X -> Y,
every x determines exactly one y, never more than one.
The Fourth Normal Form (4NF) is a level of database normalization where there are no non-trivial multivalued
dependencies other than a candidate key. It builds on the first three normal forms (1NF, 2NF, and 3NF) and
the Boyce-Codd Normal Form (BCNF). It states that, in addition to a database meeting the requirements of BCNF, it
must not contain more than one multivalued dependency.
Properties
A table with a multivalued dependency violates the normalization standard of the Fourth Normal Form (4NF)
because it creates unnecessary redundancies and can contribute to inconsistent data. To bring this up to 4NF, it is
necessary to break this information into two tables.
Example: Consider the database table of a class that has two relations R1 contains student ID(SID) and student name
(SNAME) and R2 contains course id(CID) and course name (CNAME).
UNIT 5: Functional Dependencies & Normalization
Table R1
SID SNAME
S1 A
S2 B
Table R2
CID CNAME
C1 C
C2 D
Table R1 X R2
S1 A C1 C
S1 A C2 D
S2 B C1 C
S2 B C2 D
Join Dependency
Join decomposition is a further generalization of Multivalued dependencies. If the join of R1 and R2 over C is equal to
relation R then we can say that a join dependency (JD) exists, where R1 and R2 are the decomposition R1(A, B, C) and
R2(C, D) of a given relations R (A, B, C, D). Alternatively, R1 and R2 are a lossless decomposition of R. A JD ⋈ {R1, R2,
…, Rn} is said to hold over a relation R if R1, R2, ….., Rn is a lossless-join decomposition. The *(A, B, C, D), (C, D) will
be a JD of R if the join of joins attribute is equal to the relation R. Here, *(R1, R2, R3) is used to indicate that relation
R1, R2, R3 and so on are a JD of R. Let R is a relation schema R1, R2, R3……..Rn be the decomposition of R. r( R ) is
said to satisfy join dependency if and only if
UNIT 5: Functional Dependencies & Normalization
Joint Dependency
Example:
Table R1
Company Product
C1 Pendrive
C1 mic
C2 speaker
C2 speaker
Company->->Product
Table R2
Agent Company
Aman C1
Aman C2
Mohan C1
Agent->->Company
Table R3
Agent Product
Aman Pendrive
Aman Mic
Aman speaker
UNIT 5: Functional Dependencies & Normalization
Agent Product
Mohan speaker
Agent->->Product
Table R1⋈R2⋈R3
C1 Pendrive Aman
C1 mic Aman
C2 speaker speaker
C1 speaker Aman
Agent->->Product
A relation R is in Fifth Normal Form if and only if everyone joins dependency in R is implied by the candidate keys of
R. A relation decomposed into two relations must have lossless join Property, which ensures that no spurious or
extra tuples are generated when relations are reunited through a natural join.
Properties
Example – Consider the above schema, with a case as “if a company makes a product and an agent is an agent for
that company, then he always sells that product for the company”. Under these circumstances, the ACP table is
shown as:
Table ACP
A1 PQR Nut
A1 PQR Bolt
UNIT 5: Functional Dependencies & Normalization
Agent Company Product
A1 XYZ Nut
A1 XYZ Bolt
A2 PQR Nut
The relation ACP is again decomposed into 3 relations. Now, the natural Join of all three relations will be shown as:
Table R1
Agent Company
A1 PQR
A1 XYZ
A2 PQR
Table R2
Agent Product
A1 Nut
A1 Bolt
A2 Nut
Table R3
Company Product
PQR Nut
PQR Bolt
UNIT 5: Functional Dependencies & Normalization
Company Product
XYZ Nut
XYZ Bolt
The result of the Natural Join of R1 and R3 over ‘Company’ and then the Natural Join of R13 and R2 over ‘Agent’and
‘Product’ will be Table ACP.
Hence, in this example, all the redundancies are eliminated, and the decomposition of ACP is a lossless join
decomposition. Therefore, the relation is in 5NF as it does not violate the property of lossless join.
Conclusion
Multivalued dependencies are removed by 4NF, and join dependencies are removed by 5NF.
The greatest degrees of database normalization, 4NF and 5NF, might not be required for every application.
Normalizing to 4NF and 5NF might result in more complicated database structures and slower query speed,
but it can also increase data accuracy, dependability, and simplicity.