dbms
dbms
(DBMS)
Subject Code: BCS501
Unit 1: Syllabus
• Introduction: Overview, Database System vs File System, Database System Con-
cept and Architecture.
• Data Model Schema and Instances, Data Independence, and Database Language
and Interfaces.
• Data Modeling Using the Entity Relationship Model: ER Model Concepts, Notation
for ER Diagram.
• Mapping Constraints, Keys, Concepts of Super Key, Candidate Key, Primary Key.
Introduction
Overview of Database Systems
What is Data?
Data refers to raw facts and figures without context. It can be in the form of numbers,
text, images, or other formats that are collected for reference or analysis. Data itself does not
carry any meaning until it is processed or interpreted.
Example: The number ’2024’ is data, but it does not convey any meaning until we
associate it with a year or a quantity.
1
What is a Database?
A Database is an organized collection of data that can be easily accessed, managed, and
updated. Databases store data in a structured format, using tables, records, and fields,
which allows for efficient querying and manipulation of the data.
Example: A customer database in a retail store contains information like customer names,
addresses, purchase history, and contact details.
• Data Redundancy Control: Minimizes data duplication and ensures data con-
sistency across multiple locations.
• Data Integrity: Maintains the accuracy and consistency of data over its lifecycle.
• Backup and Recovery: Provides tools to recover data in case of system failures
or data corruption.
• Data Independence: Allows changes in data structure without affecting the ap-
plication programs.
Example of DBMS:
2
Advantages of DBMS
A Database Management System (DBMS) offers several advantages:
• Data Integrity and Consistency: Ensures that data remains accurate, consis-
tent, and reliable across the database.
• Data Security: Provides robust security measures to protect data from unautho-
rized access and breaches.
• Efficient Data Access: Uses indexing, query optimization, and caching tech- niques
to enhance data retrieval and manipulation speed.
• Concurrent Access and Crash Recovery: Allows multiple users to access the
data simultaneously and ensures data recovery in case of system failures.
• Improved Data Sharing: Facilitates data sharing among multiple users or ap-
plications while maintaining data consistency and integrity.
Disadvantages of DBMS
While a DBMS provides numerous benefits, it also has some disadvantages:
3
Types of Database Users
• Database Administrators (DBAs): Responsible for managing and maintaining
the overall database environment, including user management, backup, recovery, and
security.
Example: A DBA in a large corporation might configure database servers, monitor
performance, and handle disaster recovery planning.
• Application Programmers: Developers who write application programs that
interact with the database. They use programming languages like Java, Python, or
SQL to access and manipulate data.
Example: An application programmer might create an e-commerce application that
retrieves product data from a database and displays it on a website.
• End Users: The individuals who interact with the database through applications to
perform tasks like data entry, retrieval, or reporting.
Example: A bank customer using an online portal to check their account balance
is an end user of the database.
• System Analysts: Professionals who design and develop the overall system archi-
tecture, including database design, to meet business requirements.
Example: A system analyst might work with both end-users and DBAs to create
a database schema that supports new business processes.
• Database Designers: Individuals responsible for designing the structure of the
database, including defining schemas, relationships, and constraints.
Example: A database designer might define the relationships between tables in a
hospital management system.
• Naive Users: Users who interact with the database through pre-defined applica-
tions without writing any queries or using advanced features.
Example: A cashier at a retail store using a point-of-sale system to process trans-
actions is a naive user of the database.
4
• Data Policy Development: Establishing policies, standards, and procedures for
data management, ensuring data quality, consistency, and security across the
organization.
Example: Developing guidelines for data entry to reduce errors and maintain
consistency.
• Data Standardization: Ensuring uniformity in data formats, definitions, and
representations to facilitate data integration and interoperability among different
systems.
Example: Standardizing date formats across multiple databases (e.g., using YYYY- MM-
DD).
• Data Security and Privacy: Establishing rules and protocols to protect sensitive data
from unauthorized access, breaches, and misuse, in compliance with legal and
regulatory requirements.
Example: Implementing data masking techniques for personal identifiable infor-
mation (PII).
• Data Quality Management: Monitoring and managing data accuracy, complete-
ness, consistency, and reliability to ensure high-quality data across the organization.
Example: Setting up data validation rules to detect and correct errors during data
entry.
• Data Lifecycle Management: Overseeing the complete lifecycle of data, from
creation and storage to archiving and deletion, ensuring that data is properly man-
aged throughout its lifespan.
Example: Developing retention policies that specify how long certain types of data
should be stored.
• Collaboration with IT and Business Teams: Working closely with database
administrators (DBAs), system analysts, and business users to align data manage- ment
strategies with organizational objectives.
Example: Coordinating with the IT team to ensure data backup and disaster
recovery processes are in place.
• Documentation: Maintaining comprehensive documentation of data models, stan-
dards, policies, and procedures to support data governance and compliance initia- tives.
Example: Creating a data dictionary that details data definitions, formats, and
relationships.
Data Abstraction
Data Abstraction refers to the process of hiding the complexities of the database from
the user and providing a simplified view of the data. It helps in managing the large
amounts of data stored in the database by abstracting its details, enabling users to
interact with the data without needing to understand its internal structure or storage details.
Data abstraction is achieved through three different levels:
5
• Physical Level: This is the lowest level of data abstraction, which describes how
the data is physically stored in the database. It deals with the storage of data on
storage media, such as hard drives, and the implementation details like file
organization, indexing, and data compression techniques.
Example: At this level, the data administrator might work with storage blocks, and
sectors, or manage how data is indexed in the database for quick retrieval.
• Logical Level: This level provides a higher level of abstraction and focuses on what
data is stored in the database and what the relationships are among those data.
It describes the structure of the entire database for a group of users. This level is
independent of how the data is stored physically and provides a logical view of the
data.
Example: At this level, the data might be represented using tables, columns, rows, and
relationships like one-to-one, one-to-many, or many-to-many, without concern for
physical storage details.
• View Level: This is the highest level of data abstraction and describes only a part of
the entire database. The view level simplifies the interaction for the end-users by
providing only the relevant data needed for their specific tasks or applications. It is
also used to enhance security by restricting access to certain data.
Example: A bank employee might only see the customer details relevant to their
role, like name and account balance, without access to sensitive data like Social Security
numbers.
6
Aspect Physical Level Logical Level View Level
Definition Describes how data is Describes what data is Shows only a subset of
physically stored in stored and the rela- the database that is
storage devices. tionships among those relevant to the user or
data. application.
Focus Storage structure and Overall data struc- User-specific views,
access methods. ture, schema, and re- simplifying data inter-
lationships. action.
Visibility Low-level details visi- Mid-level abstraction High-level abstraction
ble to DBAs only. visible to developers visible to end-users.
and designers.
Data Indepen- Provides low data in- Provides logical Offers external
dence dependence; changes data independence; data independence;
affect physical stor- changes do not affect changes do not affect
age. storage. internal schema.
Security Minimal impact on se- Moderate impact; fo- High impact; restricts
curity; deals with stor- cuses on logical data user access to sensitive
age details. security. data.
Example File storage formats, ER diagrams, tables, Customer view, em-
indexing, data com- relationships. ployee view, product
pression. catalog view.
Users Database Administra- Database Designers End-users and Appli-
tors (DBAs). and Developers. cation Programmers.
• 1-Tier Architecture:
In 1-tier architecture, the database is directly accessible to the user without any
intermediary application. The user directly interacts with the DBMS, which is usually
installed on their local machine. This architecture is mainly used for de- velopment
purposes, where the developer directly communicates with the database for testing
and design.
7
Aspect DBMS File System
Definition A software system that fa- A method for storing, orga-
cilitates the creation, man- nizing, and retrieving files
agement, and manipulation on a storage device.
of databases.
Data Redun- Minimizes redundancy by High redundancy due to in-
dancy using normalization tech- dependent file storage, lead-
niques. ing to data duplication.
Data Consis- Ensures data consistency Lacks mechanisms for main-
tency through integrity con- taining data consistency
straints and transactions. across multiple files.
Data Security Provides robust security Limited security features;
features, including access relies on operating system
control, encryption, and security measures.
user authentication.
Backup and Offers automated and sys- Backup and recovery pro-
Recovery tematic backup and recov- cesses are manual and less
ery processes. reliable.
Data Access Supports complex querying Limited to basic file opera-
and data manipulation us- tions (create, read, update,
ing SQL or similar lan- delete).
guages.
Concurrency Manages multiple users ac- Lacks concurrency control;
Control cessing the data simultane- file locking is often required
ously through concurrency to prevent conflicts.
control mechanisms.
Data Integrity Maintains data integrity No built-in support for en-
through constraints, trig- forcing data integrity rules.
gers, and rules.
Performance Optimized for large-scale May have performance is-
data management and com- sues with large volumes of
plex operations. data or complex operations.
Table 1: Differences Between DBMS and File System
Example: SQL*Plus, Oracle Forms, etc., where the developer interacts directly with
the database system.
• 2-Tier Architecture:
In 2-tier architecture, the DBMS system is split into two parts: the client side and
the server side. The client directly communicates with the database server. This
type of architecture is used in small to medium-sized applications where the client (user
interface) directly connects to the server (database) through an application
8
Figure 1: 1-Tier Architecture of DBMS
Example: Applications using client-server models like Microsoft Access and Fox-
Pro.
• 3-Tier Architecture:
The 3-tier architecture is the most commonly used architecture for DBMS systems.
It divides the application into three layers: the presentation layer (client), the ap-
plication layer (business logic), and the database layer (server). The client interacts with
the application server, which further communicates with the database server. This
architecture offers better security, scalability, and flexibility.
Example: Web applications where the client (browser) sends requests to the web
server (application server), which then interacts with the database server.
article graphicx amsmath
9
∗
∗
– Relational Data Model
∗ Explanation: Organizes data in tables (relations) consisting of rows and
columns. Each table has a unique key, and relationships between tables
are defined using foreign keys.
∗
– Entity-Relationship (ER) Model
∗ Explanation: Represents data using entities (objects) and relationships
between them. Widely used for conceptual modeling of databases.
∗ newline
10
–
• Object-Oriented Data Model
– Explanation: Combines object-oriented programming principles with database
management. Data is represented as objects, similar to classes in object-
oriented languages.
Data Schema
– Definition: The logical structure of the database that defines the organiza-
tion of data, such as tables, fields, and relationships.
Instances
– Definition: The actual content stored in the database at a given time. For
example, in a relational model, instances are the rows stored in a table.
11
Database Languages and Interfaces
Database languages are used to define, manipulate, control, and query the data
within a database. They include several types:
Example: SQL (Structured Query Language) is a widely used language that in-
cludes commands for DDL, DML, DCL, DQL, and VDL.
– Menu-Based Interface
12
∗ Provides a list of options or commands in a menu format.
∗ Users can navigate through different menus to execute specific database
operations.
∗ Commonly used in applications where users prefer easy and guided nav-
igation.
– Forms-Based Interface
∗ Allows users to enter data and interact with the database using forms.
∗ Suitable for data entry tasks where structured input is required.
∗ Often used in applications where non-technical users interact with the
database.
∗ Provides a visual interface with icons, buttons, and other graphical ele-
ments.
∗ Allows users to interact with the database through point-and-click ac-
tions.
∗ Commonly used in modern applications to enhance usability and user
experience.
∗ Enables users to interact with the database using natural language queries.
∗ Suitable for users who are not familiar with query languages like SQL.
∗ Relies on natural language processing (NLP) to interpret user input.
∗ Provides advanced tools and options for managing the database system.
∗ Includes functionalities like performance monitoring, backup, and secu-
rity management.
∗ Tailored for experienced users who need to maintain and optimize the
database.
13
Overall Database Structure
– Storage Manager:
– Query Processor:
– Transaction Manager:
∗ Ensures that all database transactions are processed reliably and adhere
to ACID properties (Atomicity, Consistency, Isolation, Durability).
∗ Manages transaction logs, concurrency control, and recovery processes.
∗ Includes components like the lock manager and log manager.
– Buffer Manager:
∗ Manages the buffer pool in main memory to reduce disk I/O operations.
∗ Decides which data pages to cache in memory and which to flush back
to disk.
∗ Ensures efficient data retrieval by optimizing access patterns.
– Index Manager:
14
– Metadata Manager:
– Recovery Manager:
15
– Data Modeling Using the Entity-Relationship (ER) Model:
Purpose of an ER Diagram:
16
– Mapping Constraints and Keys:
∗ Mapping Constraints: Define how entities are associated with one
another in a database. They specify the cardinality of relationships, which
determines the number of occurrences of one entity that can be associated
with occurrences of another entity. The types include:
· One-to-One (1:1): Each entity in the first entity set is associated
with at most one entity in the second entity set, and vice versa.
· Many-to-Many (M:N): Entities in the first entity set can be associ- ated
with multiple entities in the second entity set, and entities in the
second entity set can be associated with multiple entities in the first
entity set.
Image:
17
Email can be a super key if each combination uniquely identifies a
student.
· Candidate Key: A minimal super key, meaning it contains no
redundant attributes. Each candidate key can uniquely identify an
entity without any unnecessary attributes.
Example: In the same student database, StudentID alone can be a
candidate key if it is sufficient to uniquely identify each student.
· Primary Key: A candidate key chosen by the database designer to
uniquely identify each entity within an entity set. It must be unique
and not null.
Example: StudentID might be chosen as the primary key in the student
database because it uniquely identifies each student and is not null.
· Composite Key: A key that consists of two or more attributes
that together uniquely identify an entity.
Example: In an enrollment database, a combination of StudentID
and CourseID might be used as a composite key to uniquely identify
each enrollment record.
· Alternate Key: A candidate key that was not chosen as the pri- mary
key but can still uniquely identify an entity.
Example: In the student database, Email might be an alternate key
if StudentID is chosen as the primary key.
· Foreign Key: An attribute or set of attributes in one table that refers
to the primary key in another table. It establishes a link between the
two tables.
Example: In an enrollment database, CourseID in the Enrollment
table might be a foreign key that references the CourseID primary
key in the Course table.
article geometry a4paper, margin=1in graphicx array
Table 2: Differences Between Super Key, Candidate Key, and Primary Key
Key Type Super Key
Definition A set of one or more attributes that can uniquely identify an entity.
Uniqueness Must uniquely identify each entity, but can include extra attributes.
Redundancy May contain redundant attributes.
Key Type Candidate Key
Definition A minimal super key, meaning it has no redundant attributes.
Uniqueness Uniquely identifies each entity without any unnecessary attributes.
Redundancy No redundancy; minimal set of attributes needed for uniqueness.
18
∗ Generalization, Aggregation, and Reduction of an ER Diagram
to Tables:
· Generalization: A process of extracting common characteristics
from multiple entities and creating a generalized entity that repre-
sents these shared characteristics. This is often used in hierarchical
data modeling.
19
Specialization
Definition
Specialization is a process where a general entity is divided into more
specific sub-entities or subclasses. Each subclass inherits attributes and
relationships from the general entity but may also have addi- tional
attributes or relationships. article geometry a4paper, mar-
gin=1in array
20
Aspect Generalization Specialization Aggregation
Definition The process of extract- ingThe process of defining a newA process of creating a
common characteristicssubclass from an exist- inghigher-level abstraction by
from multiple entities andclass to capture more specificcombining several entities
combining them into a gen-characteristics. into a single entity.
eralized entity.
Purpose To simplify and unify sim-To refine and categorize aTo simplify complex ER di-
ilar entities by identifyinggeneral entity into moreagrams by grouping related
common attributes. specific sub-entities. entities into a single ab-
straction.
Direction From specific entities to aFrom a generalized entity toCombining multiple entities
generalized entity (top-more specific sub-entitiesinto a higher-level entity
down). (bottom-up). (horizontal aggregation).
Use Case Useful when multiple en-Useful when an entity needsUseful when representing
tities share common at-to be divided into morecomplex relationships be-
tributes or relationships. detailed types to representtween entities in a simplified
specific characteristics. manner.
ER Diagram Generalization is often rep-Specialization is repre-Aggregation is represented
Representa- resented with a trianglesented with a hierarchy,by a diamond or an oval
tion pointing to a single general-where a general entity is atencompassing multiple enti-
ized entity. the top and specific sub-ties to show a higher-level
entities below. relationship.
21
8. If the relationship has attributes, these attributes become addi-
tional columns in the relationship table.
9. Handle Primary and Foreign Keys:
10. Primary keys in entity tables are used to establish relationships with
other tables.
11. Foreign keys are added to tables to represent the relationships
between entities. They are columns that reference the primary
keys of other tables.
12. Ensure that referential integrity is maintained, meaning that for-
eign keys must correspond to valid primary keys in the referenced
tables.
13. Convert Multi-Valued Attributes:
14. Multi-valued attributes (attributes that can have multiple values
for a single entity) are handled by creating a new table.
15. The new table includes a foreign key that references the primary
key of the entity and a column for the multi-valued attribute.
16. This table essentially captures the one-to-many relationship be-
tween the original entity and the multi-valued attribute.
17. Convert Weak Entities:
18. Weak entities (entities that do not have a sufficient primary key on
their own) are handled by creating a table that includes the primary
key of the strong entity it depends on.
19. The table includes the partial key of the weak entity along with the
foreign key from the strong entity.
20. The combination of the strong entity’s primary key and the weak
entity’s partial key forms the primary key of the weak entity’s
table.
21. Normalize the Tables:
22. After creating the tables, normalize them to eliminate redun- dancy
and ensure data integrity.
23. Apply normalization rules (1NF, 2NF, 3NF) to ensure that the
tables are free from anomalies and that the relationships between
tables are accurately represented.
22
· Extended ER Model (EER): Extends the original ER model by
adding more modeling constructs, such as specialization, gener-
alization, categorization, and inheritance, to better represent more
complex database designs.
article geometry a4paper, margin=1in
23
Use Case Diagram
Timing Diagram
Sequence Diagram
24
Class Diagram
25
Subject Name: DATABASE MANAGEMENT
SYSTEM
Subject Code: BCS501
UNIT 2
UNIT 2 SYLLABUS
• Relational Data Model Concepts
• Relational Algebra
• Aggregate Functions
1
• Insert, Update, and Delete Operations
2
1 Relational Data Model Concepts
The relational data model is one of the most popular models for organizing data in
databases. In this model, data is structured into relations, which are conceptually repre-
sented as tables. Each table consists of rows (also known as tuples) and columns (called
attributes). The relational model was proposed by E. F. Codd in 1970 and forms the
foundation of relational databases such as MySQL, PostgreSQL, and Oracle.
Definition 1: A relation is a two-dimensional table with the following characteristics:
• All values in a column come from the same domain (a predefined set of values, such
as integers, characters, etc.).
Definition 2: In the relational data model, relationships between different data entities
are represented through foreign keys and primary keys, ensuring data consistency and
referential integrity.
Key Components of the Relational Model:
• Foreign Key: An attribute in one table that refers to the primary key of another
table, establishing a relationship between the two.
Example 1:
Consider a table ‘Students‘ that stores basic information about students in a college.
The table contains the following attributes: ‘StudentID‘, ‘Name‘, and ‘Age‘.
StudentID Name Age
101 Shyam 20
102 Ram 21
103 Sita 19
104 Radha 22
105 Mohan 20
Explanation:
• Each row in this table represents a tuple (or record) of a student.
• The columns represent the attributes of the students, such as ‘StudentID‘, ‘Name‘,
and ‘Age‘.
• The StudentID is a unique identifier for each student, which can be considered as
the **primary key.
3
• The Age attribute is restricted to a specific domain (positive integers).
Example 2:
Now consider a second table called ‘Courses‘, which lists the courses taken by the
students:
CourseID StudentID CourseName
501 101 Database Systems
502 102 Operating Systems
503 101 Data Structures
504 104 Algorithms
505 103 Computer Networks
Explanation:
• The ‘Courses‘ table has three attributes: ‘CourseID‘, ‘StudentID‘, and ‘Course-
Name‘.
• The ‘StudentID‘ here is a foreign key that references the ‘StudentID‘ in the ‘Stu- dents‘
table. This establishes a relationship between the ‘Students‘ and ‘Courses‘ tables.
• For example, the record with ‘StudentID = 101‘ in the ‘Courses‘ table indicates that
student Shyam (from the ‘Students‘ table) is enrolled in the courses ”Database
Systems” and ”Data Structures.”
Relationship Between Tables: The two tables (‘Students‘ and ‘Courses‘) are linked via
the ‘StudentID‘ column. The use of foreign keys allows for the establishment of relationships
across multiple tables without duplicating data. This concept of referential integrity ensures
that the relationships between tables remain consistent.
The relational model enables you to efficiently query and manipulate data. For
instance, you can retrieve all the courses taken by a specific student (e.g., Shyam) using
the ‘StudentID‘ as the common link between the ‘Students‘ and ‘Courses‘ tables.
This query fetches all the courses taken by Shyam from both the ‘Courses‘ and ‘Stu-
dents‘ tables by leveraging the relational model’s linking feature.
4
2 Constraints and Their Types
In relational databases, constraints are rules applied to the data to maintain accuracy,
consistency, and reliability. There are several types of constraints:
• Key Constraints:
• Domain Constraints: Ensures that all values in a column fall within a specified
domain (set of acceptable values).
3 Integrity Constraints
Integrity constraints are essential for ensuring that the data in the database is accu- rate
and remains consistent across operations. There are three main types of integrity constraints:
Entity Integrity: Ensures that the primary key of a table is unique and does not contain
null values. Every row must have a unique identifier.
Referential Integrity: Ensures that foreign keys in a table accurately reference valid
primary keys in another table, ensuring consistent relationships between tables.
Domain Integrity: Ensures that the values entered into a column are valid according to
the domain constraint set for that column (e.g., a column for age can only accept integer
values between 0 and 100).
Key Integrity: Ensures that the key constraints (primary key and foreign key) are always
maintained.
Unique Integrity: Ensures that specific columns marked as unique will have distinct
values in every row.
Null Integrity: Ensures that the columns marked as NOT NULL will always contain a
value.
Check Integrity: Ensures that the CHECK constraints on a column will restrict the data
to a specific condition (e.g., salary ¿ 0).
5
4 Example of Integrity Constraints
To understand integrity constraints, let’s consider the following example of two tables,
Students and Courses:
Students Table:
StudentID Name Age
101 Ram 20
102 Shyam 21
Courses Table:
CourseID StudentID CourseName
501 101 Database Systems
502 102 Operating Systems
Entity Integrity Constraint: In the Students table, the primary key is ‘StudentID‘.
According to the entity integrity constraint: - No ‘StudentID‘ can be null. - Every
‘StudentID‘ must be unique.
Thus, the table ensures that each student is uniquely identifiable, and no student can
have a missing or duplicate ‘StudentID‘.
Referential Integrity Constraint: In the Courses table, ‘StudentID‘ is a foreign key
that references the ‘StudentID‘ in the Students table. The referential integrity con- straint
ensures that: - Every ‘StudentID‘ in the Courses table must exist in the Students table.
For example, if we try to add a course with ‘StudentID = 103‘, it would violate the
referential integrity constraint because no student with ‘StudentID = 103‘ exists in the
Students table.
Domain Constraint: A domain constraint specifies the permissible values for a
column. For instance: - The ‘Age‘ column in the Students table should only accept positive
integer values.
If we try to enter a value like ‘Age = -5‘, it would violate the domain constraint.
• Primary Key: In the Students table, the ‘StudentID‘ is the primary key, ensuring
that each student has a unique identifier.
• Foreign Key: In the Courses table, the ‘StudentID‘ acts as a foreign key refer- encing
the primary key in the Students table, maintaining a relationship between the two
tables.
Domain Constraints: These constraints define the valid set of values for a column.
For example:
• The ‘Age‘ column in the Students table has a domain constraint that limits values to
integers between 18 and 25.
6
6 Relational Algebra
Relational algebra is a procedural query language that operates on relations (tables). It
provides a set of operations for manipulating relations to retrieve desired data. The basic
operations include:
• Set Difference (−): Retrieves tuples that are in one relation but not in another.
Example:
πssn(Student) − πssn(Registered)
This returns the SSNs of students who have not registered for any courses.
• Cartesian Product (×): Combines each row of the first relation with every row
of the second relation.
Example:
Student × Course
This gives all possible combinations of students and courses, resulting in a large
relation with all attributes of both relations.
7
ssn code
Registered (ssn, code) 101 CS320
102 CS150
πname(σcode = ’CS320’(Registered ⋊
⋉ Student))
πtitle(σname = ’Ram’(Registered ⋊
⋉ Student ⋊
⋉ Course))
πlecturer(σcode = ’CS150’(Subject))
πlecturer(Subject) ∩ πlecturer(Subject)
πname(σcode = ’CS150’(Registered ⋊
⋉ Student))∩πname(σcode = ’CS307’(Registered ⋊
⋉ Student))
πname(σcode = ’CS150’(Registered ⋊
⋉ Student))∩πname(σcode = ’CS1200’(Registered ⋊
⋉ Student))
• Natural Join ( ⋊
⋉ ) : Combines two relations by matching columns that have the
same name in both tables.
Example:
Student ⋊
⋉ Registered
This will join the two relations on the common attribute ssn.
8
• Division (÷): Used to find tuples in one relation that are related to all tuples in
another relation.
Example:
R(A, B) ÷ S(B)
This returns all values of A for which every B in S exists in R.
7 Relational Calculus
Relational calculus is a non-procedural query language. Instead of specifying how to
retrieve data, you specify what data to retrieve. There are two types: Tuple Relational
Calculus (TRC) and Domain Relational Calculus (DRC).
• Non-Procedural: Unlike procedural languages, relational calculus does not in- volve
specifying the steps to execute queries. Instead, it uses logical expressions to define
the data requirements.
• Logical Formulas: Queries in relational calculus are expressed using logical for-
mulas involving variables and predicates. The result is a set of tuples or values that
satisfy the formula.
• Tuple and Domain Variables: There are two types of variables in relational
calculus:
– Tuple Variables: Represent entire tuples from relations (used in Tuple Re-
lational Calculus).
– Domain Variables: Represent individual attribute values from the domains
of relations (used in Domain Relational Calculus).
• Logical Connectives: Uses logical connectives such as AND (∧), OR (∨), and
NOT (¬) to combine conditions.
• Quantifiers: Uses quantifiers such as EXISTS (∃) and FOR ALL (∀) to specify
the presence or absence of tuples or values that meet the criteria.
• Query Result: The result of a relational calculus query is a set of tuples or values
that meet the specified conditions.
9
7.2 Tuple Relational Calculus (TRC)
In Tuple Relational Calculus (TRC), variables represent tuples from relations. Queries
are expressed as logical formulas where variables are tuples and the formula defines the
condition that tuples must satisfy.
Characteristics of TRC:
• Queries are expressed in the form of logical formulas involving tuple variables.
• The result of a TRC query is a set of tuples that satisfy the given formula.
General Form:
{T | P(T )}
Where T is a tuple variable, and P(T ) is a predicate or condition that must be true for
T.
Example Queries in TRC:
This query retrieves the names of students for whom there exists a registration
tuple where the student’s SSN matches and the course code is CS320.
• Find the names of students who are not enrolled in any course:
This query retrieves the names of students who do not have any corresponding
entries in the ‘Registered‘ relation.
This query finds SSNs of students who are registered for every course listed in the
‘Course‘ relation.
• The result of a DRC query is a set of values that satisfy the given formula.
10
General Form:
{v | P(v)}
Where v is a domain variable and P(v) is a predicate that must be true for v.
Example Queries in DRC:
This query retrieves all names where there exists a student with a matching SSN
and a corresponding registration for course CS320.
• Find the SSNs of students who are enrolled in at least one course:
This query retrieves SSNs of students who have at least one entry in the ‘Registered‘
relation.
This query finds all course codes where the lecturer is Hector.
SQL Introduction
SQL (Structured Query Language) is a standardized programming language used to man-
age and manipulate relational databases. It is the primary tool for interacting with data
stored in relational database management systems (RDBMS).
Definition of SQL:
• SQL is used to create, read, update, and delete data from a database.
• It allows users to define the structure of data and establish relationships between
data sets.
• SQL is a declarative language, meaning users specify what they want, not how to
get it.
• It follows the ANSI (American National Standards Institute) and ISO (International
Organization for Standardization) standards.
• Data Definition Language (DDL): These commands are used to define and
modify the structure of a database. Examples include:
11
– DROP: To delete databases, tables, or other objects.
– TRUNCATE: To remove all records from a table, but keep its structure.
Characteristics of SQL:
• SQL is a declarative language, so it focuses on the what rather than the how.
• It supports both Data Definition Language (DDL) and Data Manipulation Lan-
guage (DML) operations.
• SQL is versatile and can be used with different database management systems like
MySQL, PostgreSQL, Oracle, SQL Server, etc.
Advantages of SQL:
• Simplicity: SQL commands are easy to learn and use, even for beginners.
Disadvantages of SQL:
• Complexity in Advanced Queries: While simple queries are easy, more complex
queries (e.g., involving multiple joins or subqueries) can be difficult to construct.
• Limited Control: SQL abstracts the process of retrieving and manipulating data,
giving users limited control over the execution of the queries.
• Overhead: SQL operations, especially on very large datasets, can introduce over-
head and may require optimization techniques.
12
SQL Data Types and Literals
SQL supports various data types that allow users to define the kind of data that can be
stored in a database table. These data types ensure data integrity and optimize storage.
SQL supports the following data types:
SQL Literals
SQL literals are constant values that are used in SQL queries for comparison, insertion,
and manipulation of data. They represent fixed values within the SQL statement.
Types of Literals:
• Bit String Literals: These represent binary data (bits) and are usually prefixed
with a B. Example: B’10101’.
• Exact Numeric Literals: Represent numbers without fractional parts. These can
be integers or decimals. Example: 42, 123.45.
13
SQL Commands:
SQL commands are broadly classified into different categories based on their functionality:
• DDL (Data Definition Language): These commands are used to define or alter
the structure of database objects.
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
• ALTER TABLE: Modifies the structure of an existing table (e.g., adding a col-
umn).
article amsmath
14
SQL Operators and Procedures
SQL supports various operators such as arithmetic, comparison, and logical operators.
Example:
• − (Subtraction)
• ∗ (Multiplication)
• / (Division)
• % (Modulo)
15
Unary Operators
• + (Positive)
• - (Negative)
Binary Operators
• + (Addition)
• - (Subtraction)
• * (Multiplication)
• / (Division)
Comparison Operators
Operator Description
= Equal to
!= Not equal to
> Greater than
< Less than
<= Less than or equal to
>= Greater than or equal to
Logical Operators
• AND
• OR
• NOT
Set Operators
• UNION
• INTERSECT
• MINUS
Operator Precedence
• Parentheses ()
• Unary operators +, -
• Arithmetic operators *, /, %
16
• Logical operators AND, OR, NOT
Projection
Projection () is used to select specific columns from a table. It is similar to selecting
columns in SQL.
Example:
Set Difference
Set Difference () finds rows present in one table but not in another. It is equivalent to
SQL’s EXCEPT clause.
Example:
SELECT Name FROM Employees WHERE Age < 35 EXCEPT SELECT Name FROM
Employees WHERE Name = ’Ram’;
17
EmployeeID Name Age
1 Radha 28
Table: Employees
2 Shyam 35
3 Ram 40
Result:
Cartesian Product
Cartesian Product (×) combines each row from one table with each row from another
table. It is similar to a SQL join without a condition.
Example:
Table: Departments
Rename
Rename () is used to rename a table or columns. It is used in SQL with the AS keyword.
Example:
18
article amsmath booktabs
Views
A view is a virtual table based on the result of a SQL query. It does not store data itself
but provides a way to simplify complex queries or present data in a particular format.
Syntax:
CREATE VIEW ViewName AS SELECT Column1, Column2 FROM TableName WHERE
Condition;
Example:
CREATE VIEW StudentView AS SELECT Name, Age FROM Students WHERE Age
> 21;
StudentID Name Age
1 Radha 20
Table: Students
2 Shyam 22
3 Ram 23
Name Age
Resulting View: StudentView Shyam 22
Ram 23
Indexes
Indexes are used to improve the speed of data retrieval operations on a database table. They
create an internal structure that allows the database to find data quickly without scanning
the entire table.
Syntax:
CREATE INDEX IndexName ON TableName (ColumnName);
Example:
CREATE INDEX idxstudentnameONStudents(Name);
19
Subqueries and Joins
Subqueries can be used to retrieve data from one table based on the results of a query from
another table. Here, we will use subqueries to work with two tables: Students and Marks.
Tables:
StudentID Name Age
1 Radha 20
Students
2 Shyam 22
3 Ram 23
StudentID Subject Marks
1 Math 85
Marks 1 Science 90
2 Math 78
3 Science 88
Syntax for Subquery with Join:
SELECT Name, Marks FROM Students JOIN Marks ON Students.StudentID
= Marks.StudentID WHERE Marks.Subject = ’Math’;
Name Marks
Resulting Data: Radha 85
Shyam 78
article amsmath booktabs enumitem graphicx
SQL Operations and Database Management Your Name September 17, 2024
Views
A view is a virtual table derived from one or more tables. It does not store data itself
but provides a way to simplify complex queries.
Syntax to Create a View:
CREATE VIEW ViewName AS SELECT Column1, Column2 FROM TableName WHERE
Condition;
Example:
CREATE VIEW StudentAges AS SELECT Name, Age FROM Students WHERE Age
> 21;
20
Indexes
Indexes improve the speed of data retrieval operations on a table. They are created on
columns that are frequently used in queries.
Syntax to Create an Index:
Example:
Aggregate Functions
Aggregate Functions perform calculations on a set of values and return a single result.
Example:
Table: Marks
StudentID Subject Marks
1 Math 85
1 Science 90
2 Math 78
3 Science 88
Output:
341
Example:
21
Table: Students
StudentID Name Age
1 Radha 20
2 Shyam 22
2 Shyam 22
3 Ram 23
Output:
Shyam
Union
Purpose: Combine results from two or more SELECT statements, removing duplicates.
Syntax:
SELECT Column1 FROM Table1 UNION SELECT Column1 FROM Table2;
Example:
Table: Students
StudentID Name Age
1 Radha 20
2 Shyam 22
Table: Teachers
TeacherID Name Subject
1 Priya Math
2 Raj Science
Output:
Shyam
Grouping
Purpose: Group rows that have the same values into summary rows.
Syntax:
SELECT Column1, COUNT(*) FROM TableName GROUP BY Column1;
Example:
22
SELECT Subject, COUNT(*) AS NumberOfStudents FROM Marks GROUP BY
Subject;
Table: Marks
StudentID Subject Marks
1 Math 85
1 Science 90
2 Math 78
3 Science 88
Output:
Subject NumberOfStudents
Math 2
Science 2
Group By Clause
Purpose: Used with aggregate functions to group the result set by one or more columns.
Syntax:
Example:
Output:
Subject AverageMarks
Math 81.5
Science 89
23
Database Modifications
Commands to Modify Database:
EmployeeID Name
1 Alice
2 Bob
3 Carol
Table 1: Employee Table
EmployeeID Department
1 HR
2 IT
4 Finance
Table 2: Department Table
Example Query:
24
SELECT Employees.Name, Departments.Department FROM Employees INNER
JOIN Departments ON Employees.EmployeeID = Departments.EmployeeID;
Output:
Name Department
Alice HR
Bob IT
Table 3: Inner Join Result
Outer Join
Purpose: Retrieve all rows from one table and matched rows from another table.
Left Outer Join Syntax:
Example Query:
Output:
Name Department
Alice HR
Bob IT
Carol NULL
• Minus: Returns rows present in the first query but not in the second.
25
Types of Join Operations
Inner Join
Purpose: Retrieve rows with matching values in both tables.
Syntax:
SELECT * FROM Table1 INNER JOIN Table2 ON Table1.Column = Table2.Column;
Example Tables:
EmployeeID Name
1 Alice
2 Bob
3 Carol
Table 5: Employee Table
EmployeeID Department
1 HR
2 IT
4 Finance
Table 6: Department Table
Example Query:
SELECT Employees.Name, Departments.Department FROM Employees INNER
JOIN Departments ON Employees.EmployeeID = Departments.EmployeeID;
Output:
Name Department
Alice HR
Bob IT
Table 7: Inner Join Result
Outer Join
Purpose: Retrieve all rows from one table and matched rows from another table.
Left Outer Join Syntax:
SELECT * FROM Table1 LEFT OUTER JOIN Table2 ON Table1.Column = Table2.Column;
Right Outer Join Syntax:
SELECT * FROM Table1 RIGHT OUTER JOIN Table2 ON Table1.Column = Table2.Column;
Example Query:
SELECT Employees.Name, Departments.Department FROM Employees LEFT
OUTER JOIN Departments ON Employees.EmployeeID = Departments.EmployeeID;
Output:
26
Name Department
Alice HR
Bob IT
Carol NULL
Example:
Example:
Minus Purpose: Returns rows present in the first query but not in the second.
Syntax:
SELECT Column1 FROM Table1 MINUS SELECT Column1 FROM Table2;
Example:
27
Cursors, Triggers, and Procedures in SQL/PL-SQL
Cursors
Purpose: Cursors provide a mechanism to handle a set of rows returned by a query.
They are useful for processing each row individually in a set of rows.
Syntax:
DECLARE cursor name CURSOR FOR SELECT statement;
Example:
DECLARE emp cursor CURSOR FOR SELECT Name FROM Employees;
Steps to Use a Cursor:
1. Declare Cursor: Define the cursor with the query.
28
Triggers
Purpose: Triggers are special types of stored procedures that automatically execute
SQL code in response to certain events on a table.
Data Manipulation Triggers: These triggers execute in response to ‘INSERT‘,
‘UPDATE‘, or ‘DELETE‘ operations on a table.
Types of Data Manipulation Triggers:
DDL Triggers: These triggers respond to Data Definition Language events such as
‘CREATE‘, ‘ALTER‘, and ‘DROP‘.
Syntax for DDL Trigger:
Logon Trigger: These triggers execute when a user connects to the database.
Syntax for Logon Trigger:
29
CREATE OR REPLACE TRIGGER logon trigger
AFTER LOGON ON DATABASE
BEGIN
-- SQL statements
END;
CLR (Common Language Runtime) Trigger: Used for managing SQL Server
triggers written in .NET languages.
Procedures
Purpose: Procedures are reusable SQL code blocks that can be executed with a single
call. They encapsulate logic and can perform operations like data manipulation and
complex calculations.
Syntax:
CREATE PROCEDURE procedure name
AS
BEGIN
-- SQL statements
END;
Example Procedure:
Embedded SQL
Purpose: Embedded SQL allows the integration of SQL statements within a
programming language. It provides a way to interact with a database from within a
host language.
Syntax:
EXEC SQL SQL STATEMENT;
Example in C:
EXEC SQL SELECT Name INTO :name FROM Employees WHERE EmployeeID =
:empID;
Dynamic SQL
Purpose: Dynamic SQL allows the execution of SQL statements that are constructed
at runtime. It is useful for scenarios where SQL statements are not known until
execution time.
Syntax:
30
EXECUTE IMMEDIATE ’SQL STATEMENT’;
Example:
8 Procedures in PL/SQL
Definition: A procedure in PL/SQL is a named block of code that performs a specific
task and can be executed whenever needed. It is a reusable piece of logic that can be
invoked using a call statement. Unlike functions, procedures do not need to return a
value, although they can return data using output parameters.
Syntax:
CREATE OR REPLACE PROCEDURE procedure_name IS
BEGIN
-- SQL and PL/SQL statements
END;
Example:
CREATE OR REPLACE PROCEDURE IncreaseSalary (emp_id NUMBER, amount NUMBER) IS
BEGIN
UPDATE Employees SET Salary = Salary + amount WHERE EmployeeID = emp_id;
END;
Advantages of Using Procedures:
• Modularity: Procedures allow for code to be divided into smaller, manageable pieces,
promoting code reuse and better organization.
• Performance: Since procedures are stored in the database, they can be compiled once
and executed many times, resulting in faster execution.
• Security: Procedures can control user access by limiting the scope of database
operations they can perform.
Disadvantages of Using Procedures:
• Complex Debugging: Debugging procedures can be more complex than debug- ging
regular code, especially when procedures call other procedures or involve dy- namic
SQL.
31
• Ensuring data consistency.
• Database normalization.
1
Constraints on Tuples in Functional Dependencies
Let R be a relation and consider two tuples t1 and t2 in R. If a functional dependency α → γ
holds, then:
• If t1[α] = t2[α], then it must be true that t1[γ] = t2[γ].
we will compute the canonical cover step by step:
Step 1: Decompose Functional Dependencies
In this case, the dependencies A → B and B → C are already decomposed, so no further
decomposition is needed.
Step 2: Remove Redundant Dependencies
We now check for any redundant dependencies. Here, we can derive A → C using the transi-
tivity rule:
A → B and B → C =⇒ A → C
Thus, the canonical cover is:
4
2
Problem: Convert Functional Dependency Set into Minimal Cover
Given a relation R(A, B, C) and a functional dependency set F = {A → B, B → C, AB →
B, AB → C, AC → B}, convert this into a minimal cover step by step:
Step 1: Simplify Right-Hand Side
We break down the functional dependencies with multiple attributes on the right-hand side:
7
EmpID EmpName PhoneNumber
1 John 1234567890
1 John 9876543210
8
Second Normal Form (2NF)
A relation is in 2NF if it is in 1NF and every non-prime attribute is fully functionally dependent
on the primary key.
Example:
Consider a relation with attributes (EmpID, ProjectID, HoursWorked). If EmpID ProjectID
possibility by ensuring that all functional dependencies have a superkey on the left-hand side.
Example:
Consider a relation with the following functional dependencies:
9
FD2: Course → Instructor
Here, StudentID is a superkey, but Course is not. Since Course is determining Instructor,
this violates BCNF because Course is not a superkey. To achieve BCNF, we split the relation into
two tables:
Table 1: (StudentID, Course)
B → F =⇒ AB = {A, B, C, D, E, F }
F → GH =⇒ AB+ = {A, B, C, D, E, F, G, H}
D → IJ =⇒ AB+ = {A, B, C, D, E, F, G, H, I, J}
Thus, AB+ = {A, B, C, D, E, F, G, H, I, J} = R, so AB is a candidate key.
1
0
Closure of A:
A+ = {A}
A → DE =⇒ A+ = {A, D, E}
D → IJ =⇒ A+ = {A, D, E, I, J}
1
1
S.No. 3NF (Third Normal BCNF (Boyce-Codd
Form) Normal Form)
1. A relation is in 3NF if it is
in 2NF and no transitive de- A relation is in BCNF if it is
in 3NF and for every func-
• A relation is in Boyce-Codd Normal Form (BCNF) if, for every functional depen-
dency X → Y , X is a superkey.
Proof:
Let us consider the conditions required by 3NF and BCNF:
9
• In 3NF, a functional dependency X → Y is allowed if Y is a prime attribute, even if X
is not a superkey.
• In BCNF, the only condition that is allowed for X → Y is that X must be a superkey.
Thus, BCNF is stricter because:
X+ = {X, Z}
Using Z → W :
X+ = {X, Z, W }
X+ does not cover all attributes of R, so X is not a superkey.
10
Closure of Y
Now, calculating the closure of Y :
Y + = {Y }
Using Y → Z:
+
R2 = {X, Y }
R3 = {Y, Q}
R4 = {Z, W, Q}
R5 = {X, Q}
11
Checking Pairs of Decomposed Relations
We check the pairwise intersection of the decomposed relations.
1. R1(X, W ) and R2(X, Y ):
R1 ∩ R2 = {X}
12
Multi-Valued Dependency (MVD)
A Multi-Valued Dependency occurs when one attribute determines a set of values for another
attribute, independent of other attributes. In formal terms, if a relation R has a dependency X
→→ Y , then Y is multi-valued with respect to X, meaning for every value of X, there can
FACULTY COURSE
Prof. A DBMS
Prof. A DAA
Prof. B OS
Prof. B CN
13
FACULTY COMMITTEE
Prof. A Exam
Prof. A Sports
Prof. B Placement
14
BCS501: Database Management System
Serializability of Schedules
1
BCS501: Database Management System
• Atomicity: Ensures that all operations within a transaction are completed suc-
cessfully, or none at all. If any part of the transaction fails, the entire transaction is
aborted, and the database is restored to its previous state.
• Consistency: Ensures that a transaction brings the database from one valid state to
another, maintaining defined rules, constraints, and integrity.
– Example: In a bank database, the rule is that the total balance across all
accounts must remain the same after a transaction. If $500 is transferred from
Account A to Account B, the total balance before and after the transaction
should remain the same. Consistency ensures that the sum of balances before
and after the transaction is maintained correctly.
• Isolation: Ensures that the execution of a transaction is isolated from other trans-
actions. This means that the intermediate states of a transaction are invisible to
other transactions until the transaction is completed.
• Durability: Guarantees that once a transaction is committed, its results are per-
manent, even in the case of a system failure.
2
BCS501: Database Management System
3. In the event of a failure, the recovery manager checks the log to determine the
status of the transaction. If the transaction was incomplete, it rolls back the changes
using the log entries, ensuring that partial changes do not remain.
4. The system can use the log to perform a redo or undo operation. Redo ensures
that committed changes are applied to the database, while undo ensures that changes
from uncommitted transactions are reversed.
This process ensures that the transaction is either completed successfully or entirely rolled
back, maintaining atomicity.
• Consistency ensures that all transactions maintain the integrity of the database.
• Isolation ensures that transactions do not interfere with each other, providing a
predictable environment for concurrent operations.
• Durability ensures that once a transaction is committed, it is not lost, even in the
event of system failure.
3
BCS501: Database Management System
• Active: This is the initial state of a transaction. The transaction remains in this state
while its instructions are being executed. Any errors at this stage may cause the
transaction to fail.
• Partially Committed: Once a transaction has executed its final operation but before
it is committed, it enters the partially committed state. In this state, the changes
made by the transaction are not yet permanent but are ready to be made
permanent if no failure occurs.
• Failed: If a transaction encounters an error or failure during its execution, it moves
to the failed state. In this state, the transaction cannot proceed further.
• Aborted: After a transaction has failed, the system moves it to the aborted state.
In this state, the system must rollback the transaction, undoing any changes it made
to the database, and restore the database to its previous consistent state.
• Committed: If a transaction completes successfully and all its operations are
executed without error, it moves to the committed state. In this state, the changes
made by the transaction are permanently saved to the database and are visible to
other transactions.
4
BCS501: Database Management System
• Conflict Serializability occurs when a schedule can be transformed into a serial schedule
by swapping non-conflicting operations. Two operations conflict if they access the
same data item, and at least one of them is a write operation.
• View Serializability ensures that two schedules are equivalent if they produce the
same result, even if their execution order is different. This form of serializability is
more relaxed than conflict serializability.
• Serializability is critical to prevent problems like lost updates (where one transaction
overwrites another’s changes), dirty reads (where a transaction reads uncommitted
data), and inconsistent data.
In this example, the operations r(A) in Ti and w(A) in Tj conflict because both access
the same data item A, and one is a write operation. Similarly, w(B) in Ti and r(B) in
Tj conflict for the same reason. These conflicts prevent the schedule from being conflict
serializable without further modifications.
5
BCS501: Database Management System
• Initial Reads: If a transaction reads a data item in a serial schedule, the same
transaction must read the same data item in the non-serial schedule.
• Final Writes: If a transaction writes the final value of a data item in a serial schedule,
the same transaction must perform the final write in the non-serial sched- ule.
• Intermediate Writes: Any data item written by one transaction and later read
by another in a serial schedule must also have the same write-read relationship in
the non-serial schedule.
• Strict Schedule: A strict schedule ensures that a transaction can neither read
nor write a data item until the previous transaction that wrote the data item has
committed. This prevents dirty reads and ensures easier recovery.
In this example, Tj reads A only after Ti commits, making the schedule recoverable.
6
BCS501: Database Management System
In this example, we would draw an edge from Ti to Tj for the conflicting operations
on A and B. If the resulting graph contains no cycles, the schedule is conflict serializable.
In this example, T2 reads data A written by T1. When T1 fails and rolls back, T2 must
also roll back to ensure consistency, causing a cascading rollback.
7
BCS501: Database Management System
• It simplifies the recovery process by eliminating the need for cascading rollbacks.
• Ensure that conflicting operations (e.g., read/write on the same data item) are
properly ordered to maintain serializability.
• Use locks or other concurrency control mechanisms to prevent overlapping opera- tions
that would violate serializability.
• Transactions should not read uncommitted data from other transactions to avoid
inconsistent states (ensure strict schedules).
• Commit a transaction only after ensuring that all preceding dependent transactions
have committed.
Why Prefer Serializable Schedules Over Serial Schedules? Serializable schedules
allow transactions to execute concurrently while maintaining the same outcome as
serial execution, leading to:
8
BCS501: Database Management System
9
BCS501: Database Management System
Example:
In this example, an edge is drawn from T1 to T2 for the conflict on data item A, and
another edge is drawn for the conflict on B. If the graph contains no cycles, the
schedule is conflict serializable.
– ¡Ti start¿
– ¡Ti, Xj, V1, V2¿ (indicating a transaction Ti updated data item Xj from value
V 1 to V 2)
10
BCS501: Database Management System
– ¡Ti commit¿
– ¡Ti abort¿
– Changes made by a transaction are not written to the database until the
transaction is committed.
– This ensures that if a transaction fails, no changes are made to the database.
Features:
Features:
– Changes are made in a new set of pages rather than overwriting the original
pages.
– If a transaction fails, the system can revert to the shadow page table.
– Only committed transactions update the current page table.
– This technique is efficient for read operations.
– Reduces the need for log maintenance, as changes are done in place.
– Shadow paging does not require a log if the system uses only the shadow page
table for recovery.
11
BCS501: Database Management System
1.20 Checkpointing
Checkpointing is the process of saving the state of a database at a specific point
in time to facilitate recovery in case of failure.
– Consistent Checkpointing:
∗ Involves creating a checkpoint when the database is in a consistent state.
∗ All transactions that were active at the time of the checkpoint are either
committed or aborted to ensure consistency.
∗ Reduces the recovery time by limiting the number of transactions that
need to be rolled back.
∗ Requires synchronization among transactions to maintain consistency.
– Fuzzy Checkpointing:
∗ Allows for checkpoints to be created during transaction execution, not
requiring a consistent state.
∗ Provides flexibility in managing large transactions.
∗ Easier to implement as it doesn’t require strict synchronization.
∗ May lead to a longer recovery time since some transactions might be par-
tially completed.
12
BCS501: Database Management System
– Log-Based Recovery:
∗ Advantages: Simple to implement, allows for quick recovery.
∗ Disadvantages: Log management can become complex, leading to perfor-
mance overhead.
– Shadow Paging:
∗ Advantages: Efficient recovery without extensive logging, quick access to
uncommitted data.
∗ Disadvantages: More memory usage due to duplicate pages, can be inef-
ficient for large transactions.
– Checkpoints:
∗ Advantages: Reduces recovery time, simplifies log management.
∗ Disadvantages: May impact performance during checkpoint creation, re-
quires careful management of transaction states.
2 Deadlock
A deadlock occurs in a database or concurrent system when two or more trans-
actions are unable to proceed because each is waiting for the other to release a
resource. This situation results in a standstill, where none of the transactions can
continue executing.
Deadlock prevention involves ensuring that the system will never enter a deadlock state
by following certain protocols:
– Pre-declaration Method: Transactions must declare all the resources they will
need before execution. If the resources are not available, the transaction must
wait until all required resources are free.
– Partial Ordering Method: Establishes a global order of resource acquisi- tion.
Transactions must acquire resources in a predefined order, preventing circular
wait conditions.
– Timestamp Ordering: Each transaction is given a unique timestamp. Re-
sources can only be allocated if the requesting transaction’s timestamp is
greater than that of the holding transaction.
13
BCS501: Database Management System
In deadlock detection, the system allows deadlocks to occur and then detects them
using various algorithms. The steps typically involve:
Once a deadlock is detected, the system must recover from it. Methods include:
Deadlock avoidance ensures that the system never enters a deadlock state by man-
aging how resources are allocated:
14
BCS501: Database Management System
3 Distributed Database
15
BCS501: Database Management System
16
BCS501: Database Management System
17
BCS501: Database Management System
18
BCS501: Database Management System
– Phase 1 - Prepare Phase: The coordinator sends a request to all partici- pants
to prepare the transaction. Each participant responds with a ’ready’ or ’abort’
message.
19
BCS501: Database Management System
– Phase 2 - Commit Phase: If all participants are ’ready’, the coordinator sends
a commit request. If any participant sends an ’abort’ message, the coordinator
sends an abort message to all participants, and the transaction is rolled back.
– The directory contains metadata about the distributed data, such as data
location, partitioning, replication details, and access paths.
– It provides transparency to users, hiding the complexities of data distribution
and allowing users to access data as if it were stored in a single location.
– The directory manages mapping between logical data and physical storage,
making data retrieval efficient.
– It plays a crucial role in query optimization by guiding the system to the
appropriate location of data fragments or replicas.
– The directory system helps in load balancing by distributing query loads across
different sites.
– It supports fault tolerance by allowing the system to locate alternative replicas
if the primary data source fails.
– The directory can either be centralized (with one global directory) or dis-
tributed, with each site maintaining its own directory but interconnected.
20
Database Management System (BCS501)
Syllabus:
• Concurrency Control
• Multiple Granularity
• Multi-Version Schemes
1 Concurrency Control
Concurrency control in DBMS refers to the techniques used to manage simultaneous
operations on the database, ensuring that the database remains consistent. Below are
four key points about concurrency control:
1
Why Concurrency Control is Needed
Concurrency control is necessary in DBMS for the following reasons:
• To ensure consistency in the database during simultaneous transactions.
1. Lost Update:
– Occurs when two or more transactions read the same data and update it
simultaneously.
– The final value depends on the last update, leading to a lost update by
the earlier transaction.
2. Dirty Read:
– Occurs when a transaction reads data that has been updated by another
uncommitted transaction.
– If the second transaction is rolled back, the first transaction would have
used invalid or dirty data.
Example: Consider two transactions, T 1 and T 2, both trying to update the same
record in a database. Without proper concurrency control, T 1 could overwrite changes
made by T 2, leading to inconsistent data.
• Allows multiple transactions to read the data item but prevents any updates
until all shared locks are released.
• Prevents any other transaction from accessing or modifying the data item.
2
If the simple binary locking scheme is used, every transaction must obey the following
rules:
2. A transaction cannot unlock a data item until it has completed its operations.
3. No transaction can lock a data item that is currently locked by another transaction.
• Allows concurrent transactions to read a data item but not modify it.
• Only one transaction can hold the exclusive lock, preventing others from read-
ing or writing.
If the shared/exclusive locking scheme is used, every transaction must obey the fol-
lowing rules:
3. Once an exclusive lock is acquired, no other transaction can hold any lock on that
data item until the exclusive lock is released.
5. Locks must be released only after the transaction completes to ensure data integrity.
Lock Compatibility
Lock compatibility refers to the conditions under which multiple transactions can acquire
locks on the same data item without causing conflicts. The compatibility of shared and
exclusive locks can be illustrated as follows:
3
Lock Mode Shared Lock (S-lock) Exclusive Lock (X-lock)
Shared Lock (S-lock) Yes No
Exclusive Lock (X-lock) No Yes
Table 1: Lock Compatibility Table
1. Lock Request:
2. Lock Granting:
• If the lock is compatible, the lock manager grants the lock to the transaction.
• If not compatible, the transaction may be put into a wait state until the lock
becomes available.
3. Lock Types:
4. Locking Protocols:
5. Unlock Request:
• After completing its operations, the transaction releases the lock on the data
item.
• This is also communicated to the lock manager, which updates the lock status.
6. Lock Timeout:
7. Deadlock Detection:
• The system periodically checks for deadlocks, where two or more transactions
are waiting indefinitely for locks held by each other.
• If detected, the system resolves the deadlock by aborting one of the transac-
tions.
8. Lock Upgrade/Downgrade:
4
• A transaction holding a shared lock may request to upgrade it to an exclusive
lock when it needs to write to the data item.
• Similarly, a transaction may release an exclusive lock and downgrade it to a
shared lock if it only needs to read.
Convoy
A convoy occurs when multiple transactions are delayed while waiting for locks, causing
reduced system performance. When one transaction holds a lock for an extended period,
other transactions may form a convoy, leading to inefficiencies.
5
Short Notes on Lock-Based Protocol
Lock-based protocols manage concurrent access to database resources using locks. These
protocols ensure that transactions follow strict rules for acquiring and releasing locks,
preventing conflicts and ensuring consistency.
• Growing Phase:
– In this phase, a transaction can acquire locks on data items but is not allowed
to release any locks.
– The transaction continues to obtain the necessary locks to ensure that it can
read or write the data it requires for its operations.
– This phase continues until the transaction has acquired all the locks it needs
or until it has been forced to stop due to waiting on another transaction.
– Example: If Transaction T 1 needs to read data items A, B, and C, it will
request and acquire locks on these items during the Growing Phase.
• Shrinking Phase:
– In this phase, the transaction can release the locks it has acquired but cannot
acquire any new locks.
– This phase allows for the release of locks, thereby making data items available
for other transactions to access.
– Once a transaction releases a lock, it cannot obtain any additional locks, which
helps prevent deadlocks.
– Example: After completing its operations, Transaction T 1 releases the locks
on data items A, B, and C, allowing other transactions to access these items.
6
Advantages of 2PL:
• Simple to implement and understand.
• Effective in preventing lost updates, dirty reads, and other concurrency-related
issues.
Disadvantages of 2PL:
• Potential for deadlocks if transactions are not managed properly.
• May lead to reduced concurrency since transactions may need to wait for locks.
Example:
• Transaction T 1:
– Acquire a read lock on item Y .
– Read item y.
7
Salient Features of Graph-Based Locking Protocol
Graph-based locking protocols use a directed graph to represent transactions and the
locks they hold. Each node represents a transaction, and edges represent the locks. If a
cycle is detected in the graph, it indicates a deadlock situation.
8
1. The Lost Update Problem:
9
3. The Incorrect Summary Problem:
Role of Locks
Locks play a crucial role in managing concurrent access to data in a database. They ensure
that only one transaction can modify a data item at a time, thereby preventing conflicts
and maintaining data integrity during concurrent operations.
3.1 Example
Consider two transactions, T1 and T2, arriving at times t1 and t2, respectively. If t1 < t2, T1
will be executed first, followed by T2. This establishes a clear order of execution based on
the timestamps.
10
3.2 Timestamp-based Protocols
3.2.1 Timestamps
• W-timestamp(Q): The latest write timestamp for data item Q.
• Disadvantages:
4 Short Notes on
4.1 Thomas’s Write Rule
Thomas’s Write Rule is a refinement of the basic timestamp ordering protocol. It allows
a transaction to write an item even if its timestamp is older than the current W-
timestamp of that item, provided that:
• The read timestamp (R-timestamp) is older than the writing transaction’s times-
tamp, meaning it has been read by a transaction that has already committed.
This rule helps reduce unnecessary aborts and improves concurrency by allowing more writes.
11
4.2 Strict Timestamp Ordering Protocol
Strict Timestamp Ordering Protocol requires that:
• Transactions must be executed in the order of their timestamps without any excep-
tions. This ensures that no transaction can be aborted due to conflicts arising from
timestamp comparisons.
The strictness of this protocol guarantees serializability but may lead to increased trans-
action wait times and decreased concurrency.
6 Phantom Phenomenon
The phantom phenomenon occurs when a transaction reads a set of rows that match a
certain condition, and another concurrent transaction inserts or deletes rows that affect the
result set of the first transaction, leading to inconsistent results.
12
6.1 Timestamp-Based Protocol to Avoid Phantom Phenomenon
To devise a timestamp-based protocol that avoids the phantom phenomenon, we can
implement the following steps:
1. Each transaction will acquire locks on the range of data items it intends to read or
write.
2. Before committing, a transaction will check whether any new transactions have
modified the range it accessed since it started.
3. If modifications are detected, the transaction will be aborted and must restart.
This approach ensures that transactions maintain consistency and avoid reading in-
consistent or invalid data.
Example: During the validation phase, if two transactions T 1 and T 2 conflict, one
is rolled back.
8 Multiple Granularity
Multiple granularity allows locking at different levels of granularity, such as tuples, pages,
or entire tables. It provides more flexibility by allowing transactions to lock only the
required data granularity, reducing the possibility of conflicts.
Example: If a transaction only needs to update a single record, it can lock that record
without locking the entire table.
13
8.2 How It Is Implemented in Transaction System
The implementation of multiple granularity in transaction systems is typically organized
in a tree structure, where:
This tree structure allows transactions to lock only the required nodes instead of the
entire hierarchy, improving concurrency and reducing conflicts.
14
8.4.1 How Does Granularity of Data Item Affect the Performance of Con-
currency Control?
The granularity of data items significantly affects concurrency control performance:
• Fine Granularity (e.g., locking individual records): Increases concurrency but may result
in higher overhead due to frequent locking and unlocking.
• Coarse Granularity (e.g., locking entire tables): Reduces overhead but can lead to
conflicts and decreased concurrency, as multiple transactions cannot access different
records in the same table simultaneously.
• System Load: Higher system load may benefit from finer granularity to improve
concurrency.
• Data Access Patterns: The nature of data access (read-heavy vs. write-heavy) can
guide granularity selection.
• Each data item has multiple versions, each associated with a timestamp indicating when
it was created.
• When a transaction wants to read a data item, it retrieves the version whose times-
tamp is closest but less than or equal to its own timestamp.
• When a transaction writes to a data item, it creates a new version with the current
timestamp.
15
This protocol enhances concurrency by allowing transactions to read old versions while
other transactions update data.
16
12.1 Recovery Mechanisms
12.1.1 1) Interaction with Concurrency Control
• Recovery mechanisms work in tandem with concurrency control to maintain database
consistency. They ensure that transactions are either fully completed or completely
rolled back, preventing partial updates that could lead to inconsistencies.
• Concurrency control ensures that concurrent transactions do not interfere with each
other, while recovery protocols guarantee that the effects of committed transactions
persist despite failures.
12.1.3 3) Checkpoints
• Checkpoints are predetermined points in time when the database system saves a
snapshot of its current state. This allows the recovery process to start from the last
checkpoint rather than from the very beginning, reducing recovery time.
• During recovery, the system can quickly determine which transactions were com-
mitted or rolled back since the last checkpoint, minimizing the amount of work needed
to restore the database to a consistent state.
• The process typically includes applying the effects of committed transactions and
undoing the effects of transactions that were not completed at the time of the crash,
ensuring that the database is restored to a consistent state.
17
13.1 Data Storage in Oracle RDBMS
Oracle RDBMS stores data in a structured format, ensuring efficient data retrieval and
management.
1. Data Files: These are the physical files on the disk that store the actual data. Each
tablespace in Oracle is associated with one or more data files.
2. Control Files: These files contain metadata about the database, such as the database
name, the names and locations of data files, and the current state of the database.
3. Redo Log Files: These files record all changes made to the database, ensuring that
data can be recovered in the event of a failure.
• Each data file belongs to only one tablespace and cannot be shared among multiple
tablespaces.
• They are required for the database to start up and to ensure data integrity.
3. Indexes: Data structures that improve the speed of data retrieval operations.
18
13.4 Definitions of Key Terms
13.4.1 Tablespace
1. A logical storage unit in Oracle that groups related data files.
2. It can contain multiple data files, allowing for flexible data management.
13.4.2 Package
1. A package is a collection of related PL/SQL types, variables, and subprograms.
4. They provide encapsulation and improved performance by reducing the need for
recompilation.
13.4.3 Schema
1. A schema is the structure that represents the logical view of the database.
13.5.2 SQL*Net
1. SQL*Net is a networking protocol that allows communication between Oracle database
clients and servers.
19
13.5.3 SQL Loader
1. SQL Loader is a tool for loading data from external files into Oracle tables.
2. It can handle various file formats and allows for data transformation during loading.
• Binary Locking: A simple locking mechanism that allows only two states: locked
or unlocked. It can lead to deadlocks if not managed properly.
• Exclusive Locking: A lock that prevents other transactions from accessing the
locked resource. It is useful for write operations but can lead to contention.
• Shared Locking: Allows multiple transactions to read the same resource simultane-
ously, promoting concurrency for read operations.
20