DBMS - Unit3 - Notes
DBMS - Unit3 - Notes
Unit – 3
For example:
Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee
table because if we know the Emp_Id, we can tell that employee name associated with
it.
Emp_Id → Emp_Name
Example:
Example:
1. ID → Name,
2. Name → DOB
Normalization of DBMS
Normal forms help to reduce data redundancy, increase data consistency, and
improve database performance. However, higher levels of normalization can lead
to more complex database designs and queries. It is important to strike a balance
between normalization and practicality when designing a database.
Multivalued dependencies occur when a single value in one column can have
multiple corresponding values in another column. For example, a table that lists
customer orders with a primary key of order ID and non-primary key columns for
customer ID and order items violates 4NF because order items depend on both order
ID and customer ID.
Join dependencies occur when two tables have multiple overlapping candidate
keys. For example, a table that lists customer orders and a table that lists customer
addresses both have a candidate key of customer ID. This violates 5NF because the
two tables can be joined on the customer ID column, resulting in a table with
redundant data.
Example of 4NF
Consider a table that lists student grades with the following columns:
Student ID, Course ID, and Grade.
This table violates 4NF because the Grade column is multivalued. A
student can have multiple grades for a single course, depending on the
different sections of the course they are taking.
To normalize this table to 4NF, we would need to split it into two tables:
• Student Grades: This table would have the following columns:
o Student ID
o Course ID
o Section ID
o Grade
• Course Sections: This table would have the following columns:
o Course ID
o Section ID
This would eliminate the multivalued dependency and ensure that the table
is in 4NF.
Example of 5NF
Consider a table that lists customer orders and a table that lists customer
addresses. Both tables have a candidate key of customer ID.
This table violates 5NF because the two tables can be joined on the
customer ID column, resulting in a table with redundant data.
To normalize this table to 5NF, we would need to split it into two tables:
• Customer Orders: This table would have the following columns:
o Order ID
o Customer ID
o Order Items
• Customer Addresses: This table would have the following columns:
o Customer ID
o Address
This would eliminate the join dependency and ensure that the table is in
5NF.
Conclusion
4NF and 5NF are two higher levels of database normalization that can be
used to further reduce data redundancy and improve data integrity. They
are not always necessary, but they can be useful in cases where data has
multiple dimensions, attributes, or values, or when ensuring the highest
level of data quality and consistency is important.
ACID stands for atomicity, consistency, isolation, and durability. These are
four key principles that Database Management Systems (DBMS) use to
maintain data integrity and consistency. A database operation that has
these properties is called an ACID transaction.
• Consistency
Only valid data is written to the database. For example, no transaction should negatively
affect the data in the database.
• Isolation
Running transactions independently is the core of isolation. For example, changes in one
transaction will not affect others until committed.
• Durability
The database should be able to hold all its latest updates even if the system fails or
restarts.
• View serializability
Ensures that transactions yield the same final result as a serial schedule.
Both the Two-Phase Locking (2PL) protocol and the Timestamp protocol are
popular concurrency control mechanisms used in database systems, aiming to
ensure data consistency and prevent conflicts between transactions. Let's
compare and contrast them:
1. **Two-Phase Locking (2PL) Protocol**:
- **Operation**: In 2PL, transactions acquire locks in two phases: the growing phase,
during which locks can be acquired but not released, and the shrinking phase, during which
locks can be released but not acquired.
- **Advantages**:
- Simplicity: 2PL is conceptually straightforward, making it easier to implement and
understand.
- Deadlock Prevention: By following strict lock acquisition and release rules, 2PL
inherently prevents deadlocks.
- **Limitations**:
- Strictness: 2PL can result in unnecessary blocking, where transactions wait for locks
even if they could proceed without causing conflicts.
- Deadlock Occurrence: While 2PL prevents deadlocks, it does not guarantee their
absence in all scenarios. If transactions do not follow the 2PL rules strictly, deadlocks can still
occur.
2. **Timestamp Protocol**:
- **Operation**: In the Timestamp protocol, each transaction is assigned a unique
timestamp representing its order of execution. Transactions are allowed to read or write
data only if their timestamp is compatible with the timestamps of other transactions
accessing the same data.
- **Advantages**:
- Optimistic: The Timestamp protocol is optimistic in nature, allowing transactions to
proceed without acquiring locks initially, assuming that conflicts will be rare.
- Reduced Blocking: Since transactions are not immediately blocked by locks, the
Timestamp protocol can lead to fewer instances of blocking and potentially better
throughput.
- **Limitations**:
- Increased Rollback: In case of conflicts, transactions may need to be rolled back and re-
executed, potentially leading to increased overhead.
- Timestamp Assignments: Generating unique timestamps and managing them efficiently
can be challenging, especially in distributed systems or in scenarios with high transaction
rates.
2PL offers simplicity and deadlock prevention at the cost of potential blocking and rigidity,
while the Timestamp protocol provides optimism and potentially better throughput but may
require more sophisticated mechanisms to manage timestamps and handle conflicts. The
choice between them depends on factors such as the specific requirements of the database
system, the workload characteristics, and the trade-offs that the system designers are willing
to make.
• Performance optimization
Checkpoints can help optimize database performance by reducing the amount of data that
needs to be processed during recovery.
• ACID properties
Checkpoints are crucial for maintaining the ACID properties of a database (Atomicity,
Consistency, Isolation, Durability).