Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
2 views

DBMS - Unit3 - Notes

The document discusses key concepts in database management, including functional dependencies, normalization forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), ACID properties, serializability, concurrency control protocols (2PL and Timestamp), log-based recovery, and checkpoints. It emphasizes the importance of functional dependencies in database design, the need for normalization to reduce redundancy, and the role of ACID properties in ensuring data integrity. Additionally, it highlights the significance of transaction logs and checkpoints in efficient database recovery management.

Uploaded by

Priyanshu Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

DBMS - Unit3 - Notes

The document discusses key concepts in database management, including functional dependencies, normalization forms (1NF, 2NF, 3NF, BCNF, 4NF, 5NF), ACID properties, serializability, concurrency control protocols (2PL and Timestamp), log-based recovery, and checkpoints. It emphasizes the importance of functional dependencies in database design, the need for normalization to reduce redundancy, and the role of ACID properties in ensuring data integrity. Additionally, it highlights the significance of transaction logs and checkpoints in efficient database recovery management.

Uploaded by

Priyanshu Gupta
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

Subject: Database Management System (AIDS 204)

Unit – 3

1. Explain the concept of functional dependencies in a relational


database. Provide an example to illustrate your explanation.
The functional dependency is a relationship that exists between two attributes. It typically
exists between the primary key and non-key attribute within a table. It's a relationship
between attributes of a table that are dependent on each other.
Functional dependencies are represented by an arrow sign, with the left side of the arrow
being the determinant and the right side being the dependent. For example, if X is a relation
with attributes P and Q, then their functional dependency would be represented by P -> Q.

For example:

Assume we have an employee table with attributes: Emp_Id, Emp_Name, Emp_Address.

Here Emp_Id attribute can uniquely identify the Emp_Name attribute of employee
table because if we know the Emp_Id, we can tell that employee name associated with
it.

Functional dependency can be written as:

Emp_Id → Emp_Name

We can say that Emp_Name is functionally dependent on Emp_Id.

Functional dependencies are an essential factor in designing database parameters and


functions to help store and manage data. They help to avoid data redundancy and
identify bad designs.

Types of Functional dependency


1. Trivial functional dependency

o A → B has trivial functional dependency if B is a subset of A.


o The following dependencies are also trivial like: A → A, B → B

Example:

1. Consider a table with two columns Employee_Id and Employee_Name.


2. {Employee_id, Employee_Name} → Employee_Id is a trivial functional depen
dency as
3. Employee_Id is a subset of {Employee_Id, Employee_Name}.
4. Also, Employee_Id → Employee_Id and Employee_Name → Employee_Nam
e are trivial dependencies too.

2. Non-trivial functional dependency

o A → B has a non-trivial functional dependency if B is not a subset of A.


o When A intersection B is NULL, then A → B is called as complete non-trivial.

Example:

1. ID → Name,
2. Name → DOB

2. Define the following normal forms: a) First Normal Form (1NF)


b) Second Normal Form (2NF) c) Third Normal Form (3NF) d)
Boyce-Codd Normal Form (BCNF).

Normalization is the process of minimizing redundancy from a relation or set of


relations. Redundancy in relation may cause insertion, deletion, and update
anomalies. So, it helps to minimize the redundancy in relations. Normal forms are
used to eliminate or reduce redundancy in database tables.

Normalization of DBMS

In database management systems (DBMS), normal forms are a series of guidelines


that help to ensure that the design of a database is efficient, organized, and free
from data anomalies. There are several levels of normalization, each with its own
set of guidelines, known as normal forms.
Important Points Regarding Normal Forms in DBMS

• First Normal Form (1NF): This is the most basic level of


normalization. In 1NF, each table cell should contain only a
single value, and each column should have a unique name.
The first normal form helps to eliminate duplicate data and
simplify queries.
• Second Normal Form (2NF): 2NF eliminates redundant data
by requiring that each non-key attribute be dependent on the
primary key. This means that each column should be directly
related to the primary key, and not to other columns.
• Third Normal Form (3NF): 3NF builds on 2NF by requiring
that all non-key attributes are independent of each other. This
means that each column should be directly related to the
primary key, and not to any other columns in the same table.
• Boyce-Codd Normal Form (BCNF): BCNF is a stricter form of
3NF that ensures that each determinant in a table is a
candidate key. In other words, BCNF ensures that each non-
key attribute is dependent only on the candidate key.
• Fourth Normal Form (4NF): 4NF is a further refinement of
BCNF that ensures that a table does not contain any multi-
valued dependencies.
• Fifth Normal Form (5NF): 5NF is the highest level of
normalization and involves decomposing a table into smaller
tables to remove data redundancy and improve data integrity.

Normal forms help to reduce data redundancy, increase data consistency, and
improve database performance. However, higher levels of normalization can lead
to more complex database designs and queries. It is important to strike a balance
between normalization and practicality when designing a database.

3. Discuss the need for higher normal forms such as Fourth


Normal Form (4NF) and Fifth Normal Form (5NF). Provide
examples to support your explanation.
Fourth Normal Form (4NF) and Fifth Normal Form (5NF) are two higher levels of
database normalization that are used to further reduce data redundancy and
improve data integrity. 4NF addresses multivalued dependencies, while 5NF
addresses join dependencies.

Multivalued dependencies occur when a single value in one column can have
multiple corresponding values in another column. For example, a table that lists
customer orders with a primary key of order ID and non-primary key columns for
customer ID and order items violates 4NF because order items depend on both order
ID and customer ID.

Join dependencies occur when two tables have multiple overlapping candidate
keys. For example, a table that lists customer orders and a table that lists customer
addresses both have a candidate key of customer ID. This violates 5NF because the
two tables can be joined on the customer ID column, resulting in a table with
redundant data.

Example of 4NF
Consider a table that lists student grades with the following columns:
Student ID, Course ID, and Grade.
This table violates 4NF because the Grade column is multivalued. A
student can have multiple grades for a single course, depending on the
different sections of the course they are taking.
To normalize this table to 4NF, we would need to split it into two tables:
• Student Grades: This table would have the following columns:
o Student ID
o Course ID
o Section ID
o Grade
• Course Sections: This table would have the following columns:
o Course ID
o Section ID

This would eliminate the multivalued dependency and ensure that the table
is in 4NF.

Example of 5NF
Consider a table that lists customer orders and a table that lists customer
addresses. Both tables have a candidate key of customer ID.
This table violates 5NF because the two tables can be joined on the
customer ID column, resulting in a table with redundant data.
To normalize this table to 5NF, we would need to split it into two tables:
• Customer Orders: This table would have the following columns:
o Order ID
o Customer ID
o Order Items
• Customer Addresses: This table would have the following columns:
o Customer ID
o Address

This would eliminate the join dependency and ensure that the table is in
5NF.

Conclusion
4NF and 5NF are two higher levels of database normalization that can be
used to further reduce data redundancy and improve data integrity. They
are not always necessary, but they can be useful in cases where data has
multiple dimensions, attributes, or values, or when ensuring the highest
level of data quality and consistency is important.

4. Describe the ACID properties in transaction management. Explain


each property with an example.

ACID stands for atomicity, consistency, isolation, and durability. These are
four key principles that Database Management Systems (DBMS) use to
maintain data integrity and consistency. A database operation that has
these properties is called an ACID transaction.

Here are some examples of each ACID property:


• Atomicity
A transaction must be all or nothing. For example, if a user tries to book a ticket, selects a
seat, and goes to the payment gateway but fails, the seat will not be reserved.

• Consistency
Only valid data is written to the database. For example, no transaction should negatively
affect the data in the database.

• Isolation
Running transactions independently is the core of isolation. For example, changes in one
transaction will not affect others until committed.

• Durability
The database should be able to hold all its latest updates even if the system fails or
restarts.

5. Discuss the concept of serializability in transaction management.


How does it ensure consistency in a database system?

Serializability in a Database Management System (DBMS) ensures that


concurrent transactions are correct and that the database is consistent. It
does this by making sure that the combined effects of multiple
transactions on the database are the same as if they were run serially, one
after the other.

Serializability prevents data inconsistencies and anomalies that can


happen when multiple transactions try to access and modify the same data
at the same time. It's achieved through concurrency control mechanisms
like: Locking, Timestamp ordering, and Optimistic concurrency control.
These mechanisms allow for concurrent access to the database while still
ensuring that transactions are executed in a serializable order.

There are two types of serializability:


• Conflict serializability
Ensures that identical data items are executed in an order that maintains database
consistency. For example, if one transaction writes to a data item while another transaction
reads from it, the read transaction must wait until the write transaction is complete to avoid
inconsistency.

• View serializability
Ensures that transactions yield the same final result as a serial schedule.

6. Compare and contrast the Two-Phase Locking (2PL) protocol and


the Timestamp protocol for concurrency control in a database
system. Highlight their advantages and limitations.

Both the Two-Phase Locking (2PL) protocol and the Timestamp protocol are
popular concurrency control mechanisms used in database systems, aiming to
ensure data consistency and prevent conflicts between transactions. Let's
compare and contrast them:
1. **Two-Phase Locking (2PL) Protocol**:
- **Operation**: In 2PL, transactions acquire locks in two phases: the growing phase,
during which locks can be acquired but not released, and the shrinking phase, during which
locks can be released but not acquired.
- **Advantages**:
- Simplicity: 2PL is conceptually straightforward, making it easier to implement and
understand.
- Deadlock Prevention: By following strict lock acquisition and release rules, 2PL
inherently prevents deadlocks.
- **Limitations**:
- Strictness: 2PL can result in unnecessary blocking, where transactions wait for locks
even if they could proceed without causing conflicts.
- Deadlock Occurrence: While 2PL prevents deadlocks, it does not guarantee their
absence in all scenarios. If transactions do not follow the 2PL rules strictly, deadlocks can still
occur.

2. **Timestamp Protocol**:
- **Operation**: In the Timestamp protocol, each transaction is assigned a unique
timestamp representing its order of execution. Transactions are allowed to read or write
data only if their timestamp is compatible with the timestamps of other transactions
accessing the same data.
- **Advantages**:
- Optimistic: The Timestamp protocol is optimistic in nature, allowing transactions to
proceed without acquiring locks initially, assuming that conflicts will be rare.
- Reduced Blocking: Since transactions are not immediately blocked by locks, the
Timestamp protocol can lead to fewer instances of blocking and potentially better
throughput.

- **Limitations**:
- Increased Rollback: In case of conflicts, transactions may need to be rolled back and re-
executed, potentially leading to increased overhead.
- Timestamp Assignments: Generating unique timestamps and managing them efficiently
can be challenging, especially in distributed systems or in scenarios with high transaction
rates.
2PL offers simplicity and deadlock prevention at the cost of potential blocking and rigidity,
while the Timestamp protocol provides optimism and potentially better throughput but may
require more sophisticated mechanisms to manage timestamps and handle conflicts. The
choice between them depends on factors such as the specific requirements of the database
system, the workload characteristics, and the trade-offs that the system designers are willing
to make.

7. Explain the process of database recovery management using the log-


based recovery technique. Discuss the role of transaction logs in
database recovery.
Log-based recovery is a technique used in database management systems (DBMS)
to recover a database to a consistent state after a crash or failure. It involves
recording each transaction in a log, which is stored in a stable storage device so that
it can be recovered if there is a failure.

The role of transaction logs:


### Log-Based Recovery in DBMS
Log-based recovery is a crucial technique used to restore a database to a
consistent state after a failure or crash. It relies on transaction logs, which are
records of all the transactions performed on the database. Here are the key
aspects of log-based recovery:

1. **Transaction Logs (Log Records)**:


- The **log** is a sequence of **log records**, capturing all the update
activities in the database.
- For each transaction, a separate log is maintained in stable storage.
- Whenever an operation modifies the database, an updated log record is
created to reflect that modification.
- An update log record has the following fields:
- **Transaction identifier (Ti)**: A unique identifier for the transaction that
performed the write operation.
- **Data item (Xj)**: A unique identifier for the data item being written.
- **Old value (V1)**: The value of the data item before the write operation.
- **New value (V2)**: The value of the data item after the write operation.
- Other types of log records include:
- `<Ti start>`: Indicates when a transaction Ti starts.
- `<Ti commit>`: Indicates when a transaction Ti commits.
- `<Ti abort>`: Indicates when a transaction Ti aborts.

2. **Undo and Redo Operations**:


- Since all database modifications must be preceded by the creation of a log
record, the system has access to both the old value (prior to modification) and
the new value (to be written) for each data item.
- This allows the system to perform the following operations:
- **Undo**: Using a log record, the system sets the data item specified in
the log record back to its old value.
- **Redo**: Using a log record, the system sets the data item specified in
the log record to its new value.

3. **Deferred vs. Immediate Modification Technique**:


- **Deferred Modification Technique**: Transactions do not modify the
database until they have partially committed. In other words, modifications
occur only after the transaction has reached a certain point.
- **Immediate Modification Technique**: Database modifications occur
while the transaction is still active.

4. **Recovery Using Log Records**:


- After a system crash, the system consults the log to determine which
transactions need to be redone and which need to be undone.
- Transaction Ti needs to be:
- **Undone** if the log contains the record `<Ti start>` but does not contain
either the record `<Ti commit>` or the record `<Ti abort>`.
- **Redone** if the log contains the record `<Ti start>` and either the record
`<Ti commit>` or the record `<Ti abort>`.

In summary, log-based recovery ensures that committed transactions'


modifications are reflected in the database while preventing any persistent
modifications made by aborted transactions. Transaction logs play a pivotal
role in maintaining data integrity and consistency during recovery processes.

8. Define checkpoints in database recovery management. How are they


helpful in improving the efficiency of recovery operations? Provide
examples.
A checkpoint in a Database Management System (DBMS) is a point in
time when the system's data and log files are synchronized, and all
transactions are committed. Checkpoints are important for maintaining
a database's integrity and for efficient recovery after system
failures. They can also be used for performance optimization and
auditing.

Checkpoints are helpful in improving the efficiency of recovery operations


in the following ways:
• Recovery point
Checkpoints provide a recovery point in the event of an unexpected shutdown, ensuring the
database can be restored to a consistent state.

• Performance optimization
Checkpoints can help optimize database performance by reducing the amount of data that
needs to be processed during recovery.

• ACID properties
Checkpoints are crucial for maintaining the ACID properties of a database (Atomicity,
Consistency, Isolation, Durability).

• Log file compression


Checkpoints compress the log by transferring old transactions to permanent storage,
ensuring system recovery and stability.

The recovery process can be further optimized by configuring the


checkpoint intervals appropriately. For example, if the checkpoint interval is
set to a shorter duration, the recovery process will have to roll forward a
shorter period of time, hence faster recovery.

You might also like