The document summarizes database recovery techniques. It discusses types of failures like transaction failures, system failures, and media failures. It describes the purpose of database recovery as bringing the database to the most recent consistent state before failure. It outlines recovery concepts like catastrophic vs. non-catastrophic failures, the log file, caching of disk blocks, write-ahead logging, and checkpoints. It provides details on each of these recovery techniques.
The document summarizes database recovery techniques. It discusses types of failures like transaction failures, system failures, and media failures. It describes the purpose of database recovery as bringing the database to the most recent consistent state before failure. It outlines recovery concepts like catastrophic vs. non-catastrophic failures, the log file, caching of disk blocks, write-ahead logging, and checkpoints. It provides details on each of these recovery techniques.
Chapter 5 Outline 1. Types of Failure 2. Purpose of Database Recovery 3. Recovery concepts 4. Recovery techniques
Samara University Department of Computer Science 2
Types of Failure • Transaction failure: Transactions may fail because of errors (integer overflow and division by zero), incorrect input, deadlock, incorrect synchronization, etc. • System failure: System may fail because of application error, operating system fault, RAM failure, etc. • Media failure: Disk head crash, power disruption, etc.
Samara University Department of Computer Science 3
Purpose of Database Recovery • To bring the database into the most recent consistent state just before the time of failure occurs. • To ensure the transaction properties of Atomicity and Durability (a committed transaction cannot be canceled and all its updates must be applied permanently to the database). • To ensure robustness, After a failure the DBMS recovery manager is responsible for bringing the system into a consistent state before transactions can resume. Samara University Department of Computer Science 4 Recovery Concepts outline 1. Catastrophic & non-catastrophic failure 2. The Log File 3. Caching (Buffering) of Disk Blocks 4. Write-Ahead Logging, Steal/No-Steal, and Force/No-Force 5. Checkpoints in the System Log and Fuzzy Check pointing 6. Transaction Rollback and Cascading Rollback 7. Transaction Actions That Do Not Affect the Database Samara University Department of Computer Science 5 1. Catastrophic & non-catastrophic failure (1) • A typical strategy for recovery may be summarized informally as follows: 1. If there is extensive damage to a wide portion of the database due to catastrophic failure, such as a disk crash, the recovery method restores a past copy of the database that was backed up to archival storage (typically tape or other large capacity offline storage media) and reconstructs a more current state by reapplying or redoing the operations of committed transactions from the backed up log, up to the time of failure. Samara University Department of Computer Science 6 Catastrophic & non-catastrophic failure (2) 2. When the database on disk is not physically damaged, and a non-catastrophic failure has occurred, the recovery strategy is to identify any changes that may cause an inconsistency in the database. For example, a transaction that has updated some database items on disk but has not been committed needs to have its changes reversed by undoing its write operations. It may also be necessary to redo some operations in order to restore a consistent state of the database Samara University Department of Computer Science 7 2. The Log File (1) The system must keep information about the changes that were applied to data items by the various transactions. This information is typically kept in the system log. • Holds the information that is necessary for the recovery process • Records all relevant operations in the order in which they occur • Is an append-only file. • Holds various types of log records (or log entries). Samara University Department of Computer Science 8 Log File Entries(2) Types of records (entries) in log file: • [start_transaction,T]: Records that transaction T has started execution. • [write_item,T,X,old_value,new_value]: T has changed the value of item X from old_value to new_value. • [read_item,T,X]: T has read the value of item X (not needed in many cases). • [end_transaction,T]: T has ended execution • [commit,T]: T has completed successfully, and committed. • [abort,T]: T has been aborted. Samara University Department of Computer Science 9 The Log File (3) For write_item log entry, old value of item before modification (BFIM - BeFore Image) and the new value after modification (AFIM – AFter Image) are stored. BFIM needed for UNDO, AFIM needed for REDO. A sample log is given below. Back P and Next P point to the previous and next log records of the same transaction. T ID Back P Next P Operation Data item BFIM AFIM T1 0 1 Begin T1 1 4 Write X X = 100 X = 200 T2 0 8 Begin T1 2 5 W Y Y = 50 Y = 100 T1 4 7 R M M = 200 M = 200 T3 0 9 R N N = 400 N = 400 T1 5 nil End 3. Caching (Buffering) of Disk Blocks (1) • Cache the disk pages (blocks)–containing DB items to be Updated into main memory buffers. • Then update the main memory buffers before being written back to disk. • For efficient recovery purpose, the caching of disk pages is handled by the DBMS instead of the OS. • Typically, a collection of in-memory buffers, called DBMS cache kept under the control of the DBMS. • A directory for the cache is used to keep track of which DB items are in the buffers. A table of <disk page address, buffer location> entries. • The DBMS cache holds the database disk blocks including; Data blocks , Index blocks , Log blocks Caching (Buffering) of Disk Blocks (2) • When DBMS requests action on some item First checks the cache directory to determine if the corresponding disk page is in the cache. If no, the item must be located on disk and the appropriate disk pages are copied into the cache. It may be necessary to replace (flush) some of cache buffers to make space available for the new item. if dirty bit value=1(modified) FIFO or LRU can be used as replacement strategies. Caching (Buffering) of Disk Blocks (3) • Dirty bit • Associate with each buffer in the cache a dirty bit. The dirty bit can be included in the directory entry. • It indicates whether or not the buffer has been modified. Set dirty bit=0 when the page is first read from disk to the buffer cache. Set dirty bit=1 as soon as the corresponding buffer is modified. • When the buffer content is replaced –flushed- from the cache, write it back to the corresponding disk page only if dirty bit=1 Caching (Buffering) of Disk Blocks (4) • Pin-unpin bit. – A page is pinned –i.e. pin-unpin bit value=1-, if it cannot be written back to disk as yet. • Strategies that can be used when flushing occurs. – In-place updating: Writes the buffer back to the same original disk location overwriting the old value on disk – Shadowing: Writes the updated buffer at a different disk location. Multiple versions of data items can be maintained. The old value called BFIM –before image The new value AFIM –after image are maintained • The new value and old value are kept on disk, so no need of log for recovery. 4. Write-ahead Logging Protocol (1) The information needed for recovery must be written to the log file on disk before changes are made to the database on disk. Write- Ahead Logging (WAL) protocol consists of two rules: For Undo: Before a data item’s AFIM is flushed to the database on disk (overwriting the BFIM) its BFIM must be written to the log and the log must be saved to disk. For Redo: Before a transaction executes its commit operation, all its AFIMs must be written to the log and the log must be saved on a stable store. Write-ahead Logging Protocol (2) • Without ensuring that this BFIM is recorded in the appropriate log entry and the log is flushed to disk before the BFIM is overwritten with the AFIM in the DB on disk, the recovery will not be possible. • With Write-Ahead Logging, the log blocks containing the associated log records for a particular data block update must first be written to disk before the data block itself can be written back to disk. Write-ahead Logging Protocol (3) • Commercial DBMSs and WAL – IBM DB2, Informix, Microsoft SQL Server, Oracle 8, and Sybase ASE all use a WAL scheme for recovery. • To facilitate the recovery process, the DBMS recovery subsystem may need to maintain a number of lists. List of active transactions: transactions started but not committed yet List of committed transactions since last checkpoint. List of aborted transactions since last checkpoint. Steal/No-Steal and Force/No-Force(1) Specify how to flush database cache buffers to database on disk: Steal: Cache buffers updated by a transaction may be flushed to disk before the transaction commits (recovery may require UNDO). Advantage: avoid the need for a very large buffer space No-Steal: Cache buffers cannot be flushed until after transaction commit (NO-UNDO). (Buffers are pinned till transactions commit). • Deferred update follows this approach Steal/No-Steal and Force/No-Force(2) • Force: Cache is flushed (forced) to disk before transaction commits (NO-REDO). • No-Force: Some cache flushing may be deferred till after transaction commits (recovery may require REDO). • These give rise to four different ways for handling recovery: Steal/No-Force (Undo/Redo), Steal/Force (Undo/No-redo), No-Steal/No-Force (Redo/No-undo), No- Steal/Force (No-undo/No-redo). • Typical database systems follow a steal/no- force strategy Checkpointing(1) • In case of failure, the recovery manager requires that the entire log be examined to process recovery→ time consuming • A quick way to limit the amount of log to scan on recovery can be achieved using checkpoints. • A [checkpoint] record is written into the log periodically at that point when the system writes out to the DB on disk all DBMS buffers that have been modified. • Hence, all transactions with [commit, T] entry in the log before [checkpoint] entry do not need to have their WRITE operations redone in case of crash. • Since all their update have been recorded in the DB on disk during checkpointing. Checkpointing (2) • The recovery manager must decides at what intervals to take a checkpoint. • The intervals measured in time, say every m minutes. Or the intervals measured in the number of committed transactions since the last checkpoint, say t transactions. – m & t are system parameter • Take a checkpoint consists of the following 1. Suspend execution of transactions temporarily. 2. Force-write all main memory buffers that have been modified to disk. 3. Write a [checkpoint] record to the log, and force-write the log to disk. 4. Resume executing transactions. Checkpointing (3) • The time needed to force-write all modified memory buffers may delay transaction processing step 1. • Use fuzzy checkpointing to reduce this delay. – The system can resume transaction processing after [checkpoint] record written to the log without having to wait for step 2 to finish. – However, until step 2 is completed, the previous [checkpoint] record should remain to be valid. • The system maintains a pointer to the valid checkpoint which points to the previous [checkpoint] recording the log – Once step2 is conducted, that pointer is changed to point to the new checkpoint in the log. Example of Checkpoints (4) Tc Tf T1 T2 T3 T4
checkpoint system failure
• T1 can be ignored (updates already output to disk due to checkpoint)
• T2 and T3 redone. • T4 undone
Samara University Department of Computer Science 23
5.Transaction Rollback & Cascading Rollback(1)
• If a transaction fails, roll back this transaction.
Any data item values changed by this transaction and written to the DB must be restored to their previous values –BFIM The UNDO-type log entries are used to restore the old values of these data items. Only the write-item operations need to be undone during transaction rollback. • If the recovery protocol ensure recoverable schedules without to ensure strict or cascadless schedules, cascading rollback can occur Transaction Rollback & Cascading rollback(2) If a transaction T is rolled back, roll back every transaction T’ that read a value of a data item written by T, and so on. • Read-item operations are recorded in the log only to determine whether cascading rollback of additional transactions is necessary • Cascading rollback is complex and time consuming. – Almost all recovery mechanisms are designed such that cascading rollback is never required – they guarantee strict or cascadless schedules 7. Transaction Actions That Do Not Affect DB • Generating and printing messages or reports from information retrieved from the database. • If a transaction fails before completion, we may not want the user to get these reports, since the transaction has failed to complete. • If such erroneous reports are produced, part of the recovery process would have to inform the user that these reports are wrong, since the user may take an action based on these reports that affects the database. • Hence, such reports should be generated only after the transaction reaches its commit point. • A common method of dealing with such actions is to issue the commands that generate the reports but keep them as batch jobs, which are executed only after the transaction reaches its commit point. If the transaction fails, the batch jobs are canceled. Recovery techniques outline 1. Recovery algorithm 2. NO-UNDO/REDO Recovery Based on Deferred Update 3. Recovery Techniques Based on Immediate Update 4. Shadow Paging 5. The ARIES Recovery Algorithm 6. Recovery in Multi-database Systems 7. Database Backup and Recovery from Catastrophic Failures 1. Recovery algorithm • Recovery algorithms have two parts 1.Actions taken during normal transaction processing to ensure enough information exists to recover from failures 2.Actions taken after a failure to recover the database contents to a state that ensures atomicity, consistency and durability 2. Deferred Update (NO-UNDO/REDO) Recovery Protocol (1) Deferred Update: A modified data item in the cache cannot be written back to disk till after the transaction commits (buffer is pinned). If a transaction fails before reaching its commit point, it will not have changed the database in any way, so UNDO is not needed. It may be necessary to REDO the effect of the operations of a committed transaction from the log, because their effect may not yet have been recorded in the database on disk. Hence, deferred update is also known as the NO-UNDO/REDO algorithm. Deferred Update (NO-UNDO/REDO) Recovery Protocol (2)
A typical deferred update protocol can be stated as
follows 1. A transaction cannot change the database on disk until it reaches its commit point. 2. A transaction does not reach its commit point until all its update operations are recorded in the log and the log is force-written to disk –i.e. WAL-
Samara University Department of Computer Science 30
Deferred Update (NO-UNDO/REDO) Recovery Protocol (3) • Advantages 1.A transaction does not record any changes in the DB on disk until it commits –never rollback because of transaction failure during transaction execution 2.A transaction will never read the value of an item that is written by an uncommitted transaction, hence no cascading rollback will occur. • Drawbacks –Limits the concurrent execution of transactions because all items remain locked until the transaction reaches its commit point –Require excessive buffer space to hold all updates Samara University Department of Computer Science 31 3. Recovery Techniques Based on Immediate Update Immediate Update: A data item modified in cache can be written back to disk before transaction commits. • However, the update operation must still be recorded in the log (on disk) before it is applied to the database using WAL protocol- If a transaction fails after recording some changes in the database on disk but before reaching its commit point, the effect of its operations on the database must be undone; that is, the transaction must be rolled back. In the general case of immediate update, both undo and redo may be required during recovery. This technique, known as the UNDO/REDO algorithm, requires both operations during recovery, and is used most often in practice. UNDO and REDO Recovery Actions To maintain atomicity and durability, some transaction’s may have their operations redone or undone during recovery. UNDO (roll-back) is needed for transactions that are not committed yet. REDO (roll-forward) is needed for committed transactions whose writes may have not yet been flushed from cache to disk. Undo: Restore all BFIMs from log to database on disk. UNDO proceeds backward in log (from most recent to oldest UNDO). Redo: Restore all AFIMs from log to database on disk. REDO proceeds forward in log (from oldest to most recent REDO). 4. Shadow Paging (NO-UNDO/NO- REDO) The AFIM does not overwrite its BFIM but is recorded at another place (new version) on the disk. Thus, a data item can have AFIM and BFIM (Shadow copy of the data item) at two different places on the disk.
X Y X' Y'
Database
X and Y: Shadow (old) copies of data items
X` and Y`: Current (new) copies of data items Samara University Department of Computer Science 34 5. ARIES Database Recovery (1) Used in practice, it is based on: 1. WAL (Write Ahead Logging) 2. Repeating history during redo: ARIES will retrace all actions of the database system prior to the crash to reconstruct the correct database state. 3. Logging changes during undo: It will prevent ARIES from repeating the completed undo operations if a failure occurs during recovery, which causes a restart of the recovery process. Samara University Department of Computer Science 36 ARIES Database Recovery (2) The Log and Log Sequence Number (LSN): A log record is written for (a) data update (Write), (b) transaction commit, (c) transaction abort, (d) undo, and (e) transaction end. In the case of undo a compensating log record is written. A unique LSN is associated with every log record. LSN increases monotonically and indicates the disk address of the log record it is associated with. In addition, each data page stores the LSN of the latest log record corresponding to a change for that page.
Samara University Department of Computer Science 37
ARIES Database Recovery (3) The Log and Log Sequence Number (LSN) (cont.) A log record stores: 1. LSN of previous log record for same transaction: It links the log records of each transaction. 2. Transaction ID. 3. Type of log record. For a write operation, additional information is logged:
1. Page ID for the page that includes the item
2. Length of the updated item 3. Its offset from the beginning of the page 4. BFIM of the item 5. AFIM of the item Samara University Department of Computer Science 38 ARIES Database Recovery (4) The Transaction table and the Dirty Page table For efficient recovery, the following tables are also stored in the log during checkpointing: Transaction table: Contains an entry for each active transaction, with information such as transaction ID, transaction status, and the LSN of the most recent log record for the transaction. Dirty Page table: Contains an entry for each dirty page (buffer) in the cache, which includes the page ID and the LSN corresponding to the earliest update to that page.
Samara University Department of Computer Science
39 ARIES Database Recovery (5) “Fuzzy” Checkpointing A checkpointing process does the following: 1. Writes a begin_checkpoint record in the log, then forces updated (dirty) buffers to disk. 2. Writes an end_checkpoint record in the log, along with the contents of transaction table and dirty page table. 3. Writes the LSN of the begin_checkpoint record to a special file. This special file is accessed during recovery to locate the last checkpoint information. To allow the system to continue to execute transactions, ARIES uses “fuzzy checkpointing”. Samara University Department of Computer Science 40 ARIES Database Recovery (6) The following steps are performed for recovery 1. Analysis phase: Start at the begin_checkpoint record and proceed to the end_checkpoint record. Access transaction table and dirty page table that were appended to the log. During this phase some other records may be written to the log and the transaction table may be modified. The analysis phase compiles the set of redo and undo operations to be performed and ends. 2. Redo phase: Starts from the point in the log where all dirty pages have been flushed, and move forward. Operations of committed transactions are redone. 3. Undo phase: Starts from the end of the log and proceeds backward while performing appropriate undo. For each undo it writes a compensating record in the log. The recovery completes at the end of undo phase. Samara University Department of Computer Science 41 6. Recovery in Multi-database Systems • A multi-database transaction require access to multiple databases. – The DBs may even be stored on different types of DBMS. • Some DBMS may be relational, whereas others are object oriented, etc. – Each DBMS involved in the multi-database transaction may have its own recovery technique and transaction manager for separate DBMSs. • Use a two-level recovery mechanism to maintain the atomicity of a multi-database transaction. – A global recovery manager, or coordinator. – The local recovery managers. Samara University Department of Computer Science 43 Two-phase commit (cont.) Phase1: Coordinator (usually application program running in middle-tier of 3-tier architecture) sends “Ready-to-commit?” query to each participating database, then waits for replies. A participating database replies Ready- to-commit only after saving all actions in its local log on disk. Phase2: If coordinator receives Ready-to- commit signals from all participating databases, it sends Commit to all; otherwise, it send Abort to all. This protocol can survive most types of crashes. Samara University Department of Computer Science 44 7. Database Backup and Recovery from Catastrophic Failures (1) • All the techniques discussed till now apply to non- catastrophic failures. – The system log is maintained on the disk and it is not lost as result of the failure. – Similarly, the shadow directory is not lost when shadow paging is used. • The recovery manager must also have to handle more catastrophic failures such as disk crashes. – Main technique used is that of DB backup. • The whole DB and the log are periodically copied into a cheap media as tapes. Samara University Department of Computer Science 45 Database Backup and Recovery from Catastrophic Failures (2) • It is customary to backup the system log at more frequent intervals than full database backup. – The log is substantially small and hence can be backed up more frequently –than DB itself – Thus users do not lose all transactions they have performed since the last DB backup. – A new log is started after each DB backup. • To recover from disk crash – Recreate the DB on disk from the latest backup copy on tape. – Reconstruct the effect of all committed transactions from the backup copies of log Samara University Department of Computer Science 46
(Ebook) Pulp & Paper Chemistry & Technology V1 - Paper Products Physics and Technology by Monica Ek, Göran Gellerstedt, Gunnar Henriksson (Editors) ISBN 9783110213454, 3110213451 - Download the ebook now to never miss important information
(Ebook) Pulp & Paper Chemistry & Technology V1 - Paper Products Physics and Technology by Monica Ek, Göran Gellerstedt, Gunnar Henriksson (Editors) ISBN 9783110213454, 3110213451 - Download the ebook now to never miss important information