Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
3 views39 pages

ADB - Chapter 4,5,6 2

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 39

Advanced Database Systems Handout

Chapter Four
Concurrency Control Techniques
Introduction
Concurrency control is the activity of coordinating the simultaneous execution of transactions
in a multiprocessing or multi-user database management system. The objective of concurrency
control is to ensure the serializability of transactions in a multi-user database management
system. Serializability can be achieved in several ways. Concurrency control can enforce
Isolation (through mutual exclusion) among conflicting transactions, preserve database
consistency through consistency preserving execution of transactions and can resolve read-
write and write-write conflicts. There are two main concurrency control techniques that allow
transactions to execute safely in parallel subject to certain constraints: locking and timestamp
methods.
Locking and timestamping are essentially conservative (or pessimistic) approaches in that
they cause transactions to be delayed in case they conflict with other transactions at some time
in the future. Optimistic methods, are based on the premise that conflict is rare so they allow
transactions to proceed unsynchronized and only check for conflicts at the end, when a
transaction commits.
Locking
A procedure used to control concurrent access to data. When one transaction is accessing the
database, a lock may deny access to other transactions to prevent incorrect results. It is an
operation which secures:
(a) permission to Read
(b) permission to Write a data item for a transaction.
Example:
• Lock (X). Data item X is locked in behalf of the requesting transaction.
Unlocking is an operation which removes these permissions from the data item.
Example: Unlock (X): Data item X is made available to all other transactions.
Lock and Unlock are Atomic operations.
Locking methods are the most widely used approach to ensure serializability of concurrent
transactions. There are several variations, but all share the same fundamental characteristic,
namely that a transaction must claim a shared (read) or exclusive (write) lock on a data item
before the corresponding database read or write operation.

1
Advanced Database Systems Handout

• Shared lock: If a transaction has a shared lock on a data item, it can read the item
but not update it.
• Exclusive lock: If a transaction has an exclusive lock on a data item, it can both
read and update the item.
Since read operations cannot conflict, it is permissible for more than one transaction to hold
shared locks simultaneously on the same item. On the other hand, an exclusive lock gives a
transaction exclusive access to that item. Thus, as long as a transaction holds the exclusive lock
on the item, no other transactions can read or update that data item. Locks are used in the
following way:
• Any transaction that needs to access a data item must first lock the item, requesting a
shared lock for read only access or an exclusive lock for both read and write access.
• If the item is not already locked by another transaction, the lock will be granted.
• If the item is currently locked, the DBMS determines whether the request is compatible
with the existing lock. If a shared lock is requested on an item that already has a shared
lock on it, the request will be granted; otherwise, the transaction must wait until the
existing lock is released.
• A transaction continues to hold a lock until it explicitly releases it either during
execution or when it terminates (aborts or commits). It is only when the exclusive lock
has been released that the effects of the write operation will be made visible to other
transactions.
In addition to these rules, some systems permit a transaction to issue a shared lock on an item
and then later to upgrade the lock to an exclusive lock. This in effect allows a transaction to
examine the data first and then decide whether it wishes to update it. For the same reason, some
systems also permit a transaction to issue an exclusive lock and then later to downgrade the
lock to a shared lock.
To guarantee serializability, we must follow an additional protocol concerning the positioning
of the lock and unlock operations in every transaction. The best-known protocol is two-phase
locking (2PL).
Two-phase locking (2PL)
A transaction follows the two-phase locking protocol if all locking operations precede the first
unlock operation in the transaction. According to the rules of this protocol, every transaction
can be divided into two phases: first a growing phase, in which it acquires all the locks needed
but cannot release any locks, and then a shrinking phase, in which it releases its locks but

2
Advanced Database Systems Handout

cannot acquire any new locks. There is no requirement that all locks be obtained
simultaneously. Normally, the transaction acquires some locks, does some processing, and
goes on to acquire additional locks as needed. However, it never releases any lock until it has
reached a stage where no new locks are needed. The rules are:
• A transaction must acquire a lock on an item before operating on the item. The lock
may be read or write, depending on the type of access needed.
• Once the transaction releases a lock, it can never acquire any new locks.
Two-Phase Locking Techniques: Essential components
• Lock Manager: is responsible for deciding the appropriate lock type (shared,
exclusive, update, and so on)
• Lock table: Lock manager uses it to store the identification of the transaction locking
a data item, the data item, lock mode and pointer to the next data item locked. One
simple way to implement a lock table is through linked list.

Transaction ID Data item id lock mode Ptr to next data item


T1 X1 Read Next
• Database requires that all transactions should be well-formed. A transaction is well-
formed if:
o It must lock the data item before it reads or writes to it.
o It must not lock an already locked data items and it must not try to unlock a free
data item.
• Examples of locks:
o The following code performs the lock operation:

o The following code performs the unlock operation:

o The following code performs the read operation:

3
Advanced Database Systems Handout

o The following code performs the write lock operation:

o The following code performs the unlock operation:

Lock conversion
Lock conversion occurs when a process accesses a data object on which it already holds a lock,
and the access mode requires a more restrictive lock than the one already held. A process can
hold only one lock on a data object at any given time, although it can request a lock on the
same data object many times indirectly through a query.
• Lock upgrade: existing read lock to write lock.
if Ti has a read-lock (X) and Tj has no read-lock (X) (i ≠ j) then
convert read-lock (X) to write-lock (X)
else
force Ti to wait until Tj unlocks X
• Lock downgrade: existing write lock to read lock
Ti has a write-lock (X)
(*no transaction can have any lock on X*)
convert write-lock (X) to read-lock (X)

4
Advanced Database Systems Handout

Two-Phase Locking Techniques


The following examples show how two transactions can execute in two phase locking:

Another problem with two-phase locking, which applies to all locking-based schemes, is that
it can cause deadlock, since transactions can wait for locks on data items. If two transactions
wait for locks on items held by the other, deadlock will occur.

Two-phase policy generates two locking algorithms


• Conservative: Prevents deadlock by locking all desired data items before transaction
begins execution.

5
Advanced Database Systems Handout

• Basic: Transaction locks data items incrementally. This may cause deadlock which is
dealt with.
• Strict: A more stricter version of Basic algorithm where unlocking is performed after
a transaction terminates (commits or aborts and rolled-back). This is the most
commonly used two-phase locking algorithm.

Deadlock
Deadlock: An impasse that may result when two (or more) transactions are each waiting for
locks to be released that are held by the other.

Dealing with Deadlock


There are three general techniques for handling deadlock: timeouts, deadlock prevention, and
deadlock detection and recovery.
With timeouts, the transaction that has requested a lock waits for at most a specified period of
time.
Using deadlock prevention, the DBMS looks ahead to determine if a transaction would cause
deadlock, and never allows deadlock to occur. A transaction locks all data items it refers to
before it begins execution. This way of locking prevents deadlock since a transaction never
waits for a data item. The conservative two-phase locking uses this approach.
Using deadlock detection and recovery, the DBMS allows deadlock to occur but recognizes
occurrences of deadlock and breaks them. In this approach, deadlocks are allowed to happen.
The scheduler maintains a wait-for-graph for detecting cycle. If a cycle exists, then one
transaction involved in the cycle is selected (victim) and rolled-back. A wait-for-graph is
created using the lock table. As soon as a transaction is blocked, it is added to the graph. When
a chain like: Ti waits for Tj waits for Tk waits for Ti or Tj occurs, then this creates a cycle.
Some of the transactions causing the deadlock must be aborted.

6
Advanced Database Systems Handout

Since it is more difficult to prevent deadlock than to use timeouts or testing for deadlock and
breaking it when it occurs, systems generally avoid the deadlock prevention method.
Deadlock avoidance
There are many variations of two-phase locking algorithm. Some avoid deadlock by not letting
the cycle to complete. That is as soon as the algorithm discovers that blocking a transaction is
likely to create a cycle, it rolls back the transaction. Wound-Wait and Wait-Die algorithms
use timestamps to avoid deadlocks by rolling-back victim.
• wait-die: When an older transaction tries to lock a DB element that has been locked by
a younger transaction, it waits. When a younger transaction tries to lock a DB element
that has been locked by an older transaction, it dies.
• wound-wait: When an older transaction tries to lock a DB element that has been
locked by a younger transaction, it wounds the younger transaction. When
a younger transaction tries to lock a DB element that has been locked by
an older transaction, it waits.
Assume that Tn requests a lock held by Tk. The following table summarizes the actions taken
for wait-die and wound-wait scheme:

Starvation
Starvation occurs when a particular transaction consistently waits or restarted and never gets a
chance to proceed further. In a deadlock resolution it is possible that the same transaction may
consistently be selected as victim and rolled-back. This limitation is inherent in all priority-
based scheduling mechanisms. In Wound-Wait scheme a younger transaction may always be
wounded (aborted) by a long running older transaction which may create starvation.
Timeouts
A simple approach to deadlock prevention is based on lock timeouts. With this approach, a
transaction that requests a lock will wait for only a system-defined period of time. If the lock
has not been granted within this period, the lock request times out. In this case, the DBMS
assumes the transaction may be deadlocked, even though it may not be, and it aborts and
automatically restarts the transaction. This is a very simple and practical solution to deadlock
prevention and is used by several commercial DBMSs.

7
Advanced Database Systems Handout

The use of locks, combined with the two-phase locking protocol, guarantees serializability of
schedules. The order of transactions in the equivalent serial schedule is based on the order in
which the transactions lock the items they require. If a transaction needs an item that is already
locked, it may be forced to wait until the item is released. A different approach that also
guarantees serializability uses transaction timestamps to order transaction execution for an
equivalent serial schedule.
Timestamping Methods
Timestamp methods for concurrency control are quite different from locking methods. No
locks are involved, and therefore there can be no deadlock. Locking methods generally prevent
conflicts by making transactions wait. With timestamp methods, there is no waiting:
transactions involved in conflict are simply rolled back and restarted.
• Timestamp: A unique identifier created by the DBMS that indicates the relative
starting time of a transaction. Timestamps can be generated by simply using the system
clock at the time the transaction started, or, more normally, by incrementing a logical
counter every time a new transaction starts.
• Timestamping: A concurrency control protocol that orders transactions in such a way
that older transactions, transactions with smaller timestamps, get priority in the event
of conflict.
With timestamping, if a transaction attempts to read or write a data item, then the read or write
is only allowed to proceed if the last update on that data item was carried out by an older
transaction. Otherwise, the transaction requesting the read/write is restarted and given a new
timestamp. New timestamps must be assigned to restarted transactions to prevent their being
continually aborted and restarted. Without new timestamps, a transaction with an old
timestamp might not be able to commit owing to younger transactions having already
committed.
Besides timestamps for transactions, there are timestamps for data items. Each data item
contains a read_timestamp, giving the timestamp of the last transaction to read the item, and
a write_timestamp, giving the timestamp of the last transaction to write (update) the item. For
a transaction T with timestamp ts(T), the timestamp ordering protocol works as follows.
Basic Timestamp Ordering
1. Transaction T issues a read(x)
a. Transaction T asks to read an item (x) that has already been updated by a
younger (later) transaction, that is ts(T) < write_timestamp(x). This means that

8
Advanced Database Systems Handout

an earlier transaction is trying to read a value of an item that has been updated
by a later transaction. The earlier transaction is too late to read the previous
outdated value, and any other values it has acquired are likely to be inconsistent
with the updated value of the data item. In this situation, transaction T must be
aborted and restarted with a new (later) timestamp.
b. Otherwise, ts(T) ≥ write_timestamp(x), and the read operation can proceed. We
set read_timestamp(x) = max(ts(T), read_timestamp(x)).
2. Transaction T issues a write(x)
a. Transaction T asks to write an item (x) whose value has already been read by a
younger transaction, that is ts(T) < read_timestamp(x). This means that a later
transaction is already using the current value of the item and it would be an error
to update it now. This occurs when a transaction is late in doing a write and a
younger transaction has already read the old value or written a new one. In this
case, the only solution is to roll back transaction T and restart it using a later
timestamp.
b. Transaction T asks to write an item (x) whose value has already been written by
a younger transaction, that is ts(T) < write_timestamp(x). This means that
transaction T is attempting to write an obsolete value of data item x. Transaction
T should be rolled back and restarted using a later timestamp.
c. Otherwise, the write operation can proceed. We set write_timestamp(x) = ts(T).
Basic timestamp ordering guarantees that transactions are conflict serializable, and the results
are equivalent to a serial schedule in which the transactions are executed in chronological order
of the timestamps. However, basic timestamp ordering does not guarantee recoverable
schedules. Strict Timestamp Ordering can overcome the limitation of the basic timestamp
ordering.
Strict Timestamp Ordering
1. Transaction T issues a write_item(X) operation: If TS(T) > read_TS(X), then delay T
until the transaction T’ that wrote or read X has terminated (committed or aborted).
2. Transaction T issues a read_item(X) operation: If TS(T) > write_TS(X), then delay T
until the transaction T’ that wrote or read X has terminated (committed or aborted).
Thomas’s write rule
A modification to the basic timestamp ordering protocol that relaxes conflict serializability can
be used to provide greater concurrency by rejecting obsolete write operations. The extension,

9
Advanced Database Systems Handout

known as Thomas’s write rule, modifies the checks for a write operation by transaction T as
follows:
a) Transaction T asks to write an item (x) whose value has already been read by a
younger transaction, that is ts(T) < read_timestamp(x). As before, roll back
transaction T and restart it using a later timestamp.
b) Transaction T asks to write an item (x) whose value has already been written by a
younger transaction, that is ts(T) < write_timestamp(x). This means that a later
transaction has already updated the value of the item, and the value that the older
transaction is writing must be based on an obsolete value of the item. In this case,
the write operation can safely be ignored.
c) Otherwise, as before, the write operation can proceed. We set write_timestamp(x)
= ts(T).
Multiversion Timestamp Ordering
Versioning of data can also be used to increase concurrency, since different users may work
concurrently on different versions of the same object instead of having to wait for each other’s
transactions to complete. In the event that the work appears faulty at any stage, it should be
possible to roll back the work to some valid state.
The basic timestamp ordering protocol assumes that only one version of a data item exists, and
so only one transaction can access a data item at a time. This restriction can be relaxed if we
allow multiple transactions to read and write different versions of the same data item, and
ensure that each transaction sees a consistent set of versions for all the data items it accesses.
In multiversion concurrency control, each write operation creates a new version of a data item
while retaining the old version. When a transaction attempts to read a data item, the system
selects one of the versions that ensures serializability.
For each data item x, we assume that the database holds n versions x1, x2, . . ., xn. For each
version i, the system stores three values:
• the value of version xi;
• read_timestamp(xi), which is the largest timestamp of all transactions that have
successfully read version xi;
• write_timestamp(xi), which is the timestamp of the transaction that created version xi.
Let ts(T) be the timestamp of the current transaction. The multiversion timestamp ordering
protocol uses the following two rules to ensure serializability:

10
Advanced Database Systems Handout

1. Transaction T issues a write(x): If transaction T wishes to write data item x, we must


ensure that the data item has not been read already by some other transaction Tj such
that ts(T) < ts(Tj). If we allow transaction T to perform this write operation, then for
serializability its change should be seen by Tj but clearly Tj, which has already read the
value, will not see T’s change. Thus, if version xj has the largest write timestamp of
data item x that is less than or equal to ts(T) (that is, write_timestamp(xj) ≤ ts(T)) and
read_timestamp(xj) > ts(T), transaction T must be aborted and restarted with a new
timestamp. Otherwise, we create a new version xi of x and set read_timestamp(xi) =
write_timestamp(xi) = ts(T).
2. Transaction T issues a read(x): If transaction T wishes to read data item x, we must
return the version xj that has the largest write timestamp of data item x that is less than
or equal to ts(T). In other words, return write_timestamp(xj) such that
write_timestamp(xj) ≤ ts(T). Set the value of read_timestamp(xj) = max(ts(T),
read_timestamp(xj)). Note that with this protocol a read operation never fails.
Versions can be deleted once they are no longer required. To determine whether a version is
required, we find the timestamp of the oldest transaction in the system. Then, for any two
versions xi and xj of data item x with write timestamps less than this oldest timestamp, we can
delete the older version.
Optimistic Techniques
In some environments, conflicts between transactions are rare, and the additional processing
required by locking or timestamping protocols is unnecessary for many of the transactions.
Optimistic techniques are based on the assumption that conflict is rare, and that it is more
efficient to allow transactions to proceed without imposing delays to ensure serializability.
When a transaction wishes to commit, a check is performed to determine whether conflict has
occurred. If there has been a conflict, the transaction must be rolled back and restarted. Since
the premise is that conflict occurs very infrequently, rollback will be rare.
There are two or three phases to an optimistic concurrency control protocol, depending on
whether it is a read-only or an update transaction:
• Read phase: This extends from the start of the transaction until immediately before the
commit. The transaction reads the values of all data items it needs from the database
and stores them in local variables. Updates are applied to a local copy of the data, not
to the database itself.

11
Advanced Database Systems Handout

• Validation phase: This follows the read phase. Checks are performed to ensure
serializability is not violated if the transaction updates are applied to the database. For
a read-only transaction, this consists of checking that the data values read are still the
current values for the corresponding data items. If no interference occurred, the
transaction is committed. If interference occurred, the transaction is aborted and
restarted. For a transaction that has updates, validation consists of determining whether
the current transaction leaves the database in a consistent state, with serializability
maintained. If not, the transaction is aborted and restarted.
• Write phase: This follows the successful validation phase for update transactions.
During this phase, the updates made to the local copy are applied to the database.
The validation phase examines the reads and writes of transactions that may cause interference.
Each transaction T is assigned a timestamp at the start of its execution, start (T), one at the start
of its validation phase, validation(T), and one at its finish time, finish(T), including its write
phase, if any. To pass the validation test, one of the following must be true:
1. All transactions S with earlier timestamps must have finished before transaction T
started; that is, finish(S) < start (T).
2. If transaction T starts before an earlier one S finishes, then:
a. the set of data items written by the earlier transaction are not the ones read by
the current transaction; and
b. the earlier transaction completes its write phase before the current transaction
enters its validation phase, that is start (T) < finish(S) < validation(T).
Although optimistic techniques are very efficient when there are few conflicts, they can result
in the rollback of individual transactions. Note that the rollback involves only a local copy of
the data so there are no cascading rollbacks, since the writes have not actually reached the
database. However, if the aborted transaction is of a long duration, valuable processing time
will be lost since the transaction must be restarted. If rollback occurs often, it is an indication
that the optimistic method is a poor choice for concurrency control in that particular
environment.
Granularity of Data Items
Granularity is the size of data items chosen as the unit of protection by a concurrency control
protocol. A lockable unit of data defines its granularity. Granularity can be coarse (entire
database) or it can be fine (a tuple or an attribute of a relation). Data item granularity

12
Advanced Database Systems Handout

significantly affects concurrency control performance. Thus, the degree of concurrency is low
for coarse granularity and high for fine granularity. Example of data item granularity:
1. A field of a database record (an attribute of a tuple)
2. A database record (a tuple or a relation)
3. A disk block
4. An entire file
5. The entire database
The size or granularity of the data item that can be locked in a single operation has a significant
effect on the overall performance of the concurrency control algorithm. However, there are
several tradeoffs that have to be considered in choosing the data item size. Typically, a data
item is chosen to be between coarse to fine, where fine granularity refers to small item sizes
and coarse granularity refers to large item sizes.
However, escalating the granularity from field or record to file may increase the likelihood of
deadlock occurring. Thus, the coarser the data item size, the lower the degree of concurrency
permitted. On the other hand, the finer the item size, the more locking information that needs
to be stored. The best item size depends upon the nature of the transactions. The following
diagram illustrates a hierarchy of granularity from coarse (database) to fine (record).
DB

f1 f2

p11 p12 ... p1n p11 p12 ... p1n

r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j r111 ... r11j

Hierarchy of granularity
We could represent the granularity of locks in a hierarchical structure where each node
represents data items of different sizes. Here, the root node represents the entire database, the
level 1 nodes represent files, the level 2 nodes represent pages, the level 3 nodes represent
records, and the level 4 leaves represent individual fields. Whenever a node is locked, all its
descendants are also locked.

13
Advanced Database Systems Handout

Figure: Levels of locking

• For example, if a transaction locks a page, Page2, all its records (Record1 and Record2)
as well as all their fields (Field1 and Field2) are also locked. If another transaction
requests an incompatible lock on the same node, the DBMS clearly knows that the lock
cannot be granted.
• If another transaction requests a lock on any of the descendants of the locked node, the
DBMS checks the hierarchical path from the root to the requested node to determine if
any of its ancestors are locked before deciding whether to grant the lock. Thus, if the
request is for an exclusive lock on record Record1, the DBMS checks its parent (Page2),
its grandparent (File2), and the database itself to determine if any of them are locked.
When it finds that Page2 is already locked, it denies the request.
• Additionally, a transaction may request a lock on a node and a descendant of the node
is already locked. For example, if a lock is requested on File2, the DBMS checks every
page in the file, every record in those pages, and every field in those records to
determine if any of them are locked.
Multiple-granularity locking
To reduce the searching involved in locating locks on descendants, the DBMS can use another
specialized locking strategy called multiple-granularity locking. This strategy uses a new
type of lock called an intention lock. When any node is locked, an intention lock is placed on
all the ancestors of the node. Thus, if some descendant of File2 (in our example, Page2) is
locked and a request is made for a lock on File2, the presence of an intention lock on File2
indicates that some descendant of that node is already locked.
14
Advanced Database Systems Handout

Intention locks may be either Shared (read) or eXclusive (write). An intention shared (IS)
lock conflicts only with an exclusive lock; an intention exclusive (IX) lock conflicts with both
a shared and an exclusive lock.
In addition, a transaction can hold a shared and intention exclusive (SIX) lock that is logically
equivalent to holding both a shared and an IX lock. A SIX lock conflicts with any lock that
conflicts with either a shared or IX lock; in other words, a SIX lock is compatible only with
an IS lock. The lock compatibility table for multiple-granularity locking is shown in the next
table.

Figure: Lock compatibility table for multiple granularity locking

To ensure serializability with locking levels, a two-phase locking protocol is used as follows:
• No lock can be granted once any node has been unlocked.
• No node may be locked until its parent is locked by an intention lock.
• No node may be unlocked until all its descendants are unlocked.
In this way, locks are applied from the root down using intention locks until the node requiring
an actual read or exclusive lock is reached, and locks are released from the bottom up.
However, deadlock is still possible and must be handled as discussed previously.
Using Locks for Concurrency Control in Indexes
• Two-phase locking can also be applied to indexes, where the nodes of an index correspond
to disk pages.
• However, holding locks on index pages until the shrinking phase of 2PL could cause an
undue amount of transaction blocking because searching an index always starts at the root.
• Therefore, if a transaction wants to insert a record (write operation), the root would be
locked in exclusive mode, so all other conflicting lock requests for the index must wait until
the transaction enters its shrinking phase.
• This blocks all other transactions from accessing the index, so in practice other approaches
to locking an index must be used.

15
Advanced Database Systems Handout

Chapter Five
Database Recovery Techniques
Database recovery is the process of restoring the database to a correct state in the event of a
failure. To preserve transaction properties (Atomicity, Consistency, Isolation and Durability).
The Need for Recovery
The storage of data generally includes four different types of media with an increasing degree
of reliability: main memory, magnetic disk, magnetic tape, and optical disk. Main memory is
volatile storage that usually does not survive system crashes. Magnetic disks provide online
non-volatile storage. Compared with main memory, disks are more reliable and much cheaper,
but slower by three to four orders of magnitude.
Magnetic tape is an offline non-volatile storage medium, which is far more reliable than disk
and fairly inexpensive, but slower, providing only sequential access. Optical disk is more
reliable than tape, generally cheaper, faster, and providing random access. Main memory is
also referred to as primary storage and disks and tape as secondary storage. Stable storage
represents information that has been replicated in several non-volatile storage media (usually
disk) with independent failure modes.
There are many different types of failure that can affect database processing, each of which has
to be dealt with in a different manner. Some failures affect main memory only, while others
involve non-volatile (secondary) storage. Among the causes of failure are:
• system crashes due to hardware or software errors, resulting in loss of main memory;
• media failures, such as head crashes or unreadable media, resulting in the loss of partsn
of secondary storage;
• application software errors, such as logical errors in the program that is accessing the
database, which cause one or more transactions to fail;
• natural physical disasters, such as fires, floods, earthquakes, or power failures;
• carelessness or unintentional destruction of data or facilities by operators or users;
• sabotage, or intentional corruption or destruction of data, hardware, or software
facilities.
Transactions and Recovery
Transactions represent the basic unit of recovery in a database system. It is the role of the
recovery manager to guarantee two of the four ACID properties of transactions, namely
atomicity and durability, in the presence of failures. The recovery manager has to ensure that,

16
Advanced Database Systems Handout

on recovery from failure, either all the effects of a given transaction are permanently recorded
in the database or none of them are.
Types of Failure
The database may become unavailable for use due to:
• Transaction failure: Transactions may fail because of incorrect input, deadlock,
incorrect synchronization.
• System failure: System may fail because of addressing error, application error,
operating system fault, RAM failure, etc.
• Media failure: Disk head crash, power disruption, etc.

Recovery Facilities
A DBMS should provide the following facilities to assist with recovery:
• a backup mechanism, which makes periodic backup copies of the database;
• logging facilities, which keep track of the current state of transactions and database
changes;
• a checkpoint facility, which enables updates to the database that are in progress to be
made permanent;
• a recovery manager, which allows the system to restore the database to a consistent
state following a failure.

Backup mechanism
The DBMS should provide a mechanism to allow backup copies of the database and the log
file to be made at regular intervals without necessarily having to stop the system first. The
backup copy of the database can be used in the event that the database has been damaged or
destroyed. A backup can be a complete copy of the entire database or an incremental backup,
consisting only of modifications made since the last complete or incremental backup.
Typically, the backup is stored on offline storage, such as magnetic tape.

Log file
To keep track of database transactions, the DBMS maintains a special file called a log (or
journal) that contains information about all updates to the database. The log may contain the
following data:
• Transaction records, containing:
o transaction identifier;

17
Advanced Database Systems Handout

o type of log record (transaction start, insert, update, delete, abort, commit);
o identifier of data item affected by the database action (insert, delete, and
update operations);
o before-image of the data item, that is, its value before change (update and
delete operations only);
o after-image of the data item, that is, its value after change (insert and update
operations only);
o log management information, such as a pointer to previous and next log
records for that transaction (all operations).
• Checkpoint records

Data Update
• Immediate Update: As soon as a data item is modified in cache, the disk copy is
updated.
• Deferred Update: All modified data items in the cache is written either after a
transaction ends its execution or after a fixed number of transactions have completed
their execution.
• Shadow update: The modified version of a data item does not overwrite its disk copy
but is written at a separate disk location.
• In-place update: The disk version of the data item is overwritten by the cache version.

Data Caching
Data items to be modified are first stored into database cache by the Cache Manager (CM)
and after modification they are flushed (written) to the disk. The flushing is controlled by
Modified and Pin-Unpin bits.
• Pin-Unpin: Instructs the operating system not to flush the data item.
• Modified: Indicates the AFIM of the data item.

Transaction Roll-back (Undo) and Roll-Forward (Redo)


To maintain atomicity, a transaction’s operations are redone or undone.
• Undo: Restore all BFIMs on to disk (Remove all AFIMs).
• Redo: Restore all AFIMs on to disk.
Database recovery is achieved either by performing only Undos or only Redos or by a
combination of the two. These operations are recorded in the log as they happen.

18
Advanced Database Systems Handout

Write-Ahead Logging
When in-place update (immediate or deferred) is used then log is necessary for recovery and
it must be available to recovery manager. This is achieved by Write-Ahead Logging (WAL)
protocol. WAL states that:
• For Undo: Before a data item’s AFIM is flushed to the database disk (overwriting the
BFIM) its BFIM must be written to the log and the log must be saved on a stable store
(log disk).
• For Redo: Before a transaction executes its commit operation, all its AFIMs must be
written to the log and the log must be saved on a stable store.

Checkpointing
A checkpoint is the point of synchronization between the database and the transaction log file.
All buffers are force-written to secondary storage. The information in the log file is used to
recover from a database failure. One difficulty with this scheme is that when a failure occurs,
we may not know how far back in the log to search and we may end up redoing transactions
that have been safely written to the database. To limit the amount of searching and subsequent
processing that we need to carry out on the log file, we can use a technique called
checkpointing. Checkpoints are scheduled at predetermined intervals and involve the
following operations:
• writing all log records in main memory to secondary storage;
• writing the modified blocks in the database buffers to secondary storage;
• writing a checkpoint record to the log file. This record contains the identifiers of all
transactions that are active at the time of the checkpoint.
If transactions are performed serially, then, when a failure occurs, we check the log file to find
the last transaction that started before the last checkpoint. Any earlier transactions would have
committed previously and would have been written to the database at the checkpoint.
Therefore, we need only redo the one that was active at the checkpoint and any subsequent
transactions for which both start and commit records appear in the log. If a transaction is active
at the time of failure, the transaction must be undone. If transactions are performed
concurrently, we redo all transactions that have committed since the checkpoint and undo all
transactions that were active at the time of the crash.
Generally, checkpointing is a relatively inexpensive operation, and it is often possible to take
three or four checkpoints an hour. In this way, no more than 15–20 minutes of work will need
to be recovered.

19
Advanced Database Systems Handout

Possible ways for flushing database cache to database disk


The following terminology is used in database recovery when pages are written back to disk:
• steal policy allows the buffer manager to write a buffer to disk before a transaction
commits (the buffer is unpinned). In other words, the buffer manages ‘steals’ a page
from the transaction. The alternative policy is no-steal.
• A force policy ensures that all pages updated by a transaction are immediately written
to disk when the transaction commits. The alternative policy is no-force.
These give rise to four different ways for handling recovery:
• Steal/No-Force (Undo/Redo)
• Steal/Force (Undo/No-redo)
• No-Steal/No-Force (Redo/No-undo)
• No-Steal/Force (No-undo/No-redo)
The simplest approach from an implementation perspective is to use a no-steal, force policy:
with no-steal we do not have to undo changes of an aborted transaction because the changes
will not have been written to disk, and with force we do not have to redo the changes of a
committed transaction if there is a subsequent crash because all the changes will have been
written to disk at commit. he deferred update recovery protocol uses a no-steal policy.
On the other hand, the steal policy avoids the need for a very large buffer space to store all
updated pages by a set of concurrent transactions, which in practice may be unrealistic anyway.
In addition, the no-force policy has the distinct advantage of not having to rewrite a page to
disk for a later transaction that has been updated by an earlier committed transaction and may
still be in a database buffer. For these reasons, most DBMSs employ a steal, no-force policy.

Recovery Techniques
The particular recovery procedure to be used is dependent on the extent of the damage that has
occurred to the database. We consider two cases:
1. If the database has been extensively damaged, for example a disk head crash has
occurred and destroyed the database, then it is necessary to restore the last backup copy
of the database and reapply the update operations of committed transactions using the
log file. This assumes, of course, that the log file has not been damaged as well.
2. If the database has not been physically damaged but has become inconsistent, for
example the system crashed while transactions were executing, then it is necessary to
undo the changes that caused the inconsistency. It may also be necessary to redo some
transactions to ensure that the updates they performed have reached secondary storage.

20
Advanced Database Systems Handout

Here, we do not need to use the backup copy of the database but can restore the database
to a consistent state using the before- and after-images held in the log file.
There are two techniques for recovery from the second situation, that is, the case where the
database has not been destroyed but is in an inconsistent state. The techniques, known as
deferred update and immediate update, differ in the way that updates are written to
secondary storage. There is also an alternative technique called shadow paging.

Recovery Schemes
Recovery techniques using deferred update
Using the deferred update recovery protocol, updates are not written to the database until after
a transaction has reached its commit point. If a transaction fails before it reaches this point, it
will not have modified the database and so no undoing of changes will be necessary. However,
it may be necessary to redo the updates of committed transactions as their effect may not have
reached the database. In this case, we use the log file to protect against system failures in the
following way:
• When a transaction starts, write a transaction start record to the log.
• When any write operation is performed, write a log record containing all the log data
specified previously (excluding the before-image of the update). Do not actually write
the update to the database buffers or the database itself.
• When a transaction is about to commit, write a transaction commit log record, write all
the log records for the transaction to disk, and then commit the transaction. Use the log
records to perform the actual updates to the database.
• If a transaction aborts, ignore the log records for the transaction and do not perform the
writes.
Note that we write the log records to disk before the transaction is actually committed, so that
if a system failure occurs while the actual database updates are in progress, the log records will
survive and the updates can be applied later. In the event of a failure, we examine the log to
identify the transactions that were in progress at the time of failure. Starting at the last entry in
the log file, we go back to the most recent checkpoint record:
• Any transaction with transaction start and transaction commit log records should be
redone. The redo procedure performs all the writes to the database using the afterimage
log records for the transactions, in the order in which they were written to the log. If
this writing has been performed already, before the failure, the write has no effect on
the data item, so there is no damage done if we write the data again (that is, the operation

21
Advanced Database Systems Handout

is idempotent). However, this method guarantees that we will update any data item
that was not properly updated prior to the failure.
• For any transactions with transaction start and transaction abort log records, we do
nothing since no actual writing was done to the database, so these transactions do not
have to be undone.
If a second system crash occurs during recovery, the log records are used again to restore the
database. With the form of the write log records, it does not matter how many times we redo
the writes.

Deferred Update in a single-user system


There is no concurrent data sharing in a single user system. The data update goes as follows:
• A set of transactions records their updates in the log.
• At commit point under WAL scheme these updates are saved on database disk.
After reboot from a failure the log is used to redo all the transactions affected by this failure.
No undo is required because no AFIM is flushed to the disk before a transaction commits.

Deferred Update with concurrent users


This environment requires some concurrency control mechanism to guarantee isolation
property of transactions. In a system recovery transactions which were recorded in the log after
the last checkpoint were redone. The recovery manager may scan some of the transactions
recorded before the checkpoint to get the AFIMs.
Two tables are required for implementing this protocol:
• Active table: All active transactions are entered in this table.
• Commit table: Transactions to be committed are entered in this table.
Recovery techniques using immediate update (Undo/No-redo Algorithm)
Using the immediate update recovery protocol, updates are applied to the database as they
occur without waiting to reach the commit point. As well as having to redo the updates of
committed transactions following a failure, it may now be necessary to undo the effects of
transactions that had not committed at the time of failure. In this case, we use the log file to
protect against system failures in the following way:
• When a transaction starts, write a transaction start record to the log.
• When a write operation is performed, write a record containing the necessary data
to the log file.
• Once the log record is written, write the update to the database buffers.

22
Advanced Database Systems Handout

• The updates to the database itself are written when the buffers are next flushed to
secondary storage.
• When the transaction commits, write a transaction commit record to the log.
It is essential that log records (or at least certain parts of them) are written before the
corresponding write to the database. This is known as the write-ahead log protocol. If updates
were made to the database first, and failure occurred before the log record was written, then
the recovery manager would have no way of undoing (or redoing) the operation. Under the
write-ahead log protocol, the recovery manager can safely assume that, if there is no transaction
commit record in the log file for a particular transaction then that transaction was still active at
the time of failure and must therefore be undone.
If a transaction aborts, the log can be used to undo it since it contains all the old values for the
updated fields. As a transaction may have performed several changes to an item, the writes are
undone in reverse order. Regardless of whether the transaction’s writes have been applied to
the database itself, writing the before-images guarantees that the database is restored to its state
prior to the start of the transaction. If the system fails, recovery involves using the log to undo
or redo transactions:
• For any transaction for which both a transaction start and transaction commit record
appear in the log, we redo using the log records to write the after-image of updated
fields, as described above. Note that if the new values have already been written to the
database, these writes, although unnecessary, will have no effect. However, any write
that did not actually reach the database will now be performed.
• For any transaction for which the log contains a transaction start record but not a
transaction commit record, we need to undo that transaction. This time the log records
are used to write the before-image of the affected fields, and thus restore the database
to its state prior to the transaction’s start. The undo operations are performed in the
reverse order to which they were written to the log.
Undo/Redo Algorithm (Single-user environment)
• Recovery schemes of this category apply undo and also redo for recovery.
• In a single-user environment no concurrency control is required but a log is maintained
under WAL. Note that at any time there will be one transaction in the system and it will
be either in the commit table or in the active table. The recovery manager performs:
– Undo of a transaction if it is in the active table.
– Redo of a transaction if it is in the commit table.

23
Advanced Database Systems Handout

Undo/Redo Algorithm (Concurrent execution)


• Recovery schemes of this category applies undo and also redo to recover the database
from failure.
• In concurrent execution environment a concurrency control is required and log is
maintained under WAL.
• Commit table records transactions to be committed and active table records active
transactions. To minimize the work of the recovery manager checkpointing is used.
The recovery performs:
– Undo of a transaction if it is in the active table.
– Redo of a transaction if it is in the commit table.
Shadow paging
An alternative to the log-based recovery schemes described above is shadow paging. This
scheme maintains two-page tables during the life of a transaction: a current page table and a
shadow page table. When the transaction starts, the two-page tables are the same. The shadow
page table is never changed thereafter, and is used to restore the database in the event of a
system failure. During the transaction, the current page table is used to record all updates to the
database. When the transaction completes, the current page table becomes the shadow page
table.
Shadow paging has several advantages over the log-based schemes: the overhead of
maintaining the log file is eliminated, and recovery is significantly faster since there is no need
for undo or redo operations. However, it has disadvantages as well, such as data fragmentation
and the need for periodic garbage collection to reclaim inaccessible blocks.

The ARIES Recovery Algorithm


The ARIES Recovery Algorithm is based on:
• WAL (Write Ahead Logging)
• Repeating history during redo:
o ARIES will retrace all actions of the database system prior to the crash to
reconstruct the database state when the crash occurred.
• Logging changes during undo:
o It will prevent ARIES from repeating the completed undo operations if a failure
occurs during recovery, which causes a restart of the recovery process.
The ARIES recovery algorithm consists of three steps:

24
Advanced Database Systems Handout

1. Analysis: step identifies the dirty (updated) pages in the buffer and the set of
transactions active at the time of crash. The appropriate point in the log where
redo is to start is also determined.
2. Redo: necessary redo operations are applied.
3. Undo: log is scanned backwards and the operations of transactions active at
the time of crash are undone in reverse order.
The Log and Log Sequence Number (LSN)
• A log record is written for:
(a) data update
(b) transaction commit
(c) transaction abort
(d) undo
(e) transaction end
• In the case of undo a compensating log record is written.
• A unique LSN is associated with every log record.
o LSN increases monotonically and indicates the disk address of the log record it
is associated with.
o In addition, each data page stores the LSN of the latest log record corresponding
to a change for that page.
• A log record stores
(a) the previous LSN of that transaction
(b) the transaction ID
(c) the type of log record.
• For a write operation the following additional information is logged:
1. Page ID for the page that includes the item
2. Length of the updated item
3. Its offset from the beginning of the page
4. BFIM of the item
5. AFIM of the item
The Transaction table and the Dirty Page table
• For efficient recovery following tables are also stored in the log during checkpointing:

25
Advanced Database Systems Handout

o Transaction table: Contains an entry for each active transaction, with


information such as transaction ID, transaction status and the LSN of the
most recent log record for the transaction.
o Dirty Page table: Contains an entry for each dirty page in the buffer, which
includes the page ID and the LSN corresponding to the earliest update to that
page.
The following steps are performed for recovery:
– Analysis phase: Start at the begin_checkpoint record and proceed to the
end_checkpoint record. Access transaction table and dirty page table are appended to
the end of the log. Note that during this phase some other log records may be written
to the log and transaction table may be modified. The analysis phase compiles the set
of redo and undo to be performed and ends.
– Redo phase: Starts from the point in the log up to where all dirty pages have been
flushed, and move forward to the end of the log. Any change that appears in the dirty
page table is redone.
– Undo phase: Starts from the end of the log and proceeds backward while performing
appropriate undo. For each undo it writes a compensating record in the log.
The recovery completes at the end of undo phase.

Recovery in Multidatabase system


• A multidatabase system is a special distributed database system where one node may
be running relational database system under UNIX, another may be running object-
oriented system under Windows and so on.
• A transaction may run in a distributed fashion at multiple nodes.
• In this execution scenario the transaction commits only when all these multiple nodes
agree to commit individually the part of the transaction they were executing.
• This commit scheme is referred to as “two-phase commit” (2PC).
– If any one of these nodes fails or cannot commit the part of the transaction, then
the transaction is aborted.
– Each node recovers the transaction under its own recovery protocol.

26
Advanced Database Systems Handout

Chapter Six
Database Security and Authorization
Introduction to DB Security Issues
Database security involve the mechanisms that protect the database against intentional or
accidental threats. Security considerations apply not only to the data held in a database:
breaches of security may affect other parts of the system, which may in turn affect the database.
Consequently, database security encompasses hardware, software, people, and data. To
effectively implement security requires appropriate controls, which are defined in specific
mission objectives for the system.
A database represents an essential corporate resource that should be properly secured using
appropriate controls. We consider database security in relation to the following situations:
• theft and fraud; affect not only the database environment but also the entire
organization.
• loss of confidentiality (secrecy); refers to the need to maintain secrecy over data,
usually only that which is critical to the organization:
• loss of privacy; refers to the need to protect data about individuals.
• loss of integrity; results in invalid or corrupted data, which may seriously affect the
operation of an organization.
• loss of availability: means that the data, or the system, or both cannot be accessed,
which can seriously affect an organization’s financial performance. In some cases,
events that cause a system to be unavailable may also cause data corruption.
Threats
A threat is Any situation or event, whether intentional or accidental, that may adversely affect
a system and consequently the organization. A threat may be caused by a situation or event
involving a person, action, or circumstance that is likely to bring harm to an organization. The
harm may be tangible, such as loss of:
• hardware,
• software,
• or data,
• or intangible, such as loss of credibility or client confidence.
The extent that an organization suffers as a result of a threat’s succeeding depends upon a
number of factors, such as the existence of countermeasures and contingency plans. For
example, if a hardware failure occurs corrupting secondary storage, all processing activity must

27
Advanced Database Systems Handout

cease until the problem is resolved. The recovery will depend upon a number of factors, which
include when the last backups were taken and the time needed to restore the system. An
organization needs to identify the types of threat it may be subjected to and initiate appropriate
plans and countermeasures, bearing in mind the costs of implementing them.

Countermeasures
The types of countermeasure to threats on computer systems range from physical controls to
administrative procedures. To protect databases against these types of threats four kinds of
countermeasures can be implemented:
• Access control
• Inference control
• Flow control
• Encryption
A DBMS typically includes a database security and authorization subsystem that is responsible
for ensuring the security portions of a database against unauthorized access. Two types of
database security mechanisms:
• Discretionary security mechanisms
• Mandatory security mechanisms
Access Controls
The typical way to provide access controls for a database system is based on the granting and
revoking of privileges.
• A privilege allows a user to create or access (that is read, write, or modify) some
database object (such as a relation, view, or index) or to run certain DBMS utilities.
• As excessive granting of unnecessary privileges can compromise security: a privilege
should only be granted to a user if that user cannot accomplish his or her work without
that privilege.
• A user who creates a database object such as a relation or a view automatically gets all
privileges on that object.
• The DBMS subsequently keeps track of how these privileges are granted to other users,
and possibly revoked, and ensures that at all times only users with necessary privileges
can access an object.
Inference control
The security problem associated with databases is that of controlling the access to a statistical
database, which is used to provide statistical information or summaries of values based on

28
Advanced Database Systems Handout

various criteria. A statistical database is a database used for statistical analysis purposes. It is
an OLAP (online analytical processing). The counter measures to statistical database security
problem is called inference control measures.
Flow control
Another security is that of flow control, which prevents information from flowing in such a
way that it reaches unauthorized users.
• Channels that are pathways for information to flow implicitly in ways that violate the
security policy of an organization are called covert channels.
Data encryption
A final security issue is data encryption, which is used to protect sensitive data (such as credit
card numbers) that is being transmitted via some type communication network. Encryption
involves the encoding of the data by a special algorithm that renders the data unreadable by
any program without the decryption key.
• The data is encoded using some encoding algorithm.
• An unauthorized user who access encoded data will have difficulty deciphering it, but
authorized users are given decoding or decrypting algorithms (or keys) to decipher data.
Encryption also protects data transmitted over communication lines. There are a number of
techniques for encoding data to conceal the information; some are termed ‘irreversible’ and
others ‘reversible’. Irreversible techniques, as the name implies, do not permit the original
data to be known. However, the data can be used to obtain valid statistical information.
Reversible techniques are more commonly used. To transmit data securely over insecure
networks requires the use of a cryptosystem, which includes:
• an encryption key to encrypt the data (plaintext);
• an encryption algorithm that, with the encryption key, transforms the plaintext into
ciphertext;
• a decryption key to decrypt the ciphertext;
• a decryption algorithm that, with the decryption key, transforms the ciphertext back
into plaintext.
Authorization
Authorization: The granting of a right or privilege that enables a subject to have legitimate
access to a system or a system’s object. Authorization controls can be built into the software,
and govern not only what system or object a specified user can access, but also what the user

29
Advanced Database Systems Handout

may do with it. The process of authorization involves authentication of subjects requesting
access to objects.
Authentication: A mechanism that determines whether a user is who he or she claims
to be.
Database Security and the DBA
A system administrator is usually responsible for allowing users to have access to a computer
system by creating individual user accounts. A system administrator is usually responsible for
allowing users to have access to a computer system by creating individual user accounts.
• Each user is given a unique identifier, which is used by the operating system to
determine who they are.
• Associated with each identifier is a password, chosen by the user and known to the
operating system
The responsibility to authorize use of the DBMS usually rests with the Database Administrator
(DBA), who must also set up individual user accounts and passwords using the DBMS itself.
The DBA’s responsibilities include:
• granting privileges to users who need to use the system
• classifying users and data in accordance with the policy of the organization
The DBA has a DBA account in the DBMS. Sometimes these are called a system or superuser
account. These accounts provide powerful capabilities such as:
1. Account creation
2. Privilege granting
3. Privilege revocation
4. Security level assignment

Access Protection, User Accounts, and Database Audits


Whenever a person or group of persons need to access a database system, the individual or
group must first apply for a user account.
• The DBA will then create a new account id and password for the user if he/she deems
there is a legitimate need to access the database
• The user must log in to the DBMS by entering account id and password whenever
database access is needed.
The user must log in to the DBMS by entering account id and password whenever database
access is needed:

30
Advanced Database Systems Handout

• To keep a record of all updates applied to the database and of the particular user who
applied each update, we can modify system log, which includes an entry for each
operation applied to the database that may be required for recovery from a transaction
failure or system crash.
• If any tampering with the database is suspected, a database audit is performed
o A database audit consists of reviewing the log to examine all accesses and
operations applied to the database during a certain time period.
• A database log that is used mainly for security purposes is sometimes called an audit
trail.

Discretionary Access Control (DAC)


Most commercial DBMSs provide an approach to managing privileges that uses SQL called
Discretionary Access Control (DAC). The SQL standard supports DAC through the GRANT
and REVOKE commands. The GRANT command gives privileges to users, and the REVOKE
command takes away privileges.
Types of Discretionary Privileges
• The account level:
o At this level, the DBA specifies the particular privileges that each account holds
independently of the relations in the database.
• The relation level (or table level):
o At this level, the DBA can control the privilege to access each individual
relation or view in the database.
The privileges at the account level apply to the capabilities provided to the account itself and
can include:
– the CREATE SCHEMA or CREATE TABLE privilege, to create a schema or base
relation;
– the CREATE VIEW privilege;
– the ALTER privilege, to apply schema changes such adding or removing attributes
from relations;
– the DROP privilege, to delete relations or views;
– the MODIFY privilege, to insert, delete, or update tuples;
– and the SELECT privilege, to retrieve information from the database by using a
SELECT query.

31
Advanced Database Systems Handout

The second level of privileges applies to the relation level This includes base relations and
virtual (view) relations. The granting and revoking of privileges generally follow an
authorization model for discretionary privileges known as the access matrix model where:
– The rows of a matrix M represents subjects (users, accounts, programs)
– The columns represent objects (relations, records, columns, views, operations).
– Each position M(i,j) in the matrix represents the types of privileges (read, write, update)
that subject i holds on object j.
To control the granting and revoking of relation privileges, each relation R in a database is
assigned an owner account, which is typically the account that was used when the relation was
created in the first place.
– The owner of a relation is given all privileges on that relation.
– In SQL2, the DBA can assign and owner to a whole schema by creating the schema and
associating the appropriate authorization identifier with that schema, using the
CREATE SCHEMA command.
– The owner account holder can pass privileges on any of the owned relation to other
users by granting privileges to their accounts.
In SQL the following types of privileges can be granted on each individual relation R:
– SELECT (retrieval or read) privilege on R:
– Gives the account retrieval privilege.
– In SQL this gives the account the privilege to use the SELECT statement to
retrieve tuples from R.
– MODIFY privileges on R:
– This gives the account the capability to modify tuples of R.
– In SQL this privilege is further divided into UPDATE, DELETE, and INSERT
privileges to apply the corresponding SQL command to R.
– In addition, both the INSERT and UPDATE privileges can specify that only
certain attributes can be updated by the account.
In SQL the following types of privileges can be granted on each individual relation R (contd.):
– REFERENCES privilege on R:
– This gives the account the capability to reference relation R when specifying
integrity constraints.
– The privilege can also be restricted to specific attributes of R.
Notice that to create a view, the account must have SELECT privilege on all relations involved
in the view definition.
32
Advanced Database Systems Handout

Specifying Privileges Using Views


The mechanism of views is an important discretionary authorization mechanism in its own
right. For example,
• If the owner A of a relation R wants another account B to be able to retrieve only some
fields of R, then A can create a view V of R that includes only those attributes and then
grant SELECT on V to B.
• The same applies to limiting B to retrieving only certain tuples of R; a view V’ can be
created by defining the view by means of a query that selects only those tuples from R
that A wants to allow B to access.

Revoking Privileges
In some cases it is desirable to grant a privilege to a user temporarily. For example,
• The owner of a relation may want to grant the SELECT privilege to a user for a specific
task and then revoke that privilege once the task is completed.
• Hence, a mechanism for revoking privileges is needed. In SQL, a REVOKE command
is included for the purpose of canceling privileges.
Propagation of Privileges using the GRANT OPTION
Whenever the owner A of a relation R grants a privilege on R to another account B, privilege
can be given to B with or without the GRANT OPTION.
If the GRANT OPTION is given, this means that B can also grant that privilege on R to other
accounts.
• Suppose that B is given the GRANT OPTION by A and that B then grants the privilege
on R to a third account C, also with GRANT OPTION. In this way, privileges on R
can propagate to other accounts without the knowledge of the owner of R.
• If the owner account A now revokes the privilege granted to B, all the privileges that B
propagated based on that privilege should automatically be revoked by the system.
Example:
• Suppose that the DBA creates four accounts
– A1, A2, A3, A4
• and wants only A1 to be able to create base relations. Then the DBA must issue the
following GRANT command in SQL
GRANT CREATETAB TO A1;

Example:

33
Advanced Database Systems Handout

• Suppose that A1 wants to allow A3 to retrieve information from either of the two tables
and also to be able to propagate the SELECT privilege to other accounts.
• A1 can issue the command:
GRANT SELECT ON EMPLOYEE, DEPARTMENT
TO A3 WITH GRANT OPTION;
• A3 can grant the SELECT privilege on the EMPLOYEE relation to A4 by issuing:
GRANT SELECT ON EMPLOYEE TO A4;
– Notice that A4 can’t propagate the SELECT privilege because GRANT
OPTION was not given to A4
Example:
• Suppose that A1 decides to revoke the SELECT privilege on the EMPLOYEE relation
from A3; A1 can issue:
REVOKE SELECT ON EMPLOYEE FROM A3;
• The DBMS must now automatically revoke the SELECT privilege on EMPLOYEE
from A4, too, because A3 granted that privilege to A4 and A3 does not have the
privilege any more.
Example:
• Suppose that A1 wants to give back to A3 a limited capability to SELECT from the
EMPLOYEE relation and wants to allow A3 to be able to propagate the privilege.
• The limitation is to retrieve only the NAME, BDATE, and ADDRESS
attributes and only for the tuples with DNO=5.
• A1 then create the view:
CREATE VIEW A3EMPLOYEE AS
SELECT NAME, BDATE, ADDRESS
FROM EMPLOYEE
WHERE DNO = 5;
After the view is created, A1 can grant SELECT on the view A3EMPLOYEE to A3
as follows:
GRANT SELECT ON A3EMPLOYEE TO A3
WITH GRANT OPTION;
Example:
• Finally, suppose that A1 wants to allow A4 to update only the SALARY attribute of
EMPLOYEE;

34
Advanced Database Systems Handout

• A1 can issue:
GRANT UPDATE ON EMPLOYEE (SALARY) TO A4;
– The UPDATE or INSERT privilege can specify particular attributes that may
be updated or inserted in a relation.
– Other privileges (SELECT, DELETE) are not attribute specific.

Specifying Limits on Propagation of Privileges


Techniques to limit the propagation of privileges have been developed, although they have not
yet been implemented in most DBMSs and are not a part of SQL.
– Limiting horizontal propagation to an integer number i means that an account B given
the GRANT OPTION can grant the privilege to at most i other accounts.
– Vertical propagation is more complicated; it limits the depth of the granting of
privileges.

Mandatory Access Control and Role-Based Access Control for Multilevel Security
The discretionary access control techniques of granting and revoking privileges on relations
has traditionally been the main security mechanism for relational database systems. This is an
all-or-nothing method:
• A user either has or does not have a certain privilege.
In many applications, an additional security policy is needed that classifies data and users based
on security classes. This approach as mandatory access control, would typically be combined
with the discretionary access control mechanisms.
Mandatory Access Control (MAC) for Multilevel Security
Mandatory Access Control (MAC) is based on system-wide policies that cannot be changed by
individual users. In this approach each database object is assigned a security class and each
user is assigned a clearance for a security class, and rules are imposed on reading and writing
of database objects by users.
The DBMS determines whether a given user can read or write a given object based on certain
rules that involve the security level of the object and the clearance of the user. These rules seek
to ensure that sensitive data can never be passed on to another user without the necessary
clearance. The SQL standard does not include support for MAC.
A popular model for MAC is called Bell–LaPadula model (Bell and LaPadula, 1974), which
is described in terms of objects (such as relations, views, tuples, and attributes), subjects (such
as users and programs), security classes, and clearances.

35
Advanced Database Systems Handout

• Each database object is assigned a security class, and each subject is assigned a
clearance for a security class.
• The security classes in a system are ordered, with a most secure class and a least
secure class.
• Examples of security classes are: top secret (TS), secret (S), confidential (C), and
unclassified (U)
Two restrictions are enforced on data access based on the subject/object classifications:
• Simple security property: A subject S is not allowed read access to an object O unless
class(S) ≥ class(O).
• A subject S is not allowed to write an object O unless class(S) ≤ class(O). This known
as the star property (or * property).

To incorporate multilevel security notions into the relational database model, it is common to
consider attribute values and tuples as data objects. Hence, each attribute A is associated with
a classification attribute C in the schema, and each attribute value in a tuple is associated with
a corresponding security classification. In addition, in some models, a tuple classification
attribute TC is added to the relation attributes to provide a classification for each tuple as a
whole. Hence, a multilevel relation schema R with n attributes would be represented as

• R(A1,C1,A2,C2, …, An,Cn,TC)

o where each Ci represents the classification attribute associated with attribute Ai.

The value of the TC attribute in each tuple t – which is the highest of all attribute classification
values within t – provides a general classification for the tuple itself, whereas each Ci provides
a finer security classification for each attribute value within the tuple.

The apparent key of a multilevel relation is the set of attributes that would have formed
the primary key in a regular(single-level) relation.

A multilevel relation will appear to contain different data to subjects (users) with different
clearance levels. In some cases, it is possible to store a single tuple in the relation at a higher
classification level and produce the corresponding tuples at a lower-level classification through
a process known as filtering. In other cases, it is necessary to store two or more tuples at
different classification levels with the same value for the apparent key. This leads to the
concept of polyinstantiation where several tuples can have the same apparent key value but
have different attribute values for users at different classification levels.

36
Advanced Database Systems Handout

In general, the entity integrity rule for multilevel relations states that all attributes that are
members of the apparent key must not be null and must have the same security classification
within each individual tuple. In addition, all other attribute values in the tuple must have a
security classification greater than or equal to that of the apparent key. This constraint ensures
that a user can see the key if the user is permitted to see any part of the tuple at all.

Other integrity rules, called null integrity and interinstance integrity, informally ensure that
if a tuple value at some security level can be filtered (derived) from a higher-classified tuple,
then it is sufficient to store the higher-classified tuple in the multilevel relation.

Comparing Discretionary Access Control and Mandatory Access Control


• Discretionary Access Control (DAC) policies are characterized by a high degree of
flexibility, which makes them suitable for a large variety of application domains.
– The main drawback of DAC models is their vulnerability to malicious attacks,
such as Trojan horses embedded in application programs.
• By contrast, mandatory policies ensure a high degree of protection in a way, they
prevent any illegal flow of information.
– Mandatory policies have the drawback of being too rigid and they are only
applicable in limited environments.
– In many practical situations, discretionary policies are preferred because they
offer a better trade-off between security and applicability.
Role-Based Access Control
Role-based access control (RBAC) emerged rapidly in the 1990s as a proven technology for
managing and enforcing security in large-scale enterprise wide systems.
• Its basic notion is that permissions are associated with roles, and users are assigned
to appropriate roles.
• Roles can be created using the CREATE ROLE and DESTROY ROLE commands.
– The GRANT and REVOKE commands discussed under DAC can then be used
to assign and revoke privileges from roles.
• RBAC appears to be a viable alternative to traditional discretionary and mandatory
access controls; it ensures that only authorized users are given access to certain data
or resources.
• Many DBMSs have allowed the concept of roles, where privileges can be assigned to
roles.

37
Advanced Database Systems Handout

• Role hierarchy in RBAC is a natural way of organizing roles to reflect the


organization’s lines of authority and responsibility.
Another important consideration in RBAC systems is the possible temporal constraints that
may exist on roles, such as time and duration of role activations, and timed triggering of a role
by an activation of another role.
• Using an RBAC model is highly desirable goal for addressing the key security
requirements of Web-based applications.
• In contrast, discretionary access control (DAC) and mandatory access control (MAC)
models lack capabilities needed to support the security requirements emerging
enterprises and Web-based applications

Access Control Policies for E-Commerce and the Web


E-Commerce environments require elaborate policies that go beyond traditional DBMSs.
• In an e-commerce environment the resources to be protected are not only traditional
data but also knowledge and experience.
• The access control mechanism should be flexible enough to support a wide spectrum
of heterogeneous protection objects.
A related requirement is the support for content-based access-control. Another requirement
is related to the heterogeneity of subjects, which requires access control policies based on user
characteristics and qualifications.
• A possible solution, to better take into account user profiles in the formulation of access
control policies, is to support the notion of credentials.
• A credential is a set of properties concerning a user that are relevant for security
purposes
– For example, age, position within an organization
• It is believed that the XML language can play a key role in access control for e-
commerce applications.
Introduction to Statistical Database Security
Statistical databases are used mainly to produce statistics on various populations. The database
may contain confidential data on individuals, which should be protected from user access.
Users are permitted to retrieve statistical information on the populations, such as averages,
sums, counts, maximums, minimums, and standard deviations. A population is a set of tuples

38
Advanced Database Systems Handout

of a relation (table) that satisfy some selection condition. Statistical queries involve applying
statistical functions to a population of tuples.
For example, we may want to retrieve the number of individuals in a population or the average
income in the population. However, statistical users are not allowed to retrieve individual data,
such as the income of a specific person. Statistical database security techniques must prohibit
the retrieval of individual data. This can be achieved by prohibiting queries that retrieve
attribute values and by allowing only queries that involve statistical aggregate functions such
as:
• COUNT, SUM, MIN, MAX, AVERAGE, and STANDARD DEVIATION.
Such queries are sometimes called statistical queries.
It is DBMS’s responsibility to ensure confidentiality of information about individuals, while
still providing useful statistical summaries of data about those individuals to users. Provision
of privacy protection of users in a statistical database is paramount.
In some cases it is possible to infer the values of individual tuples from a sequence statistical
queries.
• This is particularly true when the conditions result in a population consisting of a small
number of tuples.

39

You might also like