OS Unit III
OS Unit III
OS Unit III
5.1 Background
Recall that back in Chapter 3 we looked at cooperating processes ( those that can
effect or be effected by other simultaneously running processes ), and as an example,
we used the producer-consumer cooperating processes:
itemnextProduced;
while( true ) {
itemnextConsumed;
while( true ) {
The only problem with the above code is that the maximum number of items
which can be placed into the buffer is BUFFER_SIZE - 1. One slot is
unavailable because there always has to be a gap between the producer and the
consumer.
We could try to overcome this deficiency by introducing a counter variable, as
shown in the following code segments:
A solution to the critical section problem must satisfy the following three conditions:
1. Mutual Exclusion - Only one process at a time can be executing in their
critical section.
2. Progress - If no process is currently executing in their critical section, and one
or more processes want to execute their critical section, then only the
processes not in their remainder sections can participate in the decision, and
the decision cannot be postponed indefinitely. ( I.e. processes cannot be
blocked forever waiting to get into their critical sections. )
3. Bounded Waiting - There exists a limit as to how many other processes can
get into their critical sections after a process requests entry into their critical
section and before that request is granted. ( I.e. a process requesting entry into
their critical section will get a turn eventually, and there is a limit as to how
many other processes get to go first. )
We assume that all processes proceed at a non-zero speed, but no assumptions can be
made regarding the relative speed of one process versus another.
Kernel processes can also be subject to race conditions, which can be especially
problematic when updating commonly shared kernel data structures such as open file
tables or virtual memory management. Accordingly kernels can take on one of two
forms:
Semaphores
A more robust alternative to simple mutexes is to use semaphores, which are integer
variables for which only two ( atomic ) operations are defined, the wait and signal
operations, as shown in the following figure.
Note that not only must the variable-changing steps ( S-- and S++ ) be indivisible, it is
also necessary that for the wait operation when the test proves false that there be no
interruptions before S gets decremented. It IS okay, however, for the busy loop to be
interrupted when the test is true, which prevents the system from hanging forever.
Semaphore Usage
In practice, semaphores can take on one of two forms:
o Binary semaphores can take on one of two values, 0 or 1. They can be used
to solve the critical section problem as described above, and can be used as
mutexes on systems that do not provide a separate mutex mechanism.. The use
of mutexes for this purpose is shown in Figure 6.9 ( from the 8th edition )
below.
Mutual-exclusion implementation with semaphores.
Counting semaphores can take on any integer value, and are usually used to count
the number remaining of some limited resource. The counter is initialized to the
number of such resources available in the system, and whenever the counting
semaphore is greater than zero, then a process can enter a critical section and use one
of the resources. When the counter gets to zero ( or negative in some implementations
), then the process blocks until another process frees up a resource and increments the
counting semaphore with a signal call. ( The binary semaphore can be seen as just a
special case where the number of resources initially available is just one. )
S1;
signal( synch );
wait( synch );
S2;
Semaphore Implementation
The big problem with semaphores as described above is the busy loop in the wait call,
which consumes CPU cycles without doing any useful work. This type of lock is
known as a spinlock, because the lock just sits there and spins while it waits. While
this is generally a bad thing, it does have the advantage of not invoking context
switches, and so it is sometimes used in multi-processing systems when the wait time
is expected to be short - One thread spins on one processor while another completes
their critical section on another processor.
An alternative approach is to block a process when it is forced to wait for an available
semaphore, and swap it out of the CPU. In this implementation each semaphore needs
to maintain a list of processes that are blocked waiting for it, so that one of the
processes can be woken up and swapped back in when the semaphore becomes
available. ( Whether it gets swapped back into the CPU immediately or whether it
needs to hang out in the ready queue for a while is a scheduling problem. )
The new definition of a semaphore and the corresponding wait and signal operations
are shown as follows:
Note that in this implementation the value of the semaphore can actually become
negative, in which case its magnitude is the number of processes waiting for that
semaphore. This is a result of decrementing the counter before checking its value.
Key to the success of semaphores is that the wait and signal operations be atomic, that
is no other process can execute a wait or signal on the same semaphore at the same
time. ( Other processes could be allowed to do other things, including working with
other semaphores, they just can't have access to this semaphore. ) On single
processors this can be implemented by disabling interrupts during the execution of
wait and signal; Multiprocessor systems have to use more complex methods,
including the use of spinlocking.
Deadlocks and Starvation
One important problem that can arise when using semaphores to block processes
waiting for a limited resource is the problem of deadlocks, which occur when
multiple processes are blocked, each waiting for a resource that can only be freed by
one of the other ( blocked ) processes, as illustrated in the following example.
( Deadlocks are covered more completely in chapter 7. )
Priority Inversion
A challenging scheduling problem arises when a high-priority process gets blocked
waiting for a resource that is currently held by a low-priority process.
If the low-priority process gets pre-empted by one or more medium-priority
processes, then the high-priority process is essentially made to wait for the medium
priority processes to finish before the low-priority process can release the needed
resource, causing a priority inversion. If there are enough medium-priority
processes, then the high-priority process may be forced to wait for a very long time.
One solution is a priority-inheritance protocol, in which a low-priority process
holding a resource for which a high-priority process is waiting will temporarily inherit
the high priority from the waiting process. This prevents the medium-priority
processes from preempting the low-priority process until it releases the resource,
blocking the priority inversion problem.
The book has an interesting discussion of how a priority inversion almost doomed the
Mars Pathfinder mission, and how the problem was solved when the priority inversion
was stopped. Full details are available online
at http://research.microsoft.com/en-us/um/people/mbj/mars_pathfinder/
authoritative_account.htm
One possible solution, as shown in the following code section, is to use a set of five
semaphores ( chopsticks[ 5 ] ), and to have each hungry philosopher first wait on their
left chopstick ( chopsticks[ i ] ), and then wait on their right chopstick ( chopsticks[ ( i
+1)%5])
But suppose that all five philosophers get hungry at the same time, and each starts by
picking up their left chopstick. They then look for their right chopstick, but because it
is unavailable, they wait for it, forever, and eventually all the philosophers starve due
to the resulting deadlock.
Monitors
Semaphores can be very useful for solving concurrency problems, but only if
programmers use them properly. If even one process fails to abide by the proper
use of semaphores, either accidentally or deliberately, then the whole system breaks
down. ( And since concurrency problems are by definition rare events, the problem
code may easily go unnoticed and/or be heinous to debug. )
For this reason a higher-level language construct has been developed,
called monitors.
Monitor Usage
A monitor is essentially a class, in which all data is private, and with the special
restriction that only one method within any given monitor object may be active at the
same time. An additional restriction is that monitor methods may only access the
shared data within the monitor and any data passed to them as parameters. I.e. they
cannot access any data external to the monitor.
In order to fully realize the potential of monitors, we need to introduce one additional
new data type, known as a condition.
o A variable of type condition has only two legal
operations, wait and signal. I.e. if X was defined as type condition, then legal
operations would be X.wait( ) and X.signal( )
o The wait operation blocks a process until some other process calls signal, and
adds the blocked process onto a list associated with that condition.
o The signal process does nothing if there are no processes waiting on that
condition. Otherwise it wakes up exactly one process from the condition's list
of waiting processes. ( Contrast this with counting semaphores, which always
affect the semaphore on a signal call. )
Figure 6.18 below illustrates a monitor that includes condition variables within its
data space. Note that the condition variables, along with the list of processes currently
waiting for the conditions, are in the data space of the monitor - The processes on
these lists are not "in" the monitor, in the sense that they are not executing any code in
the monitor.
Figure 5.17 - Monitor with condition variables
But now there is a potential problem - If process P within the monitor issues a signal
that would wake up process Q also within the monitor, then there would be two
processes running simultaneously within the monitor, violating the exclusion
requirement. Accordingly there are two possible solutions to this dilemma:
Signal and wait - When process P issues the signal to wake up process Q, P then waits,
either for Q to leave the monitor or on some other condition.
Signal and continue - When P issues the signal, Q waits, either for P to exit the monitor or
for some other condition.
There are arguments for and against either choice. Concurrent Pascal offers a third alternative
- The signal call causes the signaling process to immediately exit the monitor, so that the
waiting process can then wake up and proceed.
Java and C# ( C sharp ) offer monitors bulit-in to the language. Erlang offers similar
but different constructs.
Unfortunately the use of monitors to restrict access to resources still only works if
programmers make the requisite acquire and release calls properly. One option would
be to place the resource allocation code into the monitor, thereby eliminating the
option for programmers to bypass or ignore the monitor, but then that would
substitute the monitor's scheduling algorithms for whatever other scheduling
algorithms may have been chosen for that particular resource. Chapter 14 on
Protection presents more advanced methods for enforcing "nice" cooperation among
processes contending for shared resources.
Concurrent Pascal, Mesa, C#, and Java all implement monitors as described here.
Erlang provides concurrency support using a similar mechanism.
Adaptive Mutexes
Adaptive mutexes are basically binary semaphores that are implemented differently
depending upon the conditions:
o On a single processor system, the semaphore sleeps when it is blocked, until
the block is released.
o On a multi-processor system, if the thread that is blocking the semaphore is
running on the same processor as the thread that is blocked, or if the blocking
thread is not running at all, then the blocked thread sleeps just like a single
processor system.
o However if the blocking thread is currently running on a different processor
than the blocked thread, then the blocked thread does a spinlock, under the
assumption that the block will soon be released.
o Adaptive mutexes are only used for protecting short critical sections, where
the benefit of not doing context switching is worth a short bit of spinlocking.
Otherwise traditional semaphores and condition variables are used.
Reader-Writer Locks
Reader-writer locks are used only for protecting longer sections of code which are
accessed frequently but which are changed infrequently.
Turnstiles
A turnstile is a queue of threads waiting on a lock.
Each synchronized object which has threads blocked waiting for access to it needs a
separate turnstile. For efficiency, however, the turnstile is associated with the thread
currently holding the object, rather than the object itself.
In order to prevent priority inversion, the thread holding a lock for an object will
temporarily acquire the highest priority of any process in the turnstile waiting for the
blocked object. This is called a priority-inheritance protocol.
User threads are controlled the same as for kernel threads, except that the priority-
inheritance protocol does not apply.
5.11 Summary
Database operations frequently need to carry out atomic transactions, in which the
entire transaction must either complete or not occur at all. The classic example is a
transfer of funds, which involves withdrawing funds from one account and depositing
them into another - Either both halves of the transaction must complete, or neither
must complete.
Operating Systems can be viewed as having many of the same needs and problems as
databases, in that an OS can be said to manage a small database of process-related
information. As such, OSes can benefit from emulating some of the techniques
originally developed for databases. Here we first look at some of those techniques,
and then how they can be used to benefit OS development.
System Model
A transaction is a series of actions that must either complete in its entirety or must
be rolled-back as if it had never commenced.
The system is considered to have three types of storage:
Log-Based Recovery
Before each step of a transaction is conducted, a entry is written to a log on stable
storage:
o Each transaction has a unique serial number.
o The first entry is a "start"
o Every data changing entry specifies the transaction number, the old value, and
the new value.
o The final entry is a "commit".
o All transactions are idempotent - The can be repeated any number of times and
the effect is the same as doing them once. Likewise they can be undone any
number of times and the effect is the same as undoing them once. ( I.e.
"change x from 5 to 6", rather than "add 1 to x" ).
After a crash, any transaction which has "commit" recorded in the log can be redone
from the log information. Any which has "start" but not "commit" can be undone.
Checkpoints
In the event of a crash, all data can be recovered using the system described above, by
going through the entire log and performing either redo or undo operations on all the
transactions listed there.
Unfortunately this approach can be slow and wasteful, because many transactions are
repeated that were not lost in the crash.
Alternatively, one can periodically establish a checkpoint, as follows:
o Write all data that has been affected by recent transactions ( since the last
checkpoint ) to stable storage.
o Write a <checkpoint> entry to the log.
Now for crash recovery one only needs to find transactions that did not commit prior
to the most recent checkpoint. Specifically one looks backwards from the end of the
log for the last <checkpoint> record, and then looks backward from there for the most
recent transaction that started before the checkpoint. Only that transaction and the
ones more recent need to be redone or undone.
Serializability
Figure 6.22 below shows a schedule in which transaction 0 reads and writes data
items A and B, followed by transaction 1 which also reads and writes A and B.
This is termed a serial schedule, because the two transactions are conducted serially.
For any N transactions, there are N! possible serial schedules.
A nonserial schedule is one in which the steps of the transactions are not completely
serial, i.e. they interleave in some manner.
Nonserial schedules are not necessarily bad or wrong, so long as they can be shown to
be equivalent to some serial schedule. A nonserial schedule that can be converted to a
serial one is said to be conflict serializable, such as that shown in Figure 6.23 below.
Legal steps in the conversion are as follows:
o Two operations from different transactions are said to be conflicting if they
involve the same data item and at least one of the operations is a write
operations. Operations from two transactions are non-conflicting if they either
involve different data items or do not involve any write operations.
o Two operations in a schedule can be swapped if they are from two different
transactions and if they are non-conflicting.
o A schedules is conflict serializable if there exists a series of valid swap that
converts the schedule into a serial schedule.
Locking Protocol
One way to ensure serializability is to use locks on data items during atomic
transactions.
Shared and Exclusive locks correspond to the Readers and Writers problem discussed
above.
The two-phase locking protocol operates in two phases:
o A growing phase, in which the transaction continues to gain additional locks
on data items as it needs them, and has not yet relinquished any of its locks.
o A shrinking phase, in which it relinquishes locks. Once a transaction releases
any lock, then it is in the shrinking phase and cannot acquire any more locks.
The two-phase locking protocol can be proven to yield serializable schedules, but it
does not guarantee avoidance of deadlock. There may also be perfectly valid
serializable schedules that are unobtainable with this protocol.
Timestamp-Based Protocols
Under the timestamp protocol, each transaction is issued a unique timestamp entry
before it begins execution. This can be the system time on systems where all process
access the same clock, or some non-decreasing serial number. The timestamp for
transaction Ti is denoted TS( Ti )
The schedules generated are all equivalent to a serial schedule occurring in timestamp
order.
Each data item also has two timestamp values associated with it - The W-timestamp
is the timestamp of the last transaction to successfully write the data item, and the R-
timestamp is the stamp of the last transaction to successfully read from it. ( Note that
these are the timestamps of the respective transactions, not the time at which the read
or write occurred. )
The timestamps are used in the following manner:
o Suppose transaction Ti issues a read on data item Q:
If TS(Ti ) < the W-timestamp for Q, then it is attempting to read data
that has been changed. Transaction Ti is rolled back, and restarted with
a new timestamp.
If TS(Ti ) > the W-timestamp for Q, then the R-timestamp for Q is
updated to the later of its current value and TS( Ti ).
o Suppose Ti issues a write on Q:
If TS(Ti ) < the R-timestamp for Q, then it is attempting to change data
that has already been read in its unaltered state. Ti is rolled back and
restarted with a new timestamp.
If TS(Ti ) < the W-timestamp for Q it is also rolled back and restarted,
for similar reasons.
Otherwise, the operation proceeds, and the W-timestamp for Q is
updated to TS(Ti ).
Deadlocks
Deadlock Characterization
New Sidebar in Ninth Edition
Necessary Conditions
There are four conditions that are necessary to achieve deadlock:
1. Mutual Exclusion - At least one resource must be held in a non-sharable
mode; If any other process requests this resource, then that process must wait
for the resource to be released.
2. Hold and Wait - A process must be simultaneously holding at least one
resource and waiting for at least one resource that is currently being held by
some other process.
3. No preemption - Once a process is holding a resource ( i.e. once its request
has been granted ), then that resource cannot be taken away from that process
until the process voluntarily releases it.
4. Circular Wait - A set of processes { P0, P1, P2, . . ., PN } must exist such
that every P[ i ] is waiting for P[ ( i + 1 ) % ( N + 1 ) ]. ( Note that this
condition implies the hold-and-wait condition, but it is easier to deal with the
conditions if the four are considered separately. )
Resource-Allocation Graph
In some cases deadlocks can be understood more clearly through the use
of Resource-Allocation Graphs, having the following properties:
o A set of resource categories, { R1, R2, R3, . . ., RN }, which appear as square
nodes on the graph. Dots inside the resource nodes indicate specific instances
of the resource. ( E.g. two dots might represent two laser printers. )
o A set of processes, { P1, P2, P3, . . ., PN }
o Request Edges - A set of directed arcs from Pi to Rj, indicating that process
Pi has requested Rj, and is currently waiting for that resource to become
available.
o Assignment Edges - A set of directed arcs from Rj to Pi indicating that
resource Rj has been allocated to process Pi, and that Pi is currently holding
resource Rj.
o Note that a request edge can be converted into an assignment edge by
reversing the direction of the arc when the request is granted. ( However note
also that request edges point to the category box, whereas assignment edges
emanate from a particular instance dot within the box. )
o For example:
Figure 7.1 - Resource allocation graph
Deadlock Prevention
Deadlocks can be prevented by preventing at least one of the four required conditions:
Mutual Exclusion
Shared resources such as read-only files do not lead to deadlocks.
Unfortunately some resources, such as printers and tape drives, require exclusive
access by a single process.
No Preemption
Preemption of process resource allocations can prevent this condition of deadlocks,
when it is possible.
o One approach is that if a process is forced to wait when requesting a new
resource, then all other resources previously held by this process are implicitly
released, ( preempted ), forcing this process to re-acquire the old resources
along with the new resources in a single request, similar to the previous
discussion.
o Another approach is that when a resource is requested and not available, then
the system looks to see what other processes currently have those
resources and are themselves blocked waiting for some other resource. If such
a process is found, then some of their resources may get preempted and added
to the list of resources for which the process is waiting.
o Either of these approaches may be applicable for resources whose states are
easily saved and restored, such as registers and memory, but are generally not
applicable to other devices such as printers and tape drives.
Circular Wait
One way to avoid circular wait is to number all resources, and to require that
processes request resources only in strictly increasing ( or decreasing ) order.
In other words, in order to request resource Rj, a process must first release all Ri such
that i>= j.
One big challenge in this scheme is determining the relative ordering of the different
resources
Deadlock Avoidance
The general idea behind deadlock avoidance is to prevent deadlocks from ever
happening, by preventing at least one of the aforementioned conditions.
This requires more information about each process, AND tends to lead to low device
utilization. ( I.e. it is a conservative approach. )
In some algorithms the scheduler only needs to know the maximum number of each
resource that a process might potentially use. In more complex algorithms the
scheduler can also take advantage of the schedule of exactly what resources may be
needed in what order.
When a scheduler sees that starting a process or granting resource requests may lead
to future deadlocks, then that process is just not started or the request is not granted.
A resource allocation state is defined by the number of available and allocated
resources, and the maximum requirements of all processes in the system.
Safe State
A state is safe if the system can allocate all resources requested by all processes ( up
to their stated maximums ) without entering a deadlock state.
More formally, a state is safe if there exists a safe sequence of processes { P0, P1, P2,
..., PN } such that all of the resource requests for Pi can be granted using the resources
currently allocated to Pi and all processes Pj where j <i. ( I.e. if all the processes prior
to Pi finish and free up their resources, then Pi will be able to finish also, using the
resources that they have freed up. )
If a safe sequence does not exist, then the system is in an unsafe state,
which MAY lead to deadlock. ( All safe states are deadlock free, but not all unsafe
states lead to deadlocks. )
For example, consider a system with 12 tape drives, allocated as follows. Is this a safe state?
What is the safe sequence?
P1 4 2
P2 9 2
What happens to the above table if process P2 requests and is granted one more tape
drive?
Key to the safe state approach is that when a request is made for resources, the request
is granted only if the resulting allocation state is a safe one.
The resulting resource-allocation graph would have a cycle in it, and so the request
cannot be granted.
Figure 7.8 - An unsafe state in a resource allocation graph
Banker's Algorithm
For resource categories that contain more than one instance the resource-allocation
graph method does not work, and more complex ( and less efficient ) methods must be
chosen.
The Banker's Algorithm gets its name because it is a method that bankers could use to
assure that when they lend out resources they will still be able to satisfy all their
clients. ( A banker won't loan out a little money to start building a house unless they
are assured that they will later be able to loan out the rest of the money to finish the
house. )
When a process starts up, it must state in advance the maximum allocation of
resources it may request, up to the amount available on the system.
When a request is made, the scheduler determines whether granting the request would
leave the system in a safe state. If not, then the process must wait until the request can
be granted safely.
The banker's algorithm relies on several key data structures: ( where n is the number
of processes and m is the number of resource categories. )
o Available[ m ] indicates how many resources are currently available of each
type.
o Max[ n ][ m ] indicates the maximum demand of each process of each
resource.
o Allocation[ n ][ m ] indicates the number of each resource category allocated
to each process.
o Need[ n ][ m ] indicates the remaining resources needed of each type for each
process. ( Note that Need[ i ][ j ] = Max[ i ][ j ] - Allocation[ i ][ j ] for all i, j. )
For simplification of discussions, we make the following notations / observations:
o One row of the Need vector, Need[ i ], can be treated as a vector
corresponding to the needs of process i, and similarly for Allocation and Max.
o A vector X is considered to be <= a vector Y if X[ i ] <= Y[ i ] for all i.
Safety Algorithm
In order to apply the Banker's algorithm, we first need an algorithm for determining
whether or not a particular state is safe.
This algorithm determines if the current state of a system is safe, according to the
following steps:
1. Let Work and Finish be vectors of length m and n respectively.
Work is a working copy of the available resources, which will be
modified during the analysis.
Finish is a vector of booleans indicating whether a particular process
can finish. ( or has finished so far in the analysis. )
Initialize Work to Available, and Finish to false for all elements.
2. Find an i such that both (A) Finish[ i ] == false, and (B) Need[ i ] < Work.
This process has not finished, but could with the given available working set.
If no such i exists, go to step 4.
3. Set Work = Work + Allocation[ i ], and set Finish[ i ] to true. This corresponds
to process i finishing up and releasing its resources back into the work pool.
Then loop back to step 2.
4. If finish[ i ] == true for all i, then the state is a safe state, because a safe
sequence has been found.
( JTB's Modification:
1. In step 1. instead of making Finish an array of booleans initialized to false,
make it an array of ints initialized to 0. Also initialize an int s = 0 as a step
counter.
2. In step 2, look for Finish[ i ] == 0.
3. In step 3, set Finish[ i ] to ++s. S is counting the number of finished processes.
4. For step 4, the test can be either Finish[ i ] > 0 for all i, or s >= n. The benefit
of this method is that if a safe state exists, then Finish[ ] indicates one safe
sequence ( of possibly many. ) )
An Illustrative Example
Consider the following situation:
Deadlock Detection
If deadlocks are not avoided, then another approach is to detect when they have
occurred and recover somehow.
In addition to the performance hit of constantly checking for deadlocks, a policy /
algorithm must be in place for recovering from deadlocks, and there is potential for
lost work when processes must be aborted or have their resources preempted.
Single Instance of Each Resource Type
If each resource category has a single instance, then we can use a variation of the
resource-allocation graph known as a wait-for graph.
A wait-for graph can be constructed from a resource-allocation graph by eliminating
the resources and collapsing the associated edges, as shown in the figure below.
An arc from Pi to Pj in a wait-for graph indicates that process Pi is waiting for a
resource that process Pj is currently holding.
Figure 7.9 - (a) Resource allocation graph. (b) Corresponding wait-for graph
Now suppose that process P2 makes a request for an additional instance of type C,
yielding the state shown below. Is the system now deadlocked?
Detection-Algorithm Usage
When should the deadlock detection be done? Frequently, or infrequently?
The answer may depend on how frequently deadlocks are expected to occur, as well
as the possible consequences of not catching them immediately. ( If deadlocks are not
removed immediately when they occur, then more and more processes can "back up"
behind the deadlock, making the eventual task of unblocking the system more
difficult and possibly damaging to more processes. )
There are two obvious approaches, each with trade-offs:
1. Do deadlock detection after every resource allocation which cannot be
immediately granted. This has the advantage of detecting the deadlock right
away, while the minimum number of processes are involved in the deadlock.
( One might consider that the process whose request triggered the deadlock
condition is the "cause" of the deadlock, but realistically all of the processes in
the cycle are equally responsible for the resulting deadlock. ) The down side of
this approach is the extensive overhead and performance hit caused by
checking for deadlocks so frequently.
2. Do deadlock detection only when there is some clue that a deadlock may have
occurred, such as when CPU utilization reduces to 40% or some other magic
number. The advantage is that deadlock detection is done much less
frequently, but the down side is that it becomes impossible to detect the
processes involved in the original deadlock, and so deadlock recovery can be
more complicated and damaging to more processes.
3. ( As I write this, a third alternative comes to mind: Keep a historical log of
resource allocations, since that last known time of no deadlocks. Do deadlock
checks periodically ( once an hour or when CPU usage is low?), and then use
the historical log to trace through and determine when the deadlock occurred
and what processes caused the initial deadlock. Unfortunately I'm not certain
that breaking the original deadlock would then free up the resulting log jam. )
Process Termination
Two basic approaches, both of which recover resources allocated to terminated
processes:
o Terminate all processes involved in the deadlock. This definitely solves the
deadlock, but at the expense of terminating more processes than would be
absolutely necessary.
o Terminate processes one by one until the deadlock is broken. This is more
conservative, but requires doing deadlock detection after each step.
In the latter case there are many factors that can go into deciding which processes to
terminate next:
1. Process priorities.
2. How long the process has been running, and how close it is to finishing.
3. How many and what type of resources is the process holding. ( Are they easy
to preempt and restore? )
4. How many more resources does the process need to complete.
5. How many processes will need to be terminated
6. Whether the process is interactive or batch.
7. ( Whether or not the process has made non-restorable changes to any resource.
)
Resource Preemption
When preempting resources to relieve deadlock, there are three important issues to be
addressed:
1. Selecting a victim - Deciding which resources to preempt from which
processes involves many of the same decision criteria outlined above.
2. Rollback - Ideally one would like to roll back a preempted process to a safe
state prior to the point at which that resource was originally allocated to the
process. Unfortunately it can be difficult or impossible to determine what such
a safe state is, and so the only safe rollback is to roll back all the way back to
the beginning. ( I.e. abort the process and make it start over. )
3. Starvation - How do you guarantee that a process won't starve because its
resources are constantly being preempted? One option would be to use a
priority system, and increase the priority of a process every time its resources
get preempted. Eventually it should get a high enough priority that it won't get
preempted any more.