Chapter 1: Multi Threaded Programming: (Operating Systems-18Cs43)
Chapter 1: Multi Threaded Programming: (Operating Systems-18Cs43)
Chapter 1: Multi Threaded Programming: (Operating Systems-18Cs43)
MODULE 2
Chapter 1: Multi threaded programming
1.1 Introduction:
A thread is a basic unit of CPU utilization. It consists of a thread ID, program counter, a
stack, and a set of registers.
Traditional processes have a single thread of control. It is also called as heavyweight
process. There is one program counter, and one sequence of instructions that can be
carried out at any given time.
A multi-threaded application have multiple threads within a single process, each having
their own program counter, stack and set of registers, but sharing common code, data,
and certain structures such as open files. Such process are called as lightweight
process.
1.1.1 Motivation
Threads are very useful in modern programming whenever a process has multiple tasks
to perform independently of the others.
This is particularly true when one of the tasks may block, and it is desired to allow the
other tasks to proceed without blocking.
For example in a word processor, a background thread may check spelling and grammar
while a foreground thread processes user input ( keystrokes ), while yet a third thread
loads images from the hard drive, and a fourth does periodic automatic backups of the
file being edited.
In a web server - Multiple threads allow for multiple requests to be served
simultaneously. A thread is created to service each request; meanwhile another thread
listens for more client request.
In a web browser one thread is used to display the images and another thread is used to
retrieve data from the network.
[OPERATING SYSTEMS-18CS43]
1.1.2 Benefits
1. Responsiveness - One thread may provide rapid response while other threads
are blocked or slowed down doing intensive calculations.
Multi threading allows a program to continue running even if part of it
is blocked or is performing a lengthy operation, thereby increasing
responsiveness to the user.
2. Resource sharing - By default threads share common code, data, and other
resources, which allows multiple tasks to be performed simultaneously in a
single address space.
3. Economy - Creating and managing threads is much faster than performing the
same tasks for processes. Context switching between threads takes less time.
4. Scalability, i.e. Utilization of multiprocessor architectures Multithreading
can be greatly utilized in a multiprocessor architecture. A single threaded
process can make use of only one CPU, whereas the execution of a multi-
threaded application may be split among the available processors.
Multithreading on a multi-CPU machine increases concurrency. In a single
processor architecture, the CPU generally moves between each thread so
quickly as to create an illusion of parallelism, but in reality only one thread is
running at a time.
Multicore Programming
a) Many-To-One Model
In the many-to-one model, many user-level threads are all mapped onto a
single kernel thread.
Thread management is handled by the thread library in user space, which is very efficient.
If a blocking system call is made by one of the threads, then the entire process blocks.
Thus blocking the other user threads from continuing the execution.
Only one user thread can access the kernel at a time, as there is only one kernel thread.
Thus the threads are unable to run in parallel on multiprocessors.
Green threads of Solaris and GNU Portable Threads implement the many-to- one model.
b) One-To-One Model
The one-to-one model creates a separate kernel thread to handle each user
thread.
One-to-one model overcomes the problems listed above involving blocking
system calls and the splitting of processes across multiple CPUs.
However the overhead of managing the one-to-one model is more significant,
involving more overhead and slowing down the system.
This model places a limit on the number of threads created.
Linux and Windows from 95 to XP implement the one-to-one model for
threads.
c) Many-To-Many Model
If the thread invokes the exec( ) system call, the program specified in the
parameter to exec( ) will be executed by the thread created.
b) Cancellation
Terminating the thread before it has completed its task is called thread
cancellation. The thread to be cancelled is called target thread.
Example : Multiple threads required in loading a webpage is suddenly
cancelled, if the browser window is closed.
Threads that are no longer needed may be cancelled in one of two ways:
1. Asynchronous Cancellation - cancels the thread immediately.
2. Deferred Cancellation the target thread periodically check whether
it has to terminate, thus gives an opportunity to the thread, to terminate
itself in an orderly fashion.
In this method, the operating system will reclaim all the
resources before cancellation.
[OPERATING SYSTEMS-18CS43]
c) Signal Handling
A signal is used to notify a process that a particular event has occurred.
In a single-threaded program, the signal is sent to the same thread. But, in multi-
threaded environment, the signal is delivered in variety of ways, depending on the
type of signal
Deliver the signal to the thread, to which the signal applies.
Deliver the signal to every threads in the process.
Deliver the signal to certain threads in the process.
Deliver the signal to specific thread, which receive all the signals.
d) Thread Pools
In multithreading process, thread is created for every service. Eg In web server,
thread is created to service every client request.
Creating new threads every time, when thread is needed and then deleting it when
it is done can be inefficient, as
Time is consumed in creation of the thread.
A limit has to be placed on the number of active threads in the system. Unlimited
thread creation may exhaust system resources.
An alternative solution is to create a number of threads when the process first starts,
and put those threads into a thread pool.
Threads are allocated from the pool when a request comes, and returned to the
pool when no longer needed(after the completion of request).
When no threads are available in the pool, the process may have to wait until
one becomes available.
Thread creation time is not taken. The service is done by the thread existing in
the pool. Servicing a request with an existing thread is faster than waiting to
create a thread.
The thread pool limits the number of threads in the system. This is important
on systems that cannot support a large number of concurrent threads.
[OPERATING SYSTEMS-18CS43]
e) Thread-Specific Data
Data of a thread, which is not shared with other threads is called thread
specific data.
Most major thread libraries ( pThreads, Win32, Java ) provide support for
thread-specific data.
Example if threads are used for transactions and each transaction has an ID.
This unique ID is a specific data of the thread.
f) Scheduler Activations
Scheduler Activation is the technique used for communication between the user-
thread library and the kernel.
It works as follows:
the kernel must inform an application about certain events. This procedure
is known as an upcall.
Upcalls are handled by the thread library with an upcall handler, and
upcall handlers must run on a virtual processor.
The upcall handler handles this thread, by saving the state of the blocking thread
and relinquishes the virtual processor on which the blocking thread is running.
The upcall handler then schedules another thread that is eligible to run on the
virtual processor. When the event that the blocking thread was waiting for occurs,
the kernel makes another upcall to the thread library informing it that the
previously blocked thread is now eligible to run. Thus assigns the thread to the
available virtual processor.
[OPERATING SYSTEMS-18CS43]
1.4.1 Pthreads
The POSIX standard ( IEEE 1003.1c ) defines the specification for pThreads,
not the implementation.
pThreads are available on Solaris, Linux, Mac OSX, Tru64, and via public
domain shareware for Windows.
Global variables are shared amongst all threads.
One thread can wait for the others to rejoin before continuing.
pThreads begin execution in a specified function, in this example the runner( )
function.
Pthread_create() function is used to create a thread.
[OPERATING SYSTEMS-18CS43]
Figure 4.9
In a single-processor system, only one process can run at a time; other processes must
wait until the CPU is free. The objective of multiprogramming is to have some process
running at all times in processor, to maximize CPU utilization.
In multiprogramming, several processes are kept in memory at one time. When one
process has to wait, the operating system takes the CPU away from that process and gives the
CPU to another process. This pattern continues. Every time one process has to wait, another
process can take over use of the CPU. Scheduling of this kind is a fundamental operating-
system function. Almost all computer resources are scheduled before use. The CPU is one of
the primary computer resources. Thus, its scheduling is central to operating-system design.
2.1.4 Dispatcher
Another component involved in the CPU-scheduling function is the dispatcher. The
dispatcher is the module that gives control of the CPU to the process selected by the short-
term scheduler. This function involves the following:
Switching context
Switching to user mode
Jumping to the proper location in the user program to restart that program
The dispatcher should be as fast as possible, since it is invoked during every process
switch. The time it takes for the dispatcher to stop one process and start another running is
known as the dispatch latency.
Different CPU scheduling algorithms have different properties, and the choice of a
particular algorithm may favour one class of processes over another. Many criteria have been
suggested for comparing CPU scheduling algorithms. The criteria include the following:
CPU utilization - The CPU must be kept as busy as possible. Conceptually, CPU
utilization can range from 0 to 100 percent. In a real system, it should range from 40 to 90
percent .
Throughput - If the CPU is busy executing processes, then work is done fast. One
measure of work is the number of processes that are completed per time unit, called
throughput.
Turnaround time - From the point of view of a particular process, the important
[OPERATING SYSTEMS-18CS43]
criterion is how long it takes to execute that process. The interval from the time of
submission of a process to the time of completion is the turnaround time. Turnaround
time is the sum of the periods spent waiting to get into memory, waiting in the ready
queue, executing on the CPU, and doing I/O.
Time spent waiting (to get into memory + ready queue + execution + I/O)
Waiting time - The total amount of time the process spends waiting in the ready queue.
Response time - The time taken from the submission of a request until the first response
is produced is called the response time. It is the time taken to start responding. In
interactive system, response time is given criterion.
Advantages :
more predictable than other schemes since it offers time
code for FCFS scheduling is simple to write and understand
Disadvantages:
Short jobs(process) may have to wait for long time
Important jobs (with higher priority) have to wait
cannot guarantee good response time
average waiting time and turn around time is often quite long
lower CPU and device utilization.
Example:-
Process Burst Time
P1 24
P2 3
P3 3
Suppose that the processes arrive in the order: P1, P2 , P3
The Gantt Chart for the schedule is:
P1 P2 P3
0 24 27 30
P2 P3 P1
0 3 6 30
[OPERATING SYSTEMS-18CS43]
Here, there is a Convoy effect, as all the short processes wait for the completion of one
big process. Resulting in lower CPU and device utilization.
SJF can't be implemented at the level of short-term scheduling, because there is no way to
know the length of the next CPU burst
Advantage:
The SJF is optimal, i.e. it gives the minimum average waiting time for a given set
of processes.
Disadvantage:
Determining the length of the next CPU burst.
SJF algorithm may be either 1) non-preemptive or
preemptive.
Non preemptive SJF
The current process is allowed to finish its CPU burst.
Preemptive SJF
If the new process has a shorter next CPU burst than what is left of the executing
process, that process is preempted.
It is also known as SRTF scheduling (Shortest-Remaining-Time-First).
Example (for non-preemptive SJF): Consider the following set of processes, with the length
of the CPU-burst time given in milliseconds.
Process P1 is started at time 0, since it is the only process in the queue. Process P 2 arrives at
time 1. The remaining time for process P 1 (7 milliseconds) is larger than the time required by
process P2 (4 milliseconds), so process P1 is preempted, and process P2 is scheduled. The
average waiting time for this example is ((10 -1) + (1-1) + (17 - 2) + (5- 3))/4 = 26/4 = 6.5
milliseconds.
switching.
There must be scheduling among the queues, which is commonly implemented as fixed-
priority preemptive scheduling.
For example, the foreground queue may have absolute priority over the background
queue.
Time slice: each queue gets a certain amount of CPU time which it can schedule amongst
its processes; i.e., 80% to foreground in RR
20% to background in FCFS
Let's look at an example of a multilevel queue scheduling algorithm with five
queues, listed below in order of priority:
1. System processes
2. Interactive processes
3. Interactive editing processes
4. Batch processes
5. Student processes
1) Asymmetric Multiprocessing
The basic idea is:
A master server is a single processor responsible for all scheduling
decisions, I/O processing and other system activities.
The other processors execute only user code.
Advantage:
i) This is simple because only one processor accesses the system data
structures, reducing the need for data sharing.
2) Symmetric Multiprocessing
The basic idea is:
i) Each processor is self-scheduling.
ii) To do scheduling, the scheduler for each processor
Examines the ready-queue and
Selects a process to execute.
Restriction: We must ensure that two processors do not choose the same process
and that processes are not lost from the queue.
processors.
Such a strategy is known as symmetric multithreading (or SMT). It has termed
hyperthreading technology.
The basic idea:
1) Create multiple logical processors on the same physical processor.
2) Present a view of several logical processors to the OS.
Each logical processor has its own architecture state, which includes general-purpose and
machine- state registers.
Each logical processor is responsible for its own interrupt handling.
SMT is a feature provided in hardware, not software.
The following figure illustrates a typical SMT architecture with two physical processors, each
housing two logical processors. From the operating system's perspective, four processors are
available for work on this system.
The first parameter for both functions contains a pointer to the attribute set for the thread.
The second parameter for the pthread_attr_setscope () function is passed either the
THREAD_SCOPE_SYSTEM or PTHREAD_SCOPE_PROCESS value, indicating how the
contention scope is to be set. In the case of pthread_attr_getscope(), this second parameter
contains a pointer to an int value that is set to the current value of the contention scope. If an
error occurs, each of these functions returns non-zero values.
[OPERATING SYSTEM-18CS43]
3.1Background
Co-operating process is one that can affect or be affected by other processes.
Co-operating processes may either
share a logical address-space (i.e. code & data) or
share data through files or
messages through threads.
Concurrent-access to shared-data may result in data-inconsistency.
To maintain data-consistency:
The orderly execution of co-operating processes is necessary.
Suppose that we wanted to provide a solution to producer-consumer problem that
fills all buffers. We can do so by having an variable counter that keeps track of the no.
of full buffers.
Initially, counter=0.
counter is incremented by the producer after it produces a new buffer.
counter is decremented by the consumer after it consumes a buffer.
Shared-data:
A situation where several processes access & manipulate same data concurrently and the
outcome of the execution depends on particular order in which the access takes place, is
called a race condition.
Example:
counter++ could be implemented as: counter may be implemented
as:
The value of counter may be either 4 or 6, where the correct result should be 5. This is an
example for race condition.
[OPERATING SYSTEM-18CS43]
A solution to the critical-section problem must satisfy the following three requirements:
1. Mutual exclusion. If process Pi is executing in its critical section, then no other processes
can be executing in their critical sections.
2. Progress. If no process is executing in its critical section and some processes wish to enter
their critical sections, then only those processes that are not executing in their remainder
sections can participate in deciding which will enter its critical section next, and this selection
cannot be postponed indefinitely.
3. Bounded waiting. There exists a bound, or limit, on the number of times that other
processes are allowed to enter their critical sections after a process has made a request to
enter its critical section and before that request is granted.
[OPERATING SYSTEM-18CS43]
Two general approaches are used to handle critical sections in operating systems:
Preemptive kernels: A preemptive kernel allows a process to be preempted while it
is running in kernel mode.
Nonpreemptive kernels.. A nonpreemptive kernel does not allow a process running
in kernel mode to be preempted; a kernel-mode process will run until it exits kernel
mode, blocks, or voluntarily yields control of the CPU.
6.3 Solution
It proves that
1. Mutual exclusion is preserved
2. Progress requirement is satisfied
3. Bounded-waiting requirement is met
[OPERATING SYSTEM-18CS43]
TestAndSet()
The definition of the test and set() instruction
Using test and set() instruction, mutual exclusion can be implemented by declaring a
boolean variable lock, initialized to false. The structure of process Pi is shown in
Figure:
Test and Set() instruction & Swap() Instruction do not satisfy the bounded-waiting requirement.
The hardware-based solutions to the critical-section problem are complicated as well as generally
inaccessible to application programmers. So operating-systems designers build software tools to
solve the critical-section problem, and this synchronization tool called as Semaphore.
Semaphore S is an integer variable
Two standard operations modify S: wait() and signal()
Originally called P() and V()
Can only be accessed via two indivisible (atomic) operations
Must guarantee that no two processes can execute wait () and signal () on the same
semaphore at the same time.
3.5.1 Usage:
Semaphore classified into:
Counting semaphore: Value can range over an unrestricted domain.
Binary semaphore(Mutex locks): Value can range only between from 0 & 1. It provides
mutual exclusion.
Solution for Critical-section Problem using Binary Semaphores
Binary semaphores can be used to solve the critical-section problem for multiple processes.
The processes share a semaphore mutex initialized to 1 (Figure 2.20).
process P2
Because synch is initialized to 0, P2 will execute S2 only after P1 has invoked signal (synch), which is
after statement S1 has been executed.
3.5.2 Implementation:
Main disadvantage of semaphore: Busy waiting.
Busy waiting: While a process is in its critical-section, any other process that tries to enter its
critical-section must loop continuously in the entry code.
Busy waiting wastes CPU cycles that some other process might be able to use productively.
This type of semaphore is also called a spinlock (because the process "spins" while waiting for
the lock).
To overcome busy waiting, we can modify the definition of the wait() and signal() as follows:
When a process executes the wait() and finds that the semaphore-value is not
positive, it must wait. However, rather than engaging in busy waiting, the process can
block itself.
A process that is blocked (waiting on a semaphore S) should be restarted when some
other process executes a signal(). The process is restarted by a wakeup().
We assume 2 simple operations:
block() suspends the process that invokes it.
wakeup(P) resumes the execution of a blocked process P.
We define a semaphore as follows:
[OPERATING SYSTEM-18CS43]
where,
¤ mutex provides mutual-exclusion for accesses to the buffer-pool.
¤ empty counts the number of empty buffers.
¤ full counts the number of full buffers.
[OPERATING SYSTEM-18CS43]
where,
¤ mutex is used to ensure mutual-exclusion when the variable readcount is updated.
¤ wrt is common to both reader and writer processes.
wrt is used as a mutual-exclusion semaphore for the writers.
wrt is also used by the first/last reader that enters/exits the critical-section.
¤ readcount counts no. of processes currently reading the object.
Initialization
mutex = 1, wrt = 1, readcount = 0
Writer Process: Reader Process:
The readers-writers problem and its solutions are used to provide reader-writer locks on some
systems.
The mode of lock needs to be specified:
1) read mode
[OPERATING SYSTEM-18CS43]
When a process wishes to read shared-data, it requests the lock in read mode.
2) write mode
When a process wishes to modify shared-data, it requests the lock in write mode.
Multiple processes are permitted to concurrently acquire a lock in read
mode, but only one process may acquire the lock for writing.
These locks are most useful in the following situations:
In applications where it is easy to identify
-data and
-data.
In applications that have more readers than writers.
Figure 2.21 Situation of dining philosophers Figure 2.22 The structure of philosopher
Disadvantage:
[OPERATING SYSTEM-18CS43]
Deadlock may occur if all 5 philosophers become hungry simultaneously and grab their left
chopstick. When each philosopher tries to grab her right chopstick, she will be delayed
forever.
Three possible remedies to the deadlock problem:
Allow at most 4 philosophers to be sitting simultaneously at the table.
Allow a philosopher to pick up her chopsticks only if both chopsticks are available.
Use an asymmetric solution; i.e. an odd philosopher picks up first her left chopstick and
then her right chopstick, whereas an even philosopher picks up her right chopstick and then her
left chopstick.
3.7 Monitors
Monitor is a high-level synchronization construct.
It provides a convenient and effective mechanism for process synchronization.
Need for Monitors
When programmers use semaphores incorrectly, following types of errors may occur:
Suppose that a process interchanges the order in which the wait() and signal() operations on
the semaphore utex are executed, resulting in the following execution:
3.7.1 Usage
A monitor type presents a set of programmer-defined operations that are provided to ensure
mutual-exclusion within the monitor.
It also contains (Figure 2.23):
declaration of variables
bodies of procedures(or functions).
A procedure defined within a monitor can access only those variables declared locally within the
monitor and its formal-parameters.
Similarly, the local-variables of a monitor can be accessed by only the local-procedures.
[OPERATING SYSTEM-18CS43]
Only one process at a time is active within the monitor (Figure 2.24).
To allow a process to wait within the monitor, a condition variable must be declared, as
Condition variable can only be used with the following 2 operations (Figure 2.25):
x.signal()
This operation resumes exactly one suspended process. If no process is suspended, then
the signal operation has no effect.
2) x.wait()
The process invoking this operation is suspended until another process invokes x.signal().
Figure 2.24 Schematic view of a monitor Figure 2.25 Monitor with condition variables
Suppose when the x.signal() operation is invoked by a process P, there exists a suspended process
Q associated with condition x.
Both processes can conceptually continue with their execution. Two possibilities exist:
Signal and wait
P either waits until Q leaves the monitor or waits for another condition.
Signal and continue
Q either waits until P leaves the monitor or waits for another condition.
[OPERATING SYSTEM-18CS43]
following sequence:
= 0;
where
¤ mutex is provided for each monitor.
¤ next is used a signaling process to wait until the resumed process either leaves or waits
¤ next-count is used to count the number of processes suspended
Each external procedure F is replaced by
ResourceAllocator monitor controls the allocation of a single resource among competing processes.
[OPERATING SYSTEM-18CS43]
Each process, when requesting an allocation of the resource, specifies the maximum time it plans
to
use the resource.
The monitor allocates the resource to the process that has the shortest time-allocation request.
A process that needs to access the resource in question must observe the following sequence: