Project
Project
Project
Chapter 3
Introduction to processes
A process is an instance of a program running in a computer. A process consists of an
execution environment together with one or more threads. A thread is the operating
system abstraction of an activity.
A process is an instance of a program running in a computer. It is close in meaning to
task, a term used in some operating systems. Like a task, a process is a running program
with which a particular set of data is associated so that the process can be kept track of.
An application that is being shared by multiple users will generally have one process at
some stage of execution for each user. A process is basically a program in execution. In
computing, a process is the instance of a computer program that is being executed. It
contains the program code and its activity. Further A process is defined as an entity
which represents the basic unit of work to be implemented in the system. A process can
initiate a sub-process, which is a called a child process (and the initiating process is
sometimes referred to as its parent). A child process is a replica of the parent process
and shares some of its resources, but cannot exist if the parent is terminated. Processes
can exchange information or synchronize their operation through several methods of
inter process communication.
Figure: Layout of a process inside
main memory
Heap: Heap is the segment where dynamic memory allocation usually takes place
during its run time.
Text: this includes the current activity represented by the value of program counter and
the contents of the processorʼs registration.
Another example is a web server - Multiple threads allow for multiple requests to be
satisfied simultaneously, without having to service requests sequentially or to fork
off separate processes for every incoming request. (The latter is how this sort of
thing was done before the concept of threads was developed. A daemon would
listen at a port, fork off a child for every incoming request to be processed, and
then go back to listening to the port).
Major benefits to multi-threading
● Responsiveness: If the process is divided into multiple threads, if one thread completes its
execution, then its output can be immediately returned.
● Faster context switch: Context switch time between threads is lower compared to the
process context switch. Process context switching requires more overhead from the CPU.
● Effective utilization of multiprocessor system: If we have multiple threads in a single
process, then we can schedule multiple threads on multiple processors. This will make
process execution faster.
● Resource sharing: Resources like code, data, and files can be shared among all threads
within a process. Note: Stacks and registers canʼt be shared among the threads. Each thread
has its own stack and registers.
● Communication: Communication between multiple threads is easier, as the threads share
a common address space. while in the process we have to follow some specific
communication techniques for communication between the two processes.
● Enhanced throughput of the system: If a process is divided into multiple threads, and
each thread function is considered as one job, then the number of jobs completed per unit
of time is increased, thus increasing the throughput of the system.
Properties of a Thread
● Only one system call can create more than one thread (Lightweight process).
● Threads share data and information.
● Threads shares instruction, global and heap regions but have its own
individual stack and registers.
● Thread management consumes no or fewer system calls as the
communication between threads can be achieved using shared memory.
● The isolation property of the process increases its overhead in terms of
resource consumption.
Thread Model
There are two types of threads to be managed in a modern system: User threads
and kernel threads. User threads are supported above the kernel, without kernel
support. There are the threads that application programmers would put into their
programs. Kernel threads are supported within the kernel of the OS itself. All
modern operating systems support kernel level threads, allowing the kernel to
perform multiple simultaneous tasks and/or to service multiple kernel system calls
simultaneously. In a specific implementation, the user threads must be mapped to
kernel threads, using one of the following strategies.
Many to one multithreading model:
The many to one model maps many user levels threads to one kernel thread. This type
of relationship facilitates an effective context-switching environment, easily
implemented even on the simple kernel with no thread support.
The disadvantage of this model is that since there is only one kernel-level thread
schedule at any given time, this model cannot take advantage of the hardware
acceleration offered by multithreaded processes or multi-processor systems. In this, all
the thread management is done in the userspace. If blocking comes, this model blocks
the whole system.
One to one multithreading model
The one-to-one model maps a single user-level
thread to a single kernel-level thread. This type of
relationship facilitates the running of multiple
threads in parallel. However, this benefit comes
with its drawback. The generation of every new
user thread must include creating a
corresponding kernel thread causing an
overhead, which can hinder the performance of
the parent process. Windows series and Linux
operating systems try to tackle this problem by
limiting the growth of the thread count.
Many to Many Model multithreading model
In this type of model, there are several user-level threads and several kernel-level
threads. The number of kernel threads created depends upon a particular
application. The developer can create as many threads at both levels but may not
be the same. The many to many model is a compromise between the other two
models. In this model, if any thread makes a blocking system call, the kernel can
schedule another thread for execution. Also, with the introduction of multiple
threads, complexity is not present as in the previous models. Though this model
allows the creation of multiple kernel threads, true concurrency cannot be
achieved by this model. This is because the kernel can schedule only one process
at a time.
Differences between Process and Thread
As we mentioned earlier that in many respect threads operate in the same way as
that of processes. Some of the similarities and differences are:
Similarities:
● Like processes threads share CPU and only one thread active(running) at a
time.
● Like processes, threads within a process execute sequentially.
● Like Processes, thread can create children.
● And like process, if one thread is blocked, another thread can run.
Major Differentiate between process and thread :
Virtualization
Virtualization is the most effective way to reduce IT expenses and boost efficiency
large to mid size and small organizations. Virtualization lets you run multiple
operating systems and applications on a single server, consolidate hardware to get
vastly higher productivity from fewer servers and simplify the management,
maintenance, and the deployment of new applications.
2. Network Virtualization: The ability to run multiple virtual networks with each
having a separate control and data plan. It co-exists together on top of one physical
network. It can be managed by individual parties that are potentially confidential to
each other. Network virtualization provides a facility to create and provision virtual
networks, logical switches, routers, firewalls, load balancers, Virtual Private Networks
(VPN), and workload security within days or even weeks.
1. Application Virtualization: Application virtualization helps a user to have remote
access to an application from a server. The server stores all personal information and
other characteristics of the application but can still run on a local workstation through
the internet. An example of this would be a user who needs to run two different versions
of the same software. Technologies that use application virtualization are hosted
applications and packaged applications.
2. Network Virtualization: The ability to run multiple virtual networks with each
having a separate control and data plan. It co-exists together on top of one physical
network. It can be managed by individual parties that are potentially confidential to
each other. Network virtualization provides a facility to create and provision virtual
networks, logical switches, routers, firewalls, load balancers, Virtual Private Networks
(VPN), and workload security within days or even weeks.
3. Desktop Virtualization: Desktop virtualization allows the usersʼ OS to be remotely
stored on a server in the data center. It allows the user to access their desktop virtually,
from any location by a different machine. Users who want specific operating systems
other than Windows Server will need to have a virtual desktop. The main benefits of
desktop virtualization are user mobility, portability, and easy management of software
installation, updates, and patches.
The client is the machine (workstation or PC) running the front-end applications. It
interacts with a user through the keyboard, display, and pointing device such as a
mouse. The client also refers to the client process that runs on the client machine.
The client has no direct data access responsibilities. It simply requests processes
from the server and displays data managed by the server. Therefore, the client
workstation can be optimized for its job. For example, it might not need large disk
capacity, or it might benefit from graphic capabilities.
Networked user interfaces
A major task of client machines is to provide the means for users to interact with
remote servers. There are roughly two ways in which this interaction can be
supported. First, for each remote service the client machine will have a separate
counterpart that can contact the service over the network. A typical example is a
calendar running on a user's Smartphone that needs to synchronize with a remote,
possibly shared calendar. In this case, an application-level protocol will handle the
synchronization, as shown in Figure below,
A second solution is to provide direct
access to remote services by offering
only a convenient user interface.
Effectively, this means that the client
machine is used only as a terminal
with no need for local storage, leading
to an application-neutral solution as
shown in Figure below. In the case of
networked user interfaces, everything
is processed and stored at the server.
This thin-client approach has received
much attention with the increase of
Internet connectivity and the use of
mobile devices. Thin-client solutions
are also popular as they ease the task
of system management.
Client-side software can transparently collect all responses and pass a single
response to the client application. Regarding failure transparency, masking
communication failures with a server is typically done through client middleware.
For example, client middleware can be configured to repeatedly attempt to
connect to a server, or perhaps try another server after several attempts. There are
even situations in which the client middleware returns data it had cached during a
previous session, as is sometimes done by Web browsers that fail to connect to a
server.
SERVERS
A server is a piece of computer hardware or software
that provides functionality for other programs or
devices, called clients. This architecture is called the
client-server model. Servers can provide various
functionalities, often called services, such as sharing
data or resources among multiple clients, or performing
computation for a client. A single server can serve
multiple clients, and a single client can use multiple
servers. A client process may run on the same device or
may connect over a network to a server on a different
device. Typical servers are database servers, file servers,
mail servers, print servers, web servers, game servers,
and application servers
General server design issues in distributed system
A server is a process implementing a specific service on behalf of a collection of clients. In essence,
each server is organized in the same way: it waits for an incoming request from a client and
subsequently ensures that the request is taken care of, after which it waits for the next incoming
request.
1. Concurrent versus iterative servers: There are two types of server design choices:
Iterative server: It's a single threaded server which processes the requests in a queue. While
processing the current request it adds incoming requests in a wait queue, and once the processing is
done, it takes in the next request in the queue. Essentially, requests are not executed in parallel at
any time on the server.
Concurrent server: In this case when a new request comes in, the server spawns a new thread to
service the request. Thus all processes are executed in parallel in this scenario. This is the thread
per-request model. There is also one more flavor to the multi-threaded server. Unlike the previous
case where new thread is spawned every time a new request comes in, there is a pool of
pre-spawned threads in the server which are ready to serve the requests. A thread dispatcher or
scheduler handles this pool of pre-spawned threads.
2. Contacting a server: end points
The client can determine the server it need to communicates using one of the two
techniques.
● Hard code the port number of the server in the client. If the server moves then
the client will need to be recompiled.
● A directory service is used to register a service with a port number and IP
address. The client connects to the directory service to determine the location
of the server with a particular service, and then sends to it the request. In this
case, the client only needs to know the location of the directory service.
In all cases, clients send requests to an end point, also called a port, at the machine where the
server is running. Each server listens to a specific end point. How do clients know the end point of
a service? One approach is to globally assign end points for well-known services. For example,
servers that handle Internet FTP requests always listen to TCP port 21. Likewise, an HTTP server
for the World Wide Web will always listen to TCP port 80. These end points have been assigned by
the Internet Assigned Numbers Authority (IANA). With assigned end points, the client needs to find
only the network address of the machine where the server is running. Name services can be used
for that purpose.
There are many services that do not require a pre-assigned end point. For example, a time-of-day
server may use an end point that is dynamically assigned to it by its local operating system. In that
case, a client will first have to look up the end point. One solution is to have a special daemon
running on each machine that runs servers. The daemon keeps track of the current end point of
each service implemented by a co-located server. The daemon itself listens to a well-known end
point. A client will first contact the daemon, request the end point, and then contact the specific
server, as shown in Figure below. It is common to associate an end point with a specific service.
However, actually implementing each service by means of a separate server may be a waste of
resources. For example, in a typical UNIX system, it is common to have lots of servers running
simultaneously, with most of them passively waiting until a client request comes in.
3. Interrupting a server
Another issue that needs to be taken into account when designing a server is whether
and how a server can be interrupted. For example, consider a user who has just decided
to upload a huge file to an FTP server. Then, suddenly realizing that it is the wrong file,
he wants to interrupt the server to cancel further data transmission. There are several
ways to do this. One approach that works only too well in the current Internet (and is
sometimes the only alternative) is for the user to abruptly exit the client application
(which will automatically break the connection to the server), immediately restart it,
and pretend nothing happened. The server will eventually tear down the old
connection, thinking the client has probably crashed.
Contd… Interrupting a server
A much better approach for handling communication interrupts is to develop the client
and server such that it is possible to send out-of-band data, which is data that is to be
processed by the server before any other data from that client. One solution is to let the
server listen to a separate control end point to which the client sends out-of-band data,
while at the same time listening (with a lower priority) to the end point through which
the normal data passes. Another solution is to send out-of-band data across the same
connection through which the client is sending the original request. In TCP, for example,
it is possible to transmit urgent data. When urgent data are received at the server, the
latter is interrupted (e.g., through a signal in UNIX systems), after which it can inspect
the data and handle them accordingly.
4. Stateless versus stateful servers
A final, important design issue, is whether or not the server is stateless.
Stateful servers are those, which keep a state of their connected clients. For example,
push servers are Stateful servers since the server need to know the list of clients it needs
to send messages to.
Stateless servers on the other hand don't keep any information on the state of its
connected clients, and can change its own state without having to inform any client. In
this case, the client will keep its own session information. Pull servers are examples of
stateless servers where the clients send requests to the servers as and when required.
The advantage of stateless servers are that the server crash will not impact clients more
resilient to failures, it's also more scalable, since clients take some workloads.
Soft state servers maintain the state of the client for a limited period of time on the
server and when the session timeouts it discards the information.
Advantages of both Stateful and Stateless servers
The main advantage of Stateful servers, is that they can provide better
performance for clients because clients do not have to provide full file information
every time they perform an operation, the size of messages to and from the server
can be significantly decreased. Likewise the server can make use of knowledge of
access patterns to perform read-ahead and do other optimizations. Stateful servers
can also offer clients extra services such as file locking, and remember read and
write positions.
The main advantage of stateless servers is that they can easily recover from failure.
Because there is no state that must be restored, a failed server can simply restart
after a crash and immediately provide services to clients as though nothing
happened. Furthermore, if clients crash the server is not stuck with abandoned
opened or locked files. Another benefit is that the server implementation remains
simple because it does not have to implement the state accounting associated with
opening, closing, and locking of files.
Object Servers
An object server is a server tailored to support distributed objects. The important
difference between a general object server and other servers is that an object
server by itself does not provide a specific service. Specific services are
implemented by the objects that reside in the server. Essentially, the server
provides only the means to invoke local objects, based on requests from remote
clients. As a consequence, it is relatively easy to change services by simply adding
and removing objects.
An object server thus acts as a place where objects live. An object consists of two
parts: data representing its state and the code for executing its methods. Whether
or not these parts are separated, or whether method implementations are shared
by multiple objects, depends on the object server. Also, there are differences in the
way an object server invokes its objects. For example, in a multithreaded server,
each object may be assigned a separate thread, or a separate thread may be used
for each invocation request.
CODE MIGRATION
Traditionally, code migration in distributed systems took place in the form of
process migration in which an entire process was moved from one machine to
another. The basic idea is that overall system performance can be improved if
processes are moved from heavily-loaded to lightly-loaded machines.
Code migration in the broadest sense deals with moving programs between
machines, with the intention to have those programs be executed at the target. In
some cases, as in process migration, the execution status of a program, pending
signals, and other parts of the environment must be moved as well.
Reasons for Migrating Code
Code migration is essential in distributed systems for several reasons:
Dynamic Environment: Code migration allows systems to adapt to frequent changes
by transferring code to different locations as needed.
Resource Optimization: It optimizes resource utilization by moving code to nodes with
excess capacity, balancing workload.
System Maintenance and Updates: Facilitates seamless deployment of updates,
ensuring applications remain functional.
Scaling: Enables dynamic resource allocation to handle increased traffic and workload
demands.
Fault Tolerance and Resilience: Aids in quick recovery from failures by migrating code
to ensure continuity of service.