Distributed System Notes
Distributed System Notes
Distributed Systems
A distributed system contains multiple nodes that are physically separate but linked together
using the network. All the nodes in this system communicate with each other and handle
processes in tandem. Each of these nodes contains a small part of the distributed operating
system software.
The nodes in the distributed systems can be arranged in the form of client/server systems or peer
to peer systems. Details about these are as follows −
Client/Server Systems
In client server systems, the client requests a resource and the server provides that resource. A
server may serve multiple clients at the same time while a client is in contact with only one
server. Both the client and server usually communicate via a computer network and so they are a
part of distributed systems.
The peer to peer systems contains nodes that are equal participants in data sharing. All the tasks
are equally divided between all the nodes. The nodes interact with each other as required as share
resources. This is done with the help of a network.
Advantages of Distributed Systems
All the nodes in the distributed system are connected to each other. So nodes can easily share
data with other nodes.
More nodes can easily be added to the distributed system i.e. it can be scaled as required.
Failure of one node does not lead to the failure of the entire distributed system. Other nodes
can still communicate with each other.
Resources like printers can be shared with multiple nodes rather than being restricted to just
one.
It is difficult to provide adequate security in distributed systems because the nodes as well as
the connections need to be secured.
Some messages and data can be lost in the network while moving from one node to another.
The database connected to the distributed systems is quite complicated and difficult to handle
as compared to a single user system.
Overloading may occur in the network if all the nodes of the distributed system try to send data
at once.
Distributed System is a collection of autonomous computer systems that are physically separated
but are connected by a centralized computer network that is equipped with distributed system
software. The autonomous computers will communicate among each system by sharing
resources and files and performing the tasks assigned to them.
Any Social Media can have its Centralized Computer Network as its Headquarters and computer
systems that can be accessed by any user and using their services will be the Autonomous
Systems in the Distributed System Architecture.
Distributed System Software: This Software enables computers to coordinate their activities
and to share the resources such as Hardware, Software, Data, etc.
Database: It is used to store the processed data that are processed by each Node/System of the
Distributed systems that are connected to the Centralized network.
As we can see that each Autonomous System has a common Application that can have its own
data that is shared by the Centralized Database System.
To Transfer the Data to Autonomous Systems, Centralized System should be having a
Middleware Service and should be connected to a Network.
Middleware Services enables some services which are not present in the local systems or
centralized system default by acting as an interface between the Centralized System and the
local systems. By using components of Middleware Services systems communicate and manage
data.
The Data which is been transferred through the database will be divided into segments or
modules and shared with Autonomous systems for processing.
The Data will be processed and then will be transferred to the Centralized system through the
network and will be stored in the database.
Resource Sharing: It is the ability to use any Hardware, Software, or Data anywhere in the
System.
Openness: It is concerned with Extensions and improvements in the system (i.e., How openly
the software is developed and shared with others)
Concurrency: It is naturally present in the Distributed Systems, that deal with the same activity
or functionality that can be performed by separate users who are in remote locations. Every
local system has its independent Operating Systems and Resources.
Scalability: It increases the scale of the system as a number of processors communicate with
more users by accommodating to improve the responsiveness of the system.
Fault tolerance: It cares about the reliability of the system if there is a failure in Hardware or
Software, the system continues to operate properly without degrading the performance the
system.
Transparency: It hides the complexity of the Distributed Systems to the Users and Application
programs as there should be privacy in every system.
Heterogeneity: Networks, computer hardware, operating systems, programming languages, and
developer implementations can all vary and differ among dispersed system components.
2. Transparency :
An important goal of a distributed system is to hide the fact that its process and resources
are physically distributed across multiple computers. A distributed system that is capable
of presenting itself to users and applications such that it is only a single computer system
is called transparent.
1. Openness :
Another important goal of distributed systems is openness. An open distributed system is
a system that offers services in standards that describable the syntax and semantics of
those service instances, standard rules in computer networks control the format, content,
and meaning of messages sent and received. Such rules are formalized in the protocols. In
distributed systems, services are typically specified through interfaces, often called
interface definition languages (IDL). Interface definitions written in IDL almost always
capture only the syntax of services. They accurately specify the names of functions that
are available with the types of parameters, return values, possible exceptions that can be
raised and so on.
2. Scalability :
The uncertain trend in distributed systems is towards larger systems. This observation has
implications for distributed file system design. Algorithms that work well for systems
with 100 machines can work for systems with 1000 machines and none at all for systems
with 10, 000 machines. for starters, the centralized algorithm does not scale well. If
opening a file requires contacting a single centralized server to record the fact that the file
is open then the server will eventually become a bottleneck as the system grows.
3. Reliability :
The main goal of building distributed systems was to make them more reliable than
single processor systems. The idea is that if some machine goes down, some other
machine gets used to it. In other words, theoretically the reliability of the overall system
can be a Boolean OR of the component reliability. For example, with four file servers,
each with a 0.95 chance of being up at any instant, the probability of all four being down
simultaneously is 0.000006, so the probability of at least one being available is (1-
0.000006)= 0.999994, far better than any individual server.
4. Performance :
Building a transparent, flexible, reliable distributed system is useless if it is slow like
molasses. In particular application on a distributed system, it should not deteriorate better
than running some application on a single processor. Various performance metrics can be
used. Response time is one, but so are throughput, system utilization, and amount of
network capacity consumed. Furthermore, The results of any benchmark are often highly
dependent on the nature of the benchmark. A benchmark involves a large number of
independent highly CPU-bound computations which give radically different results than a
benchmark that consists of scanning a single large file for same pattern.
Hardware concepts :-
a) A crossbar switch
b) An omega switching network
Multicomputers :-
Homogeneous:
Heterogeneous:
Software Concepts :-
The software of the distributed system is nothing but selection of different operating system
platforms.
The operating system is the interaction between user and the hardware.
Multiprocessor- uses different system services to manage resources connected in a system and
use system calls to communicate with the processor.
In distributed OS, a common set of services is shared among multiple processors in such a way
that they are meant to execute a distributed application effectively and also provide services to
separate independent computers connected in a network as shown in fig below
It uses Data structure like queue to manages the messages and avoid message loss between
sender and receiver computer.
Disadvantages:
It is specifically designed for hetrogeneous multicomputer system, where multiple hardware and
network platforms are supported.
It has multiple operating system running on different hardware platforms connected in network.
It follows the loosely coupled architecture pattern which allow user to use services provided by
the local machine itself as shown in fig below.
Eg Remote login where user workstation is used to log in to the remoter server and execute
remote commands over the network.
Advantage:
It has scalability feature, where large number of resources and users are supported.
Disadvantage:
As distributed operating system has lack of scalability and network operating system fails to
provide a single coherent view, therefore a new layer is formed between the distributed and
network operating system is called the middleware operating system.
It has a common set of services is provided for the local applications and independent set of
services for the remote applications.
It support heterogeneity that is it supports multiple languages and operating system where user
gets freedom to write the application using the any of the supported language under any
platform.
It provide the services such as locating the objects or interfaces by their names, finding the
location of objects, maintaining the quality of services, handling the protocol information,
synchronization, concurrency and security of the objects etc.
Fig(a) Middleware operating system
Client-Server Model
is a distributed application structure that partitions task or workload between the providers of a
resource or service, called servers, and service requesters called clients. In the client-server
architecture, when the client computer sends a request for data to the server through the internet,
the server accepts the requested process and deliver the data packets requested back to the client.
Clients do not share any of their resources. Examples of Client-Server Model are Email, World
Wide Web, etc.
Client: When we talk the word Client, it mean to talk of a person or an organization
using a particular service. Similarly in the digital world a Client is a computer (Host) i.e.
capable of receiving information or using a particular service from the service providers
(Servers).
Servers: Similarly, when we talk the word Servers, It mean a person or medium that
serves something. Similarly in this digital world a Server is a remote computer which
provides information (data) or access to particular services.
So, its basically the Client requesting something and the Server serving it as long as its present
in the database.
How the browser interacts with the servers ?
There are few steps to follow to interacts with the servers a client.
User enters the URL(Uniform Resource Locator) of the website or file. The Browser then
requests the DNS(DOMAIN NAME SYSTEM) Server.
DNS Server lookup for the address of the WEB Server.
DNS Server responds with the IP address of the WEB Server.
Browser sends over an HTTP/HTTPS request to WEB Server’s IP (provided by DNS
server).
Server sends over the necessary files of the website.
Browser then renders the files and the website is displayed. This rendering is done with
the help of DOM (Document Object Model) interpreter, CSS interpreter and JS Engine
collectively known as the JIT or (Just in Time) Compilers.
Peer-to-peer: Both remote processes are executing at same level and they exchange data
using some shared resource.
Client-Server: One remote process acts as a Client and requests some resource from
another application process acting as Server.
In client-server model, any process can act as Server or Client. It is not the type of machine, size
of the machine, or its computing power which makes it server; it is the ability of serving request
that makes a machine a server.
A system can act as Server and Client simultaneously. That is, one process is acting as Server
and another is acting as a client. This may also happen that both client and server processes
reside on the same machine.
Communication
Sockets
Remote Procedure Calls (RPC)
Sockets
In this paradigm, the process acting as Server opens a socket using a well-known (or known by
client) port and waits until some client request comes. The second process acting as a Client also
opens a socket but instead of waiting for an incoming request, the client processes ‘requests
first’.
When the request is reached to server, it is served. It can either be an information sharing or
resource request.
This is a mechanism where one process interacts with another by means of procedure calls. One
process (client) calls the procedure lying on remote host. The process on remote host is said to be
Server. Both processes are allocated stubs. This communication happens in the following way:
The client process calls the client stub. It passes all the parameters pertaining to program
local to it.
All parameters are then packed (marshalled) and a system call is made to send them to
other side of the network.
Kernel sends the data over the network and the other end receives it.
The remote host passes data to the server stub where it is unmarshalled.
The parameters are passed to the procedure and the procedure is then executed.
The result is sent back to the client in the same manner.
It is based on the client-server concept. The client is the program that makes the request, and the
server is the program that gives the service. An RPC, like a local procedure call, is based on the
synchronous operation that requires the requesting application to be stopped until the remote
process returns its results. Multiple RPCs can be executed concurrently by utilizing lightweight
processes or threads that share the same address space. Remote Procedure Call program as often
as possible utilizes the Interface Definition Language (IDL), a determination language for
describing a computer program component’s Application Programming Interface (API). In this
circumstance, IDL acts as an interface between machines at either end of the connection, which
may be running different operating systems and programming languages.
The process arguments are placed in a precise location by the caller when the procedure needs
to be called.
Control at that point passed to the body of the method, which is having a series of instructions.
The procedure body is run in a recently created execution environment that has duplicates of
the calling instruction’s arguments.
At the end, after the completion of the operation, the calling point gets back the control, which
returns a result.
o The call to a procedure is possible only for those procedures that are not within the
caller’s address space because both processes (caller and callee) have distinct address
space and the access is restricted to the caller’s environment’s data and variables from
the remote procedure.
o The caller and callee processes in the RPC communicate to exchange information via the
message-passing scheme.
o The first task from the server-side is to extract the procedure’s parameters when a
request message arrives, then the result, send a reply message, and finally wait for the
next call message.
o Only one process is enabled at a certain point in time.
o The caller is not always required to be blocked.
o The asynchronous mechanism could be employed in the RPC that permits the client to
work even if the server has not responded yet.
o In order to handle incoming requests, the server might create a thread that frees the
server for handling consequent requests.
Types of RPC:
The problems encountered with interactive applications that are handled remotely
It provides a server for clients to use.
Due to the callback mechanism, the client process is delayed.
Deadlocks need to be managed in callbacks.
It promotes a Peer-to-Peer (P2P) paradigm among the processes involved.
RPC for Broadcast: A client’s request that is broadcast all through the network and handled by
all servers that possess the method for handling that request is known as a broadcast RPC.
Broadcast RPC’s features include:
You have an option of selecting whether or not the client’s request message ought to be
broadcast.
It also gives you the option of declaring broadcast ports.
It helps in diminishing physical network load.
Batch-mode RPC: Batch-mode RPC enables the client to line and separate RPC inquiries in a
transmission buffer before sending them to the server in a single batch over the network. Batch-
mode RPC’s features include:
It diminishes the overhead of requesting the server by sending them all at once using the
network.
It is used for applications that require low call rates.
It necessitates the use of a reliable transmission protocol.
Remote Procedure Calls have disjoint address space i.e. different address space, unlike Local
Procedure Calls.
Remote Procedure Calls are more prone to failures due to possible processor failure or
communication issues of a network than Local Procedure Calls.
Because of the communication network, remote procedure calls take longer than local
procedure calls.
The technique of using procedure calls in RPC permits high-level languages to provide
communication between clients and servers.
This method is like a local procedure call but with the difference that the called procedure is
executed on another process and a different computer.
The thread-oriented model is also supported by RPC in addition to the process model.
The RPC mechanism is employed to conceal the core message passing method.
The amount of time and effort required to rewrite and develop the code is minimal.
The distributed and local environments can both benefit from remote procedure calls.
To increase performance, it omits several of the protocol layers.
Abstraction is provided via RPC. To exemplify, the user is not known about the nature of
message-passing in network communication.
RPC empowers the utilization of applications in a distributed environment.
In Remote Procedure Calls parameters are only passed by values as pointer values are not
allowed.
It involves a communication system with another machine and another process, so this
mechanism is extremely prone to failure.
The RPC concept can be implemented in a variety of ways, hence there is no standard.
Due to the interaction-based nature, there is no flexibility for hardware architecture in RPC.
Due to a remote procedure call, the process’s cost has increased.
What is RPC?
Remote Procedure Call (RPC) is an interprocess communication technique. The Full form of
RPC is Remote Procedure Call. It is used for client-server applications. RPC mechanisms are
used when a computer program causes a procedure or subroutine to execute in a different address
space, which is coded as a normal procedure call without the programmer specifically coding the
details for the remote interaction.
This procedure call also manages low-level transport protocol, such as User Datagram Protocol,
Transmission Control Protocol/Internet Protocol etc. It is used for carrying the message data
between programs.
Types of RPC
Callback RPC
Broadcast RPC
Batch-mode RPC
Callback RPC
This type of RPC enables a P2P paradigm between participating processes. It helps a process to
be both client and server services.
Broadcast RPC
Broadcast RPC is a client’s request, that is broadcast on the network, processed by all servers
which have the method for processing that request.
Allows you to specify that the client’s request message has to be broadcasted.
You can declare broadcast ports.
It helps to reduce the load on the physical network
Batch-mode RPC
Batch-mode RPC helps to queue, separate RPC requests, in a transmission buffer, on the client-
side, and then send them on a network in one batch to the server.
RPC Architecture
1. Client
2. Client Stub
3. RPC Runtime
4. Server Stub
5. Server
RPC Architecture
How RPC Works?
Step 1) The client, the client stub, and one instance of RPC run time execute on the client
machine.
Step 2) A client starts a client stub process by passing parameters in the usual way. The client
stub stores within the client’s own address space. It also asks the local RPC Runtime to send
back to the server stub.
Step 3) In this stage, RPC accessed by the user by making regular Local Procedural Cal. RPC
Runtime manages the transmission of messages between the network across client and server. It
also performs the job of retransmission, acknowledgment, routing, and encryption.
Step 4) After completing the server procedure, it returns to the server stub, which packs
(marshalls) the return values into a message. The server stub then sends a message back to the
transport layer.
Step 5) In this step, the transport layer sends back the result message to the client transport layer,
which returns back a message to the client stub.
Step 6) In this stage, the client stub demarshalls (unpack) the return parameters, in the resulting
packet, and the execution process returns to the caller.
Characteristics of RPC
The called procedure is in another process, which is likely to reside in another machine.
The processes do not share address space.
Parameters are passed only by values.
RPC executes within the environment of the server process.
It doesn’t offer access to the calling procedure’s environment.
Features of RPC
RPC method helps clients to communicate with servers by the conventional use of procedure
calls in high-level languages.
RPC method is modeled on the local procedure call, but the called procedure is most likely to be
executed in a different process and usually a different computer.
RPC supports process and thread-oriented models.
RPC makes the internal message passing mechanism hidden from the user.
The effort needs to re-write and re-develop the code is minimum.
Remote procedure calls can be used for the purpose of distributed and the local environment.
It commits many of the protocol layers to improve performance.
RPC provides abstraction. For example, the message-passing nature of network communication
remains hidden from the user.
RPC allows the usage of the applications in a distributed environment that is not only in the local
environment.
With RPC code, re-writing and re-developing effort is minimized.
Process-oriented and thread-oriented models support by RPC.
Disadvantages of RPC
Remote Procedure Call Passes Parameters by values only and pointer values, which is not
allowed.
Remote procedure calling (and return) time (i.e., overheads) can be significantly lower than that
for a local procedure.
This mechanism is highly vulnerable to failure as it involves a communication system, another
machine, and another process.
RPC concept can be implemented in different ways, which is can’t standard.
Not offers any flexibility in RPC for hardware architecture as It is mostly interaction-based.
The cost of the process is increased because of a remote procedure call.
Summary
RMI is used to build distributed applications; it provides remote communication between Java
programs. It is provided in the package java.rmi.
In an RMI application, we write two programs, a server program (resides on the server) and a
client program (resides on the client).
Inside the server program, a remote object is created and reference of that object is made
available for the client (using the registry).
The client program requests the remote objects on the server and tries to invoke its
methods.
Transport Layer − This layer connects the client and the server. It manages the existing
connection and also sets up new connections.
Stub − A stub is a representation (proxy) of the remote object at client. It resides in the
client system; it acts as a gateway for the client program.
Skeleton − This is the object which resides on the server side. stub communicates with
this skeleton to pass request to the remote object.
RRL(Remote Reference Layer) − It is the layer which manages the references made by
the client to the remote object.
When the client makes a call to the remote object, it is received by the stub which
eventually passes this request to the RRL.
When the client-side RRL receives the request, it invokes a method called invoke() of the
object remoteRef. It passes the request to the RRL on the server side.
The RRL on the server side passes the request to the Skeleton (proxy on the server)
which finally invokes the required object on the server.
The result is passed all the way back to the client.
Whenever a client invokes a method that accepts parameters on a remote object, the parameters
are bundled into a message before being sent over the network. These parameters may be of
primitive type or objects. In case of primitive type, the parameters are put together and a header
is attached to it. In case the parameters are objects, then they are serialized. This process is
known as marshalling.
At the server side, the packed parameters are unbundled and then the required method is
invoked. This process is known as unmarshalling.
RMI Registry
RMI registry is a namespace on which all server objects are placed. Each time the server creates
an object, it registers this object with the RMIregistry (using bind() or reBind() methods). These
are registered using a unique name known as bind name.
To invoke a remote object, the client needs a reference of that object. At that time, the client
fetches the object from the registry using its bind name (using lookup() method).