Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Class Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 36

Chapter 1 : - Fundamentals of Distributed System

1.1 Definition of a Distributed System


distributed system as a “collection of independent computers that
appear to the users of the system as a single computer” There are two
essential points in this definition:
• independent This means that, architecturally, the machines are
capable of operating independently.
• single computer : The second point is that the software enables
this set of connected machines to appear as a single computer to
the users of the system. This is known as the single system image
and is a major goal in designing distributed systems that are easy
to maintain and operate.

The figure below shows a simple distributed systems for a number of


applications running through different operating

Fig. Distributed System


Why we need a distributed system is mainly for the following
reasons:
1. Economics: a collection of microprocessors offer a better
price/performance than mainframes.
2. Speed: a distributed system may have more total computing
power than a mainframe. Enhanced performance through load
distribution.
3. Inherent distribution: Some applications are inherently
distributed. Ex. a supermarket chain.
4. Reliability: If one machine crashes, the system as a whole can
still survive. Higher availability and improved reliability.
5. Incremental growth: Computing power can be added in small
increments. Modular expandability
6. Data sharing: allow many users to access to a common data base
7. Resource Sharing: expensive peripherals like color printers
8. Communication: enhance human-to-human communication,
e.g., email, chat
9. Flexibility: spread the workload over the available machines
10.Mobility: Access the system, data or resources from any place or
device.
1.2 Working of Distributed System

Distributed System Software: This Software enables computers to


coordinate their activities and to share the resources such as Hardware,
Software, Data, etc.
Database: It is used to store the processed data that are processed by
each Node/System of the Distributed systems that are connected to the
Centralized network.

As we can see that each Autonomous System has a common


Application that can have its own data that is shared by the Centralized
Database System.
To Transfer the Data to Autonomous Systems, Centralized System
should be having a Middleware Service and should be connected to a
Network.
Middleware Services enable some services which are not present in the
local systems or centralized system default by acting as an interface
between the Centralized System and the local systems. By using
components of Middleware Services systems communicate and
manage data.
The Data which is been transferred through the database will be divided
into segments or modules and shared with Autonomous systems for
processing.
The Data will be processed and then will be transferred to the
Centralized system through the network and will be stored in the
database.

1.3 Characteristics of Distributed System


• Resource sharing – A distributed system can share hardware,
software, or data

• Simultaneous processing – Multiple machines can process the


same function simultaneously

• Scalability – The computing and processing capacity can scale up


as needed when extended to additional machines

• Error detection – Failures can be more easily detected

• Transparency – A node can access and communicate with other


nodes in the system
1.4 Advantages of Distributed System
1. Applications in Distributed Systems are Inherently Distributed
Applications.
2. Information in Distributed Systems is shared among
geographically distributed users.
3. Resource Sharing (Autonomous systems can share resources
from remote locations).
4. It has a better price performance ratio and flexibility.
5. It has shorter response time and higher throughput.
6. It has higher reliability and availability against component
failure.
7. It has extensibility so that systems can be extended in more
remote locations and also incremental growth.

1.5 Disadvantages of Distributed System


1. Relevant Software for Distributed systems does not exist
currently.
2. Security possess a problem due to easy access to data as the
resources are shared to multiple systems.
3. Networking Saturation may cause a hurdle in data transfer i.e., if
there is a lag in the network then the user will face a problem
accessing data.
4. In comparison to a single user system, the database associated
with distributed systems is much more complex and challenging
to manage.
5. If every node in a distributed system tries to send data at once, the
network may become overloaded.
1.6 Goals Of A Distributed System

1.Resources Sharing
Distributed system enable efficient sharing of resources such as
processing power, storage and data among the different nodes in that
distributed network.
2.Transparancy
Distributed system hide the distribution of resources and processes,
provides a unified and transparent view of the system.
3.Concurrency
distributed systems support concurrent execution of multiple tasks or
processs across the different nodes.
4.Scalability
Distributed system facilitate the easy expansion of system by adding
more nodes to handle increased workload.
5.Performance optimization
Distributed system optimize the overall system performance by
distributing the computation and data preprocessing.
1.7 Comparison Between Centralized System and Distributed
System

Centralized System Distributed System


Centralized System have non- Distributed System have
autonomous components. autonomous components.
Centralized System are often Distributed System many be built
built using homogeneous using heterogeneous technology.
technology.
Multiple users share the resources Distributed system components
of a centralized system at all may be used exclusively and
times. executed in concurrent processes.
Centralized Systems have a Distributed system have multiple
single point of control and of points of failure.
failure.
Lower initial cost, higher long- Higher initial cost, lower long-
term maintenance. term maintenance.

1.8 Comparison between Parallel Systems and Distributed Systems

Parallel Systems Distributed System


Tightly coupled shared memory Distributed memory Message
UMA, NUMA passing, RPC and / or used of
distributed shared memory.
Global clock control SIMD, No global clock control,
MIMD Synchronization algorithms
needed.
Bus, mesh, tree, mesh of tree and Ethernet(bus), token ring and
hypercube network SCI(ring), switching network
Performance scientific Performance
computing reliability/availability
/information/resource sharing
Occurs in a single computer Involves multiple computers
1.9 Transparency
Software hides some of the details of the distribution of system
resources. It makes the system more users friendly.
A distributed system that appears to its users and applications to be a
single computer system is said to be transparent.
Users and applications should be able to access remote resources in the
same way they access local resources.
The distributed systems should be perceived as a single entity by the
users or the application programmers rather than as a collection of
autonomous systems, which are co-operation.
The users should be unaware of where the services are located and also
the transferring from a local machine to a remote one should also be
transparent.
To make certain aspects of distributed system invisible to the
application programmer so that they need only be concerned with the
design of their particular application.

Type of transparency
a. Access transparency: enables local and remote resources to be
accessed using identical operations.
b. Location transparency: enables resources to be accessed without
knowledge of their location.
c. Concurrency transparency: enables several processes to operate
concurrently using shared resources without interference between
them.
d. Replication transparency: enables multiple instances of resources to
be used to increase reliability and performance without knowledge of
the replicas by users or application programmers. `
e. Failure transparency: enables the concealment of faults, allowing
users and application programs to complete their tasks despite the
failure of hardware or software components.
f. Mobility transparency: allows the movement of resources and clients
within asystem without affecting the operation of users or programs.
g. Performance transparency: allows the system to be reconfigures to
implement performance as loads vary.
h. Scaling transparency: allows the system and applications to expand
in scale without change to the system structure or the application
algorithms.
i. Access and location transparency together provide network
transparency

1.10 What is Scalability?


Scalability refers to the ability of a system, network, or application to
handle a growing amount of work or to be easily expanded to
accommodate growth. In computing and distributed systems,
scalability is crucial for maintaining performance, reliability, and
efficiency as demand increases.

Importance of Scalability in Distributed Systems


Scalability is very important in distributed systems:

1. Performance Maintenance: Ensures that a system remains


responsive and effective even as the number of users or the
volume of data increases.
2. Cost Efficiency: Allows for incremental growth, where
additional resources are added as needed, rather than over-
provisioning upfront.
3. Future-Proofing: Helps accommodate future growth and
technological advancements without requiring a complete
redesign or overhaul of the system.

Types of Scalability in Distributed Systems


1. Horizontal Scalability (Scaling Out)
• Horizontal scalability, or scaling out, involves adding more
machines or nodes to a distributed system to handle increased
load or demand.
2. Vertical Scalability (Scaling Up)
• Vertical scalability, or scaling up, involves increasing the capacity
of a single machine or node by adding more resources such as
CPU, memory, or storage.

1.11 Types of Distributed Systems


A Distributed System is a Network of Machines that can exchange
information with each other through Message-passing. It can be very
useful as it helps in resource sharing. It enables computers to coordinate
their activities and to share the resources of the system so that users
perceive the system as a single, integrated computing facility.

Types of Distributed Systems


1. Client/Server Systems
2. Peer-to-Peer Systems
3. Middleware
4. Three-tier
5. N-tier
1. Client/Server Systems: Client-Server System is the most basic
communication method where the client sends input to the server and
the server replies to the client with an output. The client requests the
server for resources or a task to do, the server allocates the resource or
performs the task and sends the result in the form of a response to the
request of the client. Client Server System can be applied with multiple
servers.
Examples of the Client-Server Model are Email, World Wide Web, etc.

So, it is the Client requesting something and the Server serving it as


long as it is in the database.

2. Peer-to-Peer Systems: Peer-to-Peer System communication model


works as a decentralized model in which the system works like both
Client and Server. Nodes are an important part of a system. In this, each
node performs its task on its local memory and shares data through the
supporting medium, this node can work as a server or as a client for a
system. Programs in the peer-to-peer system can communicate at the
same level without any hierarchy.

3. Middleware: Middleware can be thought of as an application that sits


between two separate applications and provides service to both. It
works as a base for different interoperability applications running on
different operating systems. Data can be transferred to other between
others by using this service.
4. Three-tier: Three-tier system uses a separate layer and server for each
function of a program. In this data of the client is stored in the middle
tier rather than sorted into the client system or on their server through
which development can be done easily. It includes an Application
Layer, Data Layer, and Presentation Layer. This is mostly used in web
or online applications.

5. N-tier: N-tier is also called a multitier distributed system. The N-tier


system can contain any number of functions in the network. N-tier
systems contain similar structures to three-tier architecture. When
interoperability sends the request to another application to perform a
task or to provide a service. N-tier is commonly used in web
applications and data systems.
1.12 Distributed Computing System
1. Cluster Computing
Cluster computing is a collection of tightly or loosely connected
computers that work together so that they act as a single entity. The
connected computers execute operations all together thus creating the
idea of a single system. The clusters are generally connected through
fast local area networks (LANs)

2. Grid Computing
Grid computing is a distributed architecture of multiple computers
connected by networks to accomplish a joint task. These tasks are
compute-intensive and difficult for a single machine to handle. Several
machines on a network collaborate under a common protocol and work
as a single virtual supercomputer to get complex tasks done. This offers
powerful virtualization by creating a single system image that grants
users and applications seamless access to IT capabilities.
1.13 Basic of Operating System

The Operating System (OS) is fundamental software that manages


computer hardware and software resources and provides common
services for computer programs. It acts as an intermediary between
users and the computer hardware.

Key Functions of an Operating System


1.Process Management
• The OS handles processes, which are instances of programs in
execution. It manages process creation, scheduling, and
termination.
• Multitasking: The OS allows multiple processes to run
simultaneously by managing CPU time between them.
• Process Scheduling: The OS decides the order in which processes
access the CPU, ensuring fair and efficient use of resources.

2.Memory Management
• The OS manages the computer's memory, including RAM and
cache.
• Allocation and Deallocation: It allocates memory to processes
when they need it and deallocates it when it's no longer required.
• Virtual Memory: The OS uses techniques like paging and
segmentation to extend the available memory by using disk space,
making it appear as if there is more memory than physically
available.
3.File System Management
• The OS provides a way to store, retrieve, and organize data on
storage devices (e.g., hard drives, SSDs).
• File Organization: It organizes data in files and directories,
providing a structured way to manage information.
• Access Control: The OS controls who can access or modify files,
ensuring data security.

4.Device Management
• The OS manages input and output devices, such as keyboards,
mice, printers, and storage devices.
• Device Drivers: The OS uses device drivers to communicate with
hardware devices, ensuring they operate correctly.
• I/O Operations: It manages all input/output

1.14 Difference Between Process and Program

Process Program
Process is active entity. Program is passive entity.
Process is a sequence of instruction Program contains the instructions.
executions.
Process is a dynamic entity. Program is a static entity.
Specific task oriented. Execution instance of a program.
Single Entity Multiple entities working together.
Lifespan longer Lifespan Limited
1.15 Process State Diagram
A process is a program in execution and it is more than a program code called as
text section and this concept works under all the operating system because all the
task perform by the operating system needs a process to perform the task

The process executes when it changes the state. The state of a process is defined
by the current activity of the process.

Each process may be in any one of the following states −

1. New − The process is being created.


2. Running − In this state the instructions are being executed.
3. Waiting − The process is in waiting state until an event occurs like I/O
operation completion or receiving a signal.
4. Ready − The process is waiting to be assigned to a processor.
5. Terminated − the process has finished execution.

It is important to know that only one process can be running on any processor at
any instant. Many processes may be ready and waiting.

Now let us see the state diagram of these process states −


Explanation
Step 1 − Whenever a new process is created, it is admitted into ready state.
Step 2 − If no other process is present at running state, it is dispatched to running
based on scheduler dispatcher.
Step 3 − If any higher priority process is ready, the uncompleted process will be
sent to the waiting state from the running state.
Step 4 − Whenever I/O or event is completed the process will send back to ready
state based on the interrupt signal given by the running state.
Step 5 − Whenever the execution of a process is completed in running state, it
will exit to terminate state, which is the completion of process.
1.16 OSI Architecture
There are n numbers of users who use computer network and are located over the
world.
So to ensure, national and worldwide data communication, systems must be
developed
which are compatible to communicate with each other ISO has developed a
standard. ISO
stands for International organization of Standardization. This is called a model
for Open
System Interconnection (OSI) and is commonly known as OSI model.

The ISO-OSI model is a seven layer architecture. It defines seven layers or levels
in a complete communication system. They are:

1. Application Layer

2. Presentation Layer

3. Session Layer

4. Transport Layer

5. Network Layer

6. Datalink Layer

7. Physical Layer
Fig. OSI seven-layer network architecture

1. Physical Layer
Purpose: Deals with the physical connection between devices and the
transmission and reception of raw binary data over a physical medium (like
cables, radio frequencies).
Functions: Bit transmission, signaling, interfacing, hardware (cables, switches,
NICs).
2. Data Link Layer
Purpose: Responsible for node-to-node data transfer and error
detection/correction from the Physical layer.
Functions: Frame creation, MAC addressing, error detection/correction, flow
control, switches.

3. Network Layer
Purpose: Manages data routing, forwarding, and addressing in the network.
Functions: Logical addressing (IP addresses), packet forwarding, routing, error
handling, congestion control, routers.
4. Transport Layer
Purpose: Ensures complete data transfer between host systems.
Functions: End-to-end communication, error recovery, flow control, data
segmentation, reliability (TCP, UDP).

5. Session Layer
Purpose: Manages sessions or connections between applications.
Functions: Session establishment, maintenance, and termination,
synchronization, dialog control.

6. Presentation Layer
Purpose: Translates data between the application layer and the network format.
Ensures that data is readable by the receiving system.
Functions: Data encryption/decryption, data compression, data translation
(syntax and semantics).

7. Application Layer
Purpose: Closest to the end-user, it interacts with software applications that
implement a communicating component.
Functions: Provides network services to end-users, such as email, file transfer,
and remote desktop.

1.17 Network Architecture


Computer Network Architecture is defined as the physical and logical design of
the software, hardware, protocols, and media of the transmission of data. Simply
we can say that how computers are organized and how tasks are allocated to the
computer.
The two types of network architectures are used:
Type:-
1. Peer-To-Peer network
2. Client/Server network

1.Peer-To-Peer network
➢ Peer-To-Peer network is a network in which all the computers are linked
together with equal privilege and responsibilities for processing the data.
➢ Peer-To-Peer network is useful for small environments, usually up to 10
computers.
➢ Peer-To-Peer network has no dedicated server.
➢ Special permissions are assigned to each computer for sharing the
resources, but this can lead to a problem if the computer with the resource
is down.
Advantages Of Peer-To-Peer Network:
o It is less costly as it does not contain any dedicated server.
o If one computer stops working but, other computers will not stop working.
o It is easy to set up and maintain as each computer manages itself.
Disadvantages Of Peer-To-Peer Network:
o In the case of Peer-To-Peer network, it does not contain the centralized
system . Therefore, it cannot back up the data as the data is different in
different locations.
o It has a security issue as the device is managed itself.

2.Client/Server Network
➢ Client/Server network is a network model designed for the end users called
clients, to access the resources such as songs, video, etc. from a central
computer known as Server.
➢ The central controller is known as a server while all other computers in the
network are called clients.
➢ A server performs all the major operations such as security and network
management.
➢ A server is responsible for managing all the resources such as files,
directories, printer, etc.
➢ All the clients communicate with each other through a server. For example,
if client1 wants to send some data to client 2, then it first sends the request
to the server for the permission. The server sends the response to the client
1 to initiate its communication with the client 2.

Advantages Of Client/Server network:


➢ A Client/Server network contains the centralized system. Therefore we can
back up the data easily.
➢ A Client/Server network has a dedicated server that improves the overall
performance of the whole system.
➢ Security is better in Client/Server network as a single server administers
the shared resources.
➢ It also increases the speed of the sharing resources.
Disadvantages Of Client/Server network:
➢ Client/Server network is expensive as it requires the server with large
memory.
➢ A server has a Network Operating System(NOS) to provide the resources
to the clients, but the cost of NOS is very high.
➢ It requires a dedicated network administrator to manage all the resources.

1.19 Distributed Computing System Models


Distributed Computing system models can be broadly classified into five
categories. They are
➢ Minicomputer model
➢ Workstation model
➢ Workstation – server model
➢ Processor – pool model
➢ Hybrid model

1.Minicomputer Model

➢ The minicomputer model is a simple extension of the centralized time-


sharing system.
➢ A distributed computing system based on this model consists of a few
minicomputers interconnected by a communication network were each
minicomputer usually has multiple users simultaneously logged on to it.
➢ Several interactive terminals are connected to each minicomputer.Each
user logged on to one specific minicomputer has remote access to other
minicomputers.
➢ The network allows a user to access remote resources that are available on
some machine other than the one on to which the user is currently logged.
The minicomputer model may be used when resource sharing with remote
users is desired.
➢ The early ARPA net is an example of a distributed computing system based
on the minicomputer model. 2.

2.Workstation Model

➢ A distributed computing system based on the workstation model consists


of several workstations interconnected by a communication network.
➢ An organization may have several workstations located throughout an
infrastructure were each workstation is equipped with its own disk &
serves as a single-user computer.
➢ In such an environment,at any one time a significant proportion of the
workstations are idle which results in the waste of large amounts of CPU
time.
➢ Therefore,the idea of the workstation model is to interconnect all these
workstations by a high-speed LAN so that idle workstations may be used
to process jobs of users who are logged onto other workstations & do not
have sufficient processing power at their own workstations to get their jobs
processed efficiently.
➢ Example:Sprite system & Xerox PARC.

3.Workstation-Server Model

➢ The workstation model is a network of personal workstations having its


own disk & a local file system.
➢ A workstation with its own local disk is usually called a diskful workstation
& a workstation without a local disk is called a diskless
workstation.Diskless workstations have become more popular in network
environments than diskful workstations,making the workstation-server
model more popular than the workstation model for building distributed
computing systems.
➢ A distributed computing system based on the workstation-server model
consists of a few minicomputers & several workstations interconnected by
a communication network.
➢ In this model,a user logs onto a workstation called his or her home
workstation.Normal computation activities required by the user's processes
are performed at the user's home workstation,but requests for services
provided by special servers are sent to a server providing that type of
service that performs the user's requested activity & returns the result of
request processing to the user's workstation.
➢ Therefore,in this model,the user's processes need not migrated to the server
machines for getting the work done by those machines.
➢ Example:The V-System.

4.Processor-Pool Model

➢ The processor-pool model is based on the observation that most of the time
a user does not need any computing power but once in a while the user may
need a very large amount of computing power for a short time.
➢ Therefore, unlike the workstation-server model in which a processor is
allocated to each user, in processor-pool model the processors are pooled
together to be shared by the users as needed.
➢ The pool of processors consists of a large number of microcomputers &
minicomputers attached to the network.
➢ Each processor in the pool has its own memory to load & run a system
program or an application program of the distributed computing system.
➢ In this model no home machine is present & the user does not log onto any
machine.
➢ This model has better utilization of processing power & greater flexibility.
➢ Example: Amoeba & the Cambridge Distributed Computing System.

5.Hybrid Model:

➢ The workstation-server model has a large number of computer users only


performing simple interactive tasks &-executing small programs.
➢ In a working environment that has groups of users who often perform jobs
needing massive computation the processor-pool model is more attractive
& suitable.
➢ To combine Advantages of workstation-server & processor-pool models, a
hybrid model can be used to build a distributed system.
➢ The processors in the pool can be allocated dynamically for computations
that are too large or require several computers for execution.
➢ The hybrid model gives guaranteed response to interactive jobs allowing
them to be more processed in local workstations of the users

1.20 Comparison of the OSI and TCP/IP Protocol Suite

OSI TCP/IP
OSI stands for Open Systems TCP/IP stands for Transmission
Interconnection Control Protocol/Internet Protocol
It has 7 layers It has 4 layers
It is low in usage It is mostly used
It is vertically approached It is horizontally approached
Delivery of the package is guaranteed Delivery of the package is not
in OSI Model guaranteed in TCP/IP Model
It is less reliable than TCP/IP Model It is more reliable than OSI Model
Error Handling Built into Data Link Error Handling Built into protocols
and Transport layers like TCP
1.21 Compare Network operating system and Distributed system.

Network operating system Distributed system


objective is to provide the local objective is to manage the hardware
services to remote client. resources.
Communication takes place on the Communication takes place on the
basis of files. basis of messages and shared memory.
more scalable than Distributed less scalable than Network Operating
Operating System. System.
fault tolerance is less. fault tolerance is high.
Rate of autonomy in Network While The rate of autonomy in
Operating System is high. Distributed Operating System is less.
Ease of implementation in Network While in Distributed Operating
Operating System is also high. System Ease of implementation is less.
In Network Operating System, All While in Distributed Operating
nodes can have different operating System, All nodes have same
system. operating system.

1.23 The TCP/IP Protocol Suite


The Transmission Control Protocol/Internet Protocol(TCP/IP) suite is a set of
network protocols which supports a four-layer network architecture. it is currently
the protocol suite employed on the Internet.
The TCP/IP reference model is a set of protocols that allow communication across
multiple diverse networks.
The internet layer implements the Internet protocol, which provides the
functionalities for allowing data to be transmitted between any two hosts on the
Internet.
The transport layer delivers the transmitted data to a specific process running on
an Internet host.
The application layer supports the programming interface used for building a
program.
Physical Layer: it is responsible for accepting and transmitting IP datagrams. This
layer may consist of a device derive in the operating system and the
corresponding network interface card in the machine.

1.24 Network Resources

You might also like