Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
240 views

Distributed System

This document provides an overview of the Distributed Computing Systems (CS504) course syllabus. The course covers three modules: [1] characterization of distributed systems and system models, [2] distributed objects and remote invocation, and [3] transactions and concurrency control. The document also lists required and reference textbooks and outlines prerequisites and basics of distributed systems including definitions, examples, and common characteristics such as heterogeneity, openness, security, and scalability.

Uploaded by

Biki Jha
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
240 views

Distributed System

This document provides an overview of the Distributed Computing Systems (CS504) course syllabus. The course covers three modules: [1] characterization of distributed systems and system models, [2] distributed objects and remote invocation, and [3] transactions and concurrency control. The document also lists required and reference textbooks and outlines prerequisites and basics of distributed systems including definitions, examples, and common characteristics such as heterogeneity, openness, security, and scalability.

Uploaded by

Biki Jha
Copyright
© © All Rights Reserved
Available Formats
Download as PPT, PDF, TXT or read online on Scribd
You are on page 1/ 162

Distributed Computing Systems

(CS504)
Syllabus

Module1: Module3: Distributed


Characterization of Objects and Remote
Distributed Systems: Invocation:
Introduction, Examples Communication
of distributed Systems, between distributed
Issues in Distributes objects,Remote
Operating Systems, procedure call, Events
Resource sharing and and notifications, Java
the Web Challenges. RMI case study.
System Models: Transactions and
Architectural models, Concurrency Control:
Fundamental Models Transactions, Nested
Text Books:
1. Distributed System: Concepts and Design, by Coulouris, Dollimore, Kindberg , Pearson
Education, 2006
2. Advanced Concepts in Operating Systems, by Mukesh Singhal & Niranjan G Shivaratri, Tata
McGraw Hill,
2001.
Reference Books:

1. Tenenbaum, S., Distributed Operating


Systems, Pearson Education, 2005.
2. P K Sinha, Distributed System:
Concepts and Design, PHI, 2004.
Pre-Requisites
The pre-requisites are significant programming
experience with a language such as C , a basic
understanding of networking, and data structures
& algorithms.
The Basics
 What is a distributed system?
 It's one of those things that's hard to define
without first defining many other things.
Here is a "cascading" definition
of a distributed system:
 A program: is the code you write.
 A process: is what you get when you run it.
 A message: is used to communicate
between processes.
 A packet : is a fragment of a message that
might travel on a wire.
 A protocol : is a formal description of
message formats and the rules that two
processes must follow in order to exchange
those messages.
 A network : is the infrastructure that links
Characterization of Distributed
Systems:
 Introduction: A distributed system is a software
system in which components located on networked
computers communicate and coordinate their
actions by passing messages.

The components interact with each other in order to


achieve a common goal.
A distributed system
 is an application that executes a collection of
protocols to coordinate the actions of multiple
processes on a network, such that all components
cooperate together to perform a single or small set
of related tasks.
What is a distributed system?

 A collection of independent computers that


appears to its users as a single coherent system.
 It consists of multiple computers that do not share
memory.
 Each Computer has its own memory and runs its
own operating system.
 The computers can communicate with each other
through a communication network.
1.2 Distributed System Characteristics

 Multiple autonomous components

 Components are not shared by all users

 Resources may not be accessible

 Software runs in concurrent processes on different processors

 Multiple points of control

 Multiple points of failure

14
2. Examples of Distributed Systems

 Local Area Network and Intranet

 Database Management System

 Automatic Teller Machine Network

 Internet/World-Wide Web

 Mobile and Ubiquitous Computing

15
2.1 Local Area Network

email server Desktop


computers
print and other servers

Local area
Web server network

email server
print
File server
other servers

the rest of
the Internet
router/firewall
16
2.2 Database Management System

17
2.3 Automatic Teller Machine Network

18
2.4 Internet

intranet 

 ISP

backbone

satellite link

desktop computer:
server:
network link:

19
2.4.1 World-Wide-Web

20
3. Common Characteristics

 What are we trying to achieve when we construct a distributed system?


 Certain common characteristics can be used to assess distributed systems
 Heterogeneity
 Openness
 Security
 Scalability
 Failure Handling
 Concurrency
 Transparency

21
3.1 Heterogeneity
 Variety and differences in
 Networks(IP4 & IP6)
 Computer hardware (32 bit and 63 bit)
 Operating systems(windows and linux)
 Programming languages(c and java)
 Implementations by different developers( biswajit & rajib)
 Middleware as software layers to provide a programming abstraction as well
as masking the heterogeneity of the underlying networks, hardware, OS, and
programming languages (e.g., CORBA).
 Mobile Code to refer to code that can be sent from one computer to another
and run at the destination (e.g., Java applets and Java virtual machine).

22
3.2 Openness
 Openness is concerned with extensions and improvements of
distributed systems.

 Detailed interfaces of components need to be published.

 Newcomponents have to be integrated with existing


components.

 Differencesin data representation of interface types on different


processors (of different vendors) have to be resolved.
Intel & AMD

23
3.3 Security
 In
a distributed system, clients send requests to access data
managed by servers, resources in the networks:
 Doctors requesting records from hospitals
 Users purchase products through electronic commerce
 Security is required for:
 Concealing the contents of messages: security and privacy
 Identifying a remote user or other agent correctly (authentication)
 New challenges:
 Denial of service attack
 Security of mobile code

24
3.4 Scalability
 Adaptation of distributed systems to
 accommodate more users
 respond faster (this is the hard one)

 Usually done by adding more and/or faster processors.


 Components should not need to be changed when scale of a
system increases.
 Design components to be scalable!

25
3.5 Failure Handling (Fault Tolerance)

 Hardware, software and networks fail!


 Distributed systems must maintain availability even at low
levels of hardware/software/network reliability.
 Fault tolerance is achieved by
 recovery
 redundancy

26
3.6 Concurrency
 Components in distributed systems are executed in concurrent
processes.
 Components access and update shared resources (e.g. variables,
databases, device drivers).
 Integrity of the system may be violated if concurrent updates
are not coordinated.
 Lost updates
 Inconsistent analysis

27
3.7 Transparency
 Distributedsystems should be perceived by users and
application programmers as a whole rather than as a collection
of cooperating components.
 Transparency has different aspects.
 Theserepresent various properties that distributed systems
should have.

28
Distributed Systems: Hardware Concepts
Multiprocessors
Multicomputers
Networks of Computers
Multiprocessors and
Multicomputers
Distinguishing features:
•Private versus shared memory
•Bus versus switched interconnection
Distributed Systems: Software Concepts
Distributed operating system
Network operating system
Middleware
Distributed Operating System
Some characteristics:
_ OS on each computer knows about the other computers
_ OS on different computers generally the same
_ Services are generally (transparently) distributed across
computers
Network Operating System
Some characteristics:
_ Each computer has its own operating system with networking facilities
_ Computers work independently (i.e., they may even have different operating
systems)
_ Services are tied to individual nodes (ftp, telnet, WWW)
_ Highly file oriented (basically, processors share only files)
Distributed System (Middleware)
Some characteristics:
_ OS on each computer need not know about the other computers
_ OS on different computers need not generally be the same
_ Services are generally (transparently) distributed across computers
Need for Middleware

Too many networked applications were


hard or difficult to integrate:
_ Departments are running different NOSs
_ Integration and interoperability only at level of primitive NOS services
_ Need for federated information systems:
– Combining different databases, but providing a single view to applications
– Setting up enterprise-wide Internet services, making use of existing
information systems
– Allow transactions across different databases
– Allow extensibility for future services (e.g., mobility, teleworking,
collaborative applications)
_ Constraint: use the existing operating systems, and treat them as the
underlying environment
(they provided the basic functionality anyway)
Communication services: Abandon primitive socket based message
passing in favor of:
_ Procedure calls across networks
_ Remote-object method invocation
_ Message-queuing systems
_ Advanced communication streams
_ Event notification service
Information system services: Services that help manage data
in a distributed system:
_ Large-scale, system wide naming services
_ Advanced directory services (search engines)
_ Location services for tracking mobile objects
_ Persistent storage facilities
_ Data caching and replication
Control services:
Services giving applications control over when,
where, and how they access data:
_ Distributed transaction processing
_ Code migration
Security services: Services for secure processing and
communication:
_ Authentication and authorization services
_ Simple encryption services
_ Auditing service
Comparison of DOS, NOS, and Middleware
Distributed systems has the following significant
consequences:

Concurrency

No global clock

Independent failures
concurrency
In a network of computers, concurrent program execution
is the norm. I can do my work on my computer while you
do your work on yours, sharing resources such as web
pages or files when necessary.
 Ticket Producer p=4 Ticket consumer
p=p+1 p=p-1
[ r=p; [r=p;
r=r+1; p=p-1;
time out time out

p=r] p=r]
40
p=3
No global clock

When programs need to cooperate they coordinate their actions by


exchanging messages.

Close coordination often depends on a shared idea of the time at which


the programs’ actions occur.

But it turns out that there are limits to the accuracy with which the
computers in a network can synchronize their clocks – there is no single
global notion of the correct time.

42
Independent failures

All computer systems can fail, and it is the responsibility of system


designers to plan for the consequences of possible failures. Distributed
systems can fail in new ways.

Faults in the network result in the isolation of the computers that are
connected to it, but that doesn’t mean that they stop running. In fact,
the programs

43
TRENDS IN DISTRIBUTED SYSTEMS
Distributed systems are undergoing a period of significant change
and this can be traced back to a number of influential trends:

The emergence of pervasive networking technology;

The emergence of ubiquitous computing coupled with the


desire to support user mobility in distributed systems;

The increasing demand for multimedia services;

The view of distributed systems as a utility.

44
The delay between the start of a message’s
transmission from one process and the beginning of
its receipt by another is referred to as latency.

Client invocation invocation Server

result result
Server

Client
Key:
Process: Computer:

45
Computer clocks and timing events

Each computer in a distributed system has its own internal clock,


which can be used by local processes to obtain the value of the
current time.

Therefore two processes running on different computers can each


associate timestamps with their events. However, even if the two
processes read their clocks at the same time, their local clocks
may supply different time values.

This is because computer clocks drift from perfect time and, more
importantly, their drift rates differ from one another.

46
Client invocation invocation Server

result result
Server

Client
Key:
Process: Computer:

47
Two variants of the interaction model

In a distributed system it is hard to set limits on the


time that can be taken for process execution, message delivery or
clock drift.

Two opposing extreme positions provide a pair of simple models –


the first has a strong assumption of time and the second makes no
assumptions about time

48
Synchronous distributed systems

Hadzilacos and Toueg define a synchronous distributed system


to be one in which the following bounds are defined:

• The time to execute each step of a process has known lower and
upper bounds.

• Each message transmitted over a channel is received within a


known bounded time.

• Each process has a local clock whose drift rate from real time
has a known bound.
49
Asynchronous distributed systems

Many distributed systems, such as the Internet, are very


useful without being able to qualify as synchronous systems.
Therefore we need an alternative model. An asynchronous
distributed system is one in which there are no bounds on:
• Process execution speeds – for example, one process step may
take only a picosecond and another a century; all that can be said
is that each step may take an arbitrarily long time.
• Message transmission delays – for example, one message from
process A to process B may be delivered in negligible time and
another may take several years. In other words, a message may
be received after an arbitrarily long time.
• Clock drift rates – again, the drift rate of a clock is arbitrary.
50
Event ordering

In many cases, we are interested in knowing whether an event


(sending or receiving a message) at one process occurred before,
after or concurrently with another event at another process.

The execution of a system can be described in terms of events and


their ordering despite the lack of accurate clocks.

51
Event ordering

For example, consider the following set of exchanges


between a group of email users, X, Y, Z and A, on a mailing list:
1. User X sends a message with the subject Meeting.
2. Users Y and Z reply by sending a message with the subject Re:
Meeting.

In real time, X’s message is sent first, and Y reads it and replies;
Z then reads both X’s message and Y’s reply and sends another
reply, which references both X’s and Y’s messages.
But due to the independent delays in message delivery, the
messages may be delivered as shown in the following figure and
some users may view these two messages in the wrong order.
52
Failure model

In a distributed system both processes and communication


channels may fail – that is, they may depart from what is
considered to be correct or desirable behaviour.

The failure model defines the ways in which failure may occur in
order to provide an understanding of the effects of
failures.

54
Omission failures

The faults classified as omission failures refer to cases when a


process or communication channel fails to perform actions that it
is supposed to do.

Process omission failures: The chief omission failure of a process


is to crash. When, say that a process has crashed we mean that it
has halted and will not execute any further steps of its program
ever.

55
Communication omission failures: Consider the communication
primitives send and receive. A process p performs a send by
inserting the message m in its outgoing message buffer. The
communication channel transports m to q’s incoming message
buffer. Process q performs a receive by taking m from its incoming
message buffer and delivering it. The outgoing and incoming
message buffers are typically provided by the operating system.

56
Arbitrary failures • The term arbitrary or Byzantine failure is
used to describe the worst possible failure semantics, in which
any type of error may occur. For example, a process may set
wrong values in its data items, or it may return a wrong value in
response to an invocation.

57
Timing failures • Timing failures are applicable in synchronous
distributed systems where time limits are set on process
execution time, message delivery time and clock drift rate.

Any one of these failures may result in responses being


unavailable to clients within a specified time interval.

58
Masking failures • Each component in a distributed system is
generally constructed from a collection of other components.
It is possible to construct reliable services from components that
exhibit failures.

For example, multiple servers that hold replicas of data can


continue to provide a service when one of them crashes. A
knowledge of the failure characteristics of a component can enable
a new service to be designed to mask the failure of the
components on which it depends.

59
CLOCKS, EVENTS AND PROCESS STATES

Each process executes on a single processor, and the processors do


not share memory.

Each process pi in has a state si that, in general, it transforms as


it executes. The process’s state includes the values of all the
variables within it.

Its state may also include the values of any objects in its local
operating system environment that it affects, such as files.

We assume that processes cannot communicate with one another


in any way except by sending messages through the network.
60
CLOCKS, EVENTS AND PROCESS STATES

We define an event to be the occurrence of a single action that a


process carries out as it executes – a communication action or a
state-transforming action.

The sequence of events within a single process pi can be placed in


a single, total ordering, which we denote by the relation i between
the events.

That is, if and only if the event e occurs before e at pi . This


ordering is well defined, whether or not the process is
multithreaded,

61
software clock

The operating system reads the node’s hardware clock value, Hit ,
scales it and adds an offset so as to produce a software clock
Cit = Hit + that approximately measures real, physical time t for
process pi .

In other words, when the real time in an absolute frame of


reference is t, Cit is the reading on the software clock.

62
Synchronizing physical clocks

In order to know at what time of day events occur at the processes


in our distributed system –

for example, for accountancy purposes – it is necessary to


synchronize the processes’ clocks, Ci , with an authoritative,
external source of time. This is external synchronization.

And if the clocks Ci are synchronized with one another to a known


degree of accuracy, then we can measure the interval between two
events occurring at different computers by appealing to their local
clocks, even though they are not necessarily synchronized to an
external source of time. This is internal synchronization.
63
Logical clocks
• Why do we need clocks?
– To determine when one thing happened
before another
• Can we determine that without using a
“clock” at all?
– Then we don’t need to worry about
synchronisation, millisecond errors etc..
Happened before
• a⟶b : a happened before b
– If a and b are successive events in same
process then a⟶b
– Send before receive
• If a : “send” event of message m
• And b : “receive” event of message m
• Then a⟶b
– Transitive: a⟶b and b⟶c ⟹a⟶c

65
Happened before is a partial ordering
Happened before & causal order
• Happened before == could have caused/
influenced
• Preserves causal relations
• Implies a partial order
– Implies time ordering between certain
pairs of events
– Does not imply anything about ordering
between concurrent events

68
Logical clocks
• Idea: Use a counter at each process
• Increment after each event
• Can also increment when there are no
events
– Eg. A clock
• An actual clock can be thought of as such
an event counter
• It counts the states of the process
• Each event has an associated time: The
count of the state when the event happened
69
Lamport clocks
• Keep a logical clock (counter)
• Send it with every message
• On receiving a message, set own clock to
max({own counter, message counter}) + 1
• For any event e, write c(e) for the logical
time
• Property:
– If a⟶b, then c(a) < c(b)
– If a || b, then no guarantees

70
Concurrency and Lamport clocks
• If e1⟶e2
– Then no Lamport clock C exists with
C(e1)==C(e2)

• If e1||e2, then there exists a Lamport


clock C such that C(e1)== C(e2)

72
The Purpose of Lamport Clocks
• If a⟶b, then c(a) < c(b)
• If we order all events by their Lamport
clock times
– We get a partial order, since some events have
same time
– The partial order satisfies “causal relations”
• Suppose there are events in different machines
– Transactions, money in/out, file read, write,
copy
• An ordering of events that guarantees
preserving causality
73
Total order from Lamport clocks
• If event e occurs in process j at time C(e)
– Give it a time (C(e), j)
– Order events by (C, process id)
– For events e1 in process i, e2 in process j:
• If C(e1)<C(e2), then e1<e2
• Else if C(e1)==C(e2) and i<j, then e1<e2
• Leslie Lamport. Time, clocks and ordering
of events in a distributed system.

74
Vector Clocks
• We want a clock such that:
– If a⟶b, then c(a) < c(b)
– AND
– If c(a) < c(b), then a⟶b

– Ref: Coulouris et al., V. Garg

75
Vector Clocks
• Each process i maintains a vector V i
• V i has n elements
– keeps clock V i [j] for every other process j
– On every local event: V i [i] =V i [i]+1
– On sending a message, i sends entire V i
– On receiving a message at process j:
• Takes max element by element
• V j [k] = max(V j [k], V i [k]), for k = 1,2,...,n
• And adds 1 to V j [j]
76
77
Comparing Timestamps
• V = V’ iff V[i] == V’[i] for i=1,2,...,n
• V < V’ iff V[i] < V’[i] for i=1,2,...,n

For events a, b and vector clock V


– a⟶b iff V(a) < V(b)
• Is this a total order?

• Two events are concurrent if


– Neither V(a) < V(b) nor V(b) < V(a)
79
Vector Clocks
• What are the drawbacks?
– Entire vector is sent with message
– All vector elements (n) have to be checked on
every message
• What is the communication complexity?
– Ω(n) per message
– Increases with time

81
Logical Clocks
• There is no way to have perfect knowledge
on ordering of events

– A “true” ordering may not exist..


– Logical and vector clocks give us a way to
have ordering consistent with causality

82
Distributed Snapshots
• Take a “snapshot” of a system
• E.g. for backup: If system fails, it can start
up from a meaningful state
• Problem:
– Imagine a sky filled with birds. The sky is
too large to cover in a single picture.
– We want to take multiple pictures that are
consistent in a suitable sense
• Eg. We can correctly count the number of
birds from the snapshot
83
Distributed Snapshots
• Global state:
– State of all processes and communication
channels
• Consistent cuts:
– A set of states of all processes is a consistent
cut if:
– For any states s, t in the cut, s||t
• If a⟶b, then the following is not allowed:
– b is before the cut, a is after the cut

84
Distributed Snapshot Algorithm
• Ask each process to record its state
• The set of states must be a consistent cut
• Assumptions:
– Communication channels are FIFO
– Processes communicate only with neighbors
– (We assume for now that everyone is
neighbor of everyone)
– Processes do not fail

86
Global Snapshot
Chandy and Lamport Algorithm
• One process initiates snapshot and sends a
marker
• Marker is the boundary between “before”
and “after” snapshot

87
Global snapshot: Chandy and Lamport algorithm
• Marker send rule (Process i)
– Process i records its state
– On every outgoing channel where a marker has not been sent:
• i sends a marker on the channel
• before sending any other message
• Marker receive rule
(Process i receives marker on channel C)
– If i has not received the marker before
• Record state of I
• Record state of C as empty
• Follow marker send rule
– Else:
• Record the state of C as the set of messages received on C since
recording i’s state and before receiving marker on C
• Algorithm stops when all processes have received
88

marker on all incoming channels


Complexity
• Message?
• Time?
89
Where snapshots are not useful:
non-stable predicates
• E.g.
– Was the antenna accessed for two
transmissions at the same time?
– Non-stable predicates may have happened,
but then system state changes..

90
Non-stable predicates
• Possibly B:
– B could have happened
• Definitely B:
– B definitely happened
• How can we check for definitely B and
possibly B?

91
Collecting global states
• Each process notes its state & vector
timestamp
– Sends it to a server for recording
– Note: we do not need to save every time a
state changes: only when it affects the
predicates to be checked
• Assuming we know what predicates will be
checked
• The server looks at these and tries to figure
out if predicate B was possibly or definitely
true 92
The circles are ‘states’, and bars are ‘events’
–We are concerned with which pairs of states
form consistent cuts
Mutual exclusion: Concurrent access of processes
to a shared resource or data is executed in
mutually exclusive manner. ... In a distributed
system, shared variables (semaphores) or a local
kernel cannot be used to implement mutual
exclusion. Message passing is the sole means for
implementing distributed mutual exclusion.
Remote Procedure Call
Remote Procedure Call
• Remote Procedure Call (RPC) is a protocol that allows
programs to call procedures located on other machines.
• RPC uses the client/server model. The requesting program is a
client and the service-providing program is the server.
• The client stub acts as a proxy for the remote procedure.
• The server stub acts as a correspondent to the client stub.
Client and Server Stubs

Principle of RPC between a client and server program.


Steps of a Remote Procedure
Call
1. Client procedure calls client stub in normal way
2. Client stub builds message, calls local OS
3. Client's OS sends message to remote OS
4. Remote OS gives message to server stub
5. Server stub unpacks parameters, calls server
6. Server does work, returns result to the stub
7. Server stub packs it in message, calls local OS
8. Server's OS sends message to client's OS
9. Client's OS gives message to client stub
10.Stub unpacks result, returns to client
Steps of a Remote Procedure
Call
Passing Value Parameters

2-8

Steps involved in doing remote computation through RPC


• The Distributed Computing Environment (DCE) RPC is
RPC in Practice - DCE RPC
developed by the Open Software Foundation (OSF)/Open
Group.
• DCE is a middleware executing as an abstraction layer between
(network) operating systems and distributed applications.
• Microsoft derived its version of RPC from DCE RPC (e.g., MIDL
compiler, etc.)
• DCE includes a number of services:
Distributed file service
Directory service
Security service
Distributed time service
• The main goal
DCE of the
RPC DCE RPC is to make it
possible for a client to access a remote service by
simply calling a local procedure.
• Developing a RPC application:
Writing a client and a server: Let the developer concentrate
on only the client- and server-specific code; let the RPC
system (generators and libraries) do the rest. Interface
Definition Language (IDL) is used to specify the variables
(data) and functions (methods).
Binding a client to a server
Performing an RPC
Writing a Client and a Server

2-14

The steps in writing a client and a server in DCE RPC.


Client-to-Server Binding (DCE
• Issues: Client must locate server machine and locate the
server. RPC)
• Execution of Client and Server
The server registers its procedures with the portmapper.
Client must locate server machine: The client contacts the portmapper to
determine which port the server is listening to.
The client communicates with the server on the assigned port.
• DCE uses a separate daemon for each server machine.
Binding a Client to a Server

2-15

Client-to-server binding in DCE.

You might also like