Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

Syllabus  

Blank Homework  
Notes   Labs   Scores   Blank

Lecture Notes
Dr. Tong Lai Yu, March 2010

0. Review and Overview 7. Distributed OS Theories


1. B-Trees 8. Distributed Mutual Exclusions
2. An Introduction to Distributed Systems 9. Agreement Protocols
3. Deadlocks 10. Distributed Scheduling
    
4. Distributed Systems Architecture 11. Distributed Resource Management
5. Processes 12. Recovery and Fault Tolerance
6. Communication 13. Security and Protection
 
Communication
1. Layered Protocols
Low-level layers
Transport layer
Application layer
Middleware layer

Basic networking model

ISO OSI model

Drawbacks:
Focus on message-passing only
Often unneeded or unwanted functionality

Low-level layers

Recap
Physical layer: contains the specification and implementation of

bits, and their transmission between sender and receiver


Data link layer: prescribes the transmission of a series of bits into

a frame to allow for error and flow control


Network layer: describes how packets in a network of computers

are to be routed.

Observation

For many distributed systems, the lowest-level interface

is that of the network layer.

Transport Layer

Important

The transport layer provides the actual communication facilities for

most distributed systems.

Standard Internet Procotols:

TCP: connection-oriented, reliable, stream-oriented

communication
UDP: unreliable (best-effort) datagram communication

Note

IP multicasting is often considered a standard available service (which

may be dangerous to assume).

Middleware Layer

Observation

Middleware is invented to provide common services and protocols

that can be used by many different applications

A rich set of communication protocols


(Un)marshaling of data, necessary for integrated systems
Naming protocols, to allow easy sharing of resources
Security protocols for secure communication
Scaling mechanisms, such as for replication and caching

Note

What remains are truly application-specific protocols...

such as?

9. Types of communication
We can view the middleware as an additional service in
client server computing:

(Consider, for example an email system.)

Traditional Client-Server

Client-server with Middleware

Distinguish:

Transient versus persistent communication


Asynchrounous versus synchronous communication

Transient versus persistent:

Transient communication: Comm. server discards message when


cannot be delivered at the next server, or at the receiver.
Persistent communication: A message is stored at a communication

server as long as it takes to deliver it.

Asynchronous versus synchronous:

Asynchronous communication: A sender continues immediately after

it has submitted the message for transmission.


Synchronous communication: The sender is blocked until its request

is known to be accepted. There are three places that synchronization can take place

( see Figure above ):


At request submission
At request delivery
After request processing

Client/Server

Some observations

Client/Server computing is generally based on a model of transient

synchronous communication:

Client and server have to be active at time of commun.


Client issues request and blocks until it receives reply
Server essentially waits only for incoming requests, and
subsequently processes them

Drawbacks of synchronous communication

Client cannot do any other work while waiting for reply


Failures have to be handled immediately: the client is waiting
The model may simply not be appropriate (mail, news)

Messaging

Message-oriented middleware ( MOM )


Aims at high-level persistent asynchronous communication:

Processes send each other messages, which are queued


Sender need not wait for immediate reply, but can do other things
Middleware often ensures fault tolerance

26. Remote Procedure Call (RPC)


Basic RPC operation

Observations

Application developers are familiar with simple procedure model


Well-engineered procedures operate in isolation (black box)
There is no fundamental reason not to execute procedures on

separate machine

Conclusion

Communication between caller &

callee can be hidden by using

procedure-call mechanism.

1 Client procedure calls client stub.


6 Server returns result to stub.

2 Stub builds message; calls local OS.


7 Stub builds message; calls OS.

3 OS sends message to remote OS.


8 OS sends message to client's OS.

 
4 Remote OS gives message to stub.
9 Client's OS gives message to stub.

5 Stub unpacks parameters and calls


10 Client stub unpacks result and returns to

    server.     the client

RPC: Parameter passing

Parameter marshaling

There's more than just wrapping parameters into a message:


Client and server machines may have different data

representations (think of byte ordering)


Wrapping a parameter means transforming a value into a

sequence of bytes
Client and server have to agree on the same encoding:
How are basic data values represented (integers, floats, characters)
How are complex data values represented (arrays, unions)
Client and server need to properly interpret messages,

transforming them into machine-dependent representations.

RPC parameter passing: some assumptions

Copy in/copy out semantics: while procedure is executed, nothing can

be assumed about parameter values.


All data that is to be operated on is passed by parameters. Excludes

passing references to (global) data.


If need to pass by reference: copy array data into message

(still cannot handle arbitrary structure)

Asynchronous RPCs

Essence

Try to get rid of the strict request-reply behavior, but let the client

continue without waiting for an answer from the server.

(a) Traditional RPC

(b) Asynchronous RPC ( no returned result required )

Deferred Synchronous RPCs

Variation

Client can also do a (non)blocking poll at the server to see whether

results are available.

RPC in Practice
Client-to-server binding (DCE)

Issues(1) Client must locate server machine, and (2) locate the server.

36. Message-Oriented Communication


Transient Messaging
Message-Queuing System
Message Brokers
Example: IBM Websphere

Transient messaging: sockets

Berkeley socket interface


Persistent messaging

Message-Queuing Model

Loosely-coupled communications using Queues.

Sender and receiever can execute completely independent of each other.

Essence

Asynchronous persistent communication through support of

middleware-level queues. Queues correspond to buffers at

communication servers.

Basic interface to a queue in message-queuing system


PUT Append a message to a specified queue
Block until the specified queue is nonempty, and remove

GET
the first message
Check a specified queue for messages, and remove

POLL
the first. Never block
Install a handler to be called when a message is put

NOTIFY  
into the specified queue

Message Broker

Observation

Message queuing systems assume a common messaging protocol: all

applications agree on message format (i.e., structure and data

representation) i.e. the sender needs to have its outgoing messages

in the same format as that of the receiver

Message broker

Centralized component that takes care of application heterogeneity in

an MQ system:

Transforms incoming messages to target format


Very often acts as an application gateway
May provide subject-based routing capabilities => Enterprise

Application Integration ( publish/subscribe )

IBM's WebSphere MQ

Basic concepts:

All queues are managed by queue managers


Application-specific messages are put into, and removed from

queues
Queues reside under the regime of a queue manager
Processes can put messages only in local queues, or through an

RPC mechanism

Message transfer:

Messages are transferred between queues


Message transfer between queues at different processes, requires

a channel
At each endpoint of channel is a message channel agent
Message channel agents (MCAs ) are responsible for:
Setting up channels using lower-level network communication

facilities (e.g., TCP/IP)


(Un)wrapping messages from/in transport-level packets
Sending/receiving packets

Channels are inherently unidirectional


Automatically start MCAs when messages arrive
Any network of queue managers can be created
Routes are set up manually (system administration)

Routing

By using logical names, in combination with name resolution to local queues,

it is possible to put a message in a remote queue

Entry in a routing table: (destQM, sendQ)

Local alias for queue manager names is used to improve management flexibility.

49. Stream-oriented communication


Support for continuous media
Streams in distributed systems
Stream management

Continuous media

Observation

All communication facilities discussed so far are essentially based on a

discrete, that is time-independent exchange of information

Continuous media

Characterized by the fact that values are time dependent:

Audio
Video
Animations
Sensor data (temperature, pressure, etc.)

Transmission modes

Different timing guarantees with respect to data transfer:

Asynchronous: no restrictions with respect to when data is to be

delivered
Synchronous: define a maximum end-to-end delay for individual

data packets
Isochronous: define a maximum end-to-end delay and maximum delay variance

(jitter is bounded)

Stream

Definition

A (continuous) data stream is a connection-oriented communication

facility that supports isochronous data transmission.

Some common stream characteristics

Streams are unidirectional


There is generally a single source, and one or more sinks
Often, either the sink and/or source is a wrapper around hardware

(e.g., camera, CD device, TV monitor)


Simple stream: a single flow of data, e.g., audio or video
Complex stream: multiple data flows, e.g., stereo audio or

combination audio/video

Streams and QoS


Essence

Streams are all about timely delivery of data. How do you specify this

Quality of Service (QoS)? Basics:

The required bit rate at which data should be transported.


The maximum delay until a session has been set up (i.e., when an

application can start sending data).


The maximum end-to-end delay (i.e., how long it will take until a

data unit makes it to a recipient).


The maximum delay variance, or jitter.
The maximum round-trip delay.

Enforcing QoS

Observation

There are various network-level tools, such as differentiated services

by which certain packets can be prioritized.

Also

Use buffers to reduce jitter:

Problem

How to reduce the effects of packet loss (when multiple samples are in

a single packet)?

The effect of packet loss in (a)noninterleaved transmission and

(b) interleaved transmission

Stream synchronization

Problem

Given a complex stream, how do you keep the different substreams in

synch?

Example

Think of playing out two channels, that together form stereo sound.

Difference should be less than 20-30 μsec!


Alternative

Multiplex all substreams into a single stream, and demultiplex at the

receiver. Synchronization is handled at multiplexing/demultiplexing

point (MPEG).

Time-division multiplexing

70. Multicast communication


See video Multicast Fundamentals

Multicast communication

Application-level multicasting
Gossip-based data dissemination

Application-level multicasting ( ALM )

Essence

Organize nodes of a distributed system into an overlay network and use that

network to disseminate data.

Two approaches in organizing the network:

a tree: unique paths between 2 nodes


a mesh: multiple paths between 2 nodes (more robust)

Chord-based tree building

1 Initiator generates a multicast identifier ( mid ).

2 Lookup succ(mid), the node responsible for mid. ( see also previous chapter )
promote the node to become the root of the tree
3 Request is routed to succ(mid), which has become the root.

4 If P wants to join, it sends a join request to the root.

5 When request arrives at Q:


Q has not seen a join request before, it becomes a forwarder for the group;
P becomes child of Q. Join request continues to be forwarded.
Q knows about tree; P becomes child of Q. No need to forward

join request anymore.

ALM: Some Costs


Link stress: How often does an ALM message cross the same

physical link? Example: message from A to D needs to cross

(Ra,Rb) twice.
Stretch: Ratio in delay between ALM-level path and network-level

path. Example: messages B to C

B --> Rb --> Ra --> Rc --> C (total cost =59)


B --> Rb --> Rd --> Rc -- C (total cost = 47)

=> stretch = 59/47 = 1.255.

Epidemic Algorithms

General background
Update models
Removing objects

Basic idea

Assume there are no write conflicts:

Update operations are performed at a single server


A replica passes updated state to only a few neighbors
Update propagation is lazy, i.e., not immediate
Eventually, each update should reach every replica

Two forms of epidemics

Anti-entropy: Each replica regularly chooses another replica at random,

and exchanges state differences, leading to identical states at both

afterwards
Gossiping: A replica which has just been updated (i.e., has been

contaminated), tells a number of other replicas about its update

(contaminating them as well).

Anti-entropy

Principle operations:

A node P selects another node Q from the system at random.


Push: P only sends its updates to Q
Pull: P only retrieves updates from Q
Push-Pull: P and Q exchange mutual updates (after which they

hold the same information).

Observation

For push-pull it takes O(log(N)) rounds to disseminate updates to all

N nodes (round = when every node as taken the initiative to start an

exchange).

Gossiping

Basic model

A server S having an update to report, contacts other servers. If a

server is contacted to which the update has already propagated, S

stops contacting other servers with probability 1/k.

Observation

If s is the fraction of ignorant servers (i.e., which are unaware of the

update), it can be shown that with many servers

s = e-(k+1)(1-s)
Note

If we really have to ensure that all servers are eventually updated,

gossiping alone is not enough

Deleting Values

Fundamental problem

We cannot remove an old value from a server and expect the removal

to propagate. Instead, mere removal will be undone in due time using

epidemic algorithms

Solution

Removal has to be registered as a special update by inserting a death

certificate

90. Naming

a. Naming Entities

Names are used to denote entities in a distributed system.

To operate on an entity, we need to access it at an access point.


Access points are entities that are named by means of an address.
A location-independent name for an entity E, is independent from the

addresses of the access points offered by E.

d. Identifier

A name with the following properties:

Each identifier refers to at most one entity


Each entity is referred to by at most one identifier
An identifier always refers to the same entity (prohibits reusing
an identifier)

h. Flat Naming

Given an essentially unstructured name (e.g., an identifier), how can


we locate its associated access point?

Simple solutions:

broadcasting: cannot scale beyond LAN


Forwarding pointers: When an entity moves, it leaves behind a pointer to next location

Home-based approaches
Use a home location to keep track of the current location of an entity.

Distributed Hash Tables (DHT) (e.g. Chord system)

Organize many nodes into a logical ring:


Each node is assigned a random m-bit identifier.
Every entity is assigned a unique m-bit key.
Entity with key k associates with node with smallest id ≥ k
( called its successor, denoted by succ(K) ).
Nonsolution: Let node p keep track of succ ( p + 1 ) as well

as its precessor pred ( p ) and start linear search along the ring.
Use Finger Tables:

Each node p maintains a finger table FTp[] with at most m entries


(use mod 2m arithmetic ):
FTp[i] = succ ( p + 2i-1 )       1 ≤ i ≤ m
Note: FTp[i] points to the first node succeeding p by at least 2i-1.
To look up a key k, node p forwards the request to node with index
j satisfying
q = FTp[j]; FTp[j] ≤ k < FTp[j +1]

(Stops when k ≤ q, which is the actual node.)


If p < k < FTp[1], the request is also forwarded to FTp[1]
e.g. Consider resolving k = 26 from node 1.

Improvements (with modifications):


topology-based assignment of node identifiers
proximity routing
proximity neighbour selection

l. Hierarchical Location Service (HLS)

Build a large-scale search tree for which the underlying network is

divided into hierarchical domains. Each domain is represented by a

separate directory node.

The root knows every entity location (only up to next level)!

DNS vs. Chord

DNS Chord
provides a host name to IP address mapping can provide same service: Name = key, value = IP
relies on a set of special root servers requires no special servers
names reflect administrative boundaries has no naming structure
is specialized to finding named hosts or services can also be used to find data objects that are not tied to certain machines

You might also like