Layered Protocols: Communication

Syllabus
Blank Homework
Notes Labs Scores Blank
Lecture Notes
Dr. Tong Lai Yu, March 2010
0. Review and Overview 7. Distributed OS Theories

1. B-Trees 8. Distributed Mutual Exclusions
2. An Introduction to Distributed Systems 9. Agreement Protocols
3. Deadlocks 10. Distributed Scheduling

4. Distributed Systems Architecture 11. Distributed Resource Management
5. Processes 12. Recovery and Fault Tolerance
6. Communication 13. Security and Protection

Communication
1. Layered Protocols
Low-level layers
Transport layer
Application layer
Middleware layer
Basic networking model
ISO OSI model
Drawbacks:
Focus on message-passing only
Often unneeded or unwanted functionality
Low-level layers
Recap
Physical layer: contains the specification and implementation of
bits, and their transmission between sender and receiver

Data link layer: prescribes the transmission of a series of bits into
a frame to allow for error and flow control

Network layer: describes how packets in a network of computers
are to be routed.
Observation
For many distributed systems, the lowest-level interface
is that of the network layer.
Transport Layer
Important
The transport layer provides the actual communication facilities for
most distributed systems.
Standard Internet Procotols:
TCP: connection-oriented, reliable, stream-oriented
communication
UDP: unreliable (best-effort) datagram communication
Note
IP multicasting is often considered a standard available service (which
may be dangerous to assume).
Middleware Layer
Observation
Middleware is invented to provide common services and protocols
that can be used by many different applications
A rich set of communication protocols

(Un)marshaling of data, necessary for integrated systems
Naming protocols, to allow easy sharing of resources
Security protocols for secure communication
Scaling mechanisms, such as for replication and caching
Note
What remains are truly application-specific protocols...
such as?
9. Types of communication
We can view the middleware as an additional service in
client server computing:
(Consider, for example an email system.)
Traditional Client-Server
Client-server with Middleware
Distinguish:
Transient versus persistent communication

Asynchrounous versus synchronous communication
Transient versus persistent:
Transient communication: Comm. server discards message when

cannot be delivered at the next server, or at the receiver.
Persistent communication: A message is stored at a communication
server as long as it takes to deliver it.
Asynchronous versus synchronous:
Asynchronous communication: A sender continues immediately after
it has submitted the message for transmission.

Synchronous communication: The sender is blocked until its request
is known to be accepted. There are three places that synchronization can take place
( see Figure above ):

At request submission
At request delivery
After request processing
Client/Server
Some observations
Client/Server computing is generally based on a model of transient
synchronous communication:
Client and server have to be active at time of commun.

Client issues request and blocks until it receives reply
Server essentially waits only for incoming requests, and
subsequently processes them
Drawbacks of synchronous communication
Client cannot do any other work while waiting for reply

Failures have to be handled immediately: the client is waiting
The model may simply not be appropriate (mail, news)
Messaging
Message-oriented middleware ( MOM )

Aims at high-level persistent asynchronous communication:
Processes send each other messages, which are queued

Sender need not wait for immediate reply, but can do other things
Middleware often ensures fault tolerance
26. Remote Procedure Call (RPC)

Basic RPC operation
Observations
Application developers are familiar with simple procedure model

Well-engineered procedures operate in isolation (black box)
There is no fundamental reason not to execute procedures on
separate machine
Conclusion
Communication between caller &
callee can be hidden by using
procedure-call mechanism.
1 Client procedure calls client stub.

6 Server returns result to stub.
2 Stub builds message; calls local OS.

7 Stub builds message; calls OS.
3 OS sends message to remote OS.

8 OS sends message to client's OS.

4 Remote OS gives message to stub.
9 Client's OS gives message to stub.
5 Stub unpacks parameters and calls

10 Client stub unpacks result and returns to
server. the client
RPC: Parameter passing
Parameter marshaling
There's more than just wrapping parameters into a message:

Client and server machines may have different data
representations (think of byte ordering)

Wrapping a parameter means transforming a value into a
sequence of bytes
Client and server have to agree on the same encoding:
How are basic data values represented (integers, floats, characters)
How are complex data values represented (arrays, unions)
Client and server need to properly interpret messages,
transforming them into machine-dependent representations.
RPC parameter passing: some assumptions
Copy in/copy out semantics: while procedure is executed, nothing can
be assumed about parameter values.

All data that is to be operated on is passed by parameters. Excludes
passing references to (global) data.

If need to pass by reference: copy array data into message
(still cannot handle arbitrary structure)
Asynchronous RPCs
Essence
Try to get rid of the strict request-reply behavior, but let the client
continue without waiting for an answer from the server.
(a) Traditional RPC
(b) Asynchronous RPC ( no returned result required )
Deferred Synchronous RPCs
Variation
Client can also do a (non)blocking poll at the server to see whether
results are available.
RPC in Practice
Client-to-server binding (DCE)
Issues(1) Client must locate server machine, and (2) locate the server.
36. Message-Oriented Communication

Transient Messaging
Message-Queuing System
Message Brokers
Example: IBM Websphere
Transient messaging: sockets
Berkeley socket interface

Persistent messaging
Message-Queuing Model
Loosely-coupled communications using Queues.
Sender and receiever can execute completely independent of each other.
Essence
Asynchronous persistent communication through support of
middleware-level queues. Queues correspond to buffers at
communication servers.
Basic interface to a queue in message-queuing system

PUT Append a message to a specified queue
Block until the specified queue is nonempty, and remove
GET
the first message
Check a specified queue for messages, and remove
POLL
the first. Never block
Install a handler to be called when a message is put
NOTIFY
into the specified queue
Message Broker
Observation
Message queuing systems assume a common messaging protocol: all
applications agree on message format (i.e., structure and data
representation) i.e. the sender needs to have its outgoing messages
in the same format as that of the receiver
Message broker
Centralized component that takes care of application heterogeneity in
an MQ system:
Transforms incoming messages to target format

Very often acts as an application gateway
May provide subject-based routing capabilities => Enterprise
Application Integration ( publish/subscribe )
IBM's WebSphere MQ
Basic concepts:
All queues are managed by queue managers

Application-specific messages are put into, and removed from
queues
Queues reside under the regime of a queue manager
Processes can put messages only in local queues, or through an
RPC mechanism
Message transfer:
Messages are transferred between queues

Message transfer between queues at different processes, requires
a channel
At each endpoint of channel is a message channel agent
Message channel agents (MCAs ) are responsible for:
Setting up channels using lower-level network communication
facilities (e.g., TCP/IP)

(Un)wrapping messages from/in transport-level packets
Sending/receiving packets
Channels are inherently unidirectional

Automatically start MCAs when messages arrive
Any network of queue managers can be created
Routes are set up manually (system administration)
Routing
By using logical names, in combination with name resolution to local queues,
it is possible to put a message in a remote queue
Entry in a routing table: (destQM, sendQ)
Local alias for queue manager names is used to improve management flexibility.
49. Stream-oriented communication

Support for continuous media
Streams in distributed systems
Stream management
Continuous media
Observation
All communication facilities discussed so far are essentially based on a
discrete, that is time-independent exchange of information
Continuous media
Characterized by the fact that values are time dependent:
Audio
Video
Animations
Sensor data (temperature, pressure, etc.)
Transmission modes
Different timing guarantees with respect to data transfer:
Asynchronous: no restrictions with respect to when data is to be
delivered
Synchronous: define a maximum end-to-end delay for individual
data packets
Isochronous: define a maximum end-to-end delay and maximum delay variance
(jitter is bounded)
Stream
Definition
A (continuous) data stream is a connection-oriented communication
facility that supports isochronous data transmission.
Some common stream characteristics
Streams are unidirectional

There is generally a single source, and one or more sinks
Often, either the sink and/or source is a wrapper around hardware
(e.g., camera, CD device, TV monitor)

Simple stream: a single flow of data, e.g., audio or video
Complex stream: multiple data flows, e.g., stereo audio or
combination audio/video
Streams and QoS

Essence
Streams are all about timely delivery of data. How do you specify this
Quality of Service (QoS)? Basics:
The required bit rate at which data should be transported.

The maximum delay until a session has been set up (i.e., when an
application can start sending data).

The maximum end-to-end delay (i.e., how long it will take until a
data unit makes it to a recipient).

The maximum delay variance, or jitter.
The maximum round-trip delay.
Enforcing QoS
Observation
There are various network-level tools, such as differentiated services
by which certain packets can be prioritized.
Also
Use buffers to reduce jitter:
Problem
How to reduce the effects of packet loss (when multiple samples are in
a single packet)?
The effect of packet loss in (a)noninterleaved transmission and
(b) interleaved transmission
Stream synchronization
Problem
Given a complex stream, how do you keep the different substreams in
synch?
Example
Think of playing out two channels, that together form stereo sound.
Difference should be less than 20-30 μsec!

Alternative
Multiplex all substreams into a single stream, and demultiplex at the
receiver. Synchronization is handled at multiplexing/demultiplexing
point (MPEG).
Time-division multiplexing
70. Multicast communication

See video Multicast Fundamentals
Multicast communication
Application-level multicasting
Gossip-based data dissemination
Application-level multicasting ( ALM )
Essence
Organize nodes of a distributed system into an overlay network and use that
network to disseminate data.
Two approaches in organizing the network:
a tree: unique paths between 2 nodes

a mesh: multiple paths between 2 nodes (more robust)
Chord-based tree building
1 Initiator generates a multicast identifier ( mid ).
2 Lookup succ(mid), the node responsible for mid. ( see also previous chapter )
promote the node to become the root of the tree
3 Request is routed to succ(mid), which has become the root.
4 If P wants to join, it sends a join request to the root.
5 When request arrives at Q:

Q has not seen a join request before, it becomes a forwarder for the group;
P becomes child of Q. Join request continues to be forwarded.
Q knows about tree; P becomes child of Q. No need to forward
join request anymore.
ALM: Some Costs

Link stress: How often does an ALM message cross the same
physical link? Example: message from A to D needs to cross
(Ra,Rb) twice.
Stretch: Ratio in delay between ALM-level path and network-level
path. Example: messages B to C
B --> Rb --> Ra --> Rc --> C (total cost =59)

B --> Rb --> Rd --> Rc -- C (total cost = 47)
=> stretch = 59/47 = 1.255.
Epidemic Algorithms
General background
Update models
Removing objects
Basic idea
Assume there are no write conflicts:
Update operations are performed at a single server

A replica passes updated state to only a few neighbors
Update propagation is lazy, i.e., not immediate
Eventually, each update should reach every replica
Two forms of epidemics
Anti-entropy: Each replica regularly chooses another replica at random,
and exchanges state differences, leading to identical states at both
afterwards
Gossiping: A replica which has just been updated (i.e., has been
contaminated), tells a number of other replicas about its update
(contaminating them as well).
Anti-entropy
Principle operations:
A node P selects another node Q from the system at random.

Push: P only sends its updates to Q
Pull: P only retrieves updates from Q
Push-Pull: P and Q exchange mutual updates (after which they
hold the same information).
Observation
For push-pull it takes O(log(N)) rounds to disseminate updates to all
N nodes (round = when every node as taken the initiative to start an
exchange).
Gossiping
Basic model
A server S having an update to report, contacts other servers. If a
server is contacted to which the update has already propagated, S
stops contacting other servers with probability 1/k.
Observation
If s is the fraction of ignorant servers (i.e., which are unaware of the
update), it can be shown that with many servers
s = e-(k+1)(1-s)
Note
If we really have to ensure that all servers are eventually updated,
gossiping alone is not enough
Deleting Values
Fundamental problem
We cannot remove an old value from a server and expect the removal
to propagate. Instead, mere removal will be undone in due time using
epidemic algorithms
Solution
Removal has to be registered as a special update by inserting a death
certificate
90. Naming
a. Naming Entities
Names are used to denote entities in a distributed system.
To operate on an entity, we need to access it at an access point.

Access points are entities that are named by means of an address.
A location-independent name for an entity E, is independent from the
addresses of the access points offered by E.
d. Identifier
A name with the following properties:
Each identifier refers to at most one entity

Each entity is referred to by at most one identifier
An identifier always refers to the same entity (prohibits reusing
an identifier)
h. Flat Naming
Given an essentially unstructured name (e.g., an identifier), how can

we locate its associated access point?
Simple solutions:
broadcasting: cannot scale beyond LAN

Forwarding pointers: When an entity moves, it leaves behind a pointer to next location
Home-based approaches
Use a home location to keep track of the current location of an entity.
Distributed Hash Tables (DHT) (e.g. Chord system)
Organize many nodes into a logical ring:

Each node is assigned a random m-bit identifier.
Every entity is assigned a unique m-bit key.
Entity with key k associates with node with smallest id ≥ k
( called its successor, denoted by succ(K) ).
Nonsolution: Let node p keep track of succ ( p + 1 ) as well
as its precessor pred ( p ) and start linear search along the ring.
Use Finger Tables:
Each node p maintains a finger table FTp[] with at most m entries

(use mod 2m arithmetic ):
FTp[i] = succ ( p + 2i-1 ) 1 ≤ i ≤ m
Note: FTp[i] points to the first node succeeding p by at least 2i-1.
To look up a key k, node p forwards the request to node with index
j satisfying
q = FTp[j]; FTp[j] ≤ k < FTp[j +1]
(Stops when k ≤ q, which is the actual node.)

If p < k < FTp[1], the request is also forwarded to FTp[1]
e.g. Consider resolving k = 26 from node 1.
Improvements (with modifications):

topology-based assignment of node identifiers
proximity routing
proximity neighbour selection
l. Hierarchical Location Service (HLS)
Build a large-scale search tree for which the underlying network is
divided into hierarchical domains. Each domain is represented by a
separate directory node.
The root knows every entity location (only up to next level)!
DNS vs. Chord
DNS Chord
provides a host name to IP address mapping can provide same service: Name = key, value = IP
relies on a set of special root servers requires no special servers
names reflect administrative boundaries has no naming structure
is specialized to finding named hosts or services can also be used to find data objects that are not tied to certain machines

Layered Protocols: Communication

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Layered Protocols: Communication

Uploaded by

Copyright:

Available Formats

Syllabus

0. Review and Overview 7. Distributed OS Theories

Basic networking model

ISO OSI model

bits, and their transmission between sender and receiver

a frame to allow for error and flow control

For many distributed systems, the lowest-level interface

is that of the network layer.

The transport layer provides the actual communication facilities for

most distributed systems.

Standard Internet Procotols:

TCP: connection-oriented, reliable, stream-oriented

IP multicasting is often considered a standard available service (which

may be dangerous to assume).

Middleware is invented to provide common services and protocols

that can be used by many different applications

A rich set of communication protocols

What remains are truly application-specific protocols...

(Consider, for example an email system.)

Client-server with Middleware

Transient versus persistent communication

Transient versus persistent:

Transient communication: Comm. server discards message when

server as long as it takes to deliver it.

Asynchronous versus synchronous:

Asynchronous communication: A sender continues immediately after

it has submitted the message for transmission.

( see Figure above ):

Client/Server computing is generally based on a model of transient

Client and server have to be active at time of commun.

Drawbacks of synchronous communication

Client cannot do any other work while waiting for reply

Message-oriented middleware ( MOM )

Processes send each other messages, which are queued

26. Remote Procedure Call (RPC)

Application developers are familiar with simple procedure model

Communication between caller &

callee can be hidden by using

1 Client procedure calls client stub.

2 Stub builds message; calls local OS.

3 OS sends message to remote OS.

5 Stub unpacks parameters and calls

server. the client

RPC: Parameter passing

There's more than just wrapping parameters into a message:

representations (think of byte ordering)

transforming them into machine-dependent representations.

RPC parameter passing: some assumptions

Copy in/copy out semantics: while procedure is executed, nothing can

be assumed about parameter values.

passing references to (global) data.

(still cannot handle arbitrary structure)

continue without waiting for an answer from the server.

(a) Traditional RPC

(b) Asynchronous RPC ( no returned result required )

Deferred Synchronous RPCs

Client can also do a (non)blocking poll at the server to see whether

results are available.

36. Message-Oriented Communication

Transient messaging: sockets

Berkeley socket interface

Loosely-coupled communications using Queues.