Distributed System 2
Distributed System 2
Slide 1
System Architectures Processes & Server Architecture Communication in a Distributed System Communication Abstractions
Slide 3
A RCHITECTURE
System Architecture:
placement of machines placement of software on machines
B UILDING
Slide 2 Two questions:
D ISTRIBUTED S YSTEM
Slide 4
Where to place?:
processing capacity, load balancing communication capacity locality
A RCHITECTURE
A RCHITECTURE
C LIENT-S ERVER
Software Architecture: Logical organisation and roles of software components
Layered Object-oriented
Request Client
Slide 7
Server Reply
Slide 5
Data-centered Event-based
Kernel
Kernel
Client
Slide 6
A RCHITECTURAL PATTERNS
Slide 8
Server
Request
Reply
Provide service
Time
C LIENT-S ERVER
C LIENT-S ERVER
Example client-server code in Erlang: % Client code using the increment server client (Server) -> Server ! {self (), 10}, receive {From, Reply} -> io:format ("Result: ~w~n", [Reply]) end. Slide 9 % Server loop for increment server loop () -> receive {From, Msg} -> From ! {self (), Msg + 1}, loop (); stop -> true end. % Initiate the server start_server() -> spawn (fun () -> loop () end).
server(void) { struct sockaddr_in cin, sin; int sd, sd_client; sd = socket(AF_INET,SOCK_STREAM,0); bind(sd,(struct sockaddr *)&sin,sizeof(sin)); listen(sd, queuesize); while (true) { sd_client = accept(sd,(struct sockaddr *)&cin,&addrlen)); recv(sd_client,buffer,sizeof(buffer),0); DoService(buffer); send(sd_client,buffer,strlen(buffer),0); close (sd_client); } close (sd); }
Slide 11
client(void) { struct sockaddr_in cin; char buffer[bufsize]; int sd; Slide 10 sd = socket(AF_INET,SOCK_STREAM,0); connect(sd,(void *)&cin,sizeof(cin)); send(sd,buffer,strlen(buffer),0); recv(sd,buffer,bufsize,0); close (sd); } Slide 12
User interface Application Database Application Database Application Database Server machine (a) (b) (c) (d) (e) Database Database
C LIENT-S ERVER
H ORIZONTAL D ISTRIBUTION
App. Server
Request Reply
Dbase Server
Replicated Web servers each containing the same Web pages Requests handled in round-robin fashion Disks
Kernel
Kernel
Slide 15
Internet Internet
P EER
Vertical Distribution from another perspective:
request reply
TO
P EER
Peer
Peer
Kernel
request reply
Slide 14
Slide 16
Return data
request
Peer
Peer
reply
Time
Kernel
H ORIZONTAL D ISTRIBUTION
P EER
AND
OVERLAY N ETWORKS
Slide 17
Slide 19
Overlay Network:
Application-specic network Addressing Routing Specialised features (e.g., encryption, multicast, etc.) Data stored at random nodes Partial view: nodes list of neighbours Exchange partial views with neighbours to update
Example:
111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000
S TRUCTURED OVERLAY
Distributed Hash Table:
Actual node
11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00
15 14 13 12 {8,9,10,11,12}
1 {0,1} 2 3
1111111 0000000 1111111 0000000 11111111111111111 00000000000000000 1111111 0000000 11111111111111111 00000000000000000
1111111 0000000 1111111 0000000 1111111111111111 0000000000000000 1111111 0000000 1111111111111111 0000000000000000
{13,14,15}
1111111 0000000 1111111 0000000 1111111111111111 0000000000000000 1111111 0000000 1111111111111111 0000000000000000
111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000
{2,3,4}
Slide 18
111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000
Slide 20
11 10
1111111 0000000 1111111 0000000 1111111111111111 0000000000000000 1111111 0000000 1111111111111111 0000000000000000
11 00 11 00 11 00 11 00 11 00 11 00 11 00 11 00 111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000
1111111 0000000 1111111 0000000 11111111111111111 00000000000000000 1111111 0000000 11111111111111111 00000000000000000
111 000 111 000 111 000 111 000 111 000 111 000 111 000 111 000
1111111 0000000 1111111 0000000 11111111111111111 00000000000000000 1111111 0000000 11111111111111111 00000000000000000
1111111 0000000 1111111 0000000 1111111111111111 0000000000000000 1111111 0000000 1111111111111111 0000000000000000
1111111 0000000 1111111 0000000 1111111111111111 0000000000000000 1111111 0000000 1111111111111111 0000000000000000
Nodes have identier and range Data has identier Node is responsible for data that falls in its range Search is routed to appropriate node
U NSTRUCTURED OVERLAY
H YBRID A RCHITECTURES
10
H YBRID A RCHITECTURES
Combination of architectures. Examples: Slide 21 Superpeer networks Collaborative distributed systems Edge-server systems Slide 23
Node provides downloaded chunks to other nodes Tracker keeps track of active nodes that have chunks of le Enforce collaboration by penalising selsh nodes
Node 1 File server Node 2
Superpeer Networks:
Regular peers are clients of superpeers Superpeers are servers for regular peers Superpeers are peers among themselves Superpeers may maintain large index, or act as brokers
Edge-Server Networks:
Servers placed at the edge of the network Servers replicate content Mostly used for content and application distribution Content Distribution Networks
Origin server
Slide 22
Regular peers
ISPs
Slide 24
Internet Enterprise networks
Superpeer network
H YBRID A RCHITECTURES
11
S ERVER D ESIGN
12
S ERVER D ESIGN
Dispatcher thread Request dispatched to a worker thread
C LUSTERED S ERVERS
Server
Worker thread
Application/compute servers
Slide 25
Operating system
Slide 27
Client requests
Dispatched request
Characteristics No parallelism, blocking system calls Parallelism, blocking system calls Parallelism, non-blocking system calls
First tier Second tier Third tier
S TATEFUL
Stateful:
VS
Keeps persistent information about clients V Improved performance X Expensive crash recovery X Must track clients
Server Guest OS
Server Guest OS
Server Guest OS
Slide 26
Stateless:
Does not keep state of clients soft state design: limited client state V Can change own state without informing clients V No cleanup after crash V Easy to replicate X Increased communication
Slide 28
Virtual Machine Monitor
Host OS
Hardware
C LUSTERED S ERVERS
13
R EQUEST S WITCHING
14
R EQUEST S WITCHING
Transport layer switch:
Logically a single TCP connection Response Server
Client
Request
Switch
Slide 29
Server
Slide 31
C OMMUNICATION
DNS-based:
Round-robin DNS
C ODE M OBILITY
Why move code?
Optimise computation (load balancing) Optmise communication
Weak vs Strong Mobility: Slide 30 Weak transfer only code Strong transfer code and execution segment Sender vs Receiver Intitated migration: Sender Send program to compute server Receiver Download applets What are the challenges of code mobility? Slide 32
C OMMUNICATION
15
C OMMUNICATION
16
Slide 33
Slide 35
Shared Memory:
Message Passing:
x=12
i=x
Slide 34
Process A
Process B
Slide 36
Shared memory
Process A
Process B
Address space 1
Address space 2
Address space 1
Address space 2
C OMMUNICATION
17
18
C OMMUNICATION
IN A
Previous slides assumed a uniprocessor or a multiprocessor. In a distributed system (multicomputer) things change: Shared Memory: Slide 37
There is no way to physically share memory
Slide 39
Synchronous vs asynchronous communication Transient vs persistent communication Provider-initiated vs consumer-initiated communication Direct -addressing vs indirect-addressing communication
Message Passing:
Over the network Introduces latencies Introduces higher chances of failure Heterogeneity introduces possible incompatibilities
M ESSAGE PASSING
Basics:
send() receive()
Variations:
Connection oriented vs Connectionless
Slide 38
Point-to-point vs Group Synchronous vs Asynchronous Buffered vs Unbuffered Reliable vs Unreliable Message ordering guarantees
Slide 40
Control-oriented communication
Associates a transfer of control with communication Active messages, remote procedure call (RPC) & remote method invocation (RMI)
Data Representation:
Marshalling Endianness
C OMMUNICATION M ODES
19
C OMMUNICATION M ODES
20
Slide 41
Slide 43
Example: notications
Consumer-Initiated
Request sent for data Example: HTTP request
Asynchronous
Sender continues execution after sending message (does not block waiting for reply) Message may be queued if receiver not active Message may be processed later at receivers convenience
Slide 42
Slide 44
Persistent
Message stored (somewhere) until receiver can accept it Example: email
Indirect-Addressing
Message not sent to a particular receiver Example: broadcast, publish/subscribe
Coupling: Time
Coupling: Space
C OMMUNICATION M ODES
21
C OMMUNICATION M ODES
22
Combinations:
A A Accepted B Persistent Asynchronous B Persistent Synchronous A ACK B Transient Synchronous (Receipt Based) Starts processing request Request Received B Transient Synchronous (Delivery Based) Accepted A Request Received B Transient Synchronous (Response Based) Accepted B Transient Asynchronous A Message can be sent only if B is running
Slide 45
Slide 47
Transient communication
Examples?
C OMMUNICATION A BSTRACTIONS
Abstractions above simple message passing make communication easier for the programmer. Provided by higher level APIs Slide 46
Message-Oriented Communication Request-Reply, Remote Procedure Call (RPC) & Remote Method Invocation (RMI) Group Communication Event-based Communication Shared Space
Slide 48
MPI is BIG. Standard reference has over 100 functions and is over 350 pages long!
23
24
MPI primitives:
Slide 49
Slide 51
Application
R2
Slide 50
Model:
Application-specic queues Messages addressed to specic queues Only guarantee delivery to queue. Not when. Message transfer can be in the order of minutes
Slide 52
Very similar to email but more general purpose (i.e., enables communication between applications and not just people)
25
26
Slide 53
Source and destination MCAs must be running to use channel Queue manager responsible for routing
Slide 55
Message-passing details hidden from application Procedure call parameters used to transmit data Client calls local stub which does messaging and marshalling
Confusion of local and remote operations can be dangerous. More on that later...
R EQUEST-R EPLY
Request:
a service data
RPC Implementation:
Client machine Server machine
Client process
Slide 54
Reply:
result of executing service data
Slide 56
j = inc(i);
2
proc: "inc" int: val(i)
Client stub
Server stub
Requirement:
Message formatting Protocol
Server OS
27
28
RPC Implementation:
client calls client stub (normal procedure call) client stub packs parameters into message data structure client stub performs send() syscall and blocks kernel transfers message to remote kernel remote kernel delivers to server stub, blocked in receive() server stub unpacks message, calls server (normal proc call) server returns to stub, which packs result into message server stub performs send() syscall kernel delivers to client stub, which unpacks and returns
Example server stub in Erlang: % increment implementation inc (Value) -> Value + 1. % RPC Server dispatch loop server () -> receive {From, inc, Value} -> From ! {self(), inc, inc(Value)} end, server().
Slide 57
Slide 59
Example client stub in Erlang: % Client code using RPC stub client (Server) -> register(server, Server), Result = inc (10), io:format ("Result: ~w~n", [Result]). Slide 58 % RPC stub for the increment server inc (Value) -> server ! {self (), inc, Value}, receive {From, inc, Reply} -> Reply end. Slide 60 Parameter marshalling:
stub must pack (marshal) parameters into message structure message data must be pointer free (by-reference data must be passed by-value) may have to perform other conversions: byte order (big endian vs little endian) oating point format dealing with pointers convert everything to standard (network) format, or message indicates format, receiver converts if necessary stubs may be generated automatically from interface specs
29
30
Sun RPC - client code: #include <rpc/rpc.h> /* standard RPC include file */ #include "date.h" /* this file is generated by rpcgen */ ... main(int argc, char **argv) { CLIENT *cl; /* RPC handle */ ... cl = clnt_create(argv[1], DATE_PROG, DATE_VERS, "udp"); Slide 63 lresult = bin_date_1(NULL, cl); printf("time on host %s = %ld\n", server, *lresult); sresult = str_date_1(lresult, cl); printf("time on host %s = %s", server, *sresult); clnt_destroy(cl); } /* done with the handle */
Slide 61
XML (data representation) and HTTP (transport) Text-based data stream is easier to debug HTTP simplies integration with web servers and works through rewalls For example, XML-RPC (lightweight) and SOAP (more powerful, but often unnecessarily complex)
Sun RPC - server code: #include <rpc/rpc.h> #include "date.h" /* standard RPC include file */ /* this file is generated by rpcgen */
Sun RPC - interface denition: program DATE_PROG { version DATE_VERS { long BIN_DATE(void) = 1; string STR_DATE(long) = 2; } = 1; } = 0x31234567;
Slide 62
/* /* /* /*
Slide 64
long * bin_date_1() { static long timeval; /* must be static */ long time(); /* Unix function */ timeval = time((long *) 0); return(&timeval); } char ** str_date_1(long *bintime) { static char *ptr; /* must be static */ char *ctime(); /* Unix function */ ptr = ctime(bintime); /* convert to local time */ return(&ptr); /* return the address of pointer */ }
31
DYNAMIC B INDING
32
The transition from Remote Procedure Call (RPC) to Remote Method Invocation (RMI) is a transition from the server metaphor to the object metaphor. Why is this important? Slide 67
RPC: explicit handling of host identication to determine the destination RMI: addressed to a particular object Objects are rst-class citizens Can pass object references as parameters More natural resource management and error handling But still, only a small evolutionary step
Slide 65
A SYNCHRONOUS RPC
Client Call remote procedure Request Wait for result Return from call Reply Time Server Client Wait for acceptance Return from call Accept request Time
Slide 66
Server
Slide 68
When no reply is required When reply isnt needed immediately (2 asynchronous RPCs deferred synchronous RPC)
33
34
Slide 69
Slide 71
Used for:
Replication of services Replication of data Service discovery
Slide 70
Event notication
Slide 72
Issues:
Reliability Ordering
Component
Event bus
Example:
IP multicast Flooding
35
S HARED S PACE
36
S HARED S PACE
Distributed Shared Memory:
Shared global address space
10 11 12 13 14 15 16
R EADING L IST
Slide 75 Implementing Remote Procedure Calls A classic paper about the design and implementation of one of the rst RPC systems.
Slide 73
0 9 2 5 1 8 3 10 6 4 12 7 14
11
13
15
16
CPU 1
CPU 2
CPU 3
CPU 4
Tuple Space:
A Write A B Write B T Read T C Look for tuple that matches T A B C Return C (and optionally remove it)
Slide 74
Insert a copy of A
Insert a copy of B B B
A Tuple instance
A JavaSpace
R EADING L IST
37
R EADING L IST
38