0% found this document useful (0 votes)

9 views

Distributed memory architecture

Uploaded by

Susanta Kumar Sahoo

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

9 views

Distributed memory architecture

Uploaded by

Susanta Kumar Sahoo

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 16

Distributed Memory Architectures

Distributed memory architectures

• Processing nodes are not able to share a physical memory space
– a node cannot address the memory of another node

• I/O is the only primitive mechanism for node cooperation

– cooperation by explicit value exchange
– possibly, shared memory can be emulated

I/O unit(s) dedicated

M
to node interfacing
M Bus DMA Bus DMA (UC):
CPU CPU Bus I/O Communication
CPU Bus I/O CPU
Unit,
UC Network Interface
Processing UC ... Processing
Unit,
Node Node
Network Card,
…
Any architecture for
Communication Network Processing Nodes,
e.g. multiprocessor

2
Kinds of distributed memory architectures
• PC/Workstation Cluster Dedicated processors
architectures:
• Multicluster
static allocation of processes
• Massively Parallel Processor (MPP) to processing nodes,
• …
possibly, dynamic
• Grid reconfiguration

• Data Center for load balancing

or fault-tolerance reasons.
• Server Farm
• Cloud
• …

3
Interprocess communication support
Executable version of parallel application:
collection of communicating processes

Run-time support of Run-time support of

communication communication
primitives Interprocess communication primitives
channels
Network communication Network communication
protocols protocols
Internode communication
channels
Processing node
... Processing node

Physical communication network

Run-time support exploits

• network communication protocols
• architectural features internal to processing nodes (notably, I/O mechanisms
via shared memory: DMA and/or Memory Mapped I/O)
4
Communication networks
• Simple cases of network computers: usual network
architectures (LAN / MAN /WAN) with serial links and
standard IP protocol
• High performance architecutures: very local interconnection
network (“Switch”) according to the structures studied for
Shared Memory Architectures:
– multistage Fat Tree, Generalized Fat Tree
– low dimensione cubes
– in the most powerful machines: wormhole flow control
• Fast Ethernet (100 Mb/s)
• Gigabit Ethernet (1 Gb/s)
• Myrinet (1.28 Gb/s)
• Infiniband (till 10 Gb/s)
• Optical technology, fotonic networks are emerging (10 – 100 Gb/s)
5
Communication networks and communication
processors
• Example: Myrinet
– KP included in the network, connected as I/O unit to processing node
– used for interprocess communication run-time support and/or Network Interface
Unit (Network Card)

PCI bus (32-64 bit)

Local memory
PCI-DMA chip

PCI DMA Node Processor Network

controller Interface interface
bridge

Communication
Processor To/from
Swiching unit of
communication
network

6
Interprocess communication run-time support
Source process
Channel of Destination process
type T

send (channel_identifier, message_value)

receive (channel_identifier, target_variable)

allocated to processing node allocated to processing node

Ni Nj
...
M M Bus DMA
Bus DMA

CPU CPU Bus I/O

CPU Bus I/O CPU

UC
Processing UC Message Processing
Node copy Node
+
scheduling
actions
Communication Network
7
Distributed run-time support
Principles:
• Channel descriptor allocated in destination node Nj
• receive is executed locally by Destination process in Nj
• send call by Source process in Ni: delegated to destination node Nj
• Delegation consists in a firmware message from Ni to Nj via
communication network, containing:
FW_MSG = (header, channel identifier, message value, Source identifier)
• In Nj, this message is received by the network interface unit (UCj)
and transformed into an interrupt (for CPUj or KPj)
• The interrupt handler executes the send primitive locally, according
to a shared memory implementation
– and possibly returns an outcome to Ni (Source) via communication network: this
action can be avoided according to the detailed implementation scheme

8
Implementation
• Channel descriptor
– data structure CHsource allocated in Ni: contains information about the current
number of buffered messages and the sender_wait boolean
– data structure CHdest allocated in Nj: the “real” channel descriptor, with the usual
structure for a shared memory implementation
• Send
– verifies the asynchrony degree saturation and, if buffer_full, suspends the Source
process
– in any case, the interprocessor message FW_MSG is sent to UCi, then to Nj via
communication network
– local execution of send on Nj, without checking buffer_full; no outcome is returned
to Ni in this scheme
• Receive
– causes the updating of the number of buffered messages in CHsource (interprocessor
message to Ni)
– In Ni, sender_wait is checked: if true, Source process is waked up.

9
send implementation – source node
M M
msg CHs msg vtg
CHd
...
... ... Bus DMA
Bus DMA
CPU CPU
CPU Bus I/O CPU Bus I/O

Processing Node UC Processing Node UC

Ni Nj

Communication Network

verifies the asynchrony degree saturation and, if buffer_full, suspends the Source process

in any case, the interprocessor message FW_MSG msg

(header, channel identifier, message value, Source process identifier)
is produced and passed to UCi by reference ...
UCi exploits DMA and transmits FW_MSG to UCj, via network, directly in pipeline (flit by flit,
without any intermediate copy in UCi)

10
send implementation – destination node
M M
msg msg vtg
CHs CHd
...
... Bus DMA
Bus DMA
CPU CPU
CPU Bus I/O CPU Bus I/O

Processing Node UC Processing Node UC

Ni Nj

Communication Network

The pipeline trasmission is continued in Nj: UCj copies FW_MSG, via DMA, in Nj memory directly
(without any intermediate copy)
Running process (or KP) is interrupted by UCj; the interrupt message is the reference (capability) to
FW_MSG
Running process (or KP) acquires FW_MSG, then Chdest and VTG, into its own addressing space. So,
the local send can be executed (without checking buffer_full).
Optimization: UC is KP, thus the additional copy of FW_MSG is saved ! (on the fly execution)
11
send implementation – memory-to-memory copy

M M
msg vtg
CHs CHd
...
Bus DMA Bus DMA

CPU CPU
CPU Bus I/O CPU Bus I/O

Processing Node UC Processing Node UC

Ni Nj

Communication Network

In practice, even in a distributed memory architecture, a memory-to-memory copy can be

implemented (plus additional operations for low level scheduling),

provided that the communication network protocol is the primitive, firmware one.

If IP protocol is adopted, then several additional copies and administrative operations are
done. IP overhead is prevailing, also compared to the network latency.
12
Implementation
• A key point for the local send execution on Nj is the addressing
space of the process executing the interrupt handler.
• Any process should contain all possible channel descriptors,
all possible target variables and process control blocks for all
processes allocated on Nj.
• In practice, static allocation of such objects is impossible.
• Solution: dynamic allocation by means of Capability
mechanism.

13
Interprocess communication cost model
• Base latency: takes into account
– latency on Ni:
• operations on CHsource,
• formatting of FW_MSG and delegation to KPj or UCj
• operations in KPj or UCj
– network latency (depending on network kind and dimension, routing and flow
control strategies, link latency, link size, number of crossed units: SEE Shared
Memory Arch.)
– latency on Nj: latency of local send execution (SEE Shared Memory Arch.)
• Under-load latency:
– resolution of a client-server model (SEE Shared Memory Arch.), where the
destination node (thus, any node) is the server and the possible source nodes (thus,
any node) are the clients
– M/M/1 is a typical (worst-case) assumption
– parameter p: average number of nodes acting as clients, according to the structure of
the parallel program and to the process mapping strategies

14
Typical latencies
• The communication network is used with the primitive
firmware routing and flow-control protocol:
– similar result of shared memory run-time, for systems realized in a rack
Tsetup  103 t, Ttransm  102 t
– otherwise, for long distance networks, the transmission latency dominates, e.g.
Tsetup  103 t, Ttransm  104 t till 106 t

• The communication network is used with the IP protocol, i.e.,

the application is IP-dependent
– The network is exploited in the primitive way, however an additional overhead
is paid due to the protocol actions (e.g., formatting, de-formatting) inside the
nodes (+ transmission overhead on long distance networks):
Rack: Tsetup  105 t, Ttransm  104 t
Long distance: Tsetup  107 t, Ttransm  108 t

15
Exercizes
1. Describe the interprocess communication run-time support
in details, in particular the actions inside the source and
destination nodes.
2. Evaluate the interprocess communication latency in detail,
according to the implementation scheme of Exercize 1.
3. Study the interprocess communication run-time support for
clusters whose nodes are SMP or NUMA machines.

Task Communication
No ratings yet
Task Communication
22 pages
Unit IV
No ratings yet
Unit IV
45 pages
Distributed e Sys
No ratings yet
Distributed e Sys
7 pages
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
No ratings yet
Multiple Processor Systems: 8.1 Multiprocessors 8.2 Multicomputers 8.3 Distributed Systems
55 pages
Lecture 03 InterprocessCommunication
No ratings yet
Lecture 03 InterprocessCommunication
45 pages
Chapter 2 OS
No ratings yet
Chapter 2 OS
38 pages
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
No ratings yet
Lecture 3 Multiprocessor Vs Multicomputer Vs DS
55 pages
Fundamentals of Parallel Computers
No ratings yet
Fundamentals of Parallel Computers
6 pages
Lectures On Multiprocessors: Unit 10
No ratings yet
Lectures On Multiprocessors: Unit 10
26 pages
8.1.1 Multiprocessors Hardware 8.1.2 Multiprocessors Operation System Types 8.1.3 Multiprocessors Synchronization 8.1.4 Multiprocessors Scheduling
No ratings yet
8.1.1 Multiprocessors Hardware 8.1.2 Multiprocessors Operation System Types 8.1.3 Multiprocessors Synchronization 8.1.4 Multiprocessors Scheduling
49 pages
Unit-5 Part-2
No ratings yet
Unit-5 Part-2
22 pages
Unit 3
No ratings yet
Unit 3
43 pages
Unit3-all
No ratings yet
Unit3-all
115 pages
@vtucode - in 21CS643 Module 4 2021 Scheme
No ratings yet
@vtucode - in 21CS643 Module 4 2021 Scheme
189 pages
Scientific Programming - 1994 - Rosing - Low Latency Messages on Distributed Memory Multiprocessors
No ratings yet
Scientific Programming - 1994 - Rosing - Low Latency Messages on Distributed Memory Multiprocessors
9 pages
Chapter 2
No ratings yet
Chapter 2
32 pages
1.interprocess Communication Mechanisms 2.memory Management and Virtual Memory
No ratings yet
1.interprocess Communication Mechanisms 2.memory Management and Virtual Memory
45 pages
Oc23 Mpps
No ratings yet
Oc23 Mpps
30 pages
A Summary On "Characterizing Processor Architectures For Programmable Network Interfaces"
No ratings yet
A Summary On "Characterizing Processor Architectures For Programmable Network Interfaces"
6 pages
chapter 2 (3)
No ratings yet
chapter 2 (3)
75 pages
FIT3143 Topic 2 2024
No ratings yet
FIT3143 Topic 2 2024
51 pages
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
No ratings yet
Lecture-27 Interconnection Networks+chapter-5 Slides-Version-2
70 pages
Module 2 PDF
No ratings yet
Module 2 PDF
76 pages
Distributed Embedded System
No ratings yet
Distributed Embedded System
7 pages
Designing With RTOS
No ratings yet
Designing With RTOS
17 pages
Unit - IV New
No ratings yet
Unit - IV New
124 pages
Ch03 Process Concept
No ratings yet
Ch03 Process Concept
67 pages
Lecture 19
No ratings yet
Lecture 19
20 pages
CICS 504 Computer Organization
No ratings yet
CICS 504 Computer Organization
35 pages
3b-Interprocess Communication
No ratings yet
3b-Interprocess Communication
26 pages
Principles of Operating Systems: Lecture 5 - Interprocess Communication Ardalan Amiri Sani
No ratings yet
Principles of Operating Systems: Lecture 5 - Interprocess Communication Ardalan Amiri Sani
38 pages
DS CH4
No ratings yet
DS CH4
62 pages
Advanced Operating Systems for M1IL -Inter Process Communication
No ratings yet
Advanced Operating Systems for M1IL -Inter Process Communication
20 pages
Chapter 3: Process: COP 4610: Introduction To Operating Systems (Fall 2016)
No ratings yet
Chapter 3: Process: COP 4610: Introduction To Operating Systems (Fall 2016)
47 pages
MultiProcessors Tanenbaum BP
No ratings yet
MultiProcessors Tanenbaum BP
29 pages
Multiprocessor
No ratings yet
Multiprocessor
22 pages
04 Process Con
No ratings yet
04 Process Con
26 pages
Receive Etc
No ratings yet
Receive Etc
16 pages
Advanced Computer Networks - CS716 Power Point Slides Lecture 25
No ratings yet
Advanced Computer Networks - CS716 Power Point Slides Lecture 25
264 pages
Lecture 6 Interprocess Communication
No ratings yet
Lecture 6 Interprocess Communication
95 pages
Distributed System Message Passing
No ratings yet
Distributed System Message Passing
30 pages
Unit I Introduction
No ratings yet
Unit I Introduction
54 pages
CS Chap7 Multicores Multiprocessors Clusters
No ratings yet
CS Chap7 Multicores Multiprocessors Clusters
65 pages
Lecture 4
No ratings yet
Lecture 4
33 pages
Tinyos PDF
No ratings yet
Tinyos PDF
43 pages
U-1 Os Notes
No ratings yet
U-1 Os Notes
63 pages
Unit 2
No ratings yet
Unit 2
54 pages
DC_ EXP 1
No ratings yet
DC_ EXP 1
3 pages
CHAP-5 Task - Communication
100% (1)
CHAP-5 Task - Communication
33 pages
CS3451 - Introduction To Operating Systems: Ii Year / Iv Semester
No ratings yet
CS3451 - Introduction To Operating Systems: Ii Year / Iv Semester
21 pages
Lecture 4 On Chip Interfaces 2021
No ratings yet
Lecture 4 On Chip Interfaces 2021
37 pages
M1 - Process Management
No ratings yet
M1 - Process Management
39 pages
Interprocess Communication & Process Synchronization: Fall 09
No ratings yet
Interprocess Communication & Process Synchronization: Fall 09
51 pages
System Busses / Networks-on-Chip: EECE 579 - Advanced Topics in VLSI Design Spring 2009 Brad Quinton
No ratings yet
System Busses / Networks-on-Chip: EECE 579 - Advanced Topics in VLSI Design Spring 2009 Brad Quinton
102 pages
Chapter 3: Processes
No ratings yet
Chapter 3: Processes
54 pages
Inter-Process Communication
No ratings yet
Inter-Process Communication
37 pages
OS IAE 1(A)
No ratings yet
OS IAE 1(A)
6 pages
Chapter 4 Communication
No ratings yet
Chapter 4 Communication
75 pages
Digital Electronics, Computer Architecture and Microprocessor Design Principles
From Everand
Digital Electronics, Computer Architecture and Microprocessor Design Principles
Jagdish Krishanlal Arora
No ratings yet
Computer Networking: An introductory guide for complete beginners: Computer Networking, #1
From Everand
Computer Networking: An introductory guide for complete beginners: Computer Networking, #1
Ramon Nastase
4.5/5 (2)
Addressing Modes_all
No ratings yet
Addressing Modes_all
2 pages
8051 Tutorial_ Addressing Modes - 8052
No ratings yet
8051 Tutorial_ Addressing Modes - 8052
3 pages
array & vector processor
No ratings yet
array & vector processor
17 pages
Pipe Lining
No ratings yet
Pipe Lining
7 pages
IntroDynamicNetworks
No ratings yet
IntroDynamicNetworks
77 pages
Arithmetic Pipeline
No ratings yet
Arithmetic Pipeline
14 pages
Serial Port Pinout of GE PLC
No ratings yet
Serial Port Pinout of GE PLC
15 pages
Alcatel-Lucent - Internal Proprietary - Use Pursuant To Company Instruction
No ratings yet
Alcatel-Lucent - Internal Proprietary - Use Pursuant To Company Instruction
15 pages
Laptop Lenovo Foxcon-TPC02
No ratings yet
Laptop Lenovo Foxcon-TPC02
33 pages
Taos White Paper - Advanced SSH Tunneling
No ratings yet
Taos White Paper - Advanced SSH Tunneling
16 pages
Configuring Snmpv3 On Huawei Devices
No ratings yet
Configuring Snmpv3 On Huawei Devices
3 pages
CH - 1 - Device Configuration
No ratings yet
CH - 1 - Device Configuration
15 pages
Gpon Olt Web Manual v2.0.1
No ratings yet
Gpon Olt Web Manual v2.0.1
91 pages
Stanley College of Engineering and Technology For Women, Hyderabad Department of Information and Technology Course Plan Class: B.E. Sem: 3
No ratings yet
Stanley College of Engineering and Technology For Women, Hyderabad Department of Information and Technology Course Plan Class: B.E. Sem: 3
4 pages
Presentation Node B Commissioning S1+1+1 FE 2x E1
No ratings yet
Presentation Node B Commissioning S1+1+1 FE 2x E1
39 pages
LP142 Material de Red PDF
No ratings yet
LP142 Material de Red PDF
4 pages
Bluetooth
100% (2)
Bluetooth
19 pages
Dt4030e 00
No ratings yet
Dt4030e 00
3 pages
HuynhDuyTrong CV
No ratings yet
HuynhDuyTrong CV
4 pages
IEEE 802.11 Network Anomaly Detection and Attack Classification: A Deep Learning Approach
No ratings yet
IEEE 802.11 Network Anomaly Detection and Attack Classification: A Deep Learning Approach
6 pages
Description of OSI Layers
100% (1)
Description of OSI Layers
11 pages
Open5GS Diagram
No ratings yet
Open5GS Diagram
1 page
MX Series 5G Universal Routing Platforms: Product Overview
No ratings yet
MX Series 5G Universal Routing Platforms: Product Overview
14 pages
GSM History
No ratings yet
GSM History
29 pages
JS5016 Data Convergence Gateway Operation Guide V1.0
No ratings yet
JS5016 Data Convergence Gateway Operation Guide V1.0
15 pages
IPv4 and Addressing Complete
No ratings yet
IPv4 and Addressing Complete
24 pages
Computer Networks - ITM-31023: Introduction To Computer Network BY Eng.M.M.M.Arshad
No ratings yet
Computer Networks - ITM-31023: Introduction To Computer Network BY Eng.M.M.M.Arshad
14 pages
Telnet Commands: Connect To Online Help
No ratings yet
Telnet Commands: Connect To Online Help
2 pages
tg588v V2-Setup+user Manual
No ratings yet
tg588v V2-Setup+user Manual
60 pages
21EC38001_EXPT3
No ratings yet
21EC38001_EXPT3
13 pages
Be Enc 2020
No ratings yet
Be Enc 2020
10 pages
Cambium Networks Data Sheet CnWave 5G Fixed Base Transceiver Station
No ratings yet
Cambium Networks Data Sheet CnWave 5G Fixed Base Transceiver Station
5 pages
Cs1302 - Computer Networks
No ratings yet
Cs1302 - Computer Networks
1 page
Sub Netting
No ratings yet
Sub Netting
2 pages
CN - 2024 - IMP Question
No ratings yet
CN - 2024 - IMP Question
2 pages
Border Gateway Protocol Thesis
100% (3)
Border Gateway Protocol Thesis
5 pages