Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003
Chapter 8
Multiprocessors
Shared Memory Architectures
Chap. 8 - Multiprocessors 2
The Big Picture: Where are
Introduction We Now?
Chap. 8 - Multiprocessors 3
The Multiprocessor Picture
Processor/Memory
Bus
Example:
Pentium System
Organization
PCI Bus
I/O Busses
Chap. 8 - Multiprocessors 4
Shared Memory Multiprocessor
Processor Processor Processor Processor
Registers Registers Registers Registers
Chap. 8 - Multiprocessors 5
Shared Memory Multiprocessor
• Several processors share one
address space P P P
– conceptually a shared memory
– often implemented just like a Network/Bus
multicomputer
• address space distributed
over private memories M
• Communication is implicit Conceptual Model
– read and write accesses to
shared memory locations
• Synchronization
– via shared memory locations
• spin waiting for non-zero
– barriers
Chap. 8 - Multiprocessors 6
Message Passing
Multicomputers
• Computers (nodes) connected by a network
– Fast network interface
• Send, receive, barrier
– Nodes not different than regular PC or workstation
• Cluster conventional workstations or PCs with fast network
– cluster computing
– Berkley NOW Node
– IBM SP2
P P P
M M M
Network
Chap. 8 - Multiprocessors 7
Large-Scale MP Designs
Memory: distributed with nonuniform memory access time (“numa”)
and scalable interconnect (distributed memory)
100 cycles
40 cycles Low Latency
High Reliability
1 cycle
Chap. 8 - Multiprocessors 8
Shared Memory
Architectures
8.1 Introduction In this section we will understand the
8.3 Centralized Shared issues around:
Memory Architectures
• Sharing one memory space among
several processors.
• Maintaining coherence among several
copies of a data item.
Chap. 8 - Multiprocessors 9
Shared Memory Architectures The Problem of Cache Coherency
Chap. 8 - Multiprocessors 11
Shared Memory What Does Coherency Mean?
Architectures
• Informally:
– “Any read must return the most recent write”
– Too strict and too difficult to implement
• Better:
– “Any write must eventually be seen by a read”
– All writes are seen in proper order (“serialization”)
• Two rules to ensure this:
– “If P writes x and P1 reads it, P’s write will be seen by P1 if the
read and write are sufficiently far apart”
– Writes to a single location are serialized:
seen in one order
• Latest write will be seen
• Otherwise could see writes in illogical order
(could see older value after a newer value)
Chap. 8 - Multiprocessors 12
Shared Memory There are Different Types of
Architectures Memory In The Cache
Test_and_set(lock)
shared_data = xyz;
What kinds of memory are there in the cache? Clear(lock);
* Write Back gives good performance, but if you use write through
here, there will be performance degradation.
** Write through here means the lock state is seen immediately.
You want a write through here to flush the cache.
Chap. 8 - Multiprocessors 13
Shared Memory Potential HW Coherency
Architectures Solutions
Chap. 8 - Multiprocessors 14
Shared Memory An Example Snoopy Protocol
Maintained by Hardware
Architectures
Chap. 8 - Multiprocessors 15
Shared Memory Snoopy-Cache State Machine-I
Architectures CPU Read hit
• State machine
for CPU requests
for each
cache block CPU Read Shared
Invalid (read/only)
Place read miss
on bus
CPU Write
Applies to
Write Back CPU read miss CPU Read miss
Data Write back block
Place Write Place read miss
Miss on bus on bus
CPU Write
Cache Block Place Write Miss on Bus
State Exclusive
(read/write)
CPU Write Miss
CPU read hit Write back cache block
CPU write hit Place write miss on bus
Chap. 8 - Multiprocessors 16
Shared Memory
Snoopy-Cache State Machine-II
Architectures
• State machine
for bus requests Write miss
for each for this block Shared
Invalid
cache block (read/only)
• Appendix E gives
details of bus requests
Write Back
Block; (abort Write Back
memory access) Block; (abort
memory access)
Write miss
for this block
Read miss
Exclusive for this block
(read/write)
Chap. 8 - Multiprocessors 17
Shared Memory
Example
Architectures
Processor 1 Processor 2 Bus Memory
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1
P1: Read A1
P2: Read A1
P2: Write 20 to A1
P2: Write 40 to A2
P2: Write 20 to A1
P2: Write 40 to A2
Invalid Shared
Write
miss on bus
Exclusive
Chap. 8 - Multiprocessors 19
Shared Memory Example: Step 2
Architectures
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1
P1: Read A1 Excl. A1 10
P2: Read A1
P2: Write 20 to A1
P2: Write 40 to A2
Exclusive
Remote Read
Write Back
Exclusive
Chap. 8 - Multiprocessors 21
Shared Memory Example: Step 4
Architectures
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1
P1: Read A1 Excl. A1 10
P2: Read A1 Shar. A1 RdMs P2 A1
Shar. A1 10 WrBk P1 A1 10 A1 10
Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10
P2: Write 40 to A2 10
10
Remote Write
Assumes initial cache state Invalid Shared
is invalid and A1 and A2 map
to same cache block,
but A1 ≠ A2
Exclusive
Chap. 8 - Multiprocessors 22
Shared Memory Example: Step 5
Architectures
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1
P1: Read A1 Excl. A1 10
P2: Read A1 Shar. A1 RdMs P2 A1
Shar. A1 10 WrBk P1 A1 10 A1 10
Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10
P2: Write 40 to A2 WrMs P2 A2 10
Excl. A2 40 WrBk P2 A1 20 A1 20
Chap. 8 - Multiprocessors 23
Summary
8.1 Introduction – the big picture
8.3 Centralized Shared Memory Architectures
Chap. 8 - Multiprocessors 24