Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003

Download as ppt, pdf, or txt
Download as ppt, pdf, or txt
You are on page 1of 24

Computer Architecture

Chapter 8
Multiprocessors
Shared Memory Architectures

Prof. Jerry Breecher


CSCI 240
Fall 2003
Chapter Overview
We’re going to do only one section from this chapter, that part related
to how caches from multiple processors interact with each other.

8.1 Introduction – the big picture


8.3 Centralized Shared Memory Architectures

Chap. 8 - Multiprocessors 2
The Big Picture: Where are
Introduction We Now?

8.1 Introduction The major issue is this:


8.3 Centralized Shared
Memory Architectures We’ve taken copies of the contents of main
memory and put them in caches closer to the
processors. But what happens to those
copies if someone else wants to use the main
memory data?
How do we keep all copies of the data in synch
with each other?

Chap. 8 - Multiprocessors 3
The Multiprocessor Picture

Processor/Memory
Bus

Example:
Pentium System
Organization

PCI Bus

I/O Busses

Chap. 8 - Multiprocessors 4
Shared Memory Multiprocessor
Processor Processor Processor Processor
Registers Registers Registers Registers

Caches Caches Caches Caches

Memory Chipset •Memory: centralized with


Uniform Memory Access
time (“uma”) and bus
interconnect, I/O
Disk & other IO •Examples: Sun Enterprise
6000, SGI Challenge, Intel
SystemPro

Chap. 8 - Multiprocessors 5
Shared Memory Multiprocessor
• Several processors share one
address space P P P
– conceptually a shared memory
– often implemented just like a Network/Bus
multicomputer
• address space distributed
over private memories M
• Communication is implicit Conceptual Model
– read and write accesses to
shared memory locations
• Synchronization
– via shared memory locations
• spin waiting for non-zero
– barriers

Chap. 8 - Multiprocessors 6
Message Passing
Multicomputers
• Computers (nodes) connected by a network
– Fast network interface
• Send, receive, barrier
– Nodes not different than regular PC or workstation
• Cluster conventional workstations or PCs with fast network
– cluster computing
– Berkley NOW Node

– IBM SP2
P P P

M M M

Network

Chap. 8 - Multiprocessors 7
Large-Scale MP Designs
Memory: distributed with nonuniform memory access time (“numa”)
and scalable interconnect (distributed memory)

100 cycles
40 cycles Low Latency
High Reliability

1 cycle
Chap. 8 - Multiprocessors 8
Shared Memory
Architectures
8.1 Introduction In this section we will understand the
8.3 Centralized Shared issues around:
Memory Architectures
• Sharing one memory space among
several processors.
• Maintaining coherence among several
copies of a data item.

Chap. 8 - Multiprocessors 9
Shared Memory Architectures The Problem of Cache Coherency

CPU CPU CPU

Cache Cache Cache

A’ 100 A’ 550 A’ 100

B’ 200 B’ 200 B’ 200

Memory Memory Memory

A 100 A 100 A 100

B 200 B 200 B 440

I/O I/O I/O


Output of A gives 100 Input 440 to B

a) Cache and memory b) Cache and memory c) Cache and memory


coherent: A’ = A, B’ = B. incoherent: A’ ^= A. incoherent: B’ ^= B.
Chap. 8 - Multiprocessors 10
Shared Memory Some Simple Definitions
Architectures

Mechanism How It Works Performance Coherency Issues


Write modified Good, Can have problems
data from cache because with various copies
Write Back to memory only doesn’t tie up containing different
when memory values.
necessary. bandwidth.
Write modified Not so good - Modified values
data from cache uses a lot of always written to
Write Through to memory memory memory; data
immediately. bandwidth. always matches.

Chap. 8 - Multiprocessors 11
Shared Memory What Does Coherency Mean?
Architectures

• Informally:
– “Any read must return the most recent write”
– Too strict and too difficult to implement
• Better:
– “Any write must eventually be seen by a read”
– All writes are seen in proper order (“serialization”)
• Two rules to ensure this:
– “If P writes x and P1 reads it, P’s write will be seen by P1 if the
read and write are sufficiently far apart”
– Writes to a single location are serialized:
seen in one order
• Latest write will be seen
• Otherwise could see writes in illogical order
(could see older value after a newer value)

Chap. 8 - Multiprocessors 12
Shared Memory There are Different Types of
Architectures Memory In The Cache

Test_and_set(lock)
shared_data = xyz;
What kinds of memory are there in the cache? Clear(lock);

TYPE Shared? Writable How Kept Coherent


Code Shared No No Need.
Private Data Exclusive Yes Write Back
Shared Data Shared Yes Write Back *
Interlock Data Shared Yes Write Through **

* Write Back gives good performance, but if you use write through
here, there will be performance degradation.
** Write through here means the lock state is seen immediately.
You want a write through here to flush the cache.
Chap. 8 - Multiprocessors 13
Shared Memory Potential HW Coherency
Architectures Solutions

• Snooping Solution (Snoopy Bus):


– Send all requests for data to all processors
– Processors snoop to see if they have a copy and respond accordingly
– Requires broadcast, since caching information is at processors
– Works well with bus (natural broadcast medium)
– Dominates for small scale machines (most of the market)
• Directory-Based Schemes
– Keep track of what is being shared in one centralized place
– Distributed memory => distributed directory for scalability
(avoids bottlenecks)
– Send point-to-point requests to processors via network
– Scales better than Snooping
– Actually existed BEFORE Snooping-based schemes

Chap. 8 - Multiprocessors 14
Shared Memory An Example Snoopy Protocol
Maintained by Hardware
Architectures

Invalidation protocol, write-back cache


Each block of memory is in one state:
Clean in all caches and up-to-date in memory (Shared)
OR Dirty in exactly one cache (Exclusive)
OR Not in any caches
Each cache block is in one state (track these):
Shared : block can be read
OR Exclusive : cache has only copy, its writeable, and dirty
OR Invalid : block contains no data
Read misses: cause all caches to snoop bus
Writes to clean line are treated as misses

Chap. 8 - Multiprocessors 15
Shared Memory Snoopy-Cache State Machine-I
Architectures CPU Read hit
• State machine
for CPU requests
for each
cache block CPU Read Shared
Invalid (read/only)
Place read miss
on bus
CPU Write
Applies to
Write Back CPU read miss CPU Read miss
Data Write back block
Place Write Place read miss
Miss on bus on bus

CPU Write
Cache Block Place Write Miss on Bus
State Exclusive
(read/write)
CPU Write Miss
CPU read hit Write back cache block
CPU write hit Place write miss on bus
Chap. 8 - Multiprocessors 16
Shared Memory
Snoopy-Cache State Machine-II
Architectures
• State machine
for bus requests Write miss
for each for this block Shared
Invalid
cache block (read/only)
• Appendix E gives
details of bus requests

Write Back
Block; (abort Write Back
memory access) Block; (abort
memory access)
Write miss
for this block
Read miss
Exclusive for this block
(read/write)

Chap. 8 - Multiprocessors 17
Shared Memory
Example
Architectures
Processor 1 Processor 2 Bus Memory
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1
P1: Read A1
P2: Read A1

P2: Write 20 to A1
P2: Write 40 to A2

CPU Read hit


Assumes initial cache state Remote Write
or Miss
is invalid and A1 and A2 map Invalid Shared
to same cache block, Read
but A1 ≠ A2 Write
miss on bus

Remote miss on bus CPU Write


Write Place Write
or Miss Remote Read Miss on Bus
Write Back Write Back
This is the
Cache for P1.
Exclusive
CPU read hit
CPU write hit Chap. 8 - Multiprocessors 18
Shared Memory Example: Step 1
Architectures
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1
P1: Read A1
P2: Read A1

P2: Write 20 to A1
P2: Write 40 to A2

Invalid Shared

Write
miss on bus

Exclusive

Chap. 8 - Multiprocessors 19
Shared Memory Example: Step 2
Architectures
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1
P1: Read A1 Excl. A1 10
P2: Read A1

P2: Write 20 to A1
P2: Write 40 to A2

Assumes initial cache state Shared


Invalid
is invalid and A1 and A2 map
to same cache block,
but A1 ≠ A2

Exclusive

CPU read hit


Chap. 8 - Multiprocessors 20
Shared Memory Example: Step 3
Architectures
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1
P1: Read A1 Excl. A1 10
P2: Read A1 Shar. A1 RdMs P2 A1
Shar. A1 10 WrBk P1 A1 10 A1 10
Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 10
P2: Write 40 to A2 10
10

Assumes initial cache state Invalid Shared


is invalid and A1 and A2 map Read
to same cache block, miss on bus
but A1 ≠ A2.

Remote Read
Write Back
Exclusive

Chap. 8 - Multiprocessors 21
Shared Memory Example: Step 4
Architectures
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1
P1: Read A1 Excl. A1 10
P2: Read A1 Shar. A1 RdMs P2 A1
Shar. A1 10 WrBk P1 A1 10 A1 10
Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10
P2: Write 40 to A2 10
10

Remote Write
Assumes initial cache state Invalid Shared
is invalid and A1 and A2 map
to same cache block,
but A1 ≠ A2

Exclusive

Chap. 8 - Multiprocessors 22
Shared Memory Example: Step 5
Architectures
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1
P1: Read A1 Excl. A1 10
P2: Read A1 Shar. A1 RdMs P2 A1
Shar. A1 10 WrBk P1 A1 10 A1 10
Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10
P2: Write 40 to A2 WrMs P2 A2 10
Excl. A2 40 WrBk P2 A1 20 A1 20

Assumes initial cache state


is invalid and A1 and A2 map
to same cache block,
but A1 ≠ A2

Chap. 8 - Multiprocessors 23
Summary
8.1 Introduction – the big picture
8.3 Centralized Shared Memory Architectures

We’ve looked at what happens to caches when we have multiple


processors or devices looking at memory.

Chap. 8 - Multiprocessors 24

You might also like