Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003

Computer Architecture
Chapter 8
Multiprocessors
Shared Memory Architectures
Prof. Jerry Breecher

CSCI 240
Fall 2003
Chapter Overview
We’re going to do only one section from this chapter, that part related
to how caches from multiple processors interact with each other.
8.1 Introduction – the big picture

8.3 Centralized Shared Memory Architectures
Chap. 8 - Multiprocessors 2
The Big Picture: Where are
Introduction We Now?
8.1 Introduction The major issue is this:

8.3 Centralized Shared
Memory Architectures We’ve taken copies of the contents of main
memory and put them in caches closer to the
processors. But what happens to those
copies if someone else wants to use the main
memory data?
How do we keep all copies of the data in synch
with each other?
The Multiprocessor Picture
Processor/Memory
Bus
Example:
Pentium System
Organization
PCI Bus
I/O Busses
Shared Memory Multiprocessor
Processor Processor Processor Processor
Registers Registers Registers Registers
Caches Caches Caches Caches
Memory Chipset •Memory: centralized with

Uniform Memory Access
time (“uma”) and bus
interconnect, I/O
Disk & other IO •Examples: Sun Enterprise
6000, SGI Challenge, Intel
SystemPro
Shared Memory Multiprocessor
• Several processors share one
address space P P P
– conceptually a shared memory
– often implemented just like a Network/Bus
multicomputer
• address space distributed
over private memories M
• Communication is implicit Conceptual Model
– read and write accesses to
shared memory locations
• Synchronization
– via shared memory locations
• spin waiting for non-zero
– barriers
Message Passing
Multicomputers
• Computers (nodes) connected by a network
– Fast network interface
• Send, receive, barrier
– Nodes not different than regular PC or workstation
• Cluster conventional workstations or PCs with fast network
– cluster computing
– Berkley NOW Node
– IBM SP2
P P P
M M M
Network
Large-Scale MP Designs
Memory: distributed with nonuniform memory access time (“numa”)
and scalable interconnect (distributed memory)
100 cycles
40 cycles Low Latency
High Reliability
1 cycle
Shared Memory
Architectures
8.1 Introduction In this section we will understand the
8.3 Centralized Shared issues around:
Memory Architectures
• Sharing one memory space among
several processors.
• Maintaining coherence among several
copies of a data item.
Shared Memory Architectures The Problem of Cache Coherency
CPU CPU CPU
Cache Cache Cache
A’ 100 A’ 550 A’ 100
B’ 200 B’ 200 B’ 200
Memory Memory Memory
A 100 A 100 A 100
B 200 B 200 B 440
I/O I/O I/O

Output of A gives 100 Input 440 to B
a) Cache and memory b) Cache and memory c) Cache and memory

coherent: A’ = A, B’ = B. incoherent: A’ ^= A. incoherent: B’ ^= B.
Shared Memory Some Simple Definitions
Architectures
Mechanism How It Works Performance Coherency Issues

Write modified Good, Can have problems
data from cache because with various copies
Write Back to memory only doesn’t tie up containing different
when memory values.
necessary. bandwidth.
Write modified Not so good - Modified values
data from cache uses a lot of always written to
Write Through to memory memory memory; data
immediately. bandwidth. always matches.
Shared Memory What Does Coherency Mean?
Architectures
• Informally:
– “Any read must return the most recent write”
– Too strict and too difficult to implement
• Better:
– “Any write must eventually be seen by a read”
– All writes are seen in proper order (“serialization”)
• Two rules to ensure this:
– “If P writes x and P1 reads it, P’s write will be seen by P1 if the
read and write are sufficiently far apart”
– Writes to a single location are serialized:
seen in one order
• Latest write will be seen
• Otherwise could see writes in illogical order
(could see older value after a newer value)
Shared Memory There are Different Types of
Architectures Memory In The Cache
Test_and_set(lock)
shared_data = xyz;
What kinds of memory are there in the cache? Clear(lock);
TYPE Shared? Writable How Kept Coherent

Code Shared No No Need.
Private Data Exclusive Yes Write Back
Shared Data Shared Yes Write Back *
Interlock Data Shared Yes Write Through **
* Write Back gives good performance, but if you use write through
here, there will be performance degradation.
** Write through here means the lock state is seen immediately.
You want a write through here to flush the cache.
Shared Memory Potential HW Coherency
Architectures Solutions
• Snooping Solution (Snoopy Bus):

– Send all requests for data to all processors
– Processors snoop to see if they have a copy and respond accordingly
– Requires broadcast, since caching information is at processors
– Works well with bus (natural broadcast medium)
– Dominates for small scale machines (most of the market)
• Directory-Based Schemes
– Keep track of what is being shared in one centralized place
– Distributed memory => distributed directory for scalability
(avoids bottlenecks)
– Send point-to-point requests to processors via network
– Scales better than Snooping
– Actually existed BEFORE Snooping-based schemes
Shared Memory An Example Snoopy Protocol
Maintained by Hardware
Architectures
Invalidation protocol, write-back cache

Each block of memory is in one state:
Clean in all caches and up-to-date in memory (Shared)
OR Dirty in exactly one cache (Exclusive)
OR Not in any caches
Each cache block is in one state (track these):
Shared : block can be read
OR Exclusive : cache has only copy, its writeable, and dirty
OR Invalid : block contains no data
Read misses: cause all caches to snoop bus
Writes to clean line are treated as misses
Shared Memory Snoopy-Cache State Machine-I
Architectures CPU Read hit
• State machine
for CPU requests
for each
cache block CPU Read Shared
Invalid (read/only)
Place read miss
on bus
CPU Write
Applies to
Write Back CPU read miss CPU Read miss
Data Write back block
Place Write Place read miss
Miss on bus on bus
CPU Write
Cache Block Place Write Miss on Bus
State Exclusive
(read/write)
CPU Write Miss
CPU read hit Write back cache block
CPU write hit Place write miss on bus
Shared Memory
Snoopy-Cache State Machine-II
Architectures
• State machine
for bus requests Write miss
for each for this block Shared
Invalid
cache block (read/only)
• Appendix E gives
details of bus requests
Write Back
Block; (abort Write Back
memory access) Block; (abort
memory access)
Write miss
for this block
Read miss
Exclusive for this block
(read/write)
Shared Memory
Example
Architectures
Processor 1 Processor 2 Bus Memory
P1 P2 Bus Memory
step State Addr Value State Addr Value Action Proc. Addr Value Addr Value
P1: Write 10 to A1
P1: Read A1
P2: Read A1
P2: Write 20 to A1
P2: Write 40 to A2
CPU Read hit

Assumes initial cache state Remote Write
or Miss
is invalid and A1 and A2 map Invalid Shared
to same cache block, Read
but A1 ≠ A2 Write
miss on bus
Remote miss on bus CPU Write

Write Place Write
or Miss Remote Read Miss on Bus
Write Back Write Back
This is the
Cache for P1.
Exclusive
CPU read hit
CPU write hit Chap. 8 - Multiprocessors 18
Shared Memory Example: Step 1
Architectures
P1 P2 Bus Memory
P1: Write 10 to A1 Excl. A1 10 WrMs P1 A1
P1: Read A1
P2: Read A1
P2: Write 20 to A1
P2: Write 40 to A2
Invalid Shared
Write
miss on bus
Exclusive
Architectures
P1 P2 Bus Memory
P1: Read A1 Excl. A1 10
P2: Read A1
P2: Write 20 to A1
P2: Write 40 to A2
Assumes initial cache state Shared

Invalid
is invalid and A1 and A2 map
to same cache block,
but A1 ≠ A2
Exclusive
CPU read hit

Architectures
P1 P2 Bus Memory
P2: Read A1 Shar. A1 RdMs P2 A1
Shar. A1 10 WrBk P1 A1 10 A1 10
Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 10
10
Assumes initial cache state Invalid Shared

is invalid and A1 and A2 map Read
to same cache block, miss on bus
but A1 ≠ A2.
Remote Read
Write Back
Exclusive
Architectures
P1 P2 Bus Memory
Shar. A1 10 WrBk P1 A1 10 A1 10
Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10
10
Remote Write
Assumes initial cache state Invalid Shared
but A1 ≠ A2
Exclusive
Architectures
P1 P2 Bus Memory
Shar. A1 10 WrBk P1 A1 10 A1 10
Shar. A1 10 RdDa P2 A1 10 10
P2: Write 20 to A1 Inv. Excl. A1 20 WrMs P2 A1 10
P2: Write 40 to A2 WrMs P2 A2 10
Excl. A2 40 WrBk P2 A1 20 A1 20
Assumes initial cache state

but A1 ≠ A2
Summary
8.1 Introduction – the big picture
8.3 Centralized Shared Memory Architectures
We’ve looked at what happens to caches when we have multiple

processors or devices looking at memory.

Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003

Uploaded by

Copyright:

Available Formats

Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Architecture: Multiprocessors Shared Memory Architectures Prof. Jerry Breecher CSCI 240 Fall 2003

Uploaded by

Copyright:

Available Formats

Computer Architecture

Prof. Jerry Breecher

8.1 Introduction – the big picture

8.1 Introduction The major issue is this:

Caches Caches Caches Caches

Memory Chipset •Memory: centralized with

CPU CPU CPU

Cache Cache Cache

A’ 100 A’ 550 A’ 100

B’ 200 B’ 200 B’ 200

Memory Memory Memory

A 100 A 100 A 100

B 200 B 200 B 440

I/O I/O I/O

a) Cache and memory b) Cache and memory c) Cache and memory

Mechanism How It Works Performance Coherency Issues

TYPE Shared? Writable How Kept Coherent

• Snooping Solution (Snoopy Bus):

Invalidation protocol, write-back cache

CPU Read hit

Remote miss on bus CPU Write

Assumes initial cache state Shared

CPU read hit

Assumes initial cache state Invalid Shared

Assumes initial cache state

We’ve looked at what happens to caches when we have multiple

You might also like