Computer Architecture: Memory Hierarchy Design

Computer Architecture
Chapter 5
Memory Hierarchy Design
Chapter Overview
5.1 Introduction
5.2 The ABCs of Caches
5.3 Reducing Cache Misses
5.4 Reducing Cache Miss Penalty
5.5 Reducing Hit Time
5.6 Main Memory
5.7 Virtual Memory
5.8 Protection and Examples of Virtual Memory
Introduction The Big Picture: Where are
We Now?
5.1 Introduction The Five Classic Components of a Computer

Processor
Input
5.4 Reducing Cache Miss
Penalty Control
Memory
5.6 Main Memory Datapath Output
5.7 Virtual Memory
5.8 Protection and Examples
of Virtual Memory
Topics In This Chapter:
SRAM Memory Technology
DRAM Memory Technology
Memory Organization
Introduction The Big Picture: Where are
We Now?
Levels of the Memory Hierarchy Upper
Capacity
Access Staging Level
Time Xfer Unit faste
Cost
CPU Registers
Register r
100s Bytes
1s ns s prog.
Instr. /compiler
Cache Operands 1-8 bytes
K Bytes Cach
4 ns
1-0.1 e cache cntl
cents/bit Block 8-128
Main Memory s bytes
M Bytes Memor
100ns- 300ns y
$.0001-.00001 cents OS
Disk Page 512-4K
/bit
G Bytes, 10 ms s bytes
(10,000,000 ns) Dis
- -
k user/operato
105- 10 6cents/bit File r
Tape s Mbytes Large
infinite
sec- -
Tap Lower r
min8 e Level
10
The ABCs of Caches
5.1 Introduction In this section we will:
Learn lots of definitions about caches – you
can’t talk about something until you
5.4 Reducing Cache Miss understand it (this is true in computer science
Penalty at least!)
5.6 Main Memory Answer some fundamental questions about
caches:
5.7 Virtual Memory
Q1: Where can a block be placed in the
5.8 Protection and Examples upper level? (Block placement)
of Virtual Memory
Q2: How is a block found if it is in the
upper level? (Block identification)
Q3: Which block should be replaced on a
miss? (Block replacement)
Q4: What happens on a write?
(Write strategy)
Cache Memory
The purpose of cache memory is to speed up
accesses by storing recently used data closer to
the CPU, instead of storing it in main memory.
Although cache is much smaller than main
memory, its access time is a fraction of that of
main memory.
Unlike main memory, which is accessed by
address, cache is typically accessed by content;
hence, it is often called content addressable
memory .
Because of this, a single large cache memory
isn’t always desirable-- it takes longer to search.
Cache
Small amount of fast memory
Sits between normal main memory
and CPU
May be located on CPU chip or
module
Cache/Main Memory
Structure
Cache operation – overview
CPU requests contents of memory location
Check cache for this data
If present, get from cache (fast)
If not present, read required block from main
memory to cache
Then deliver from cache to CPU
Cache includes tags to identify which block of
main memory is in each cache slot
Cache Read Operation - Flowchart
Comparison of Cache Sizes
Processor Type Year of L1 cachea L2 cache L3 cache
Introduction
IBM 360/85 Mainframe 1968 16 to 32 KB — —
PDP-11/70 Minicomputer 1975 1 KB — —
VAX 11/780 Minicomputer 1978 16 KB — —
IBM 3033 Mainframe 1978 64 KB — —
IBM 3090 Mainframe 1985 128 to 256 KB — —
Intel 80486 PC 1989 8 KB — —
Pentium PC 1993 8 KB/8 KB 256 to 512 KB —
PowerPC 601 PC 1993 32 KB — —
PowerPC 620 PC 1996 32 KB/32 KB — —
PowerPC G4 PC/server 1999 32 KB/32 KB 256 KB to 1 MB 2 MB
IBM S/390 G4 Mainframe 1997 32 KB 256 KB 2 MB
IBM S/390 G6 Mainframe 1999 256 KB 8 MB —
Pentium 4 PC/server 2000 8 KB/8 KB 256 KB —
IBM SP High-end server/ 2000 64 KB/32 KB 8 MB —
supercomputer
CRAY MTAb Supercomputer 2000 8 KB 2 MB —
Itanium PC/server 2001 16 KB/16 KB 96 KB 4 MB
SGI Origin 2001 High-end server 2001 32 KB/32 KB 4 MB —
Itanium 2 PC/server 2002 32 KB 256 KB 6 MB
IBM POWER5 High-end server 2003 64 KB 1.9 MB 36 MB
CRAY XD-1 Supercomputer 2004 64 KB/64 KB 1MB —
The ABCs of Caches Definitions
The Principle of Locality
The Principle of Locality:

Program access a relatively small portion of the address space at
any instant of time.
Three Different Types of Locality:
Temporal Locality (Locality in Time): If an item is referenced, it
will tend to be referenced again soon (e.g., loops, reuse)
Spatial Locality (Locality in Space): If an item is referenced, items
whose addresses are close by tend to be referenced soon
(e.g., straightline code, array access)
Sequential Locality : Sequential order of program execution
except branch instructions.
A few terms
Inclusion Property
Coherence Property
Access frequency
Access time
Cycle time
Latency
Bandwidth
Capacity
Unit of transfer
Memory Hierarchy: Terminology

Hit: data appears in some block in the upper level (example: Block X)
Hit Rate: the fraction of memory access found in the upper level
Hit Time: Time to access the upper level which consists of
Upper level access time + Time to determine hit/miss
Miss: data needs to be retrieve from a block in the lower level (Block Y)
Miss Rate = 1 - (Hit Rate)
Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
Consider a memory with three levels

Average memory access time (assuming hit at 3rd level)
h1 * t1 + (1 – h1) [t1 + h2 * t2 + (1 – h2) * ( t2 + t3)] where t1, t2 and t3 are access times at
the three levels
Access frequency of level Mi: fi = (1- h1) (1- h2)…(1-hi)hi
Effective Access time = (fi * ti)

Cache Measures
Hit rate : fraction found in that level

So high that usually talk about Miss rate
Average memory-access time

= Hit time + Miss rate x Miss penalty
(ns or clocks)
Miss penalty : time to replace a block from lower level,
including time to replace in CPU
access time : time to lower level
= f(latency to lower level)
transfer time : time to transfer block
=f(Bandwidth between upper & lower levels)
Measures
CPU Execution time = (CPU Clock Cycles + Memory Stall Cycles) *

Clock Cycle Time
CPU clock cycles includes cache hit and CPU is stalled during miss
Memory Stall cycles

= Number of misses * Miss penalty
= IC * (Misses / Instruction) * Miss penalty
= IC * (Memory Accesses / Instruction) * Miss Rate * Miss penalty
Miss rate and miss penalties are different for reads and writes
Memory Stall Cycles

= IC* (Reads / Instruction) * Read Miss Rate * Read Miss penalty
+ IC * (Writes / Instruction) * Write Miss Rate * Write Miss penalty
Miss Rate = Misses / Instruction

= (Miss rate * Memory Accesses ) / Instruction Count
= Miss rate * (Memory Accesses / Instruction)
Typical Cache Organization
Memory Memor Simplest Cache: Direct Mapped

Address
0 y
4 Byte Direct Mapped Cache
1
Cache Index
2
0
3
1
4
2
5
3
6
7
8
Location 0 can be occupied by data from:
9
Memory location 0, 4, 8, ... etc.
A
In general: any memory location
B whose 2 LSBs of the address are 0s
C
Address<1:0> => cache index
D
Which one should we place in the cache?
E
How can we tell which one is in the
F cache?
Cache Memory
Where can a block be placed in the Cache?
Block 12 is placed in an 8 block cache:
Fully associative, direct mapped, 2-way set associative
S.A. Mapping = Block Number Modulo Number Sets
Fully associative: Direct mapped: Set associative:
block 12 can go block 12 can go block 12 can go
anywhere only into block 4 anywhere in set
(12 mod 8) 0 (12 mod 4)
Bloc 0123456 Bloc 0123456 Bloc 0123456
k 7 k 7 k 7
no. no. no.
Set Set Set Set

Block-frame 0 1 2 3
address
Bloc 111111111122222222223
k
0 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
no.
1
The ABCs of Caches How is a block found if it
is in the cache?
Each entry in the cache stores words

Tag on each block
No need to check index or block offset
Address Byte Offset
Tag Index
The ABCs of Caches How is a block found if it
is in the cache?
Each entry in the cache stores words

Tag on each block
No need to check index or block offset
Address Byte Offset
Tag Index
Cache Memory
Take advantage of spatial locality
Store multiple words
The diagram below is a schematic of what
cache looks like.
Block 0 contains multiple words from main memory, identified

with the tag 00000000. Block 1 contains words identified with
the tag 11110101.
The other two blocks are not valid.
Cache Memory
As an example, suppose a program generates

the address 1AA. In 14-bit binary, this number is:
00000110101010.
The first 7 bits of this address go in the tag field,
the next 4 bits go in the block field, and the final
3 bits indicate the word within the block.
Cache Organizations I :
Cache Memory DirectMapped Cache
Block
31 address 9 4 0
Cache Tag Example: 0x50 Cache Index Byte Select
Ex: 0x01 Ex: 0x00
Stored as part
of the cache “state”
Valid Bit Cache Tag Cache Data

Byte 31 : Byte 1 Byte 0 0
0x50 Byte 63 : Byte 33 Byte 32 1
2
3
: : :
Byte 1023 : Byte 992 31
Direct Mapping
Tag s-r AddressLineStructure
or Slot r Word w
8 14 2
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag
field
Check contents of cache by finding line and
checking Tag
Direct Mapping
Cache Line Table
Cache line Main Memory blocks held
0 0, m, 2m, 3m…2s-m
1 1,m+1, 2m+1…2s-m+1
m-1 m-1, 2m-1,3m-1…2s-1

Direct Mapping Cache Organization
Direct Mapping pros & cons
Simple
Inexpensive
Fixed location for given block
If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high
Associative Mapping
A main memory block can load into

any line of cache
Memory address is interpreted as
tag and word
Tag uniquely identifies block of
memory
Every line’s tag is examined for a
match
Cache searching gets expensive
Fully Associative Cache Organization
Associative Mapping Address Structure
Word
Tag 22 bit 2 bit
22 bit tag stored with each 32 bit block of data

Compare tag field with tag entry in cache to
check for hit
Least significant 2 bits of address identify which
16 bit word is required from 32 bit data block
e.g.
Address Tag Data Cache line
FFFFFC FFFFFC 24682468 3FFF
Cache Organizations II :
Cache Memory SetAssociative Cache
Block
31 address 9 8 4 0
Cache Tag Example: 0x50 Cache Index Byte Select
Ex: 0x01 mod 16 Ex: 0x00
Stored as part
of the cache “state”
Valid Bit Cache Tag Cache Data

Byte 31 : Byte 1 Byte 0
Set 0
0x50 Byte 63 : Byte 33 Byte 32
Set 1
: : :
Set 15
Byte 1023 : Byte 992
Set Associative Mapping
Cache is divided into a number of

sets
Each set contains a number of lines
A given block maps to any line in a
given set
e.g. Block B can be in any line of set i
e.g. 2 lines per set
2 way associative mapping
A given block can be in one of 2 lines in only one
set
Example
13 bit set number
Block number in main memory is
modulo 213
000000, 00A000, 00B000, 00C000 …
map to same set
Two Way Set Associative
Cache Organization
Address Structure
Word
Tag 9 bit Set 13 bit 2 bit
Use set field to determine cache set to look in

Compare tag field to see if we have a hit
e.g
Address Tag Data Set number
1FF 7FFC 1FF 12345678 1FFF
001 7FFC 001 11223344 1FFF
Two Way
Set
Associative
Mapping
Example
Replacement Algorithms (1)
Direct mapping
No choice
Each block only maps to one line
Replace that line
Replacement Algorithms (2)
Associative & Set Associative
Hardware implemented algorithm (speed)

Least Recently used (LRU)
e.g. in 2 way set associative
Which of the 2 block is lru?
First in first out (FIFO)
replace block that has been in cache longest
Least frequently used
replace block which has had fewest hits
Random
Write Policy
Must not overwrite a cache block

unless main memory is up to date
Multiple CPUs may have individual
caches
I/O may address main memory
directly
Write through
All writes go to main memory as well as cache

Multiple CPUs can monitor main memory traffic
to keep local (to CPU) cache up to date
Lots of traffic
Slows down writes
Remember bogus write through caches!

Write back
Updates initially made in cache only

Update bit for cache slot is set when update
occurs
If block is to be replaced, write to main memory
only if update bit is set
Other caches get out of sync
I/O must access main memory through cache
N.B. 15% of memory references are writes
Cache Memory Let’s Do An Example:
The Memory Addresses We’ll Be
Using
Here’s a number of addresses. We’ll be asking for the data at these
addresses and see what happens to the cache when we do so.
Address Tag Set Off- Result
set
3 9 8 5 4 0
1 Miss
1090 0000000000000000000001 001 0001
3 0 9 8 0 5 0
4 0
1 Miss
1440 0000000000000000000001 110 0000
0 1 0
3 9 8 5 4 0
1
5000 xxxxxxxxxxxxxxxxxxxxxxx xxx 0100
x 0
3 9 8 5 4 0
1
1470 xxxxxxxxxxxxxxxxxxxxxx xxx xxxxx
x x
Cache:
1. Is Direct Mapped
2. Contains 512 bytes.
3. Has 16 sets.
4. Each set can hold 32 bytes or
1 cache line.
Here’s the Cache We’ll Be
Set V Tag Touching Data
Address (Can hold a 32-byte cache line.)
Initially the 0 (0000) N

1 (0001) N
cache is 2 (0010) N
empty. 3 (0011) N
4 (0100) N
Cache: 5 (0101) N
1. Is Direct Mapped 6 (0110) N
2. Contains 512
7 (0111) N
bytes.
3. Has 16 sets. 8 (1000) N
4. Each set can hold 9 (1001) N
32 bytes or 1
10 (1010) N
cache line.
11 (1011) N
12 (1100) N
13 (1101) N
14 (1110) N
15 (1111) N
Cache Memory
We want to READ Doing Some Cache Action
data from address Set V Tag Data
Address (Always holds a 32-byte cache line.)
1090
0 (0000) N
= 010|0010|00010
1 (0001) N
Add. Tag Set Offset 2 (0010) Y 00000….10 Data from memory loc. 1088 - 1119
N
3 (0011) N
256 0000 1000 00000
4 (0100) N
512 0001 0000 00000
5 (0101) N
1024 0010 0000 00000
6 (0110) N
1090 0010 0010 00010 7 (0111) N
1099 0010 0010 01011 8 (1000) N
1440 0010 1101 00000 9 (1001) N

10 (1010) N
1470 0010 1101 11110
11 (1011) N
1600 0011 0010 00000
12 (1100) N
1620 0011 0010 10100
13 (1101) N
2048 0100 0000 00000 14 (1110) N
4096 1000 0000 00000 15 (1111) N
5000 1001 1100 01000

Cache Memory
1440 Address (Always holds a 32-byte cache line.)
0 (0000) N
= 010|1101|00000
1 (0001) N
Add. Tag Set Offset 2 (0010) Y 00000….10 Data from memory loc. 1088 - 1119
256 0000 1000 00000 3 (0011) N
512 0001 0000 00000 4 (0100) N

5 (0101) N
1024 0010 0000 00000
6 (0110) N
1090 0010 0010 00010
7 (0111) N
1099 0010 0010 01011
8 (1000) N
1440 0010 1101 00000 9 (1001) N
1470 0010 1101 11110 10 (1010) N
1600 0011 0010 00000 11 (1011) N

12 (1100) N
1620 0011 0010 10100
13 (1101) N
Y 00000….10 Data from memory loc. 1440 - 1471
2048 0100 0000 00000
14 (1110) N
4096 1000 0000 00000 15 (1111) N
5000 1001 1100 01000
Cache Memory
data from address
Set V Tag Data
= 1001|1100|01000 0 (0000) N
1 (0001) N
Add. Tag Set Offset 2 (0010) Y 00000…….10 Data from memory loc. 1088 - 1119
3 (0011) N
256 0000 1000 00000
4 (0100) N
512 0001 0000 00000
5 (0101) N
1024 0010 0000 00000 6 (0110) N
1090 0010 0010 00010 7 (0111) N
1099 0010 0010 01011 8 (1000) N

9 (1001) N
1440 0010 1101 00000
10 (1010) N
1470 0010 1101 11110
11 (1011) N
1600 0011 0010 00000 12 (1100) Y 00000….1001 Data from memory loc. 4992 - 5023
N
1620 0011 0010 10100 13 (1101) Y 00000…0010 Data from memory loc. 1440 - 1471
2048 0100 0000 00000 14 (1110) N

15 (1111) N
4096 1000 0000 00000
5000 1001 1100 01000

Cache Memory
We want to READ
Doing Some Cache Action
data from address
Set V Tag Data
= 0010|1101|11110 0 (0000) N
1 (0001) N
Add. Tag Set Offset 2 (0010) Y 00000…….10 Data from memory loc. 1088 - 1119
3 (0011) N
256 0000 1000 00000
4 (0100) N
512 0001 0000 00000
5 (0101) N
1024 0010 0000 00000
6 (0110) N
1090 0010 0010 00010
7 (0111) N
1099 0010 0010 01011 8 (1000) N
1440 0010 1101 00000 9 (1001) N
1470 0010 1101 11110 10 (1010) N
1600 0011 0010 00000 11 (1011) N
13 (1101) Y 00000….0010
00000…00010 Data
Data from from memory
memory loc. 1440loc. 1440 - 1471
- 1471
2048 0100 0000 00000
14 (1110) N
4096 1000 0000 00000
15 (1111) N
5000 1001 1100 01000
Cache Memory
0 (0000) N
= 0011|0010|00000
1 (0001) N
Add. Tag Set Offset 2 (0010) Y 00000…….10 Data from memory
Y 00000….0011 Data from memory loc. 1600loc. 1060 - 1091
- 1631
256 0000 1000 00000 3 (0011) N
512 0001 0000 00000 4 (0100) N

5 (0101) N
1024 0010 0000 00000
6 (0110) N
1090 0010 0010 00010
7 (0111) N
1099 0010 0010 01011
8 (1000) N
1440 0010 1101 00000 9 (1001) N
1470 0010 1101 11110 10 (1010) N
1600 0011 0010 00000 11 (1011) N

12 (1100) Y 00000….1001 Data from memory loc. 4992 - 5023
1620 0011 0010 10100
13 (1101) Y 00000…00010 Data from memory loc. 1440 - 1471
2048 0100 0000 00000
14 (1110) N
4096 1000 0000 00000 15 (1111) N
5000 1001 1100 01000
Cache Memory
We want to WRITE Doing Some Cache Action
data to address Set V Tag Data
= 0000|1000|00000 0 (0000) N
1 (0001) N
Add. Tag Set Offset
256 0000 1000 00000 3 (0011) N
512 0001 0000 00000 4 (0100) N
1024 0010 0000 00000 5 (0101) N

6 (0110) N
1090 0010 0010 00010
7 (0111) N
1099 0010 0010 01011
N
1440 0010 1101 00000
9 (1001) N
1470 0010 1101 11110 10 (1010) N
1600 0011 0010 00000 11 (1011) N
2048 0100 0000 00000
14 (1110) N
4096 1000 0000 00000
15 (1111) N
5000 1001 1100 01000
Cache Memory
data to address Set V Tag Data
0 (0000) N
= 0011|0010|10100
1 (0001) N
Add. Tag Set Offset 2 (0010) Y 00000…….10 Data from memory
Y 00000….0011 Data from memory loc. 1600loc. 1060 - 1091
- 1631
256 0000 1000 00000 3 (0011) N
512 0001 0000 00000 4 (0100) N

5 (0101) N
1024 0010 0000 00000
6 (0110) N
1090 0010 0010 00010
7 (0111) N
1099 0010 0010 01011
1440 0010 1101 00000 9 (1001) N
1470 0010 1101 11110 10 (1010) N
1600 0011 0010 00000 11 (1011) N

1620 0011 0010 10100
2048 0100 0000 00000
14 (1110) N
4096 1000 0000 00000 15 (1111) N
5000 1001 1100 01000
Cache Memory
data to address
Set V Tag Data
= 0010|0010|01011 0 (0000) N
1 (0001) N
2 (0010) Y 00000….0010
Y Data from
00000…00011 memory
Data loc. 1088loc.
from memory - 1119
1600 - 1631
Add. Tag Set Offset
3 (0011) N
256 0000 1000 00000
4 (0100) N
512 0001 0000 00000 5 (0101) N
1024 0010 0000 00000 6 (0110) N
1090 0010 0010 00010 7 (0111) N
9 (1001) N
1440 0010 1101 00000
10 (1010) N
1470 0010 1101 11110
11 (1011) N
1620 0011 0010 10100 13 (1101) Y 00000…00010 Data from memory loc. 1440 - 1471
2048 0100 0000 00000 14 (1110) N

15 (1111) N
4096 1000 0000 00000
5000 1001 1100 01000

Cache Memory
What happens on a write?
Write through —The information is written to both the block in
the cache and to the block in the lower-level memory.
Write back —The information is written only to the block in the

cache. The modified cache block is written to main memory
only when it is replaced.
is block clean or dirty?
WT always combined with write buffers so that don’t wait for

lower level memory
Cache Memory
Write Buffer for Write Through
Cache
Processor DRA
M
Write Buffer
A Write Buffer is needed between the Cache

and Memory
Processor: writes data into the cache and the write buffer;
Memory controller: write contents of the buffer to memory.
Write buffer is just a FIFO:
Typical number of entries: 4;
Must handle bursts of writes;
Cache Memory Write-miss Policy:
Write Allocate vs . Not Allocate
Assume: a 16-bit (sub-block) write to memory location 0x0 and causes a miss. Do
we allocate space in cache and possibly read in the block?
Yes: Write Allocate (Write back caches)
No: Not Write Allocate (Write through)
Example:
WriteMem[100]
WriteMem[100]
ReadMem[200]
WriteMem[200]
WriteMem[100]
NWA: four misses and one hit

WA: two misses and three hits
Pentium 4 Cache
80386 – no on chip cache
80486 – 8k using 16 byte lines and four way set associative
organization
Pentium (all versions) – two on chip L1 caches
Data & instructions
Pentium III – L3 cache added off chip
Pentium 4
L1 caches
8k bytes
64 byte lines
four way set associative
L2 cache
Feeding both L1 caches
256k
128 byte lines
8 way set associative
L3 cache on chip
Intel Cache Evolution
Problem Solution Processor on which feature
first appears
External memory slower than the system bus. Add external cache using faster 386
memory technology.
Increased processor speed results in external bus becoming a Move external cache on-chip, 486
bottleneck for cache access. operating at the same speed as the
processor.
Internal cache is rather small, due to limited space on chip Add external L2 cache using faster 486
technology than main memory
Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentium
the Execution Unit simultaneously require access to the caches.
cache. In that case, the Prefetcher is stalled while the
Execution Unit’s data access takes place.
Increased processor speed results in external bus becoming a Create separate back-side bus that Pentium Pro
bottleneck for L2 cache access. runs at higher speed than the main
(front-side) external bus. The BSB is
dedicated to the L2 cache.
Move L2 cache on to the processor Pentium II

chip.
Some applications deal with massive databases and must Add external L3 cache. Pentium III
have rapid access to large amounts of data. The on-chip
caches are too small.
Move L3 cache on-chip. Pentium 4
Reducing Cache Misses Classifying Misses: 3 Cs
5.1 Introduction
Compulsory —The first access to a block is
5.2 The ABCs of Caches not in the cache, so the block must be brought
5.3 Reducing Cache Misses into the cache. Also called cold start misses
5.4 Reducing Cache Miss
or first reference misses .
Penalty (Misses in even an Infinite Cache)
Capacity —If the cache cannot contain all the
blocks needed during execution of a program,
5.6 Main Memory capacity misses will occur due to blocks
5.7 Virtual Memory being discarded and later retrieved.
(Misses in Fully Associative Size X Cache)
5.8 Protection and Examples
of Virtual Memory Conflict —If block-placement strategy is set
associative or direct mapped, conflict misses
(in addition to compulsory & capacity misses)
will occur because a block can be discarded
and later retrieved if too many blocks map to
its set. Also called collision misses or
interference misses .
(Misses in N-way Associative, Size X Cache)

Computer Architecture: Memory Hierarchy Design

Uploaded by

Copyright:

Available Formats

Computer Architecture: Memory Hierarchy Design

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Computer Architecture: Memory Hierarchy Design

Uploaded by

Copyright:

Available Formats

Computer Architecture

5.1 Introduction The Five Classic Components of a Computer

The Principle of Locality:

Memory Hierarchy: Terminology

Consider a memory with three levels

Access frequency of level Mi: fi = (1- h1) (1- h2)…(1-hi)hi

Effective Access time = (fi * ti)

Hit rate : fraction found in that level

Average memory-access time

CPU Execution time = (CPU Clock Cycles + Memory Stall Cycles) *

Memory Stall cycles

Memory Stall Cycles

Miss Rate = Misses / Instruction

Memory Memor Simplest Cache: Direct Mapped

Set Set Set Set

Each entry in the cache stores words

Address Byte Offset

Each entry in the cache stores words

Address Byte Offset

Block 0 contains multiple words from main memory, identified

As an example, suppose a program generates

Valid Bit Cache Tag Cache Data

m-1 m-1, 2m-1,3m-1…2s-1

A main memory block can load into

22 bit tag stored with each 32 bit block of data

Valid Bit Cache Tag Cache Data

Cache is divided into a number of

Use set field to determine cache set to look in

Hardware implemented algorithm (speed)

Must not overwrite a cache block

All writes go to main memory as well as cache

Remember bogus write through caches!

Updates initially made in cache only

Initially the 0 (0000) N

1440 0010 1101 00000 9 (1001) N

5000 1001 1100 01000

512 0001 0000 00000 4 (0100) N

1600 0011 0010 00000 11 (1011) N

1099 0010 0010 01011 8 (1000) N

2048 0100 0000 00000 14 (1110) N

5000 1001 1100 01000

1600 0011 0010 00000 11 (1011) N

512 0001 0000 00000 4 (0100) N

1600 0011 0010 00000 11 (1011) N

1024 0010 0000 00000 5 (0101) N

512 0001 0000 00000 4 (0100) N

1600 0011 0010 00000 11 (1011) N

1090 0010 0010 00010 7 (0111) N

2048 0100 0000 00000 14 (1110) N

5000 1001 1100 01000

Write back —The information is written only to the block in the

WT always combined with write buffers so that don’t wait for

A Write Buffer is needed between the Cache

NWA: four misses and one hit

Move L2 cache on to the processor Pentium II

You might also like