Computer Architecture: Memory Hierarchy Design
Computer Architecture: Memory Hierarchy Design
Computer Architecture: Memory Hierarchy Design
Chapter 5
Memory Hierarchy Design
Chapter Overview
5.1 Introduction
5.2 The ABCs of Caches
5.3 Reducing Cache Misses
5.4 Reducing Cache Miss Penalty
5.5 Reducing Hit Time
5.6 Main Memory
5.7 Virtual Memory
5.8 Protection and Examples of Virtual Memory
Introduction The Big Picture: Where are
We Now?
Inclusion Property
Coherence Property
Access frequency
Access time
Cycle time
Latency
Bandwidth
Capacity
Unit of transfer
The ABCs of Caches Definitions
Miss: data needs to be retrieve from a block in the lower level (Block Y)
Miss Rate = 1 - (Hit Rate)
Miss Penalty: Time to replace a block in the upper level +
Time to deliver the block the processor
Cache Measures
Bloc 111111111122222222223
k
0 1 2 3 4 5 6 7 8 9 30 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0
no.
1
The ABCs of Caches How is a block found if it
is in the cache?
Tag Index
The ABCs of Caches How is a block found if it
is in the cache?
Tag Index
Cache Memory
Take advantage of spatial locality
Store multiple words
The diagram below is a schematic of what
cache looks like.
Block
31 address 9 4 0
Cache Tag Example: 0x50 Cache Index Byte Select
Ex: 0x01 Ex: 0x00
Stored as part
of the cache “state”
: : :
Byte 1023 : Byte 992 31
Direct Mapping
Tag s-r AddressLineStructure
or Slot r Word w
8 14 2
24 bit address
2 bit word identifier (4 byte block)
22 bit block identifier
8 bit tag (=22-14)
14 bit slot or line
No two blocks in the same line have the same Tag
field
Check contents of cache by finding line and
checking Tag
Direct Mapping
Cache Line Table
Cache line Main Memory blocks held
0 0, m, 2m, 3m…2s-m
1 1,m+1, 2m+1…2s-m+1
Simple
Inexpensive
Fixed location for given block
If a program accesses 2 blocks that map to the
same line repeatedly, cache misses are very high
Associative Mapping
Word
Tag 22 bit 2 bit
Block
31 address 9 8 4 0
Cache Tag Example: 0x50 Cache Index Byte Select
Ex: 0x01 mod 16 Ex: 0x00
Stored as part
of the cache “state”
: : :
Set 15
Byte 1023 : Byte 992
Set Associative Mapping
3 9 8 5 4 0
1 Miss
1090 0000000000000000000001 001 0001
3 0 9 8 0 5 0
4 0
1 Miss
1440 0000000000000000000001 110 0000
0 1 0
3 9 8 5 4 0
1
5000 xxxxxxxxxxxxxxxxxxxxxxx xxx 0100
x 0
3 9 8 5 4 0
1
1470 xxxxxxxxxxxxxxxxxxxxxx xxx xxxxx
x x
Cache:
1. Is Direct Mapped
2. Contains 512 bytes.
3. Has 16 sets.
4. Each set can hold 32 bytes or
1 cache line.
Here’s the Cache We’ll Be
Set V Tag Touching Data
Address (Can hold a 32-byte cache line.)
Add. Tag Set Offset 2 (0010) Y 00000…….10 Data from memory loc. 1088 - 1119
3 (0011) N
256 0000 1000 00000
4 (0100) N
512 0001 0000 00000
5 (0101) N
1024 0010 0000 00000
6 (0110) N
1090 0010 0010 00010
7 (0111) N
1099 0010 0010 01011 8 (1000) N
1440 0010 1101 00000 9 (1001) N
1470 0010 1101 11110 10 (1010) N
1620 0011 0010 10100 12 (1100) Y 00000….1001 Data from memory loc. 4992 - 5023
13 (1101) Y 00000….0010
00000…00010 Data
Data from from memory
memory loc. 1440loc. 1440 - 1471
- 1471
2048 0100 0000 00000
14 (1110) N
4096 1000 0000 00000
15 (1111) N
5000 1001 1100 01000
Cache Memory
We want to READ Doing Some Cache Action
data from address Set V Tag Data
1600 Address (Always holds a 32-byte cache line.)
0 (0000) N
= 0011|0010|00000
1 (0001) N
Add. Tag Set Offset 2 (0010) Y 00000…….10 Data from memory
Y 00000….0011 Data from memory loc. 1600loc. 1060 - 1091
- 1631
256 0000 1000 00000 3 (0011) N
= 0000|1000|00000 0 (0000) N
1 (0001) N
Add. Tag Set Offset
2 (0010) Y 00000….0011 Data from memory loc. 1600 - 1631
256 0000 1000 00000 3 (0011) N
512 0001 0000 00000 4 (0100) N
1620 0011 0010 10100 12 (1100) Y 00000….1001 Data from memory loc. 4992 - 5023
13 (1101) Y 00000…00010 Data from memory loc. 1440 - 1471
2048 0100 0000 00000
14 (1110) N
4096 1000 0000 00000
15 (1111) N
5000 1001 1100 01000
Cache Memory
We want to WRITE Doing Some Cache Action
data to address Set V Tag Data
1620 Address (Always holds a 32-byte cache line.)
0 (0000) N
= 0011|0010|10100
1 (0001) N
Add. Tag Set Offset 2 (0010) Y 00000…….10 Data from memory
Y 00000….0011 Data from memory loc. 1600loc. 1060 - 1091
- 1631
256 0000 1000 00000 3 (0011) N
1099 0010 0010 01011 8 (1000) Y 00000….0000 Data from memory loc. 256 - 287
9 (1001) N
1440 0010 1101 00000
10 (1010) N
1470 0010 1101 11110
11 (1011) N
1600 0011 0010 00000 12 (1100) Y 00000….1001 Data from memory loc. 4992 - 5023
1620 0011 0010 10100 13 (1101) Y 00000…00010 Data from memory loc. 1440 - 1471
Cache
Processor DRA
M
Write Buffer
Example:
WriteMem[100]
WriteMem[100]
ReadMem[200]
WriteMem[200]
WriteMem[100]
External memory slower than the system bus. Add external cache using faster 386
memory technology.
Increased processor speed results in external bus becoming a Move external cache on-chip, 486
bottleneck for cache access. operating at the same speed as the
processor.
Internal cache is rather small, due to limited space on chip Add external L2 cache using faster 486
technology than main memory
Contention occurs when both the Instruction Prefetcher and Create separate data and instruction Pentium
the Execution Unit simultaneously require access to the caches.
cache. In that case, the Prefetcher is stalled while the
Execution Unit’s data access takes place.
Increased processor speed results in external bus becoming a Create separate back-side bus that Pentium Pro
bottleneck for L2 cache access. runs at higher speed than the main
(front-side) external bus. The BSB is
dedicated to the L2 cache.
Some applications deal with massive databases and must Add external L3 cache. Pentium III
have rapid access to large amounts of data. The on-chip
caches are too small.
Move L3 cache on-chip. Pentium 4
Reducing Cache Misses Classifying Misses: 3 Cs
5.1 Introduction
Compulsory —The first access to a block is
5.2 The ABCs of Caches not in the cache, so the block must be brought
5.3 Reducing Cache Misses into the cache. Also called cold start misses
5.4 Reducing Cache Miss
or first reference misses .
Penalty (Misses in even an Infinite Cache)
Capacity —If the cache cannot contain all the
5.5 Reducing Hit Time
blocks needed during execution of a program,
5.6 Main Memory capacity misses will occur due to blocks
5.7 Virtual Memory being discarded and later retrieved.
(Misses in Fully Associative Size X Cache)
5.8 Protection and Examples
of Virtual Memory Conflict —If block-placement strategy is set
associative or direct mapped, conflict misses
(in addition to compulsory & capacity misses)
will occur because a block can be discarded
and later retrieved if too many blocks map to
its set. Also called collision misses or
interference misses .
(Misses in N-way Associative, Size X Cache)