Lec2 PDF
Lec2 PDF
Lec2 PDF
Basic components
Memory hierarchy
Cache memory
Virtual Memory
CPU D t transfer
Date t f Main
Memory
Control
Data transfer
Control
Secondary Memory
1
Main Memory Model
A word (8, 16, 32, or 64 bits)
Memory
Control
Unit
3
address 21
0
Memory Characteristics
The most important characteristics of a memory:
speed — as fast as possible;
size — as large as possible;
cost — reasonable price.
They are determined by the technology used for implementation.
Your personal library
2
Memory Access Bottleneck
CPU
Memory
Memory Bandwidth
Memory bandwidth denotes the amount of data that can
be accessed from a memory per second:
1
M-Bandwidth
M Bandwidth = ∙ amount of data p
per access
l ti
memory cycle time
Ex. MCT = 100 nano second and 4 bytes (a word) per access:
M-Bandwidth = 40 mega bytes per second.
3
Memory Banks
Interleaving
g placement
p of program
p g and data
12 13 14 15
8 9 10 11
4 5 6 7
0 1 2 3
CPU
Basic components
Memory hierarchy
Cache memory
Virtual Memory
4
Motivation
What do we need?
A memory to store very large programs and to work at a speed
comparable to that of the CPU.
A solution:
T
To build
b ild a compositeit memory systemt which
hi h combines
bi a smallll
and fast memory with a large and slow memory, and behaves,
most of the time, like a large and fast memory.
The two level principle above can be extended into a hierarchy
of many levels.
Memory Hierarchy
CPU
Registers
Cache
Main Memory
Secondary Memory
of direct access type
Secondary Memory
of archive type
5
Access time
Memory Hierarchy
example
CPU Capacity example
1-10 ns Registers 16-256
As one goes down the
10 0 ns
10-50 Cache 4 512K
4-512K hierarchy, the following
occur:
40-500 ns Main Memory 4-256M
Decreasing cost/bit.
Increasing capacity.
5-100 ms Secondary Memory Increasing access
40G/unit
(for 4KB) of direct access type time.
Decreasing frequency
of access by the CPU.
0.5-5 s Secondary Memory 50M/tape
(for 8KB) of archive type
Basic components
Memory hierarchy
Cache memory
Virtual Memory
6
Mismatch of CPU and MM Speeds
4
10
Speed Gap
(ca. one order
d)
Cyclle Time (nano second
10
3 of magnitude,
i.e., 10 times)
2
10
1
10
0
1955 1960 1965 1970 1975 1980 1985 1990 2000 2005
Cache Memory
CPU Main Memory
addresses
instructions
and data
addresses instructions
and data
Cache addresses
iinstructions
t ti
and data
7
Zebo’s Cache Memory Model
Personal library for a high-speed reader
Cache
Storage cells Memory controller
8
Locality of Reference
Temporal locality: If an item is referenced, it will
tend to be referenced again soon.
Spatial locality: If an item is referenced
referenced, items
whose addresses are close by will tend to be
referenced soon.
This access pattern is referred as locality of
reference principle, which is an intrinsic features
of the von Neumann architecture:
Sequential instruction storage.
Loops and iterations (e.g., subroutine calls).
Sequential data storage (e.g., array).
Ex. A computer has 8MB MM with 100 ns access time, 8KB cache with 10 ns
access time, BS=4, and Tchecking = 0, Phit = 0.97, AAT will be 22.9 ns.
9
Cache Design
The size and nature of the copied block must be care-
fully designed, as well as the algorithm to decide which
block to be removed from the cache when it is full:
Cache block size (line size).
Total cache size.
Mapping function.
Replacement method.
Write policy.
Numbers of caches:
• Single, two-level, or three-level cache.
• Unified vs. split cache.
Unified caches:
+ Better balance the load between instruction and data fetches
depending on the dynamics of the program execution.
+ Design and implementation are cheaper.
Lower performance.
10
Direct Mapping Cache
Direct mapping - Each block of the main memory
is mapped into a fixed cache slot.
1
2
1
1
2
2
Cache
Storage cells Memory controller
11
Direct Mapping Pros & Cons
Simple to implement and therefore inexpensive.
Fixed location for blocks.
If a program accesses 2 blocks
bl k that
th t map tot the
th same
cache slot repeatedly, cache miss rate is very high.
1
2
1
1
2
2
Cache
Storage cells Memory controller
Associative Mapping
A main memory block can be loaded into any slot of the cache.
To determine if a block is in the cache, a mechanism is needed
to simultaneously examine every slot’s tag.
associative memory example
9990-9999
Tag (3 ps)
Tag 90-99
80-89
0120-0129 70-79
0110-0119 66-69
0100-0109
0100 0109 0106 , 0107 50-59
40-49
010 30-39
0020-0029 287 20-29
0010-0019 001 10-19
0000-0009 297 00-09
10,000-Word Memory 100-Word Cache
12
Fully Associative Organization
13
Replacement Algorithms
With direct mapping, it is no need.
With associative mapping, a replacement algorithm is
needed in order to determine which block to replace:
First-in-first-out (FIFO).
Use info Tag
Least-recently used (LRU) -
replace the block that has been
in the cache longest with not
reference to it.
Lest-frequently
Lest frequently used (LFU) -
replace the block that has
experienced the fewest 15:55
references.
Random.
Write Policy
The problem:
How to keep cache content and main memory content
consistent without losing too much performance?
Write through:
All write operations are passed to main memory:
If the addressed location is currently in the cache, the
cache is updated so that it is coherent with the main
memory.
For writes, the processor always slows down to main
memory speed.
Since the percentage of writes is small (ca. 15%), this
scheme doesn’t lead to large performance reduction.
14
Write Policy (Cont’d)
Write through with buffered write:
The same as write-through, but instead of slowing the processor
down by writing directly to main memory, the write address and data
are stored in a high-speed
high speed write buffer; the write buffer transfers
data to main memory while the processor continues its task.
Higher speed, but more complex hardware.
Write back:
Write operations update only the cache memory which is not kept
coherent with main memory. When the slot is replaced from the
cache,
h itits content
t th has tto b
be copied
i dbback
k tto memory.
Good performance (usually several writes are performed on a cache
block before it is replaced), but more complex hardware is needed.
15
Cache Architecture Examples (Cont’d)
Intel Itanium 2 (introduced 2002) three levels of cache:
L1 L2 L3
Basic components
Memory hierarchy
Cache memory
Virtual Memory
16
Motivation for Virtual Memory
The physical main memory (RAM) is very limited in space.
It may not be big enough to store all the executing
programs at the same time.
time
Some program may need memory larger than the main
memory size, but not all the program need to be
maintained in the main memory at the same time.
Virtual Memory takes advantage of the fact that at any
given instant of time, an executing program needs only a
fraction of the memory that the whole program occupies.
The basic idea: Load only pieces of each executing
program which are currently needed.
Paging
Divide programs (processes) into equal sized, small
blocks, called pages.
Divide the p
primary
y memoryy into equal
q sized,, small
blocks called page frames.
Allocate the required number of page frames to a
program.
A program does not require continuous page
frames!
The operating system (OS) is responsible for:
Maintaining a list of free frames.
Using a page table to keep track of the mapping
between pages and page frames.
17
Logical and Physical Addresses
Implementation of
th page-tables:
the t bl
Main memory —
slow since an
extra memory
access is needed.
0 Separate registers
1 — fast but
2 expensive.
3
Cache.
Virtual Memory
Demand paging
Do not require all pages of a process in memory.
Bring in pages as required.
Page fault
Required page is not in memory.
Operating system must swap in the required page.
May need to swap out a page to make space.
Select page to throw out based on recent history.
18
Objective of Virtual Memory
To provide the user/programmer with a much bigger
memory than the main memory with the help of the
operative system.
Virtual memory >> real memory
MM addresses
Program addresses
0000
0000
1000 Secondary memory
1000
2000
2000
3000
3000
4000
5000
Page Fault
When accessing a VM page which is not in the main memory,
a page fault occurs. The page must then be loaded from the
secondary memory into the main memory by the OS.
Vi t l Add
Virtual Address
Page Number Offset
Page Map
Page Fault
(Interrupt to OS)
Pages in MM
19
Page Replacement
When a page fault occurs and all page frames are
occupied, one of them must be replaced.
If the replaced page has been modified during the time it
resides in the main memory, the updated version should
be written back to the secondary memory.
Our wish is to replace the page which will not be
accessed in the future for the longest amount of time.
Problem — We don’t know exactly what will happen in
the future.
Solution — We predict the future by studying the access
patterns up till now (“learn from history”).
Replacement Algorithms
FIFO (First In First Out) — To replace the one in
MM the longest of time.
LRU (Least Recently Used) — To replace the
one that has not be accessed the longest time.
LFU (Least Frequently Used) — To replace the
one that has the smallest number of access
during the latest time period.
20
Summary
A memory system has to fit very large programs and still
provide fast access.
No single
g type
yp of memory y can p
provide all the need of a
computer system.
Usually several different storage mechanisms are
organized in a layer hierarchy.
Cache is a hardware solution to improve memory access which
is transparent to the programmers.
Virtual memory provides a much larger address space than the
available physical space with the help of the OS (software
solution).
The layer structure works very well partly due to the
locality of reference principle.
21