9 - Computer Memory System Overview Cache Memory Principles
9 - Computer Memory System Overview Cache Memory Principles
LEARNING OBJECTIVES
After studying this chapter, you should be able to:
Present an overview of the main characteristics of computer memory systems
and the use of a memory hierarchy.
Describe the basic concepts and intent of cache memory.
Discuss the key elements of cache design.
Distinguish among direct mapping, associative mapping, and set-associative
mapping.
Explain the reasons for using multiple levels of cache.
Understand the performance implications of multiple levels of memory.
Although seemingly simple in concept, computer memory exhibits perhaps the wid-
est range of type, technology, organization, performance, and cost of any feature
of a computer system. No single technology is optimal in satisfying the memory
requirements for a computer system. As a consequence, the typical computer
system is equipped with a hierarchy of memory subsystems, some internal to the
system (directly accessible by the processor) and some external (accessible by the
processor via an I/O module).
This chapter and the next focus on internal memory elements, while Chapter 6
is devoted to external memory. To begin, the first section examines key characteristics
of computer memories. The remainder of the chapter examines an essential element
of all modern computer systems: cache memory.
Location Performance
Internal (e.g., processor registers, cache, main Access time
memory) Cycle time
External (e.g., optical disks, magnetic Transfer rate
disks, tapes)
Physical Type
Capacity
Semiconductor
Number of words
Magnetic
Number of bytes
Optical
Unit of Transfer
Magneto-optical
Word
Physical Characteristics
Block
Volatile/nonvolatile
Access Method
Erasable/nonerasable
Sequential
Organization
Direct
Memory modules
Random
Associative
module. This may be equal to the word length, but is often larger, such as 64, 128, or
256 bits. To clarify this point, consider three related concepts for internal memory:
• Word: The “natural” unit of organization of memory. The size of a word is typi-
cally equal to the number of bits used to represent an integer and to the instruc-
tion length. Unfortunately, there are many exceptions. For example, the CRAY
C90 (an older model CRAY supercomputer) has a 64-bit word length but uses
a 46-bit integer representation. The Intel x86 architecture has a wide variety of
instruction lengths, expressed as multiples of bytes, and a word size of 32 bits.
• Addressable units: In some systems, the addressable unit is the word. However,
many systems allow addressing at the byte level. In any case, the relationship
between the length in bits A of an address and the number N of addressable
units is 2A = N.
• Unit of transfer: For main memory, this is the number of bits read out of or
written into memory at a time. The unit of transfer need not equal a word or
an addressable unit. For external memory, data are often transferred in much
larger units than a word, and these are referred to as blocks.
Another distinction among memory types is the method of accessing units of
data. These include the following:
• Sequential access: Memory is organized into units of data, called records.
Access must be made in a specific linear sequence. Stored addressing informa-
tion is used to separate records and assist in the retrieval process. A shared
read–write mechanism is used, and this must be moved from its current loca-
tion to the desired location, passing and rejecting each intermediate record.
Thus, the time to access an arbitrary record is highly variable. Tape units, dis-
cussed in Chapter 6, are sequential access.
4.1 / COMPUTER MEMORY SYSTEM OVERVIEW 115
• Direct access: As with sequential access, direct access involves a shared
read–write mechanism. However, individual blocks or records have a unique
address based on physical location. Access is accomplished by direct access
to reach a general vicinity plus sequential searching, counting, or waiting to
reach the final location. Again, access time is variable. Disk units, discussed in
Chapter 6, are direct access.
• Random access: Each addressable location in memory has a unique, physically
wired-in addressing mechanism. The time to access a given location is inde-
pendent of the sequence of prior accesses and is constant. Thus, any location
can be selected at random and directly addressed and accessed. Main memory
and some cache systems are random access.
• Associative: This is a random access type of memory that enables one to make
a comparison of desired bit locations within a word for a specified match, and
to do this for all words simultaneously. Thus, a word is retrieved based on a
portion of its contents rather than its address. As with ordinary random-access
memory, each location has its own addressing mechanism, and retrieval time
is constant independent of location or prior access patterns. Cache memories
may employ associative access.
From a user’s point of view, the two most important characteristics of memory
are capacity and performance. Three performance parameters are used:
• Access time (latency): For random-access memory, this is the time it takes to
perform a read or write operation, that is, the time from the instant that an
address is presented to the memory to the instant that data have been stored
or made available for use. For non-random-access memory, access time is the
time it takes to position the read–write mechanism at the desired location.
• Memory cycle time: This concept is primarily applied to random-access memory
and consists of the access time plus any additional time required before a second
access can commence. This additional time may be required for transients to die
out on signal lines or to regenerate data if they are read destructively. Note that
memory cycle time is concerned with the system bus, not the processor.
• Transfer rate: This is the rate at which data can be transferred into or out of a
memory unit. For random-access memory, it is equal to 1/(cycle time).
For non-random-access memory, the following relationship holds:
n
Tn = TA + (4.1)
R
where
Tn = Average time to read or write n bits
TA = Average access time
n = Number of bits
R = Transfer rate, in bits per second (bps)
A variety of physical types of memory have been employed. The most com-
mon today are semiconductor memory, magnetic surface memory, used for disk and
tape, and optical and magneto-optical.
116 CHAPTER 4 / CACHE MEMORY
g-
Re rs
e
it
s
Inb e
me oard ch
mo Ca
ry in
Ma ory
m
me
isk
t i cd
e
Ou gn OM
t Ma D-R W
sto boar C D-R W
rag d C R M
e D-
DV -RA y
D a
DV lu-R
B
Of e
f
sto -line c tap
rag eti
e gn
Ma
Example 4.1 Suppose that the processor has access to two levels of memory. Level 1
contains 1000 words and has an access time of 0.01 μs; level 2 contains 100,000 words
and has an access time of 0.1 μs. Assume that if a word to be accessed is in level 1, then
the processor accesses it directly. If it is in level 2, then the word is first transferred to
level 1 and then accessed by the processor. For simplicity, we ignore the time required
for the processor to determine whether the word is in level 1 or level 2. Figure 4.2 shows
the general shape of the curve that covers this situation. The figure shows the average
access time to a two-level memory as a function of the hit ratio H, where H is defined as
the fraction of all memory accesses that are found in the faster memory (e.g., the cache),
T1 is the access time to level 1, and T2 is the access time to level 2.1 As can be seen, for
high percentages of level 1 access, the average total access time is much closer to that of
level 1 than that of level 2.
In our example, suppose 95% of the memory accesses are found in level 1. Then the
average time to access a word can be expressed as
The average access time is much closer to 0.01 μs than to 0.1 μs, as desired.
Accordingly, it is possible to organize data across the hierarchy such that the
percentage of accesses to each successively lower level is substantially less than that
of the level above. Consider the two-level example already presented. Let level 2
T1 T2
T2
Average access time
T1
0 1
Fraction of accesses involving only level 1 (hit ratio)
Figure 4.2 Performance of Accesses Involving only
Level 1 (hit ratio)
1
If the accessed word is found in the faster memory, that is defined as a hit. A miss occurs if the accessed
word is not found in the faster memory.
4.1 / COMPUTER MEMORY SYSTEM OVERVIEW 119
memory contains all program instructions and data. The current clusters can be
temporarily placed in level 1. From time to time, one of the clusters in level 1 will
have to be swapped back to level 2 to make room for a new cluster coming in to
level 1. On average, however, most references will be to instructions and data con-
tained in level 1.
This principle can be applied across more than two levels of memory, as sug-
gested by the hierarchy shown in Figure 4.1. The fastest, smallest, and most expen-
sive type of memory consists of the registers internal to the processor. Typically, a
processor will contain a few dozen such registers, although some machines contain
hundreds of registers. Main memory is the principal internal memory system of
the computer. Each location in main memory has a unique address. Main memory
is usually extended with a higher-speed, smaller cache. The cache is not usually
visible to the programmer or, indeed, to the processor. It is a device for staging
the movement of data between main memory and processor registers to improve
performance.
The three forms of memory just described are, typically, volatile and employ
semiconductor technology. The use of three levels exploits the fact that semicon-
ductor memory comes in a variety of types, which differ in speed and cost. Data are
stored more permanently on external mass storage devices, of which the most com-
mon are hard disk and removable media, such as removable magnetic disk, tape,
and optical storage. External, nonvolatile memory is also referred to as secondary
memory or auxiliary memory. These are used to store program and data files and
are usually visible to the programmer only in terms of files and records, as opposed
to individual bytes or words. Disk is also used to provide an extension to main mem-
ory known as virtual memory, which is discussed in Chapter 8.
Other forms of memory may be included in the hierarchy. For example, large
IBM mainframes include a form of internal memory known as expanded storage.
This uses a semiconductor technology that is slower and less expensive than that
of main memory. Strictly speaking, this memory does not fit into the hierarchy but
is a side branch: Data can be moved between main memory and expanded storage
but not between expanded storage and external memory. Other forms of secondary
memory include optical and magneto-optical disks. Finally, additional levels can be
effectively added to the hierarchy in software. A portion of main memory can be
used as a buffer to hold data temporarily that is to be read out to disk. Such a tech-
nique, sometimes referred to as a disk cache,2 improves performance in two ways:
• Disk writes are clustered. Instead of many small transfers of data, we have a
few large transfers of data. This improves disk performance and minimizes
processor involvement.
• Some data destined for write-out may be referenced by a program before the
next dump to disk. In that case, the data are retrieved rapidly from the soft-
ware cache rather than slowly from the disk.
Appendix 4A examines the performance implications of multilevel memory
structures.
2
Disk cache is generally a purely software technique and is not examined in this book. See [STAL12] for
a discussion.
120 CHAPTER 4 / CACHE MEMORY
Cache memory is designed to combine the memory access time of expensive, high-
speed memory combined with the large memory size of less expensive, lower-speed
memory. The concept is illustrated in Figure 4.3a. There is a relatively large and slow
main memory together with a smaller, faster cache memory. The cache contains a
copy of portions of main memory. When the processor attempts to read a word of
memory, a check is made to determine if the word is in the cache. If so, the word is
delivered to the processor. If not, a block of main memory, consisting of some fixed
number of words, is read into the cache and then the word is delivered to the pro-
cessor. Because of the phenomenon of locality of reference, when a block of data is
fetched into the cache to satisfy a single memory reference, it is likely that there will
be future references to that same memory location or to other words in the block.
Figure 4.3b depicts the use of multiple levels of cache. The L2 cache is slower
and typically larger than the L1 cache, and the L3 cache is slower and typically
larger than the L2 cache.
Figure 4.4 depicts the structure of a cache/main-memory system. Main mem-
ory consists of up to 2n addressable words, with each word having a unique n-bit
address. For mapping purposes, this memory is considered to consist of a number
of fixed-length blocks of K words each. That is, there are M = 2n/K blocks in main
memory. The cache consists of m blocks, called lines.3 Each line contains K words,
Block transfer
Word transfer
Fastest Fast
Less Slow
fast
3
In referring to the basic unit of the cache, the term line is used, rather than the term block, for two rea-
sons: (1) to avoid confusion with a main memory block, which contains the same number of data words as
a cache line; and (2) because a cache line includes not only K words of data, just as a main memory block,
but also includes tag and control bits.
4.2 / CACHE MEMORY PRINCIPLES 121
Line Memory
number Tag Block address
0 0
1 1
2 2 Block 0
3 (K words)
•
•
•
C1
Block length
(K words) •
(a) Cache •
•
Block M–1
2n 1
Word
length
(b) Main memory
plus a tag of a few bits. Each line also includes control bits (not shown), such as a
bit to indicate whether the line has been modified since being loaded into the cache.
The length of a line, not including tag and control bits, is the line size. The line
size may be as small as 32 bits, with each “word” being a single byte; in this case
the line size is 4 bytes. The number of lines is considerably less than the number
of main memory blocks (m V M). At any time, some subset of the blocks of
memory resides in lines in the cache. If a word in a block of memory is read, that
block is transferred to one of the lines of the cache. Because there are more blocks
than lines, an individual line cannot be uniquely and permanently dedicated to a
particular block. Thus, each line includes a tag that identifies which particular block
is currently being stored. The tag is usually a portion of the main memory address,
as described later in this section.
Figure 4.5 illustrates the read operation. The processor generates the read
address (RA) of a word to be read. If the word is contained in the cache, it is deliv-
ered to the processor. Otherwise, the block containing that word is loaded into the
cache, and the word is delivered to the processor. Figure 4.5 shows these last two
operations occurring in parallel and reflects the organization shown in Figure 4.6,
which is typical of contemporary cache organizations. In this organization, the cache
connects to the processor via data, control, and address lines. The data and address
lines also attach to data and address buffers, which attach to a system bus from
122 CHAPTER 4 / CACHE MEMORY
START
Receive address
RA from CPU
Load main
memory block Deliver RA word
into cache line to CPU
DONE
which main memory is reached. When a cache hit occurs, the data and address buff-
ers are disabled and communication is only between processor and cache, with no
system bus traffic. When a cache miss occurs, the desired address is loaded onto the
system bus and the data are returned through the data buffer to both the cache and
the processor. In other organizations, the cache is physically interposed between
the processor and the main memory for all data, address, and control lines. In this
latter case, for a cache miss, the desired word is first read into the cache and then
transferred from cache to processor.
A discussion of the performance parameters related to cache use is contained
in Appendix 4A.
4.3 / ELEMENTS OF CACHE DESIGN 123
Address
Address
buffer
System bus
Control Control
Processor Cache
Data
buffer
Data
This section provides an overview of cache design parameters and reports some
typical results. We occasionally refer to the use of caches in high-performance com-
puting (HPC). HPC deals with supercomputers and their software, especially for
scientific applications that involve large amounts of data, vector and matrix com-
putation, and the use of parallel algorithms. Cache design for HPC is quite differ-
ent than for other hardware platforms and applications. Indeed, many researchers
have found that HPC applications perform poorly on computer architectures that
employ caches [BAIL93]. Other researchers have since shown that a cache hierar-
chy can be useful in improving performance if the application software is tuned to
exploit the cache [WANG99, PRES01].4
Although there are a large number of cache implementations, there are a few
basic design elements that serve to classify and differentiate cache architectures.
Table 4.2 lists key elements.
Cache Addresses
Almost all nonembedded processors, and many embedded processors, support vir-
tual memory, a concept discussed in Chapter 8. In essence, virtual memory is a facil-
ity that allows programs to address memory from a logical point of view, without
4
For a general discussion of HPC, see [DOWD98].