Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
141 views

9 - Computer Memory System Overview Cache Memory Principles

This document provides an overview of computer memory systems and characteristics. It discusses the memory hierarchy including internal memory like cache and main memory, as well as external memory. The key points covered are the location, capacity, unit of transfer, access methods like sequential, direct, random and associative, and performance metrics like access time, cycle time and transfer rate of different types of computer memory.

Uploaded by

Aki
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
141 views

9 - Computer Memory System Overview Cache Memory Principles

This document provides an overview of computer memory systems and characteristics. It discusses the memory hierarchy including internal memory like cache and main memory, as well as external memory. The key points covered are the location, capacity, unit of transfer, access methods like sequential, direct, random and associative, and performance metrics like access time, cycle time and transfer rate of different types of computer memory.

Uploaded by

Aki
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

4.

1 / COMPUTER MEMORY SYSTEM OVERVIEW 113

LEARNING OBJECTIVES
After studying this chapter, you should be able to:
 Present an overview of the main characteristics of computer memory systems
and the use of a memory hierarchy.
 Describe the basic concepts and intent of cache memory.
 Discuss the key elements of cache design.
 Distinguish among direct mapping, associative mapping, and set-associative
mapping.
 Explain the reasons for using multiple levels of cache.
 Understand the performance implications of multiple levels of memory.

Although seemingly simple in concept, computer memory exhibits perhaps the wid-
est range of type, technology, organization, performance, and cost of any feature
of a computer system. No single technology is optimal in satisfying the memory
requirements for a computer system. As a consequence, the typical computer
system is equipped with a hierarchy of memory subsystems, some internal to the
system (directly accessible by the processor) and some external (accessible by the
processor via an I/O module).
This chapter and the next focus on internal memory elements, while Chapter 6
is devoted to external memory. To begin, the first section examines key characteristics
of computer memories. The remainder of the chapter examines an essential element
of all modern computer systems: cache memory.

4.1 COMPUTER MEMORY SYSTEM OVERVIEW

Characteristics of Memory Systems


The complex subject of computer memory is made more manageable if we classify
memory systems according to their key characteristics. The most important of these
are listed in Table 4.1.
The term location in Table 4.1 refers to whether memory is internal and exter-
nal to the computer. Internal memory is often equated with main memory. But there
are other forms of internal memory. The processor requires its own local memory, in
the form of registers (e.g., see Figure 2.3). Further, as we shall see, the control unit
portion of the processor may also require its own internal memory. We will defer
discussion of these latter two types of internal memory to later chapters. Cache is
another form of internal memory. External memory consists of peripheral storage
devices, such as disk and tape, that are accessible to the processor via I/O controllers.
An obvious characteristic of memory is its capacity. For internal memory, this is
typically expressed in terms of bytes (1 byte = 8 bits) or words. Common word lengths
are 8, 16, and 32 bits. External memory capacity is typically expressed in terms of bytes.
A related concept is the unit of transfer. For internal memory, the unit
of transfer is equal to the number of electrical lines into and out of the memory
114 CHAPTER 4 / CACHE MEMORY

Table 4.1 Key Characteristics of Computer Memory Systems

Location Performance
Internal (e.g., processor registers, cache, main Access time
memory) Cycle time
External (e.g., optical disks, magnetic Transfer rate
disks, tapes)
Physical Type
Capacity
Semiconductor
Number of words
Magnetic
Number of bytes
Optical
Unit of Transfer
Magneto-optical
Word
Physical Characteristics
Block
Volatile/nonvolatile
Access Method
Erasable/nonerasable
Sequential
Organization
Direct
Memory modules
Random
Associative

module. This may be equal to the word length, but is often larger, such as 64, 128, or
256 bits. To clarify this point, consider three related concepts for internal memory:
• Word: The “natural” unit of organization of memory. The size of a word is typi-
cally equal to the number of bits used to represent an integer and to the instruc-
tion length. Unfortunately, there are many exceptions. For example, the CRAY
C90 (an older model CRAY supercomputer) has a 64-bit word length but uses
a 46-bit integer representation. The Intel x86 architecture has a wide variety of
instruction lengths, expressed as multiples of bytes, and a word size of 32 bits.
• Addressable units: In some systems, the addressable unit is the word. However,
many systems allow addressing at the byte level. In any case, the relationship
between the length in bits A of an address and the number N of addressable
units is 2A = N.
• Unit of transfer: For main memory, this is the number of bits read out of or
written into memory at a time. The unit of transfer need not equal a word or
an addressable unit. For external memory, data are often transferred in much
larger units than a word, and these are referred to as blocks.
Another distinction among memory types is the method of accessing units of
data. These include the following:
• Sequential access: Memory is organized into units of data, called records.
Access must be made in a specific linear sequence. Stored addressing informa-
tion is used to separate records and assist in the retrieval process. A shared
read–write mechanism is used, and this must be moved from its current loca-
tion to the desired location, passing and rejecting each intermediate record.
Thus, the time to access an arbitrary record is highly variable. Tape units, dis-
cussed in Chapter 6, are sequential access.
4.1 / COMPUTER MEMORY SYSTEM OVERVIEW 115
• Direct access: As with sequential access, direct access involves a shared
read–write mechanism. However, individual blocks or records have a unique
address based on physical location. Access is accomplished by direct access
to reach a general vicinity plus sequential searching, counting, or waiting to
reach the final location. Again, access time is variable. Disk units, discussed in
Chapter 6, are direct access.
• Random access: Each addressable location in memory has a unique, physically
wired-in addressing mechanism. The time to access a given location is inde-
pendent of the sequence of prior accesses and is constant. Thus, any location
can be selected at random and directly addressed and accessed. Main memory
and some cache systems are random access.
• Associative: This is a random access type of memory that enables one to make
a comparison of desired bit locations within a word for a specified match, and
to do this for all words simultaneously. Thus, a word is retrieved based on a
portion of its contents rather than its address. As with ordinary random-access
memory, each location has its own addressing mechanism, and retrieval time
is constant independent of location or prior access patterns. Cache memories
may employ associative access.
From a user’s point of view, the two most important characteristics of memory
are capacity and performance. Three performance parameters are used:
• Access time (latency): For random-access memory, this is the time it takes to
perform a read or write operation, that is, the time from the instant that an
address is presented to the memory to the instant that data have been stored
or made available for use. For non-random-access memory, access time is the
time it takes to position the read–write mechanism at the desired location.
• Memory cycle time: This concept is primarily applied to random-access memory
and consists of the access time plus any additional time required before a second
access can commence. This additional time may be required for transients to die
out on signal lines or to regenerate data if they are read destructively. Note that
memory cycle time is concerned with the system bus, not the processor.
• Transfer rate: This is the rate at which data can be transferred into or out of a
memory unit. For random-access memory, it is equal to 1/(cycle time).
For non-random-access memory, the following relationship holds:
n
Tn = TA + (4.1)
R
where
Tn = Average time to read or write n bits
TA = Average access time
n = Number of bits
R = Transfer rate, in bits per second (bps)
A variety of physical types of memory have been employed. The most com-
mon today are semiconductor memory, magnetic surface memory, used for disk and
tape, and optical and magneto-optical.
116 CHAPTER 4 / CACHE MEMORY

Several physical characteristics of data storage are important. In a volatile


memory, information decays naturally or is lost when electrical power is switched
off. In a nonvolatile memory, information once recorded remains without deterio-
ration until deliberately changed; no electrical power is needed to retain informa-
tion. Magnetic-surface memories are nonvolatile. Semiconductor memory (memory
on integrated circuits) may be either volatile or nonvolatile. Nonerasable memory
cannot be altered, except by destroying the storage unit. Semiconductor memory of
this type is known as read-only memory (ROM). Of necessity, a practical noneras-
able memory must also be nonvolatile.
For random-access memory, the organization is a key design issue. In this con-
text, organization refers to the physical arrangement of bits to form words. The
obvious arrangement is not always used, as is explained in Chapter 5.

The Memory Hierarchy


The design constraints on a computer’s memory can be summed up by three ques-
tions: How much? How fast? How expensive?
The question of how much is somewhat open ended. If the capacity is there,
applications will likely be developed to use it. The question of how fast is, in a sense,
easier to answer. To achieve greatest performance, the memory must be able to
keep up with the processor. That is, as the processor is executing instructions, we
would not want it to have to pause waiting for instructions or operands. The final
question must also be considered. For a practical system, the cost of memory must
be reasonable in relationship to other components.
As might be expected, there is a trade-off among the three key characteristics
of memory: capacity, access time, and cost. A variety of technologies are used to
implement memory systems, and across this spectrum of technologies, the following
relationships hold:
• Faster access time, greater cost per bit
• Greater capacity, smaller cost per bit
• Greater capacity, slower access time
The dilemma facing the designer is clear. The designer would like to use mem-
ory technologies that provide for large-capacity memory, both because the capac-
ity is needed and because the cost per bit is low. However, to meet performance
requirements, the designer needs to use expensive, relatively lower-capacity memo-
ries with short access times.
The way out of this dilemma is not to rely on a single memory component or
technology, but to employ a memory hierarchy. A typical hierarchy is illustrated in
Figure 4.1. As one goes down the hierarchy, the following occur:
a. Decreasing cost per bit
b. Increasing capacity
c. Increasing access time
d. Decreasing frequency of access of the memory by the processor
Thus, smaller, more expensive, faster memories are supplemented by larger,
cheaper, slower memories. The key to the success of this organization is item (d):
4.1 / COMPUTER MEMORY SYSTEM OVERVIEW 117

g-
Re rs
e
it
s
Inb e
me oard ch
mo Ca
ry in
Ma ory
m
me
isk
t i cd
e
Ou gn OM
t Ma D-R W
sto boar C D-R W
rag d C R M
e D-
DV -RA y
D a
DV lu-R
B

Of e
f
sto -line c tap
rag eti
e gn
Ma

Figure 4.1 The Memory Hierarchy

decreasing frequency of access. We examine this concept in greater detail when we


discuss the cache, later in this chapter, and virtual memory in Chapter 8. A brief
explanation is provided at this point.
The use of two levels of memory to reduce average access time works in prin-
ciple, but only if conditions (a) through (d) apply. By employing a variety of tech-
nologies, a spectrum of memory systems exists that satisfies conditions (a) through
(c). Fortunately, condition (d) is also generally valid.
The basis for the validity of condition (d) is a principle known as locality of
reference [DENN68]. During the course of execution of a program, memory refer-
ences by the processor, for both instructions and data, tend to cluster. Programs
typically contain a number of iterative loops and subroutines. Once a loop or sub-
routine is entered, there are repeated references to a small set of instructions.
Similarly, operations on tables and arrays involve access to a clustered set of data
words. Over a long period of time, the clusters in use change, but over a short period
of time, the processor is primarily working with fixed clusters of memory references.
118 CHAPTER 4 / CACHE MEMORY

Example 4.1 Suppose that the processor has access to two levels of memory. Level 1
contains 1000 words and has an access time of 0.01 μs; level 2 contains 100,000 words
and has an access time of 0.1 μs. Assume that if a word to be accessed is in level 1, then
the processor accesses it directly. If it is in level 2, then the word is first transferred to
level 1 and then accessed by the processor. For simplicity, we ignore the time required
for the processor to determine whether the word is in level 1 or level 2. Figure 4.2 shows
the general shape of the curve that covers this situation. The figure shows the average
access time to a two-level memory as a function of the hit ratio H, where H is defined as
the fraction of all memory accesses that are found in the faster memory (e.g., the cache),
T1 is the access time to level 1, and T2 is the access time to level 2.1 As can be seen, for
high percentages of level 1 access, the average total access time is much closer to that of
level 1 than that of level 2.
In our example, suppose 95% of the memory accesses are found in level 1. Then the
average time to access a word can be expressed as

(0.95)(0.01 ms) + (0.05)(0.01 ms + 0.1 ms) = 0.0095 + 0.0055 = 0.015 ms

The average access time is much closer to 0.01 μs than to 0.1 μs, as desired.

Accordingly, it is possible to organize data across the hierarchy such that the
percentage of accesses to each successively lower level is substantially less than that
of the level above. Consider the two-level example already presented. Let level 2

T1  T2

T2
Average access time

T1

0 1
Fraction of accesses involving only level 1 (hit ratio)
Figure 4.2 Performance of Accesses Involving only
Level 1 (hit ratio)

1
If the accessed word is found in the faster memory, that is defined as a hit. A miss occurs if the accessed
word is not found in the faster memory.
4.1 / COMPUTER MEMORY SYSTEM OVERVIEW 119
memory contains all program instructions and data. The current clusters can be
temporarily placed in level 1. From time to time, one of the clusters in level 1 will
have to be swapped back to level 2 to make room for a new cluster coming in to
level 1. On average, however, most references will be to instructions and data con-
tained in level 1.
This principle can be applied across more than two levels of memory, as sug-
gested by the hierarchy shown in Figure 4.1. The fastest, smallest, and most expen-
sive type of memory consists of the registers internal to the processor. Typically, a
processor will contain a few dozen such registers, although some machines contain
hundreds of registers. Main memory is the principal internal memory system of
the computer. Each location in main memory has a unique address. Main memory
is usually extended with a higher-speed, smaller cache. The cache is not usually
visible to the programmer or, indeed, to the processor. It is a device for staging
the movement of data between main memory and processor registers to improve
performance.
The three forms of memory just described are, typically, volatile and employ
semiconductor technology. The use of three levels exploits the fact that semicon-
ductor memory comes in a variety of types, which differ in speed and cost. Data are
stored more permanently on external mass storage devices, of which the most com-
mon are hard disk and removable media, such as removable magnetic disk, tape,
and optical storage. External, nonvolatile memory is also referred to as secondary
memory or auxiliary memory. These are used to store program and data files and
are usually visible to the programmer only in terms of files and records, as opposed
to individual bytes or words. Disk is also used to provide an extension to main mem-
ory known as virtual memory, which is discussed in Chapter 8.
Other forms of memory may be included in the hierarchy. For example, large
IBM mainframes include a form of internal memory known as expanded storage.
This uses a semiconductor technology that is slower and less expensive than that
of main memory. Strictly speaking, this memory does not fit into the hierarchy but
is a side branch: Data can be moved between main memory and expanded storage
but not between expanded storage and external memory. Other forms of secondary
memory include optical and magneto-optical disks. Finally, additional levels can be
effectively added to the hierarchy in software. A portion of main memory can be
used as a buffer to hold data temporarily that is to be read out to disk. Such a tech-
nique, sometimes referred to as a disk cache,2 improves performance in two ways:
• Disk writes are clustered. Instead of many small transfers of data, we have a
few large transfers of data. This improves disk performance and minimizes
processor involvement.
• Some data destined for write-out may be referenced by a program before the
next dump to disk. In that case, the data are retrieved rapidly from the soft-
ware cache rather than slowly from the disk.
Appendix 4A examines the performance implications of multilevel memory
structures.

2
Disk cache is generally a purely software technique and is not examined in this book. See [STAL12] for
a discussion.
120 CHAPTER 4 / CACHE MEMORY

4.2 CACHE MEMORY PRINCIPLES

Cache memory is designed to combine the memory access time of expensive, high-
speed memory combined with the large memory size of less expensive, lower-speed
memory. The concept is illustrated in Figure 4.3a. There is a relatively large and slow
main memory together with a smaller, faster cache memory. The cache contains a
copy of portions of main memory. When the processor attempts to read a word of
memory, a check is made to determine if the word is in the cache. If so, the word is
delivered to the processor. If not, a block of main memory, consisting of some fixed
number of words, is read into the cache and then the word is delivered to the pro-
cessor. Because of the phenomenon of locality of reference, when a block of data is
fetched into the cache to satisfy a single memory reference, it is likely that there will
be future references to that same memory location or to other words in the block.
Figure 4.3b depicts the use of multiple levels of cache. The L2 cache is slower
and typically larger than the L1 cache, and the L3 cache is slower and typically
larger than the L2 cache.
Figure 4.4 depicts the structure of a cache/main-memory system. Main mem-
ory consists of up to 2n addressable words, with each word having a unique n-bit
address. For mapping purposes, this memory is considered to consist of a number
of fixed-length blocks of K words each. That is, there are M = 2n/K blocks in main
memory. The cache consists of m blocks, called lines.3 Each line contains K words,
Block transfer
Word transfer

CPU Cache Main memory


Fast Slow

(a) Single cache

Level 1 Level 2 Level 3 Main


CPU
(L1) cache (L2) cache (L3) cache memory

Fastest Fast
Less Slow
fast

(b) Three-level cache organization

Figure 4.3 Cache and Main Memory

3
In referring to the basic unit of the cache, the term line is used, rather than the term block, for two rea-
sons: (1) to avoid confusion with a main memory block, which contains the same number of data words as
a cache line; and (2) because a cache line includes not only K words of data, just as a main memory block,
but also includes tag and control bits.
4.2 / CACHE MEMORY PRINCIPLES 121
Line Memory
number Tag Block address
0 0
1 1
2 2 Block 0
3 (K words)


C1
Block length
(K words) •
(a) Cache •

Block M–1

2n  1
Word
length
(b) Main memory

Figure 4.4 Cache/Main Memory Structure

plus a tag of a few bits. Each line also includes control bits (not shown), such as a
bit to indicate whether the line has been modified since being loaded into the cache.
The length of a line, not including tag and control bits, is the line size. The line
size may be as small as 32 bits, with each “word” being a single byte; in this case
the line size is 4 bytes. The number of lines is considerably less than the number
of main memory blocks (m V M). At any time, some subset of the blocks of
memory resides in lines in the cache. If a word in a block of memory is read, that
block is transferred to one of the lines of the cache. Because there are more blocks
than lines, an individual line cannot be uniquely and permanently dedicated to a
particular block. Thus, each line includes a tag that identifies which particular block
is currently being stored. The tag is usually a portion of the main memory address,
as described later in this section.
Figure 4.5 illustrates the read operation. The processor generates the read
address (RA) of a word to be read. If the word is contained in the cache, it is deliv-
ered to the processor. Otherwise, the block containing that word is loaded into the
cache, and the word is delivered to the processor. Figure 4.5 shows these last two
operations occurring in parallel and reflects the organization shown in Figure 4.6,
which is typical of contemporary cache organizations. In this organization, the cache
connects to the processor via data, control, and address lines. The data and address
lines also attach to data and address buffers, which attach to a system bus from
122 CHAPTER 4 / CACHE MEMORY

START

Receive address
RA from CPU

Is block Access main


No
containing RA memory for block
in cache? containing RA
Yes

Fetch RA word Allocate cache


and deliver line for main
to CPU memory block

Load main
memory block Deliver RA word
into cache line to CPU

DONE

Figure 4.5 Cache Read Operation

which main memory is reached. When a cache hit occurs, the data and address buff-
ers are disabled and communication is only between processor and cache, with no
system bus traffic. When a cache miss occurs, the desired address is loaded onto the
system bus and the data are returned through the data buffer to both the cache and
the processor. In other organizations, the cache is physically interposed between
the processor and the main memory for all data, address, and control lines. In this
latter case, for a cache miss, the desired word is first read into the cache and then
transferred from cache to processor.
A discussion of the performance parameters related to cache use is contained
in Appendix 4A.
4.3 / ELEMENTS OF CACHE DESIGN 123

Address

Address
buffer

System bus
Control Control
Processor Cache

Data
buffer

Data

Figure 4.6 Typical Cache Organization

4.3 ELEMENTS OF CACHE DESIGN

This section provides an overview of cache design parameters and reports some
typical results. We occasionally refer to the use of caches in high-performance com-
puting (HPC). HPC deals with supercomputers and their software, especially for
scientific applications that involve large amounts of data, vector and matrix com-
putation, and the use of parallel algorithms. Cache design for HPC is quite differ-
ent than for other hardware platforms and applications. Indeed, many researchers
have found that HPC applications perform poorly on computer architectures that
employ caches [BAIL93]. Other researchers have since shown that a cache hierar-
chy can be useful in improving performance if the application software is tuned to
exploit the cache [WANG99, PRES01].4
Although there are a large number of cache implementations, there are a few
basic design elements that serve to classify and differentiate cache architectures.
Table 4.2 lists key elements.

Cache Addresses
Almost all nonembedded processors, and many embedded processors, support vir-
tual memory, a concept discussed in Chapter 8. In essence, virtual memory is a facil-
ity that allows programs to address memory from a logical point of view, without

4
For a general discussion of HPC, see [DOWD98].

You might also like