Memory Cache: Computer Architecture and Organization
Memory Cache: Computer Architecture and Organization
Memory Cache: Computer Architecture and Organization
Performance
Access Time, the length of time it takes the memory to complete its task.
Memory Cycle Time, the amount of time required to perform the next task.
Transfer Rate, memory speed receive or output data
Physical Type
Phycical Caracteristic
Volatile, memory that requires power to be used. Examples of RAM.
Non-Volatile, even if the device is off, the memory will retain data. For example HDD and
SSD (ROM).
4.2
CACHE MEMORY PRINCIPLES
Figure below depicts the use of multiple levels of
Cache memory is designed to combine the memory
cache. The L2 cache is slower and typically larger
access time of expensive, high-speed memory
than the L1 cache, and the L3 cache is slower and
combined with the large memory size of less
typically larger than the L2 cache. There is a
expensive, lower-speed memory. The concept is
relatively large and slow main memory together
shown below
with a smaller, faster cache memory.
The cache contains a copy of portions of main memory. When the processor attempts to
read a word of memory, a check is made to determine if the word is in the cache. If so, the
word is delivered to the processor. If not, a block of main memory, consisting of some fixed
number of words, is read into the cache and then the word is delivered to the processor.
Figure below depicts the structure of a cache/main-memory system. Main memory consists of up to 2n
addressable words, with each word having a unique n- bit address. For mapping purposes, this memory
is considered to consist of a number of fixed-length blocks of K words each. The cache consists of m
blocks, called lines. Each line contains K words,
Most non-embedded processors, and many embedded processors, support virtual memory.
Basically, virtual memory is a facility that allows programs to address memory from a logical point of
view, regardless of the amount of main memory physically available. When virtual memory is used,
the address field of the machine instruction contains the virtual address. To read and write from
memory, a hardware memory management unit (MMU) translates each virtual address into a
physical address in main memory.
When virtual addresses are used, the system designer may choose to place the cache between the
processor and the MMU or between the MMU and main memory A logical cache, also known as a
virtual cache, stores data using virtual addresses. The processor accesses the cache directly, without
going through the MMU. A physical cache stores data using main memory physical addresses.
CACHE ADDRESS
CACHE SIZE
The cache designer wants a cache that is small enough so that the average price per bit is close to
the price of main memory and large enough that the average access time is close to the cache itself.
There are several motivations for minimizing cache size. Where the larger the cache, the greater the
number of gates involved in addressing the cache. As a result, large caches tend to be somewhat
slower than smaller caches – even when built with the same IC technology and placed in the same
place in chips and circuit boards. Availability of chips and board area also limits cache size. Since
cache performance is very sensitive to the nature of its workload, it is not possible to obtain an
optimum cache size.
APPING FUNCTION
M
There are fewer cache channels compared to main memory blocks, an algorithm is needed for mapping main
memory blocks into cache channels. There are three types of techniques, namely as follows:
FIFO/LIFO
In FIFO the items that go to the cache first are
01 removed regardless of how often or how many
times they were accessed before. LIFO behaves the
other way around - removing the most recent item
from the cache.
Random Algorithme
This technique does not use any information in
04 determining which page to replace.
Every time a page fault occurs, the replacement is
chosen at random.
WRITE POLICY
When a block of data is retrieved and placed in the cache, not only the desired word but
also some number of adjacent words are retrieved. As the block becomes even bigger and
the probability of using the newly fetched information becomes less than the probability
of reusing the information that has to be replaced. Two specific effects come into play:
1. Larger blocks reduce the number of blocks that fit into a cache.
2. As a block becomes larger, each additional word is farther from the requested word
and therefore less likely to be needed in the near future.
The relationship between block size and hit ratio is complex, depending on the locality
characteristics of a particular program, and no definitive optimum value has been found.
UMBER OF CACHE
N
MULTIPLE CACHE (ON-CHIP)
Multilevel cache is one of the techniques to improve cache performance by reducing the “miss penalty”. The term
miss penalty refers to the extra time required to bring the data into cache from the main memory whenever
there is a “miss” in cache.
Typically, most contemporary designs include both on-chip (internal level 1 (L1)) and external caches (level 2
(L2)). For the reason If there is no L2 cache and the processor makes an access request for a memory location not
in the L1 cache, then the processor must access DRAM or ROM memory across the bus. On the other hand, if an
L2 SRAM (static RAM) cache is used, then frequently the missing information can be quickly retrieved.
Two features of contemporary cache design for multilevel caches are noteworthy.
1. Many designs use the system bus use a separate data path as the path for transfer between the L2 cache and
the processor to reduce the burden on the system bus.
2. With the continued shrinkage of processor components, combine the L2 cache on the processor chip,
improving performance.
Originally, the L3 cache was accessible over the external bus. More recently, most microprocessors have
incorporated an on- chip L3 cache.
Total Hit Ratio (L1 and L2) for 8-kB and 16-kB L1
UNIFINED VERSUS SPLIT CACHES
More recently, it has become common to split the cache into two: one dedicated to
instructions and one dedicated to data. These two caches both exist at the same
level, typically as two L1 caches. There are two potential advantages of a unified
cache:
1. For a given cache size, a unified cache has a higher hit rate than split caches
because it balances the load between instruction and data fetches automatically.
2. Only one cache needs to be designed and implemented.
The key advantage of the split cache design is that it eliminates contention for the
cache between the instruction fetch/decode unit and the execution unit.
4.4
PENTIUM 4 CACHE ORGANIZATION
The Pentium 4 is a seventh generation microprocessor created by Intel Corporation and
released in November 2000 following the Intel Pentium III processor.
Cache Level 1, separate cache and is Cache Level 2, unified cache and
8KB in size and is a four-way 256KB in size. The line size is 128
associative set. This means that each bytes and is an eight-way associative
set consists of four rows in the set. This means that each set consists
cache. The line size is 64 bytes. of eight rows in the cache.
Fetch/decode unit: fetches program instruction in order from the L2 cache, decodes these
into a series of micro-operations, and stores the results in the L1 instruction cache.
Out-of –order execution logic: schedules execution of the micro- operations subject to data
dependencies and resource and resource availability, thus micro- operations maybe
scheduled for execution in a different order than they were fetched from the instruction
stream.
Execution units: These units execute micro- operations, fetching required data from the L1
data cache and temporarily storing results in registers.
Memory subsystem: This unit includes the L2 and L3 cache and system bus, which is used to
access main memory when the L1 and L2cache have a cache miss, and to access the system
I/O resources.
APPENDIX 4A
PERFORMANCE CHARACTERISTICS OF TWO-
LEVEL MEMORIES
This two-tier architecture exploits a property known as locality to provide a comparable
increase in performance over a single tier of memory. The main memory cache
mechanism is part of the computer architecture, implemented in hardware and usually
not visible to the operating system. There are two other examples of a two-tier memory
approach that also exploit locality and which are, at least in part, implemented in the
operating system: virtual memory and disk cache.
The basis for the superior performance of two-tier memory is a principle known as
reference locality. This principle states that memory references tend to cluster. Over a
long period of time, the clusters used change, but in a short time, the processor mainly
works with fixed clusters of the original reference memory.
Two conditions in the difference in locality, that is:
Spatial locality, refers to the tendency of Temporal locality, refers to the tendency of the
execution to involve a number of clustered processor to access memory locations that have
memory locations and the processor accessing been used recently and take advantage of larger
instructions sequentially through the cache cache blocks by incorporating prefetching
hierarchy. mechanisms into cache control logic.
The locality property can be exploited in two-level memory formation. Upper-level
memory (MI) is smaller, faster, and more expensive (per bit) than lower-level memory
(M2). The ml is used as temporary storage for most of the contents of the larger M2.
When a memory reference is created, an attempt is made to access the item in Ml.