Cache Memory
Cache Memory
Cache Memory
c
If each block from main memory has only one place it can appear in the cache, the cache is said
to be
. Inorder to determine to which Cache line a main memory block is mapped
we can use the formula shown below
Let us assume we have a Main Memory of size 4GB (232), with each byte directly addressable by
a 32-bit address. We will divide Main memory into blocks of each 32 bytes (25). Thus there are
128M (i.e. 232/25 = 227) blocks in Main memory.
We have a Cache memory of 512KB (i.e. 219), divided into blocks of each 32 bytes (25). Thus
there are 16K (i.e. 219/25 = 214) blocks also known as c
or c in cache
memory. It is clear from above numbers that there are more Main memory blocks than Cache
slots.
The Main memory is not physically partitioned in the given way, but this is the view of
Main memory that the cache sees.
We are dividing both Main Memory and cache memory into blocks of same size i.e. 32
bytes.
A set of 8k (i.e. 227/214 = 213) Main memory blocks are mapped onto a single Cache slot. In order
to keep track of which of the 213 possible Main memory blocks are in each Cache slot, a 13-bit
tag field is added to each Cache slot which holds an identifier in the range from 0 to 213 ± 1.
All the tags are stored in a special tag memory where they can be searched in parallel. Whenever
a new block is stored in the cache, its tag is stored in the corresponding tag memory location.
When a program is first loaded into Main memory, the Cache is cleared, and so while a program
is executing, a valid bit is needed to indicate whether or not the slot holds a block that belongs to
the program being executed. There is also a dirty bit that keeps track of whether or not a block
has been modified while it is in the cache. A slot that is modified must be written back to the
main memory before the slot is reused for another block. When a program is initially loaded into
memory, the valid bits are all set to 0. The first instruction that is executed in the program will
therefore cause a miss, since none of the program is in the cache at this point. The block that
causes the miss is located in the main memory and is loaded into the cache.
This scheme is called "direct mapping" because each cache slot corresponds to an explicit set of
main memory blocks. For a direct mapped cache, each main memory block can be mapped to
only one slot, but each slot can receive more than one block.
The mapping from main memory blocks to cache slots is performed by partitioning an main
memory address into fields for the tag, the slot, and the word as shown below:
The 32-bit main memory address is partitioned into a 13-bit tag field, followed by a 14-bit slot
field, followed by a 5-bit word field. When a reference is made to a main memory address, the
slot field identifies in which of the 214 cache slots the block will be found if it is in the cache.
If the valid bit is 1, then the tag field of the referenced address is compared with the tag field of
the cache slot. If the tag fields are the same, then the word is taken from the position in the slot
specified by the word field. If the valid bit is 1 but the tag fields are not the same, then the slot is
written back to main memory if the dirty bit is set, and the corresponding main memory block is
then read into the slot. For a program that has just started execution, the valid bit will be 0, and
so the block is simply written to the slot. The valid bit for the block is then set to 1, and the
program resumes execution.
!
|
c
" #
$ c
If a Main memory block can be placed in any of the Cache slots, then the cache is said to be
mapped in fully associative.
Let us assume we have a Main Memory of size 4GB (232), with each byte directly addressable by
a 32-bit address. We will divide Main memory into blocks of each 32 bytes (25). Thus there are
128M (i.e. 232/25 = 227) blocks in Main memory.
We have a Cache memory of 512KB (i.e. 219), divided into blocks of each 32 bytes (25). Thus
there are 16K (i.e. 219/25 = 214) blocks also known as c
or c in cache
memory. It is clear from above numbers that there are more Main memory blocks than Cache
slots.
The Main memory is not physically partitioned in the given way, but this is the view of
Main memory that the cache sees.
We are dividing both Main Memory and cache memory into blocks of same size i.e. 32
bytes.
In fully associative mapping any one of the 128M (i.e. 227) Main memory blocks can be mapped
into any of the single Cache slot. To keep track of which one of the 227 possible blocks is in each
slot, a 27-bit tag field is added to each slot which holds an identifier in the range from 0 to 227 ±
1. The tag field is the most significant 27 bits of the 32-bit memory address presented to the
cache.
In an associative mapped cache, each Main memory block can be mapped to any slot. The
mapping from main memory blocks to cache slots is performed by partitioning an address into
fields for the tag and the word (also known as the ³byte´ field) as shown below:
When a reference is made to a Main memory address, the cache hardware intercepts the
reference and searches the cache tag memory to see if the requested block is in the cache. For
each slot, if the valid bit is 1, then the tag field of the referenced address is compared with the tag
field of the slot. All of the tags are searched in parallel, using an associative memory. If any tag
in the cache tag memory matches the tag field of the memory reference, then the word is taken
from the position in the slot specified by the word field. If the referenced word is not found in the
cache, then the main memory block that contains the word is brought into the cache and the
referenced word is then taken from the cache. The tag, valid, and dirty fields are updated, and the
program resumes execution.
Associative mapped cache has the advantage of placing any main memory block into any cache
line. This means that regardless of how irregular the data and program references are, if a slot is
available for the block, it can be stored in the cache. This results in considerable hardware
overhead needed for %.
Although this mapping scheme is powerful enough to satisfy a wide range of memory access
situations, there are two implementation problems that limit performance.
m| The process of deciding which slot should be freed when a new block is brought into the
cache can be complex. This process requires a significant amount of hardware and
introduces delays in memory accesses.
m| When the cache is searched, the tag field of the referenced address must be compared
with all 214 tag fields in the cache.
m| c
&
#
$ c
m| Aet Associative mapping scheme combines the simplicity of Direct mapping with the
flexibility of Fully Associative mapping. It is more practical than Fully Associative
mapping because the associative portion is limited to just a few slots that make up a set.
In this mapping mechanism, the cache memory is divided into '$' sets, each consisting of
'' cache lines. A block from Main memory is first mapped onto a specific cache set, and
then it can be placed anywhere within that set. This type of mapping has very efficient
ratio between implementation and efficiency. The set is usually chosen by
If there are '' cache lines in a set, the cache placement is called
'
$
i.e. if there are two blocks or cache lines per set, then it is a (
'
$
% and four blocks or cache lines per set, then it is a )
'
$
%.
Let us assume we have a Main Memory of size 4GB (232), with each byte directly
addressable by a 32-bit address. We will divide Main memory into blocks of each 32
bytes (25). Thus there are 128M (i.e. 232/25 = 227) blocks in Main memory.
We have a Cache memory of 512KB (i.e. 219), divided into blocks of each 32 bytes (25).
Thus there are 16K (i.e. 219/25 = 214) blocks also known as c
or c in
cache memory. It is clear from above numbers that there are more Main memory blocks
than Cache slots.
The Main memory is not physically partitioned in the given way, but this is the
view of Main memory that the cache sees.
We are dividing both Main Memory and cache memory into blocks of same size
i.e. 32 bytes.
Let us try 2-way set associative cache mapping i.e. 2 cache lines per set. We will divide
16K cache lines into sets of 2 and hence there are 8K (214/2 = 213) sets in the Cache
memory.
Ao even using the above formula we can find out number of sets in the Cache memory
i.e.
The format for an address has 13 bits in the set field, which identifies the set in which the
addressed word will be found if it is in the cache. There are five bits for the word field as
before and there is 14-bit tag field that together make up the remaining 32 bits of the
address as shown below:
!
|
|
|
c c
c
m| |
m| s | |
m| A| |
m| :
| | | |
||| ||||| || |
|
||| |||||
||
| || ||| |
|
|||
|| ||| |||| ||| |
||| ||||
||||||||| ||
|
|| ||||| |
|||| || |
||
| || |
| ||
|
||
|
|||
| |||
|| | |||
| |||| || |
|||
|||||| |
| || ||
| ||
|
|
| ||
As far as the mapping functions are concerned, the book did an okay job describing the details
and differences of each. I, however, would like to describe them with an emphasis on how we
would model them using code.
Remember that direct mapping assigned each memory block to a specific line in the cache. If a
line is all ready taken up by a memory block when a new block needs to be loaded, the old block
is trashed. The figure below shows how multiple blocks are mapped to the same line in the
cache. This line is the only line that each of these blocks can be sent to. In the case of this figure,
there are 8 bits in the block identification portion of the memory address.
The address for this example is broken down something like the following:
| '|
| | | || | |
|
rnce the block is stored in the line of the cache, the tag is copied to the tag location of the line.
The address is broken into three parts: (s-r) MAB bits represent the tag to be stored in a line of
the cache corresponding to the block stored in the line; r bits in the middle identifying which line
the block is always stored in; and the w LAB bits identifying each word within the block. This
means that:
Direct mapping is simple and inexpensive to implement, but if a program accesses 2 blocks that
map to the same line repeatedly, the cache begins to thrash back and forth reloading the line over
and over again meaning misses are very high.
s
In full associative, any block can go into any line of the cache. This means that the word id bits
are used to identify which word in the block is needed, but the tag becomes all of the remaining
bits.
s
The address is broken into two parts: a tag used to identify which block is stored in which line of
the cache (s bits) and a fixed number of LAB bits identifying the word within the block (w bits).
This means that:
This is the one that you really need to pay attention to because this is the one for the homework.
Aet associative addresses the problem of possible thrashing in the direct mapping method. It does
this by saying that instead of having exactly one line that a block can map to in the cache, we
will group a few lines together creating a á . Then a block in memory can map to any one of the
lines of a specific set. There is still only one set that the block can map to.
Note that blocks 0, 256, 512, 768, etc. can only be mapped to one set. Within the set, however,
they can be mapped associatively to one of two lines.
The memory address is broken down in a similar way to direct mapping except that there is a
slightly different number of bits for the tag (s-r) and the set identification (r). It should look
something like the following:
Now if you have a 24 bit address in with a block size of 4 words (2 bit id) and
1K lines in a cache (10 bit id), the partitioning of the address for the cache would look like this.
|
||0 |
|."|
/| | |."|
/| | |.|
/|
If we took the exact same system, but converted it to 2-way á áá (2-way
meaning we have 2 lines per set), we'd get the following:
Notice that by making the number of sets equal to half the number of lines (i.e., 2 lines per set),
one less bit is needed to identify the set within the cache. This bit is moved to the tag so that the
tag can be used to identify the block within the set.
]
6our assignment is to simulate a 4K using C. The memory of this system
is divided into 8-word blocks, which means that the 4K cache has 4K/8 = 512 lines. I've given
you two function declarations in C. In addition, I've given you two arrays, one representing chars
stored in main memory and one representing the lines of the cache. Vach line is made up of a
structure called "cache_line". This structure contains all of the information stored in a single line
of the cache including the tag and the eight words (one block) contained within a single block.
÷ ÷ ÷
÷÷
An array is made up of these lines called "cache" and it contains 512 lines.
||
Next, a memory has been created called "memory". It contains 64K bytes.
||
There will also be two global variables, int number_of_requests and int number_of_hits. The
purpose of these variables will be discussed later.
Using this information, you should be able to see that there are 16 bits in a memory address (216
= 65,536), 3 bits of which identify a word (char) within a block, 9 bits identify the line that a
block should be stored in, and the remaining 16-3-9=4 bits to store as the tag. Vach of these
arrays are global so your routines will simply modify them.
6ou will be creating two functions with the following prototypes. (Be sure to use the exact
prototypes shown below.)
The data that ends up in cache[] from memory[] won't affect the returned values of either
function, but I will be examining it to see if the correct blocks of data were moved into the
cache[] array.
Well, have fun! Don't hesitate to read the book. There's a great deal of useful information in there
including examples and descriptions of the different mapping functions.
|
|
|
|
|||
| | |||| |3|||| |3||"'|||||
| | 4||
||
|
|||
5||6!6| ||||||"%|
|| || ||67 6| |
||| || || | | ||66| |||| | ||
| ||| 2|$
||| ||| || ||| ||2|| |
|||| ||| || || | ||||| | ||
|| |
±
| |||| || + |
||||||5||
| |||"|
|||$|||| ||
|| ||| ||
||| | 4||
||
|
|||
|| || |
|| || ||
|
|5||
| | | |||||||
| | ||||||.|
| |3| |||||3||%/|| |
|| |
4|
|
|||
|||
A || ||'||||"%
|||||||6 6| ||
||.|||||| ||6!6| || | || |
| /||A ||| ||"||||
|.|
||
||/|
| ||
|
|||&|| || || |
||| | ||||4||
|||
÷
).
÷/,|
|
|||
#1),12)3144$|
0)04 ,1|
"()05,|
|
| || ||| ||| || ||||| |
| |||
||||#||%| | |||||
,1||
,|||%| | ||| ||
,||A||||||
||"|| | || || |||| | | ||
| ||| ||| | ||||
%
| ||||| ||
||| || ||| | |
|||
||||| ||||#|||| || ||||
| | || + || | | || ||||4|
|
: ||, || |||| | ||||| | ||3| | ||| |
|| || |||| |||| ||"| |.| /||||
||||| ||||| ||| |. |
,)
,5"(|/||||"| || ||
| ||
|| || ||
||| | | | || | |(||'|| | |||||||
| || || ||| | | | | | |":#||<|
| || |
| | || |||||||| |||
| |||||, || | ||||| ||
| || ||||| | |||| | ||||(|||
| |||| | | || ||"%| |.||| /|
| |||"|
| "|||||||||| || ||| ||
||
|
&
#
$ %
|
|||
|||| | ||||| | | | |
||
. | |
||/||||||||| | +|| || || |
|| | || |
|| || | || |
| || |"| | ||#|: ||| |||%<|