Module 6 CO 2020
Module 6 CO 2020
Module 6 CO 2020
Module VI -- MEMORY
Prepared By
Asst. Prof. Sherin Thomas
ECE Dept.
MBITS, Nellimattam
Cache Memory
• For good performance, the time needed to access the necessary
information should be reduced.
• Since the speed of the main memory unit is limited, the solution is to
use a fast cache memory which essentially makes the main memory
appear to the processor to be faster.
Principle of Cache Memory
Locality of reference:
Many instructions in localized areas of the program are executed repeatedly during some
time period, and the remainder of the program is accessed relatively infrequently.
This is referred to as locality of reference.
It manifests itself in two ways: temporal and spatial.
Temporal
The temporal means if a data location is referenced then it will tend to be referenced again soon.
Therefore, when the processor loads or stores data that is not in the cache, the data is copied
from main memory into the cache. Subsequent requests for that data hit in the cache.
E.g. if you recently brought a book to your table to look at, you will probably need to look at it
again soon.
Spatial
The spatial aspect means that when the processor accesses a piece of data, it is also likely to
access data in nearby memory locations.
Therefore, when the cache fetches one word from memory, it may also fetch several adjacent
words. This group of words is called a cache block.
E.g. If an element in an array is used, other elements in the same array are also likely to be used,
creating spatial locality.
Size of Cache
The cache is usually built out of SRAM on the same chip as the processor.
The cache speed is comparable to the processor speed.
The cache is relatively small from kilobytes to a few megabytes.
A direct mapped cache has one block in each set, so it is organized into S = B sets. To understand
the mapping of memory addresses onto cache blocks, imagine main memory as being mapped
into b-word blocks, just as the cache is.
To understand the mapping of memory addresses onto cache blocks, imagine main memory as
being mapped into b-word blocks, just as the cache is.
•The cache has eight sets, each of which contains
a one-word block.
Block 0 to set 0
Block 1 to set 1….. Block 7 to set 7.
•There are no more blocks of the cache, such
that block 8 of main memory maps to set 0 of
the cache and so on.
0x00000004, 0x00000024, . . . , 0xFFFFFFE4 ---- set 1
Block 0 to set 0
Cache fields for the main memory address
1. Byte Offset
The two least significant bits of the 32-bit address are called the byte offset, because they indicate
the byte within the word.
To make the address byte addressable, multiply every address by 4 which is equivalent to left
shift by 2 bits.
These 2 bits are LSB 00 bits in the address.
2. Set Bits
The next three bits are called the set bits, because they indicate the set to which the address maps.
In general, the number of set bits = log2S
For e.g if the cache has 8 sets then set bits = log28 = 3
3. Tag Bits
The remaining 27 tag bits indicate the memory address of the data stored in a given cache set.
Hardware Implementation for the Direct Mapped Cache1 line consisting of
Each set contains
32 bits of data, 27 tag bits and 1 valid bit
• Each memory address still maps to a specific set, but it can map to any one of the N blocks in
the set.
•Hence, a direct mapped cache is another name for a one-way set associative cache. N is also
called the degree of associativity of the cache.
Figure given below for a cache with two blocks per set.
In this case, memory blocks 0, 64, 128, …, 4032 map into cache set 0,
and they can occupy either of the two block positions within this set.
•Figure shows the hardware for a C = 8-word, N = 2-way set associative cache. The cache now
has only S = 4 sets rather than 8.
Figure given below for a cache with two blocks per set. In this case, memory blocks 0, 64, 128, …,
4032 map into cache set 0, and they can occupy either of the two block positions within this set.
Hardware Implementation of Multi-way Set Associative Cache
2. Tag Bits 1. Set Bits
The tag increases from 27 to 28 Only log24 = 2 set bits rather than 3 are used to
bits select the set in the direct mapped cache.
Each set contains two ways or degrees of associativity
Advantages
Lower miss rates
Disadvantage
slower and
expensive to
build
Two-way set associative cache contents
3. Fully Associative Cache
A fully associative cache contains a single set with B ways, where B is the number of blocks.
A fully associative cache is another name for a B-way set associative cache with one set.
Upon a data request, eight tag comparisons (not shown) must be made, because the data could
be in any block. Similarly, an 8:1 multiplexer chooses the proper data if a hit occurs.
Fully associative caches tend to have the fewest conflict misses for a given cache capacity, but
they require more hardware for additional tag comparisons.
They are best suited to relatively small caches because of the large number of comparators.
Direct Mapped Cache with Two Sets and a Four-Word Block Size
Figure shows the hardware for a C= 8-word direct mapped cache with a b = 4-word block size.
The cache now has only B = C/b = 2 blocks.
A direct mapped cache has one block in each set, so this cache is organized as two sets. Thus,
only log22 = 1 bit is used to select the set.
We have 2 sets. 1 Block per each
set(B). So total 2 blocks.
Each block have 4 words (block size
‘b’ given).
Block offset bits= log2b = log24=2.
Set bit= log2S= log22 = 1.
In associative and set associative caches, when a new block is to be brought into the cache and all
the positions that it may occupy is full, the cache controller must decide which of the old block to
overwrite. There are three types of replacement algorithm.
There is a high probability that the blocks have been referenced recently will be
referenced again soon.
Therefore, when a block is to be overwritten, it is sensible to overwrite the one that has
gone the longest time without being referenced.
This block is called least recently used block and the technique is called the LRU
replacement algorithm.
Upon replacement, the new block replaces a random block within the least recently used
group. Such a policy is called pseudo-LRU and is good enough in practice.
Assume LRU replacement, a block size of
one word, and an initially empty cache.
3. FIFO
In FIFO, removes the oldest block from a full set when a new block must be brought in.
It is generally not as effective as the LRU algorithm in choosing the best block to remove.
Write Policy
A shared-memory multiprocessor is easy to program. Each variable in a program has a
unique address location in the memory, which can be accessed by any processor. Each
processor has its own cache. Therefore, it is necessary to deal with the possibility that
copies of shared data may reside in several caches.
When any processor writes to a shared variable in its own cache, all other caches that
contain a copy of that variable will then have the old, incorrect value. They must be
informed of the change so that they can either update their copy to the new value or
invalidate it.
Two approaches for updating the cache content:
2. Write-back Protocol
•Initially, the memory is the owner of all blocks, and the memory retains ownership of any block
that is read by a processor to place a copy in its cache.
•If some processor wants to write to a block in its cache, it must first become the exclusive owner
of this block.
•To do so, all copies in other caches must first be invalidated with a broadcast request.
•The new owner of the block may then modify the contents
•When another processor wishes to read a block that has been modified, the request for the block
must be forwarded to the current owner.
•The data are then sent to the requesting processor by the current owner. The data are also sent to
the appropriate memory module, which requires ownership and updates the contents of the block
in the memory.
•The cache of the processor that was the previous owner retains a copy of the block.
•Hence, the block is now shared with copies in two caches and the memory.
•Subsequent requests from other processors to read the same block are serviced by the memory
module containing the block.
•When another processor wishes to write to a block that has been modified, the current owner
• sends the data to the requesting processor.
• It also transfers ownership of the block to the requesting processor and invalidates its cached
copy. Since the block is being modified by the new owner, the contents of the block in the memory
are not updated.
The next request for the same block is serviced by the new owner.
VIRTUAL MEMORY
•The objective of adding a hard disk to the memory hierarchy is to inexpensively give the illusion
of a very large memory while still providing the speed of faster memory for most accesses.
•A computer with only 128 MB of DRAM, for example, could effectively provide 2 GB of
memory using the hard disk. This larger 2-GB memory is called virtual memory, and the smaller
128-MB main memory is called physical memory.
•Programs can access data anywhere in virtual memory, so they must use that
specify the location in virtual memory. The physical memory holds a subset of most recently
accessed virtual memory.
•Virtual memory systems use different terminologies for the same caching principles.
Table summarizes the analogous terms.
virtual addresses
The process of determining the physical
address from the virtual address is called
address translation.
Physical memory is
likewise divided into
physical pages of the
same size.
Virtual memory is divided into virtual pages, If the processor attempts to access a
typically 4 KB in size. virtual address that is not in physical
The rectangles indicate pages. memory, a page fault occurs, and the
Some virtual pages are present in physical operating system loads the page from the
memory, and some are located on the disk. hard disk into physical memory.
MEMORY MANAGEMENT
Figure shows the organization
that implements virtual memory.
If the data are not in the main memory, the MMU causes
the operating system to bring the data into the memory
from the disk.
The DMA scheme is used to perform the data Transfer
between the disk and the main memory.
Paging & Segmentation
MSB bits tell us the page no. [VPN] LSB 12 bits tell us where in the 4KB
page [page offset]
Use this VPN as an index to the In this case the index will be FC519
Page Table.
Lets consider the address F C 5 1 9 0 8 B
31
So this is a 32 bit P.A used to access P.M. Here Page offset is
present in both V.A and P.A. Its not changed during
translation. But the VPN gets changed to PPN[frame no.]
Page Offset
00152[PPN]
FC519 Physical
[P.T index] PPN
Address
We find the
entry with that PPN + Page Offset = Physical Address
index. 00152 + 08B = Physical Address
This tell us where did we put that page
in actual P.M. Lets say that PPN is 00152.
Put it together with the page offset into
Page Table a Physical Address Physical Memory
Address translation using the page table Base Register
The processor typically uses a dedicated register, called the page table register, to store the base address of the
page table in physical memory.
A small cache, usually called the Translation Look aside Buffer (TLB) is incorporated into the
MMU for this purpose.
The operation of the TLB with respect to the page table in the main memory is essentially the
same as the operation of cache memory.
•Given a virtual address, the MMU looks in the TLB for the referenced page.
•A TLB is organized as a fully associative cache and typically holds 16 to 512 entries.
•Each TLB entry holds a virtual page number and its corresponding physical page number.
•The TLB is accessed using the virtual page number. If the TLB hits, it returns the corresponding
physical page number. Otherwise, the processor must read the page table in physical memory.
• If the page table entry for this page is found in the TLB, the physical address is obtained
immediately.
•If there is a miss in the TLB, then the required entry is obtained from the page table in the
main memory and the TLB is updated.
•When a program generates an access request to a page that is not in the main memory, a
page fault is said to have occurred and the date is loaded from secondary storage to physical
memory.