Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Module 6 CO 2020

Download as pptx, pdf, or txt
Download as pptx, pdf, or txt
You are on page 1of 40

Computer Organization

Module VI -- MEMORY
Prepared By
Asst. Prof. Sherin Thomas
ECE Dept.
MBITS, Nellimattam
Cache Memory
• For good performance, the time needed to access the necessary
information should be reduced.
• Since the speed of the main memory unit is limited, the solution is to
use a fast cache memory which essentially makes the main memory
appear to the processor to be faster.
Principle of Cache Memory
Locality of reference:

The effectiveness of the cache mechanism is based on a property of computer programs


called locality of reference.

Many instructions in localized areas of the program are executed repeatedly during some
time period, and the remainder of the program is accessed relatively infrequently.
This is referred to as locality of reference.
It manifests itself in two ways: temporal and spatial.
Temporal
The temporal means if a data location is referenced then it will tend to be referenced again soon.
Therefore, when the processor loads or stores data that is not in the cache, the data is copied
from main memory into the cache. Subsequent requests for that data hit in the cache.
E.g. if you recently brought a book to your table to look at, you will probably need to look at it
again soon.
Spatial
The spatial aspect means that when the processor accesses a piece of data, it is also likely to
access data in nearby memory locations.
Therefore, when the cache fetches one word from memory, it may also fetch several adjacent
words. This group of words is called a cache block.

E.g. If an element in an array is used, other elements in the same array are also likely to be used,
creating spatial locality.
Size of Cache
The cache is usually built out of SRAM on the same chip as the processor.
The cache speed is comparable to the processor speed.
The cache is relatively small from kilobytes to a few megabytes.

A cache holds commonly used memory data.


The number of data words that it can hold is called the capacity, C.
The number of words in the cache block, b, is called the block size.
A cache of capacity C contains B = C/b blocks. Where B= no. of blocks
Cache Organizations
•A cache is organized into S sets, each of which holds one or more blocks of data.
•Mapping
The relationship between the address of data in main memory and the location of that data in
the cache is called the mapping.
•Each memory address maps to exactly one set in the cache.
•Some of the address bits are used to determine which cache set contains the data. If the set
contains more than one block, the data may be kept in any of the blocks in the set.
•Caches are categorized based on the number of blocks in a set.
•Types of Mapping

•Direct Mapped Cache

•N-way set associative cache

•Fully associative cache


•Direct Mapped Cache

A direct mapped cache has one block in each set, so it is organized into S = B sets. To understand
the mapping of memory addresses onto cache blocks, imagine main memory as being mapped
into b-word blocks, just as the cache is.

To understand the mapping of memory addresses onto cache blocks, imagine main memory as
being mapped into b-word blocks, just as the cache is.
•The cache has eight sets, each of which contains
a one-word block.
Block 0 to set 0
Block 1 to set 1….. Block 7 to set 7.
•There are no more blocks of the cache, such
that block 8 of main memory maps to set 0 of
the cache and so on.
0x00000004, 0x00000024, . . . , 0xFFFFFFE4 ---- set 1

0x00000010, . . . , 0xFFFFFFF0 ------- set4 and so forth

Block 0 to set 0
Cache fields for the main memory address
1. Byte Offset

The two least significant bits of the 32-bit address are called the byte offset, because they indicate
the byte within the word.
To make the address byte addressable, multiply every address by 4 which is equivalent to left
shift by 2 bits.
These 2 bits are LSB 00 bits in the address.
2. Set Bits

The next three bits are called the set bits, because they indicate the set to which the address maps.
In general, the number of set bits = log2S
For e.g if the cache has 8 sets then set bits = log28 = 3

3. Tag Bits

The remaining 27 tag bits indicate the memory address of the data stored in a given cache set.
Hardware Implementation for the Direct Mapped Cache1 line consisting of
Each set contains
32 bits of data, 27 tag bits and 1 valid bit

Valid bit indicates whether the set


contains a meaningful data.
If valid bit=0, data is meaningless.

Direct mapped cache with 8 sets


Disadvantage of Direct Mapping
•When two recently accessed addresses map to the same cache block, a conflict occurs, and the
most recently accessed address discard the previous one from the block.
•Direct mapped caches have only one block in each set, so two addresses that map to the same set
always cause a conflict.
2. Multi-way Set Associative Cache
•An N-way set associative cache reduces conflicts by providing N blocks in each set where data
mapping to that set might be found.

• Each memory address still maps to a specific set, but it can map to any one of the N blocks in
the set.

•Hence, a direct mapped cache is another name for a one-way set associative cache. N is also
called the degree of associativity of the cache.
Figure given below for a cache with two blocks per set.
In this case, memory blocks 0, 64, 128, …, 4032 map into cache set 0,
and they can occupy either of the two block positions within this set.
•Figure shows the hardware for a C = 8-word, N = 2-way set associative cache. The cache now
has only S = 4 sets rather than 8.

Figure given below for a cache with two blocks per set. In this case, memory blocks 0, 64, 128, …,
4032 map into cache set 0, and they can occupy either of the two block positions within this set.
Hardware Implementation of Multi-way Set Associative Cache
2. Tag Bits 1. Set Bits
The tag increases from 27 to 28 Only log24 = 2 set bits rather than 3 are used to
bits select the set in the direct mapped cache.
Each set contains two ways or degrees of associativity

Advantages
Lower miss rates

Disadvantage
slower and
expensive to
build
Two-way set associative cache contents
3. Fully Associative Cache

A fully associative cache contains a single set with B ways, where B is the number of blocks.

A memory address can map to a block in any of these ways.

A fully associative cache is another name for a B-way set associative cache with one set.
Upon a data request, eight tag comparisons (not shown) must be made, because the data could
be in any block. Similarly, an 8:1 multiplexer chooses the proper data if a hit occurs.

Fully associative caches tend to have the fewest conflict misses for a given cache capacity, but
they require more hardware for additional tag comparisons.

They are best suited to relatively small caches because of the large number of comparators.
Direct Mapped Cache with Two Sets and a Four-Word Block Size

Figure shows the hardware for a C= 8-word direct mapped cache with a b = 4-word block size.
 The cache now has only B = C/b = 2 blocks.

 A direct mapped cache has one block in each set, so this cache is organized as two sets. Thus,
only log22 = 1 bit is used to select the set.
We have 2 sets. 1 Block per each
set(B). So total 2 blocks.
Each block have 4 words (block size
‘b’ given).
Block offset bits= log2b = log24=2.
Set bit= log2S= log22 = 1.

•A multiplexer is now needed to select


the word within the block. The
multiplexer is controlled by the log24=2
block offset bits of the address.
REPLACEMENT ALGORITHM

In associative and set associative caches, when a new block is to be brought into the cache and all
the positions that it may occupy is full, the cache controller must decide which of the old block to
overwrite. There are three types of replacement algorithm.

a)Least Recently Used (LRU)


b)Random
c)First In First out (FIFO)
1. Least Recently Used (LRU)

There is a high probability that the blocks have been referenced recently will be
referenced again soon.

Therefore, when a block is to be overwritten, it is sensible to overwrite the one that has
gone the longest time without being referenced.

This block is called least recently used block and the technique is called the LRU
replacement algorithm.

Upon replacement, the new block replaces a random block within the least recently used
group. Such a policy is called pseudo-LRU and is good enough in practice.
Assume LRU replacement, a block size of
one word, and an initially empty cache.

lw $t0, 0x04($0) 04 ----- 0000 0100


lw $t1, 0x24($0) 24 ----- 0010 0100
lw $t2, 0x54($0) 54 ----- 0101 0100
The first two instructions load
data from memory addresses 0x4
and 0x24 into set 1 of the cache,
shown in Figure (a).
U = 0 indicates that data in way 0
was the least recently used.
The next memory access, to address 0x54, also maps to set 1 and replaces the least recently used
data in way 0, as shown in Figure (b)
The use bit, U, is set to 1 to indicate that data in way 1 was the least recently used
2. Random

The simplest algorithm in which the block to be overwritten is choosing randomly.

3. FIFO

In FIFO, removes the oldest block from a full set when a new block must be brought in.
It is generally not as effective as the LRU algorithm in choosing the best block to remove.

Write Policy
A shared-memory multiprocessor is easy to program. Each variable in a program has a
unique address location in the memory, which can be accessed by any processor. Each
processor has its own cache. Therefore, it is necessary to deal with the possibility that
copies of shared data may reside in several caches.
When any processor writes to a shared variable in its own cache, all other caches that
contain a copy of that variable will then have the old, incorrect value. They must be
informed of the change so that they can either update their copy to the new value or
invalidate it.
Two approaches for updating the cache content:

Caches are classified as either write-through or write-back.


Write-Through Protocol
A write-through protocol can be implemented in one of two ways.

1. One version is based on updating the values in other caches.


2. While the second relies on invalidating the copies in other caches.
1. Updating the values in Caches
•When a processor writes a new value to a block of data in its cache, the new value is also written
into the memory module containing the block being modified. Since copies of this block may
exist in other caches, these copies must be updated to reflect the change caused by the Write
operation.
•The simplest way of doing this is to broadcast the written data to the caches of all processors in
the system. As each processor receives the broadcast data, it updates the contents of the affected
cache block if this block is present in its cache.
2. Invalidation of copies
•When a processor writes a new value into its cache, this value is also sent to the appropriate
location in memory, and all copies in other caches are invalidated.
•Again, broadcasting can be used to send the invalidation requests throughout the system.

2. Write-back Protocol
•Initially, the memory is the owner of all blocks, and the memory retains ownership of any block
that is read by a processor to place a copy in its cache.
•If some processor wants to write to a block in its cache, it must first become the exclusive owner
of this block.
•To do so, all copies in other caches must first be invalidated with a broadcast request.
•The new owner of the block may then modify the contents
•When another processor wishes to read a block that has been modified, the request for the block
must be forwarded to the current owner.
•The data are then sent to the requesting processor by the current owner. The data are also sent to
the appropriate memory module, which requires ownership and updates the contents of the block
in the memory.
•The cache of the processor that was the previous owner retains a copy of the block.
•Hence, the block is now shared with copies in two caches and the memory.
•Subsequent requests from other processors to read the same block are serviced by the memory
module containing the block.
•When another processor wishes to write to a block that has been modified, the current owner
• sends the data to the requesting processor.
• It also transfers ownership of the block to the requesting processor and invalidates its cached
copy. Since the block is being modified by the new owner, the contents of the block in the memory
are not updated.
The next request for the same block is serviced by the new owner.
VIRTUAL MEMORY

•The objective of adding a hard disk to the memory hierarchy is to inexpensively give the illusion
of a very large memory while still providing the speed of faster memory for most accesses.
•A computer with only 128 MB of DRAM, for example, could effectively provide 2 GB of
memory using the hard disk. This larger 2-GB memory is called virtual memory, and the smaller
128-MB main memory is called physical memory.
•Programs can access data anywhere in virtual memory, so they must use that
specify the location in virtual memory. The physical memory holds a subset of most recently
accessed virtual memory.
•Virtual memory systems use different terminologies for the same caching principles.
Table summarizes the analogous terms.
virtual addresses
The process of determining the physical
address from the virtual address is called
address translation.

Physical memory is
likewise divided into
physical pages of the
same size.

Virtual memory is divided into virtual pages, If the processor attempts to access a
typically 4 KB in size. virtual address that is not in physical
The rectangles indicate pages. memory, a page fault occurs, and the
Some virtual pages are present in physical operating system loads the page from the
memory, and some are located on the disk. hard disk into physical memory.
MEMORY MANAGEMENT
Figure shows the organization
that implements virtual memory.

A special hardware unit, called the Memory Management


Unit (MMU), translates virtual addresses into physical
addresses.

When the desired data (or instructions) are in the main


memory, these data are fetched as described in our
presentation of the ache mechanism.

If the data are not in the main memory, the MMU causes
the operating system to bring the data into the memory
from the disk.
The DMA scheme is used to perform the data Transfer
between the disk and the main memory.
Paging & Segmentation

Pages & Segments

The largest segment supported on any processor ranges


from 216 bytes up to 232 bytes; the smallest segment is
1 byte.
Page Table
Information about the main memory location of each page
is kept in a page table.
This information includes the main memory address where
the page is stored and the current status of the page [PPN &
V bit].
Page table is indexed with VPN.
e.g. Entry 5 specifies that virtual page 5 maps to physical page1.
The starting address of the page table is kept in a page table
base register [PTBR]
VPN + contents of PTBR = address of the corresponding entry in
the page table is obtained.
The contents of this location give the starting address of the
page if that page currently resides in the main memory.
V bit indicates the validity of the page, that is, whether the page
is actually loaded in the main memory.
Entry 6 is invalid. So virtual page 6 is located on disk.
The Translation of a Virtual Address to a Physical Address
The processor needs the physical address to access the
31 actual memory.
Program generates a virtual address. It consists of VPN
and Page Offset.
VPN is Virtual Page No. , which tell us which page.
Page Offset tell us where in the page we are. Each page is
of 4KB size.
Lets consider the address F C 5 1 9 0 8 B.

MSB bits tell us the page no. [VPN] LSB 12 bits tell us where in the 4KB
page [page offset]

Use this VPN as an index to the In this case the index will be FC519
Page Table.
Lets consider the address F C 5 1 9 0 8 B
31
So this is a 32 bit P.A used to access P.M. Here Page offset is
present in both V.A and P.A. Its not changed during
translation. But the VPN gets changed to PPN[frame no.]

Page Offset
00152[PPN]
FC519 Physical
[P.T index] PPN
Address
We find the
entry with that PPN + Page Offset = Physical Address
index. 00152 + 08B = Physical Address
This tell us where did we put that page
in actual P.M. Lets say that PPN is 00152.
Put it together with the page offset into
Page Table a Physical Address Physical Memory
Address translation using the page table Base Register
The processor typically uses a dedicated register, called the page table register, to store the base address of the
page table in physical memory.

The starting address of the page table is


kept in a page table base register. By adding
the virtual page number to the contents of
this register, the address of the
corresponding entry in the page table is
obtained.

The contents of this location give the


starting address of the page if that page
currently resides in the main memory. Each
entry in the page table also includes some
valid bits that describe the status of the
page while it is in the main memory.
Translation Look aside Buffer
MMU is normally implemented as part of the processor chip (along with the primary cache), it
is impossible to include a complete page table on this chip. Therefore, the page table is kept in
the main memory.
However, a copy of a small portion of the page table can be accommodated within the MMU.
This portion consists of the page table entries that correspond to the most recently accessed
pages.

A small cache, usually called the Translation Look aside Buffer (TLB) is incorporated into the
MMU for this purpose.

The operation of the TLB with respect to the page table in the main memory is essentially the
same as the operation of cache memory.
•Given a virtual address, the MMU looks in the TLB for the referenced page.
•A TLB is organized as a fully associative cache and typically holds 16 to 512 entries.
•Each TLB entry holds a virtual page number and its corresponding physical page number.
•The TLB is accessed using the virtual page number. If the TLB hits, it returns the corresponding
physical page number. Otherwise, the processor must read the page table in physical memory.
• If the page table entry for this page is found in the TLB, the physical address is obtained
immediately.
•If there is a miss in the TLB, then the required entry is obtained from the page table in the
main memory and the TLB is updated.
•When a program generates an access request to a page that is not in the main memory, a
page fault is said to have occurred and the date is loaded from secondary storage to physical
memory.

You might also like