Module 3F
Module 3F
Module 3F
and Organization
[CSE2003]
1
Faculty, SCSE,
VIT- Bhopal University.
2 Contents
UNIT – III
Memory System
Memory systems hierarchy-Main memory organization-Types of Main
memory : SRAM , DRAM and its characteristics and performance –
latency –cycle time -bandwidth- memory interleaving
Cache memory: Address mapping-line size-replacement and policies-
coherence
Virtual memory: Paging (Single level and multi level) – Page Table
Mapping – TLB
Reliability of memory systems: Error detecting and error correcting
systems.
Memory
-The maximum size of the memory that can be used in any computer is determined by the
addressing scheme. For example, a computer that generates 16-bit addresses is capable of
addressing up to 2^16 = 64K (kilo) memory locations
The main memory is functionally organized as a number of locations
Each location stores a fixed number of bits.
The term word length of a memory indicates the number of bits in each location.
The total capacity of a memory is the number of locations multiplied by the word length.
Each location in memory is identified by a unique address as shown in Fig.
Two different memories with the same capacity may have different organization as
illustrated by Fig.2.
Both have same capacity of 4 kilo bytes but they differ in their internal organizations.
Fig.1.Main memory locations
Fig:2.Memory capacity and organization
Memory Addressability
The number of bits in the memory address determines the maximum number of
memory addresses possible for the CPU.
Suppose a CPU has n bits in the address, its memory can have a maximum of 2^n
(2 to the power of n) locations.
This is known as the CPU’s memory addressability.
It gives theoretical maximum memory capacity for a CPU.
Table: Popular CPU and their memory addressability
Memory
Q.1.A computer has a main memory with 1024 locations of each 32-bits.
Calculate the total memory capacity:
Q.2.A CPU has a 12 bit address for memory addressing: (a) What is the
memory addressability of the CPU ? (b) If the memory has a total capacity
of 16 KB, what is the word length of the memory?
Memory
Q.A computer has a main memory with 1024 locations of each 32-bits.
Calculate the total memory capacity:
Sol1.
Word length = 32 bits = 4 bytes;
No. of locations = 1024 = 1 kilo = 1K;
Memory capacity = 1K * 4 bytes = 4 KB
Volatility
— Does the memory retain data in the absence of electrical power?
• Delay
— Ranges from tiny fractions of a second (volatile DRAM) to many years (CDs,
DVDs)
• Erasable
— Can the memory be rewritten? If so, how fast? How many erase cycles can occur?
• Power consumption
11 Types
Semiconductor
— RAM (volatile or non-volatile)
• Magnetic Surface Memory
— Disk & Tape
• Optical
— CD & DVD
• Others
— Magneto-optical
— Bubble
— Hologram
Classification of Memory Systems
-A memory unit is called a random-access memory (RAM) if the access time to any
location is the same, independent of the location’s address.
-This distinguishes such memory units from serial, or partly serial, access storage devices
such as magnetic and optical disks.
-Access time of the latter devices depends on the address or position of the data.
Memory
-The processor uses the address lines to specify the memory location involved in a data
transfer operation, and uses the data lines to transfer the data.
-At the same time, the control lines carry the command indicating a Read or
a Write operation and whether a byte or a word is to be transferred.
-The control lines also provide the necessary timing information and are used by the
memory to indicate when it has completed the requested operation.
Memory access time -A useful measure of the speed of memory units is the time that
elapses between the initiation of an operation to transfer a word of data and the completion
of that operation.
Memory cycle time, which is the minimum time delay required between the initiation of two
successive memory operations, for example, the time between two successive Read
operations.
-The cycle time is usually slightly longer than the access time, depending on the
implementation details of the memory unit.
Memory
-The maximum size of the memory that can be used in any computer is
determined by the addressing scheme.
-For example, a computer that generates 16-bit addresses is capable of
addressing up to 2^16 = 64K (kilo) memory locations.
-Machines whose instructions generate 32-bit addresses can utilize a memory
that contains up to 2^32 = 4G (giga) locations.
-Machines with 64-bit addresses can access up to 2^64 = 16E (exa) ≈
locations.
-The number of locations represents the size of the address space of the
computer
Connection of the memory to the processor.
Internal Organization of Memory Chips
16 × 8 organization: an example of a very small memory circuit consisting of 16 words of 8
bits each.
Memory organization
-Memory cells are usually organized in the form of an array, in which each cell is capable of
storing one bit of information.
-Each row of cells constitutes a memory word, and all cells of a row are connected to a
common line referred to as the word line, which is driven by the address decoder on the
chip.
-The cells in each column are connected to a Sense/Write circuit by two bit lines, and the
Sense/Write circuits are connected to the data input/output lines of the chip.
-During a Read operation, these circuits sense, or read, the information stored in the cells
selected by a word line and place this information on the output data lines.
-During a Write operation, the Sense/Write circuits receive input data and store them in the
cells of the selected word.
-The memory circuit in Figure stores 128 bits and requires 14 external connections
for address, data, and control lines. It also needs two lines for power supply and ground
connections.
Memory
Consider now a slightly larger memory circuit, one that has 1K (1024)
memory
cells.
-This circuit can be organized as a 128 × 8 memory, requiring a total of 19
external connections.
-Commercially available memory chips contain a much larger number of
memory cells.
- For example, a 1G-bit chip may have a 256M × 4 organization, in which
case a 28-bit address is needed and 4 bits are transferred to or from the
chip.
What About a 256 X 16 Memory?
• How many total number of external connections are required:
Sol.
How many 128*8 RAM chips are required to provide a memory capacity of 2048*8?
How many 128*8 RAM chips are required to provide a memory capacity of 2045*16?
Q.1.A computer employs RAM chips of 256*8 and ROM chips of
1024*8.The computer system needs 2K bytes of RAM, 4K bytes
of ROM. How many RAM and ROM chips are needed.
Example 2
How many 128 x 8 RAM chips are needed to provide a memory capacity of
2048 bytes?
b. How many lines of the address bus must be used to access 2048 bytes of
memory? How many of these lines will be common to all chips?
c. How many lines must be decoded for chip select? Specify the size of the
decoders.
a.The number of memory chips needed =
2^11 / 2^7 = 16 memory chips
b. To access 2048 bytes, 11 bits for
addressing are needed.
Each chip has capacity of 2^7 bytes 7
bits for addressing must be common to all
chips.
c. We have 16 memory chip 4 selection
bits are needed.
The 4 selection bits are decoded to 16 bit
Q.2. Consider a fully associative mapped cache with block size 4 KB. The size of main
memory is 16 GB. Find the number of bits in tag.
Numerical
Question 3
A block-set associative cache memory consists of 128 blocks divided into four block
sets. The main memory consists of 16,384 blocks and each block contains 256 eight
bit words.
1. How many bits are required for addressing the main memory?
2. How many bits are needed to represent the TAG, SET and WORD fields?
Solution:
Given-
Number of blocks in cache memory = 128
Number of blocks in each set of cache = 4
Main memory size = 16384 blocks
Block size = 256 bytes
1 word = 8 bits = 1 byte
Main Memory Size-
We have-
Size of main memory
= 16384 blocks
= 16384 x 256 bytes
= 2^22bytes
Thus, Number of bits required to address main memory = 22 bits
Number of Bits in Block Offset=256bytes=2^8
Number of Bits in Set Number=Number of lines in cache / Set size=128/4=2^5
Number of Bits in Tag Number-
Number of bits in tag
= Number of bits in physical address – (Number of bits in set number + Number of
bits in word)=22 bits – (5 bits + 8 bits)=9bits
Practice problem 4.
A 4-way set associative cache memory unit with a capacity of 16 KB is built using a
block size of 8 words. The word length is 32 bits. The size of the physical address
space is 4 GB. The number of bits for the TAG field is___
Sol. Given-
Set size = 4 lines
Cache memory size = 16 KB
Block size = 8 words
1 word = 32 bits = 4 bytes
Main memory size = 4 GB
1.Number of Bits in Physical Address-We have,
Main memory size = 4 GB =2^32=32bits
2.Number of Bits in Block Offset-
Block size= 8 words= 8 x 4 bytes= 32 bytes=2^5=5
3. Number of lines in cache= Cache size / Line size=16 KB / 32 bytes=2^9=9
4.Number of sets in cache= Number of lines in cache / Set size= 512 lines / 4 lines=2^7=7
5.Number of bits in tag
= Number of bits in physical address – (Number of bits in set number + Number ofbits in
block offset)
= 32 bits – (7 bits + 5 bits)
= 32 bits – 12 bits
= 20 bits
Thus, number of bits in tag = 20 bits
Problem 5. Consider a direct mapped cache with 8 cache blocks (0-7). If the memory block
requests are in the order-
3, 5, 2, 8, 0, 6, 3, 9, 16, 20, 17, 25, 18, 30, 24, 2, 63, 5, 82, 17, 24
Which of the following memory blocks will not be in the cache at the end of the
sequence?
1. 3
2. 18
3. 20
4. 30
Also, calculate the hit ratio and miss ratio.
Sol. We have,
There are 8 blocks in cache memory numbered from 0 to 7.
In direct mapping, a particular block of main memory is mapped to a particular
line of cache memory.
The line number is given by-
Cache line number = Block address % Number of lines in cache
For the given sequence-
Requests for memory blocks are generated one by one.
The line number of the block is calculated using the above relation.
Then, the block is placed in that particular line.
If already there exists another block in that line, then it is replaced.
Out of given options, only block-18
is not present in the main memory.
Option-(B) is correct.
Hit ratio = 3 / 21
Miss ratio = 17 / 21
How is a block found if present in cache?
• Caches include a TAG associated with each cache block.
– The TAG of every cache block where the block being requested may be present
needs to be compared with the TAG field of the MM address.
– All the possible tags are compared in parallel, as speed is important.
• Mapping Algorithms?
– Direct mapping requires a single comparison.
– Associative mapping requires a full associative search over all the TAGs
corresponding to all cache blocks.
– Set associative mapping requires a limited associated search over the TAGs of
only the selected set.
Replacement Algorithms
Which block should be replaced on a cache miss?
• With fully associative or set associative mapping, there can be several blocks
to choose from for replacement when a miss occurs.
• Two primary strategies are used:
a) Random: The candidate block is selected randomly for replacement. This simple
strategy tends to spread allocation uniformly.
b) Least Recently Used (LRU): The block replaced is the one that has not been used
for the longest period of time.
• Makes use of a corollary of temporal locality:
“If recently used blocks are likely to be used again, then the best candidate for replacement
is the least recently used block”
LRU
To implement the LRU algorithm, the cache controller must track the LRU
block as the computation proceeds.
• Example: Consider a 4-way set associative cache.
– For tracking the LRU block within a set, we use a 2-bit counter with every block.
– When hit occurs:
• Counter of the referenced block is reset to 0.
• Counters with values originally lower than the referenced one are incremented by 1,
and all others remain unchanged.
– When miss occurs:
• If the set is not full, the counter associated with the new block loaded is set to 0, and
all other counters are incremented by 1.
• If the set is full, the block with counter value 3 is removed, the new block put in its
place, and the counter set to 0. The other three counters are incremented by 1.
Replacement Algorithms
-In a direct-mapped cache, the position of each block is predetermined by its address;
hence, the replacement strategy is trivial.
-In associative and set-associative caches there exists some flexibility. When a new block is
to be brought into the cache and all the positions that it may occupy are full, the cache
controller must decide which of the old blocks to overwrite.
-This is an important issue, because the decision can be a strong determining factor in
system performance.
- The objective is to keep blocks in the cache that are likely to be referenced in the near
future
- The property of locality of reference in programs gives a clue to a reasonable
strategy.
-Because program execution usually stays in localized areas for reasonable periods
of time, there is a high probability that the blocks that have been referenced recently will
be referenced again soon.
- Therefore, when a block is to be overwritten, it is sensible to overwrite the one that has
gone the longest time without being referenced.
-This block is called the least recently used (LRU) block, and the technique is called the
LRU replacement algorithm.
Replacement Algorithms
-- Therefore, when a block is to be overwritten, it is sensible to overwrite the one that has
gone the longest time without being referenced.
-This block is called the least recently used (LRU) block, and the technique is called the
LRU replacement algorithm.
-To use the LRU algorithm, the cache controller must track references to all blocks as
computation proceeds.
-Suppose it is required to track the LRU block of a four-block set in a set-associative cache.
-A 2-bit counter can be used for each block.
- When a hit occurs, the counter of the block that is referenced is set to 0.
- Counters with values originally lower than the referenced one are incremented by one,
and all others remain unchanged.
- When a miss occurs and the set is not full, the counter associated with the new block
loaded from the main memory is set to 0, and the values of all other counters are
increased by one.
-When a miss occurs and the set is full, the block with the counter value 3 is removed, the
new block is put in its place, and its counter is set to 0.
-The other three block counters are incremented by one. It can be easily verified that the
counter values of occupied blocks are always distinct.
Replacement Algorithms-LRU
-- -To use the LRU algorithm, the cache controller must track references to all blocks as
computation proceeds.
-Suppose it is required to track the LRU block of a four-block set in a set-associative cache.
-A 2-bit counter can be used for each block.
- When a hit occurs, the counter of the block that is referenced is set to 0.
- Counters with values originally lower than the referenced one are incremented by one,
and all others remain unchanged.
- When a miss occurs and the set is not full, the counter associated with the new block
loaded from the main memory is set to 0, and the values of all other counters are
increased by one.
-When a miss occurs and the set is full, the block with the counter value 3 is removed, the
new block is put in its place, and its counter is set to 0.
-The other three block counters are incremented by one. It can be easily verified that the
counter values of occupied blocks are always distinct.
Q.6.Consider a 4-way set associative mapping with 16 cache blocks. The memory block
requests are in the order-
0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155
If LRU replacement policy is used, which cache block will not be present in the cache?
3
8
129
216
Also, calculate the hit ratio and miss ratio.
Elements of Cache Design
• Addresses (logical or physical)
• Size
• Mapping Function (direct, assoociative, set associative)
• Replacement Algorithm (LRU, LFU, FIFO, random)
• Write Policy (write through, write back, write once)
• Line Size
• Number of Caches (how many levels, unified or split)
Other algorithms
Replacement algorithm
First in first out (FIFO)
— replace block that has been in cache longest
— Implemented as circular queue
• Least frequently used
— replace block which has had fewest hits
• Random
— Almost as good other choices
• LRU is often favoured because of ease of hardware implementation
Cache write strategies
Cache coherence-In a single CPU system two copies of same data(one in cache and other
in MM) results in inconsistent data
-Contents of cache and MM can be altered by more than one device Eg.CPU may write to
cache and DMA and I/O can write to MM.
Cache designs can be classified based on the write and memory update
strategy being used.
1. Write Through / Store Through
2. Write Back / Copy Back
(a) Write Through Strategy
• Information is written to both the cache
block and the main memory block.
• Features:
– Easier to implement
– Read misses do not result in writes to the
lower level (i.e. MM).
– The lower level (i.e. MM) has the most
updated version of the data – important for
I/O operations and multiprocessor systems.
– A write buffer is often used to reduce CPU
write stall time while data is written to main
memory.
-Overhead of accessing both cache and main memory for updating.
(b) Write Back Strategy
• Information is written only to the cache block.
• A modified cache block is written to MM only when it is replaced.
• Features:
– Writes occur at the speed of cache memory.
– Multiple writes to a cache block requires only one write to MM.
– Uses less memory bandwidth, makes it attractive to multi-processors.
• Write-back cache blocks can be clean or dirty.
– A status bit called dirty bit or modified bit is associated with each cache block,
which indicates whether the block was modified in the cache (0: clean, 1: dirty).
– If the status is clean, the block is not written back to MM while being replaced.
Write Back Strategy
Virtual memory
Almost all modern processors support virtual memory
• Virtual memory allows a program to treat its memory space as single contiguous block that
may be considerably larger than main memory
• A memory management unit takes care of the mapping between virtual and physical
Addresses.
-A logical (virtual) cache stores virtual addresses rather than physical addresses
• Processor addresses cache directly without going through MMU
• Obvious advantage is that addresses do not have to be translated by the MMU
• A not-so-obvious disadvantage is that all processes have the same virtual address space
— The same virtual address in two processes usually refers to different physical addresses
— So either flush cache with every context switch or add extra bits
Logical and physical cache
Examples:
most Cray machines
Memory
early PCs
0:
nearly all embedded systems Physical 1:
Addresses
CPU
CPU
P-1:
N-1:
Disk
address translation
m–1 p p–1 0
physical frame number page offset physical address
virtual address 0
page table
n–1 p p–1
base register
(per process) virtual page number (VPN) page offset
VPN acts as
table index
if valid=0
then page m–1 p p–1 0
not in memory physical frame number (PFN) page offset
(page fault)
physical address
16
Address translation in hardware
Most significant bits of VA(Virtual
access) give the VPN (Virtual page
number)
• Page table maps VPN to PFN
(Page frame number)
• PA is obtained from PFN and offset
within a page
• MMU stores (physical) address of
start of page table, not all entries.
• “Walks” the page table to get
relevant PTE
TLB
-In principle, then, every virtual memory reference can cause two physical memory
accesses: one to fetch the appropriate page table entry, and one to fetch the
desired data.
-Thus, a straightforward virtual memory scheme would have the effect
of doubling the memory access time.
-To overcome this problem, most virtual memory schemes make use of a special cache for
page table entries, usually called a translation lookaside buffer (TLB).
- This cache functions in the same way as a memory cache and contains those page table
entries that have been most recently used.
What happens on memory access?
• CPU requests code or data at a virtual address
• MMU must translate VA to PA
– First, access memory to read page table entry
– Translate VA to PA –
Then, access memory to fetch code/data
• Paging adds overhead to memory access
•Solution? A cache for VA-PA mappings
Translation Lookaside Buffer (TLB)
Q.2. Consider a two level paging scheme with a TLB. Assume no page fault occurs. It takes
20 ns to search the TLB and 100 ns to access the physical memory. If TLB hit ratio is 80%,
the effective memory access time is _______ msec.
Sol.1. Given-
I. Number of levels of page table = 1
TLB access time = 20 ns
Main memory access time = 100 ns
TLB Hit ratio = 80% = 0.8
II.TLB Miss ratio
= 1 – TLB Hit ratio
= 1 – 0.8
= 0.2
Calculating Effective Access Time-
Substituting values in the above formula, we get-
Effective Access Time
= 0.8 x { 20 ns + 100 ns } + 0.2 x { 20 ns + (1+1) x 100 ns }
= 0.8 x 120 ns + 0.2 + 220 ns
= 96 ns + 44 ns
= 140 ns
Thus, effective memory access time = 140 ns.
Sol.2.
Given-
Number of levels of page table = 2
TLB access time = 20 ns
Main memory access time = 100 ns
TLB Hit ratio = 80% = 0.8
Calculating TLB Miss Ratio-
TLB Miss ratio
= 1 – TLB Hit ratio
= 1 – 0.8
= 0.2
Calculating Effective Access Time-
Substituting values in the above formula, we get-
Effective Access Time
= 0.8 x { 20 ns + 100 ns } + 0.2 x { 20 ns + (2+1) x 100 ns }
= 0.8 x 120 ns + 0.2 + 320 ns
= 96 ns + 64 ns
= 160 ns
Thus, effective memory access time = 160 ns.
Paging
Q.1. Consider a machine with 64 MB physical memory and a 32 bit virtual
address space. If the page size is 4 KB, what is the approximate size of
the page table?
Find No. of pages-
No. of frames
No. of entries in page table
Size of page table.
Memory Interleaving
--Interleaved memory implements the concept of accessing more words in single memory
access cycle
-Memory can be partitioned into N separate modules.
-Then N accesses can be done simultaneously.
-Dividing main memory in more than two modules
-Compensate for relative slow speed of DRAM
-Increase bandwidth-by allowing simultaneous access to more than one chunk of memory
-improves performance-processor can transfer more amount of data to and from memory
-divides system memory into multiple blocks
-Two methods-2-way interleaving and 4-way interleaving
-In 2-way interleaving 2 memory blocks accessed at same time for reading and writing
operation
-In 4-way interleaving 4 memory blocks accessed at same time
Memory Interleaving
183 References