Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Module 3F

Download as pdf or txt
Download as pdf or txt
You are on page 1of 183

Computer Architecture

and Organization
[CSE2003]

1
Faculty, SCSE,
VIT- Bhopal University.
2 Contents
UNIT – III
 Memory System
 Memory systems hierarchy-Main memory organization-Types of Main
memory : SRAM , DRAM and its characteristics and performance –
latency –cycle time -bandwidth- memory interleaving
 Cache memory: Address mapping-line size-replacement and policies-
coherence
 Virtual memory: Paging (Single level and multi level) – Page Table
Mapping – TLB
 Reliability of memory systems: Error detecting and error correcting
systems.
Memory
-The maximum size of the memory that can be used in any computer is determined by the
addressing scheme. For example, a computer that generates 16-bit addresses is capable of
addressing up to 2^16 = 64K (kilo) memory locations
The main memory is functionally organized as a number of locations
 Each location stores a fixed number of bits.
 The term word length of a memory indicates the number of bits in each location.
 The total capacity of a memory is the number of locations multiplied by the word length.
 Each location in memory is identified by a unique address as shown in Fig.
 Two different memories with the same capacity may have different organization as
illustrated by Fig.2.
 Both have same capacity of 4 kilo bytes but they differ in their internal organizations.
Fig.1.Main memory locations
Fig:2.Memory capacity and organization
Memory Addressability
 The number of bits in the memory address determines the maximum number of
memory addresses possible for the CPU.
 Suppose a CPU has n bits in the address, its memory can have a maximum of 2^n
(2 to the power of n) locations.
 This is known as the CPU’s memory addressability.
 It gives theoretical maximum memory capacity for a CPU.
 Table: Popular CPU and their memory addressability
Memory

Q.1.A computer has a main memory with 1024 locations of each 32-bits.
Calculate the total memory capacity:

Q.2.A CPU has a 12 bit address for memory addressing: (a) What is the
memory addressability of the CPU ? (b) If the memory has a total capacity
of 16 KB, what is the word length of the memory?
Memory
Q.A computer has a main memory with 1024 locations of each 32-bits.
Calculate the total memory capacity:
 Sol1.
 Word length = 32 bits = 4 bytes;
 No. of locations = 1024 = 1 kilo = 1K;
 Memory capacity = 1K * 4 bytes = 4 KB

 Sol2. No. of address bits = 12;


 Memory addressability = 212 = 4 kilo locations;
 Memory capacity = 16 KB;
 Word length = memory capacity/no. of locations = 16 KB/4K = 4B =
4 bytes.
9 Types of Main memory

 1.Physical characteristics of memory


 2.Crucial metrics of memory system
 3.Classification based on volatility
 4.Classification based on access-
 5.Classification of ROM
 6.Classification of RAM
10 Physical characteristics

 Volatility
 — Does the memory retain data in the absence of electrical power?
 • Delay
— Ranges from tiny fractions of a second (volatile DRAM) to many years (CDs,
DVDs)
 • Erasable
— Can the memory be rewritten? If so, how fast? How many erase cycles can occur?
 • Power consumption
11 Types

 Semiconductor
— RAM (volatile or non-volatile)
 • Magnetic Surface Memory
— Disk & Tape
 • Optical
— CD & DVD
 • Others
— Magneto-optical
— Bubble
— Hologram
Classification of Memory Systems

a) Volatile versus Non-volatile:


– A volatile memory system is one where the stored data is lost when the
power is switched off.
• Examples: CMOS static memory, CMOS dynamic memory.
• Dynamic memory in addition requires periodic refreshing.
– A non-volatile memory system is one where the stored data is retained
even when the power is switched off.
• Examples: Read-only memory, Magnetic disk, CDROM/DVD, Flash memory,
Resistive memory.
b)Random-access versus Direct/Sequential access:
-A memory is said to be random-access when the read/write time is
independent of the memory location being accessed.
• Examples: CMOS memory (RAM and ROM).
– A memory is said to be sequential access when the stored data can
only be accessed sequentially in a particular order.
• Examples: Magnetic tape, Punched paper tape.
– A memory is said to be direct or semi-random access when part of the
access is sequential and part is random.
• Example: Magnetic disk.
• We can directly go to a track acer which access will be sequential.
c) Read-only versus Random-access:
– Read-only Memory (ROM) is one where data once stored in permanent
or semi-permanent.
• Data written (programmed) during manufacture or in the laboratory.
• Examples: ROM, PROM, EPROM, EEPROM.
– Random Access Memory (RAM) is one where data access time is the
same independent of the location (address).
• Used in main / cache memory systems.
• Example: Static RAM (SRAM) à data once written are retained as long as
power is on.
• Example: Dynamic RAM (DRAM) à requires periodic refreshing even
when power is on (data stored as charge on tiny capacitors).
Some important questions?

– How to make the memory system work faster?


– How to increase the data transfer rate between CPU and memory?
– How to address the ever increasing storage needs of applications?
• Some possible solutions:
– Cache Memory: to increase the effective speed of the memory system.
– Virtual Memory: to increase the effective size of the memory system.
Cache memory
Virtual Memory
Cache and Virtual Memory
- The processor of a computer can usually process instructions and data
faster than they can be fetched from the main memory.
-Hence, the memory access time is the bottleneck in the system.
-One way to reduce the memory access time is to use a cache memory.
-This is a small, fast memory inserted between the larger, slower main
memory and the processor.
-It holds the currently active portions of a program and their data.
-Virtual memory is another important concept related to memory
organization.
-With this technique, only the active portions of a program are stored in the
main memory, and the remainder is stored on the much larger secondary
storage device.
How a Memory Chip Looks Like?
Access Time, Latency and Bandwidth

• Terminologies used to measure speed of the memory system.


a) Memory Access Time: Time between initiation of an operation (Read or
Write) and completion of that operation.
b) Latency: Initial delay from the initiation of an operation to the time the
first data is available.
c) Bandwidth: Maximum speed of data transfer in bytes per second.
• In modern memory organizations, every read request reads a block of words
into some high-speed registers (LATENCY), from where data are supplied to
the processor one by one (ACCESS TIME).
Latency
-During block transfers, memory latency is the amount of time it takes to transfer the first
word of a block.
The time required to transfer a complete block depends also on the rate at which successive
words can be transferred and on the size of the
block.
-The time between successive words of a block is much shorter than the time needed
to transfer the first word.
-For instance, in the timing diagram in Figure, the access cycle begins with the assertion of
the RAS signal.
-The first word of data is transferred five clock cycles later.
-Thus, the latency is five clock cycles.
- If the clock rate is 500 MHz, then the latency is 10 ns.
- The remaining three words are transferred in consecutive clock cycles, at
the rate of one word every 2 ns.
Latency
Memory
Crucial Metrics
Memory access time -A useful measure of the speed of memory units is the time that
elapses between the initiation of an operation to transfer a word of data and the completion
of that operation.
Memory cycle time, which is the minimum time delay required between the initiation of two
successive memory operations, for example, the time between two successive Read
operations.
-The cycle time is usually slightly longer than the access time, depending on the
implementation details of the memory unit.

-A memory unit is called a random-access memory (RAM) if the access time to any
location is the same, independent of the location’s address.
-This distinguishes such memory units from serial, or partly serial, access storage devices
such as magnetic and optical disks.
-Access time of the latter devices depends on the address or position of the data.
Memory
-The processor uses the address lines to specify the memory location involved in a data
transfer operation, and uses the data lines to transfer the data.
-At the same time, the control lines carry the command indicating a Read or
a Write operation and whether a byte or a word is to be transferred.
-The control lines also provide the necessary timing information and are used by the
memory to indicate when it has completed the requested operation.

Memory access time -A useful measure of the speed of memory units is the time that
elapses between the initiation of an operation to transfer a word of data and the completion
of that operation.
Memory cycle time, which is the minimum time delay required between the initiation of two
successive memory operations, for example, the time between two successive Read
operations.
-The cycle time is usually slightly longer than the access time, depending on the
implementation details of the memory unit.
Memory
-The maximum size of the memory that can be used in any computer is
determined by the addressing scheme.
-For example, a computer that generates 16-bit addresses is capable of
addressing up to 2^16 = 64K (kilo) memory locations.
-Machines whose instructions generate 32-bit addresses can utilize a memory
that contains up to 2^32 = 4G (giga) locations.
-Machines with 64-bit addresses can access up to 2^64 = 16E (exa) ≈
locations.
-The number of locations represents the size of the address space of the
computer
Connection of the memory to the processor.
Internal Organization of Memory Chips
16 × 8 organization: an example of a very small memory circuit consisting of 16 words of 8
bits each.
Memory organization
-Memory cells are usually organized in the form of an array, in which each cell is capable of
storing one bit of information.
-Each row of cells constitutes a memory word, and all cells of a row are connected to a
common line referred to as the word line, which is driven by the address decoder on the
chip.
-The cells in each column are connected to a Sense/Write circuit by two bit lines, and the
Sense/Write circuits are connected to the data input/output lines of the chip.
-During a Read operation, these circuits sense, or read, the information stored in the cells
selected by a word line and place this information on the output data lines.
-During a Write operation, the Sense/Write circuits receive input data and store them in the
cells of the selected word.
-The memory circuit in Figure stores 128 bits and requires 14 external connections
for address, data, and control lines. It also needs two lines for power supply and ground
connections.
Memory
Consider now a slightly larger memory circuit, one that has 1K (1024)
memory
cells.
-This circuit can be organized as a 128 × 8 memory, requiring a total of 19
external connections.
-Commercially available memory chips contain a much larger number of
memory cells.
- For example, a 1G-bit chip may have a 256M × 4 organization, in which
case a 28-bit address is needed and 4 bits are transferred to or from the
chip.
What About a 256 X 16 Memory?
• How many total number of external connections are required:
Sol.
How many 128*8 RAM chips are required to provide a memory capacity of 2048*8?
How many 128*8 RAM chips are required to provide a memory capacity of 2045*16?
Q.1.A computer employs RAM chips of 256*8 and ROM chips of
1024*8.The computer system needs 2K bytes of RAM, 4K bytes
of ROM. How many RAM and ROM chips are needed.
Example 2
How many 128 x 8 RAM chips are needed to provide a memory capacity of
2048 bytes?
b. How many lines of the address bus must be used to access 2048 bytes of
memory? How many of these lines will be common to all chips?
c. How many lines must be decoded for chip select? Specify the size of the
decoders.
a.The number of memory chips needed =
2^11 / 2^7 = 16 memory chips
b. To access 2048 bytes, 11 bits for
addressing are needed.
Each chip has capacity of 2^7 bytes 7
bits for addressing must be common to all
chips.
c. We have 16 memory chip 4 selection
bits are needed.
The 4 selection bits are decoded to 16 bit

an 4 ×16 decoder is needed


Practice example
A computer employs RAM chips of 256 x 8 and ROM chips of 1024 x 8. The computer
system needs 2K bytes of RAM, 4K bytes of ROM, and four interface units, each with four
registers. A memory-mapped I/O configuration is used. The two highest-order bits of the
address bus are assigned 00 for RAM and 10 for interface registers.
a. How many RAM and ROM chips are needed?
b. Draw a memory-address map for the system.
c. Give the address range in hexadecimal for RAM, ROM and interface.
soution
Introduction

• Broadly two types of semiconductor memory systems:


a) Static Random Access Memory (SRAM)
b) Dynamic Random Access Memory (DRAM)

• Vary in terms of speed, density, volatility properties, and cost.


– Present-day main memory systems are built using DRAM.
– Cache memory systems are built using SRAM.
Static Random Access Memory (SRAM)
• SRAM consists of circuits which can store the data as long as
power is applied.
• It is a type of semiconductor memory that uses bistable latching
circuitry (flip-flop) to store each bit.
• SRAM memory arrays can be arranged in rows and columns of
memory cells.
– Called word line and bit line.
SRAM technology

– Can be built using 4 or 6 MOS transistors.


– Modern SRAM chips in the market uses 6-transistor
implementations for CMOS compatibility.
– Widely used in small-scale systems like microcontrollers and
embedded systems.
– Also used to implement cache memories in computer systems.
A 1-bit SRAM Cell
• Two inverters are cross connected to form
a latch.
• The latch is connected to two bit lines with
transistors T1 and T2.
• Transistors behave like switches that can
be opened (OFF) or closed (ON) under the
control of the word line.
• To retain the state of the latch, the word
line can be grounded which makes the
transistors off.
READ Operation in SRAM
• To read the content of the cell, the
word line is activated (= 1) to make
the transistors T1 and T2 on.
• The value stored in latch is available
on bit line b and its complement on
b’.
• Sense/write circuits connected to
the bit lines monitor the states of b
and b’.
WRITE Operation in SRAM
• To write 1: The bit line b is set with 1 and bit
line b’ is set with 0. Then the word line is
activated and the data is written to the latch.

• To write 0: The bit line b is set with 0 and bit


line b’ is set with 1. Then the word line is
activated and the data is written to the latch.

• The required signals (either 1 or 0) are


generated by the sense/write circuit.
6-Transistor Static Memory cell
6T SRAM-In State 0
• In state 0 the voltage at X is low
and the voltage at Y is high.
• When the voltage at X is low,
transistors (T4 & T5) are on while
(T3 & T6) are off.
• When word line is activated, T1
and T2 are turned on and the bit
lines b will have 0 and b’ will have 1.
6T SRAM-In State 1
• In state 1 the voltage at X is high and the
voltage at Y is low.
• When the voltage at X is high,
transistors (T3 & T6) are on while
(T4 & T5) are off.
• When word line is activated, T1 and
T2 are turned on and the bit lines b
will have 1 and b’ will have 0.
Features of SRAM
• Low power consumption.
– A major advantage of CMOS SRAMs is their very low power
consumption, because current flows in the cell only when the cell is being
accessed.
• Simplicity – refresh circuitry is not needed.
– Volatile :: continuous power supply is required.
• Fast operation.
– Access time is very fast; fast memories (cache) are built using SRAM.
• High cost.
– 6 transistors per cell.
• Limited capacity.
– Not economical to manufacture high-capacity SRAM chips.
SRAM-Characteristics
-Continuous power is needed for the cell to retain its state.
-If power is interrupted, the cell’s contents are lost.
-When power is restored, the latch settles into a stable state, but not necessarily the same
state the cell was in before the interruption.
-Hence, SRAMs are said to be volatile memories because their contents are lost when
power is interrupted.
-A major advantage of CMOS SRAMs is their very low power consumption, because
current flows in the cell only when the cell is being accessed.
-Otherwise, T1, T2, and one transistor in each inverter are turned off, ensuring that there is
no continuous electrical path between Vsupply and ground.
-Static RAMs can be accessed very quickly. Access times on the order of a few
nanoseconds are found in commercially available chips.
-SRAMs are used in applications where speed is of critical concern.
Dynamic Random Access Memory (DRAM)
• Dynamic RAM do not retain its state even
if power supply is on.
– Data stored in the form of charge stored
on a capacitor.
• Requires periodic refresh.
– The charge stored cannot be retained
over long time (due to leakage).
• Less expensive that SRAM.
– Requires less hardware (one transistor
and one capacitor per cell).
• Address lines are multiplexed.
READ Operation in DRAM
• The transistor of the particular cell is
turned on by activating the word line.
• A sense amplifier connected to bit
line senses the charge stored in the
capacitor.
• If the charge is above threshold, the
bit line is maintained at high voltage,
which represents logic 1.
• If the charge is below threshold, the
bit line is grounded, which represent
logic 0.
WRITE Operation in DRAM
• The transistor of the particular cell is
turned on by activating the word line.
• Depending on the value to be written (0
or 1), an appropriate voltage is applied
to the bit line.
• The capacitor gets charged to the
required voltage state.
• Refreshing of the capacitor requires
periodic READ-WRITE cycles (every few
msec).
Dynamic RAMs
-Static RAMs are fast, but their cells require several transistors.
-Less expensive and higher density RAMs can be implemented with simpler cells.
- But, these simpler cells do not retain their state for a long period, unless they are
accessed frequently for Read or Write operations.
- Memories that use such cells are called dynamic RAMs (DRAMs).
-Information is stored in a dynamic memory cell in the form of a charge on a capacitor,
but this charge can be maintained for only tens of milliseconds.
-Since the cell is required to store information for a much longer time, its contents must be
periodically refreshed by restoring the capacitor charge to its full value.
-This occurs when the contents of the cell are read or when new information is written into it.
Operation
-To store information in this cell, transistor T is turned on and an appropriate voltage is
applied to the bit line. This causes a known amount of charge to be
stored in the capacitor.
-After the transistor is turned off, the charge remains stored in the capacitor, but not
for long. The capacitor begins to discharge. This is because the transistor continues to
conduct a tiny amount of current, measured in picoamperes, after it is turned off.
Static Memories
-Memories that consist of circuits capable of retaining their state as long as power is applied
are known as static memories.
-Two inverters are cross-connected to form a latch.
The latch is connected to two bit lines by transistors T1 and T2. These transistors act as
switches that can be opened or closed under control of the word line.

Fig: A static RAM cell.


Operation
-When the word line is at ground level, the transistors are turned off and the latch retains its
state.
-For example, if the logic value at point X is 1 and at point Y is 0, this state is maintained as
long as the signal on the word line is at ground level. Assume that this state represents the
value 1.
Read Operation
In order to read the state of the SRAM cell, the word line is activated to close switches
T1 and T2. If the cell is in state 1, the signal on bit line b is high and the signal on bit line
b(bar) is low. The opposite is true if the cell is in state 0. Thus, b and b(bar) are always
complements of each other.
-The Sense/Write circuit at the end of the two bit lines monitors their state and sets the
corresponding output accordingly.
Write Operation
During a Write operation, the Sense/Write circuit drives bit lines b and b(bar), instead of
sensing their state. It places the appropriate value on bit line b and its complement on b(bar)
and activates the word line. This forces the cell into the corresponding state, which the cell
retains when the word line is deactivated.
Consider a system having three levels of memory, a cache memory, a semiconductor main
memory, and a magnetic disk secondary memory, if the access times of the memories are
10 ns, 50 ns, and 1 us, respectively. The cache hit ratio is 80% and the main memory hit
ratio is 85%. Compute the average access time for this system.
Quick Review of Memory Technology
• Static RAM:
– Very fast but expensive memory technology (requires 6 transistors / bit).
– Packing density is limited.
• Dynamic RAM:
– Significantly slower than DRAM, but much less expensive (1 transistor /
bit).
– Requires periodic refreshing.
• Flash memory:
– Non-volatile memory technology that uses floating-gate MOS transistors.
– Slower than DRAM, but higher packing density, and lower cost per bit.
Magnetic disk:
– Provides large amount of storage, with very low cost per bit.
– Much slower than DRAM, and also flash memory.
– Requires mechanical moving parts, and uses magnetic recording
technology.
Cache-Performance Considerations
-Two key factors in the commercial success of a computer are performance
and cost; the best possible performance for a given cost is the objective.
-Performance depends on how fast machine instructions can be brought into
the processor and how fast they can be executed
-The main purpose of this hierarchy is to create a memory that the processor
sees as having a short access time and a large capacity.
-When a cache is used, the processor is able to access instructions and
data more quickly when the data from the referenced memory locations are
in the cache.
-Therefore, the extent to which caches improve performance is dependent
on how frequently the requested instructions and data are found in the
cache.
Hit Rate and Miss Penalty
Hit-A successful access to data in a cache.
Hit rate: The number of hits stated as a fraction of all attempted accesses is
called the hit rate
-Miss rate is the number of misses stated as a fraction of attempted
accesses.
-High hit rates well over 0.9 are essential for high-performance computers.
-Performance is adversely affected by the actions that need to be taken when a miss
occurs.
-A performance penalty is incurred because of the extra time needed to bring a block
of data from a slower unit in the memory hierarchy to a faster unit.
-During that period, the processor is stalled waiting for instructions or data.
-We refer to the total access time seen by the processor when a miss occurs
as the miss penalty.
-Consider a system with only one level of cache.
In this case, the miss penalty consists almost entirely of the time to access a block of data in
the main memory.
-Let h be the hit rate, M the miss penalty, and C the time to access information in the cache.
Thus, the average access time experienced by the processor is
Hit Ratio / Hit Rate:
– The hit ratio H is defined as the probability that a logical address generated
by the CPU refers to information stored in M1.
– We can determine H experimentally as follows:
• A set of representative programs is executed or simulated.
• The number of references to M1 and M2, denoted by N1 and N2 respectively, are
recorded.
𝑁𝑁1
𝐻𝐻 =
𝑁𝑁1 + 𝑁𝑁2

The quantity (1 – H) is called the miss ratio.


tA1=Access time of Memory M1
tA2=Access time of Memory M2
Then Average time required by CPU to access a word in memory can be
expressed as :
𝑡𝑡𝐴𝐴 = 𝐻𝐻. 𝑡𝑡𝐴𝐴𝐴 + 1 − 𝐻𝐻 . 𝑡𝑡𝑚𝑚𝑚𝑚𝑚𝑚𝑚𝑚
Where tmiss is time required to handle the miss called miss penalty
Efficiency:
– Let r = 𝑡𝑡𝐴𝐴𝐴 / 𝑡𝑡𝐴𝐴1 denote the access time ratio of the two levels of memory.
– We define the access efficiency as e = 𝑡𝑡𝐴𝐴1 / 𝑡𝑡𝐴𝐴 , which is the factor by
which 𝑡𝑡𝐴𝐴 differs from its minimum possible value.
Performance of Memory Hierarchy
• We first consider a 2-level hierarchy consisting of two levels of memory, say,
M1 and M2.
Q.1.Consider a system with 2 level cache. Access times of Level 1 cache, Level 2 cache
and main memory are 1 ns, 10 ns, and 500 ns, respectively. The hit rates of Level 1 and
Level 2 caches are 0.8 and 0.9, respectively. What is the average access time of the system
ignoring the search time within the cache?
Ans. T=12.6ns
Access time for hierarchical access
Practice questions
Q.2.Consider a system having three levels of memory, a L1 cache memory, a L2 cache
memory and semiconductor main memory, if the access times of the memories are 10 ns,
50 ns, and 1 us, respectively. The L1 cache hit ratio is 80% and L2 cache hit ratio is 90%.
Compute the average access time for this system.
Solution
Explanation: First, the system will look in cache 1. If it is not found in cache 1, then cache 2
and then further in main memory (if not in cache 2 also).
The average access time would take into consideration success in cache 1, failure in cache
1 but success in cache 2, failure in both the caches and success in main memory.

Where H1= Hit rate of level 1 cache =0.8


T1 = Access time for level 1 cache=10ns
H2 = Hit rate of level 2 cache = 0.9
T2 = Access time for level 2 cache = 50 ns
Hm = Hit rate of Main Memory = 1
Tm = Access time for Main Memory = 500 ns
Memory Hierarchy design
Introduction
• Programmers want unlimited amount of memory with very low latency.
• Fast memory technology is more expensive per bit than slower memory.
– SRAM is more expensive than DRAM, DRAM is more expensive than disk.
• Possible solution?
– Organize the memory system in several levels, called memory hierarchy.
– Exploit temporal and spatial locality on computer programs.
– Try to keep the commonly accessed segments of program / data in the faster
memories.
– Results in faster access times on the average.
Memory Hierarchy
• The memory system is organized in several levels, using progressively
faster technologies as we move towards the processor.
– The entire addressable memory space is available in the largest (but
slowest) memory (typically, magnetic disk or flash storage).
– We incrementally add smaller (but faster) memories, each containing a
subset of the data stored in the memory below it.
– We proceed in steps towards the processor.
Typical hierarchy (starting with closest to the processor):
1. Processor registers
2. Level-1 cache (typically divided into separate instruction and
data cache)
3. Level-2 cache
4. Level-3 cache
5. Main memory
6. Secondary memory (magnetic disk / flash drive)
• As we move away from the processor:
– Size increases
– Cost decreases
– Speed decreases
Memory Hierarchy
-All of these different types of memory units
are employed effectively in a computer
system.
Memory Hierarchy
-Ideal memory would be fast, large, and inexpensive.
-A very fast memory can be implemented using static RAM(SRAM) chips.
-But, these chips are not suitable for implementing large memories,
because their basic cells are larger and consume more power than dynamic RAM cells.
-A solution is provided by using secondary storage, mainly magnetic disks, to provide the
required memory space.
-Disks are available at a reasonable cost, and they are used extensively in computer
systems.
-However, they are much slower than semiconductor memory units.
-In summary, a very large amount of cost-effective storage can be provided by magnetic
disks, and a large and considerably faster, yet affordable, main memory can be built with
dynamic RAM technology.
-This leaves the more expensive and much faster static RAM technology to be used in
smaller units where speed is of the essence, such as in cache memories.
Memory Hierarchy
Memory Hierarchy
Comparison

Level Typical Access Time Typical Capacity Other Features


Register 300-500 ps 500-1000 B On-chip
Level-1 cache 1-2 ns 16-64 KB On-chip
Level-2 cache 5-20 ns 256 KB – 2 MB On-chip
Level-3 cache 20-50 ns 1-32 MB On or off chip
Main memory 50-100 ns 1-16 GB
Magnetic disk 5-50 ms 100 GB – 16 TB
Locality of Reference
• Programs tend to reuse data and instructions they have used recently.
– Rule of thumb: 90% of the total execution time of a program is spent in
only 10% of the code (also called 90/10 rule).
– Reason: nested loops in a program, few procedures calling each other
repeatedly, arrays of data items being accessed sequentially, etc.
• Basic idea to exploit this rule:
– Based on a program’s recent past, we can predict with a reasonable
accuracy what instructions and data will be accessed in the near future.
A protocol of books
Most frequently accessed: desk
Slightly Less frequently used: shelf
Rarely used: Cabinet
Temporal locality :same set of books tend to be read again and again before exam.
Cache Memory
-The memory control circuitry is designed to take advantage of the property of locality of
reference.
-Temporal locality suggests that whenever an information item, instruction or data, is first
needed, this item should be brought into the cache, because it is likely to be needed again
soon.
- Spatial locality suggests that instead of fetching just one item from the main memory to
the cache, it is useful to fetch several items that are located at adjacent addresses as
well.
- Cache block refers to a set of contiguous address locations of some size
-Usually, the cache memory can store a reasonable number of blocks at any given time, but
this number is small compared to the total number of blocks in the main memory.
-The correspondence between the main memory blocks and those in the cache is specified
by a mapping function.
-When the cache is full and a memory word (instruction or data) that is not in the cache is
referenced, the cache control hardware must decide which block should be removed to
create space for the new block that contains the referenced word.
-The collection of rules for making this decision constitutes the cache’s replacement
algorithm.
Locality of reference
The 90/10 rule has two dimensions:
a) Temporal Locality (locality in time)
• If an item is referenced in memory, it will tend to be referenced again soon.
b) Spatial locality (locality in space)
• If an item is referenced in memory, nearby items will tend to be
referenced soon.
(a) Temporal Locality
• Recently executed instructions are likely to be executed again very soon.
(b) Spatial Locality
• Instructions residing close to a recently executing instruction are likely to
be executed soon.
Cache Memory
-The cache is a small and very fast memory, interposed between the processor and the
main memory.
-Its purpose is to make the main memory appear to the processor to be much
faster than it actually is.
-The effectiveness of this approach is based on a property of computer programs called
locality of reference.
-Analysis of programs shows that most of their execution time is spent in routines in which
many instructions are executed repeatedly.
-These instructions may constitute a simple loop, nested loops, or a few procedures that
repeatedly call each other.
-This behaviour manifests itself in two ways: temporal and spatial.
-The first means that a recently executed instruction is likely to be executed again very
soon.
-The spatial aspect means that instructions close to a recently executed instruction are also
likely to be executed soon.
Use of cache memory
Cache Memory
Cache memory is logically divided into blocks or lines, where every block
(line) typically contains 8 to 256 bytes.
• When the CPU wants to access a word in memory, a special hardware first
checks whether it is present in cache memory.
– If so (called cache hit), the word is directly accessed from the cache memory.
– If not, the block containing the requested word is brought from main memory to
cache.
– For writes, sometimes the CPU can also directly write to main memory.
• Objective is to keep the commonly used blocks in the cache memory.
– Will result in significantly improved performance due to the property of locality
of reference.
Elements of Cache Design
• Cache Size: It is the amount of main memory data that cache
can hold.
• Block Size: It is the no. of words(bytes) grouped together into
a single unit called a block.
• Mapping Function:(Block placement phenomenon): Direct,
Associative and Set associative.
• Replacement Algorithms:
• Write Policy:
Elements of Cache Design
• Addresses (logical or physical)
• Size
• Mapping Function (direct, assoociative, set associative)
• Replacement Algorithm (LRU, LFU, FIFO, random)
• Write Policy (write through, write back, write once)
• Line Size
• Number of Caches (how many levels, unified or split)
Elements of Cache Design
• Addresses (logical or physical)
• Size
• Mapping Function (direct, assoociative, set associative)
• Replacement Algorithm (LRU, LFU, FIFO, random)
• Write Policy (write through, write back, write once)
• Line Size
• Number of Caches (how many levels, unified or split)
Where can a block be placed in the cache?
• This is determined by some mapping algorithms.
– Specifies which main memory blocks can reside in which cache memory blocks.
– At any given time, only a small subset of the main memory blocks can be held in cache
memory.
• Three common block mapping techniques are used:
a) Direct Mapping
b) Associative Mapping
c) (N-way) Set Associative Mapping

Example1: A 2-level memory hierarchy
• Consider a 2-level cache memory / main memory hierarchy.
– The cache memory consists of 256 blocks (lines) of 32 words
each.
• Total cache size is 8192 (8K) words.
– Main memory is addressable by a 24-bit address.
• Total size of the main memory is 2^24 = 16 M words.
• Number of 32-word blocks in main memory = 16 M / 32 = 512K
(a) Direct Mapping
• Each main memory block can be placed in only one block in the cache.
• The mapping function is:
Cache Block = (Main Memory Block) % (Number of cache blocks)
• For the example,
Cache Block = (Main Memory Block) % 256
• Some example mappings:
.
Direct mapping
Block replacement algorithm is trivial, as there is no choice.
• More than one MM block is mapped onto the same cache block.
– May lead to contention even if the cache is not full.
– New block will replace the old block.
– May lead to poor performance if both the blocks are frequently used.
• The MM address is divided into three fields: TAG, BLOCK and WORD.
– When a new block is loaded into the cache, the 8-bit BLOCK field
determines the cache block where it is to be stored.
– The high-order 11 bits are stored in a TAG register associated with the
cache block.
– When accessing a memory word, the corresponding TAG fields are
compared.
• Match implies HIT.
Mapping Functions
Direct Mapping
Example 2:Consider a cache consisting of 128 blocks of 16 words each, for a total of
2048 (2K) words, and assume that the main memory is addressable by a 16-bit
address. The main memory has 64K words, which we will view as 4K blocks of 16
words each.
Mapping Functions
Direct Mapping
Example 2:Consider a cache consisting of 128 blocks of 16 words each, for a total of
2048 (2K) words, and assume that the main memory is addressable by a 16-bit
address. How address distribution will be done?
Mapping Functions
Direct Mapping
Example 2:Consider a cache consisting of 128 blocks of 16 words each, for a total of
2048 (2K) words, and assume that the main memory is addressable by a 16-bit
address. The main memory has 64K words, which we will view as 4K blocks of 16
words each.
-The simplest way to determine cache locations in which to store memory blocks is
the direct-mapping technique.
- In this technique, block j of the main memory maps onto block j modulo 128 of the cache.
-Thus, whenever one of the main memory blocks 0, 128, 256, . . . is loaded into the cache, it
is stored in cache block 0. Blocks 1, 129, 257, . . . are stored in cache block 1, and so on. --
Since more than one memory block is mapped onto a given cache block position,
contention may arise for that position even when the cache is not full
-The memory address can be divided into three fields, as shown in figure.
Direct mapping
Direct- Mapping Cache Organization
Direct mapping
-The low-order 4 bits select one of 16 words in a block.
-When a new block enters the cache, the 7-bit cache block field determines the cache
position in which this block must be stored. The high-order 5 bits of the memory address of
the block are stored in 5 tag bits associated with its location in the cache.
-The tag bits identify which of the 32 main memory blocks mapped into this cache position is
currently resident in the cache.
-As execution proceeds, the 7-bit cache block field of each address generated by
the processor points to a particular block location in the cache.
-The high-order 5 bits of the address are compared with the tag bits associated with that
cache location.
-If they match, then the desired word is in that block of the cache. If there is no match, then
the block containing the required word must first be read from the main memory and loaded
into the cache.
-The direct-mapping technique is easy to implement, but it is not very flexible.
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
- If a program accesses 2 blocks that map to the same line repeatedly, cache misses are
very high.
A computer has a main memory of 64k*16, and a cache of 1K
words. The cache uses direct mapping with a block size of 4
words.
(a) how many blocks the cache can accommodate?
(b) how many bits are there in tag, index and word fields of
address format?
(c) how many bits are there in each word/line of cache?
A computer has a main memory of 64k*16, and a cache of 1K
words. The cache uses direct mapping with a block size of 4
words.
a) how many blocks the cache can accommodate?
(b) how many bits are there in tag, index and word fields of
address format?
(c) how many bits are there in each word/line of cache?
(Sol. Main memory has 64K = 64 x 1024 = 2^6 x 2^10 = 2^16 words
Cache memory has 1K = 1024 = 2^10 words
no of blocks present is cache =1024/4
=256 (blocks)

Tag = 16-8-2=6 bits


Block = 8 bits
Word = 2 bits
(b) Each word of cache will contain = Data + tag + valid = 16+6+1 = 23 bits in a word
(a) Cache can accommodate 256 blocks
Associative Mapping
-It is more flexible mapping method, in which a main memory block can be placed into any
cache block position.
-The tag bits of an address received from the processor are compared to the tag bits of
each block of the cache to see if the desired block is present.
-This is called the associative-mapping technique. It gives complete freedom in
choosing the cache location in which to place the memory block, resulting in a more
efficient use of the space in the cache.
-When a new block is brought into the cache, it replaces (ejects) an existing block only if the
cache is full.
-In this case, we need an algorithm to select the block to be replaced.
-Many replacement algorithms are possible.
-The complexity of an associative cache is higher than that of a direct-mapped cache,
because of the need to search all 128 tag patterns to determine whether a given block is in
the cache.
-To avoid a long delay, the tags must be searched in parallel. A search of this kind is called
an associative search.
Mapping Functions
Direct Mapping
Example 2:Consider a cache consisting of 128 blocks of 16 words each, for a total of
2048 (2K) words, and assume that the main memory is addressable by a 16-bit
address. How address distribution will be done?
(b) Associative Mapping
• Here, a MM block can potentially reside in any cache block position.
• The memory address is divided into two fields: TAG and WORD.
– When a block is loaded into the cache from MM, the higher order 19 bits of
the address are stored into the TAG register corresponding to the cache
block.
– When accessing memory, the 19-bit TAG field of the address is compared
with all the TAG registers corresponding to all the cache blocks.
• Requires associative memory for storing the TAG values.
– High cost / lack of scalability.
• Because of complete freedom in block positioning, a wide range of
replacement algorithms is possible.
Example1: A 2-level memory hierarchy
• Consider a 2-level cache memory / main memory hierarchy.
– The cache memory consists of 256 blocks (lines) of 32 words
each.
• Total cache size is 8192 (8K) words.
– Main memory is addressable by a 24-bit address.
• Total size of the main memory is 2^24 = 16 M words.
• Number of 32-word blocks in main memory = 16 M / 32 = 512K
Associative mapping
Example 2:Consider a cache consisting of 128 blocks of 16 words each, for a total of
2048 (2K) words, and assume that the main memory is addressable by a 16-bit
address. The main memory has 64K words, which we will view as 4K blocks of 16
words each.
Associative-mapped cache
A main memory address consists of a 22-bit tag and a 2-bit byte number.
c) N-way Set Associative Mapping
• A group of N consecutive blocks in the cache is called a set.
• This algorithm is a balance of direct mapping and associative mapping.
– Like direct mapping, a MM block is mapped to a set.
Set Number = (MM Block Number) % (Number of Sets in Cache)
– The block can be placed anywhere within the set (there are N choices)
• The value of N is a design parameter:
– N = 1 :: same as direct mapping.
– N = number of cache blocks :: same as associative mapping.
– Typical values of N used in practice are: 2, 4 or 8.
Example1: A 2-level memory hierarchy
• Consider a 2-level cache memory / main memory hierarchy.
– The cache memory consists of 256 blocks (lines) of 32 words
each.
• Total cache size is 8192 (8K) words.
– Main memory is addressable by a 24-bit address.
• Total size of the main memory is 2^24 = 16 M words.
• Number of 32-word blocks in main memory = 16 M / 32 = 512K
Illustration for N = 4:
– Number of sets in cache memory = 64.
– Memory blocks are mapped to a set using modulo-64 operation.
– Example: MM blocks 0, 64, 128, etc. all map to set 0, where they can occupy any
of the four available positions.
• MM address is divided into three fields: TAG, SET and WORD.
– The TAG field of the address must be associatively compared to the TAG fields of
the 4 blocks of the selected set.
– This instead of requiring a single large associative memory, we need a number of
very small associative memories only one of which will be used at a time.
Set-Associative Mapping
-Another approach is to use a combination of the direct- and associative-mapping
techniques.
-The blocks of the cache are grouped into sets, and the mapping allows a block of the
main memory to reside in any block of a specific set.
-Hence, the contention problem of the direct method is eased by having a few choices for
block placement.
-At the same time, the hardware cost is reduced by decreasing the size of the associative
search.
-An example of this set-associative-mapping technique is shown in Figure for a cache with
two blocks per set.
-In this case, memory blocks 0, 64, 128, . . . , 4032 map into cache set 0, and they
can occupy either of the two block positions within this set.
-Having 64 sets means that the 6-bit set field of the address determines which set of the
cache might contain the desired block.
-The tag field of the address must then be associatively compared to the tags of the
two blocks of the set to check if the desired block is present. This two-way associative
search is simple to implement.
Example 2:Consider a cache consisting of 128 blocks of 16 words each, for a total of
2048 (2K) words, and assume that the main memory is addressable by a 16-bit
address. The main memory has 64K words, which we will view as 4K blocks of 16
words each.
Consider 2 way set associative mapping.
Set-Associative Mapping
Practice Question
1.Consider a direct mapped cache of size 16 KB with block size 256 bytes. The size of main
memory is 128 KB. Find-
1. Number of bits in tag
2. Tag directory size

Q.2. Consider a fully associative mapped cache with block size 4 KB. The size of main
memory is 16 GB. Find the number of bits in tag.
Numerical
Question 3
A block-set associative cache memory consists of 128 blocks divided into four block
sets. The main memory consists of 16,384 blocks and each block contains 256 eight
bit words.
1. How many bits are required for addressing the main memory?
2. How many bits are needed to represent the TAG, SET and WORD fields?
Solution:
Given-
 Number of blocks in cache memory = 128
 Number of blocks in each set of cache = 4
 Main memory size = 16384 blocks
 Block size = 256 bytes
 1 word = 8 bits = 1 byte
 Main Memory Size-
We have-
 Size of main memory
 = 16384 blocks
 = 16384 x 256 bytes
 = 2^22bytes
 Thus, Number of bits required to address main memory = 22 bits
Number of Bits in Block Offset=256bytes=2^8
Number of Bits in Set Number=Number of lines in cache / Set size=128/4=2^5
Number of Bits in Tag Number-
Number of bits in tag
= Number of bits in physical address – (Number of bits in set number + Number of
bits in word)=22 bits – (5 bits + 8 bits)=9bits
Practice problem 4.
A 4-way set associative cache memory unit with a capacity of 16 KB is built using a
block size of 8 words. The word length is 32 bits. The size of the physical address
space is 4 GB. The number of bits for the TAG field is___
Sol. Given-
 Set size = 4 lines
 Cache memory size = 16 KB
 Block size = 8 words
 1 word = 32 bits = 4 bytes
 Main memory size = 4 GB
1.Number of Bits in Physical Address-We have,
Main memory size = 4 GB =2^32=32bits
2.Number of Bits in Block Offset-
Block size= 8 words= 8 x 4 bytes= 32 bytes=2^5=5
3. Number of lines in cache= Cache size / Line size=16 KB / 32 bytes=2^9=9
4.Number of sets in cache= Number of lines in cache / Set size= 512 lines / 4 lines=2^7=7
5.Number of bits in tag
= Number of bits in physical address – (Number of bits in set number + Number ofbits in
block offset)
= 32 bits – (7 bits + 5 bits)
= 32 bits – 12 bits
= 20 bits
Thus, number of bits in tag = 20 bits
Problem 5. Consider a direct mapped cache with 8 cache blocks (0-7). If the memory block
requests are in the order-
3, 5, 2, 8, 0, 6, 3, 9, 16, 20, 17, 25, 18, 30, 24, 2, 63, 5, 82, 17, 24
Which of the following memory blocks will not be in the cache at the end of the
sequence?
1. 3
2. 18
3. 20
4. 30
Also, calculate the hit ratio and miss ratio.
Sol. We have,
 There are 8 blocks in cache memory numbered from 0 to 7.
 In direct mapping, a particular block of main memory is mapped to a particular
line of cache memory.
 The line number is given by-
Cache line number = Block address % Number of lines in cache
For the given sequence-
 Requests for memory blocks are generated one by one.
 The line number of the block is calculated using the above relation.
 Then, the block is placed in that particular line.
 If already there exists another block in that line, then it is replaced.
Out of given options, only block-18
is not present in the main memory.
 Option-(B) is correct.
 Hit ratio = 3 / 21
 Miss ratio = 17 / 21
How is a block found if present in cache?
• Caches include a TAG associated with each cache block.
– The TAG of every cache block where the block being requested may be present
needs to be compared with the TAG field of the MM address.
– All the possible tags are compared in parallel, as speed is important.
• Mapping Algorithms?
– Direct mapping requires a single comparison.
– Associative mapping requires a full associative search over all the TAGs
corresponding to all cache blocks.
– Set associative mapping requires a limited associated search over the TAGs of
only the selected set.
Replacement Algorithms
Which block should be replaced on a cache miss?
• With fully associative or set associative mapping, there can be several blocks
to choose from for replacement when a miss occurs.
• Two primary strategies are used:
a) Random: The candidate block is selected randomly for replacement. This simple
strategy tends to spread allocation uniformly.
b) Least Recently Used (LRU): The block replaced is the one that has not been used
for the longest period of time.
• Makes use of a corollary of temporal locality:
“If recently used blocks are likely to be used again, then the best candidate for replacement
is the least recently used block”
LRU
To implement the LRU algorithm, the cache controller must track the LRU
block as the computation proceeds.
• Example: Consider a 4-way set associative cache.
– For tracking the LRU block within a set, we use a 2-bit counter with every block.
– When hit occurs:
• Counter of the referenced block is reset to 0.
• Counters with values originally lower than the referenced one are incremented by 1,
and all others remain unchanged.
– When miss occurs:
• If the set is not full, the counter associated with the new block loaded is set to 0, and
all other counters are incremented by 1.
• If the set is full, the block with counter value 3 is removed, the new block put in its
place, and the counter set to 0. The other three counters are incremented by 1.
Replacement Algorithms
-In a direct-mapped cache, the position of each block is predetermined by its address;
hence, the replacement strategy is trivial.
-In associative and set-associative caches there exists some flexibility. When a new block is
to be brought into the cache and all the positions that it may occupy are full, the cache
controller must decide which of the old blocks to overwrite.
-This is an important issue, because the decision can be a strong determining factor in
system performance.
- The objective is to keep blocks in the cache that are likely to be referenced in the near
future
- The property of locality of reference in programs gives a clue to a reasonable
strategy.
-Because program execution usually stays in localized areas for reasonable periods
of time, there is a high probability that the blocks that have been referenced recently will
be referenced again soon.
- Therefore, when a block is to be overwritten, it is sensible to overwrite the one that has
gone the longest time without being referenced.
-This block is called the least recently used (LRU) block, and the technique is called the
LRU replacement algorithm.
Replacement Algorithms
-- Therefore, when a block is to be overwritten, it is sensible to overwrite the one that has
gone the longest time without being referenced.
-This block is called the least recently used (LRU) block, and the technique is called the
LRU replacement algorithm.
-To use the LRU algorithm, the cache controller must track references to all blocks as
computation proceeds.
-Suppose it is required to track the LRU block of a four-block set in a set-associative cache.
-A 2-bit counter can be used for each block.
- When a hit occurs, the counter of the block that is referenced is set to 0.
- Counters with values originally lower than the referenced one are incremented by one,
and all others remain unchanged.
- When a miss occurs and the set is not full, the counter associated with the new block
loaded from the main memory is set to 0, and the values of all other counters are
increased by one.
-When a miss occurs and the set is full, the block with the counter value 3 is removed, the
new block is put in its place, and its counter is set to 0.
-The other three block counters are incremented by one. It can be easily verified that the
counter values of occupied blocks are always distinct.
Replacement Algorithms-LRU
-- -To use the LRU algorithm, the cache controller must track references to all blocks as
computation proceeds.
-Suppose it is required to track the LRU block of a four-block set in a set-associative cache.
-A 2-bit counter can be used for each block.
- When a hit occurs, the counter of the block that is referenced is set to 0.
- Counters with values originally lower than the referenced one are incremented by one,
and all others remain unchanged.
- When a miss occurs and the set is not full, the counter associated with the new block
loaded from the main memory is set to 0, and the values of all other counters are
increased by one.
-When a miss occurs and the set is full, the block with the counter value 3 is removed, the
new block is put in its place, and its counter is set to 0.
-The other three block counters are incremented by one. It can be easily verified that the
counter values of occupied blocks are always distinct.
Q.6.Consider a 4-way set associative mapping with 16 cache blocks. The memory block
requests are in the order-
0, 255, 1, 4, 3, 8, 133, 159, 216, 129, 63, 8, 48, 32, 73, 92, 155
If LRU replacement policy is used, which cache block will not be present in the cache?
3
8
129
216
Also, calculate the hit ratio and miss ratio.
Elements of Cache Design
• Addresses (logical or physical)
• Size
• Mapping Function (direct, assoociative, set associative)
• Replacement Algorithm (LRU, LFU, FIFO, random)
• Write Policy (write through, write back, write once)
• Line Size
• Number of Caches (how many levels, unified or split)
Other algorithms
Replacement algorithm
First in first out (FIFO)
— replace block that has been in cache longest
— Implemented as circular queue
• Least frequently used
— replace block which has had fewest hits
• Random
— Almost as good other choices
• LRU is often favoured because of ease of hardware implementation
Cache write strategies
Cache coherence-In a single CPU system two copies of same data(one in cache and other
in MM) results in inconsistent data
-Contents of cache and MM can be altered by more than one device Eg.CPU may write to
cache and DMA and I/O can write to MM.
Cache designs can be classified based on the write and memory update
strategy being used.
1. Write Through / Store Through
2. Write Back / Copy Back
(a) Write Through Strategy
• Information is written to both the cache
block and the main memory block.
• Features:
– Easier to implement
– Read misses do not result in writes to the
lower level (i.e. MM).
– The lower level (i.e. MM) has the most
updated version of the data – important for
I/O operations and multiprocessor systems.
– A write buffer is often used to reduce CPU
write stall time while data is written to main
memory.
-Overhead of accessing both cache and main memory for updating.
(b) Write Back Strategy
• Information is written only to the cache block.
• A modified cache block is written to MM only when it is replaced.
• Features:
– Writes occur at the speed of cache memory.
– Multiple writes to a cache block requires only one write to MM.
– Uses less memory bandwidth, makes it attractive to multi-processors.
• Write-back cache blocks can be clean or dirty.
– A status bit called dirty bit or modified bit is associated with each cache block,
which indicates whether the block was modified in the cache (0: clean, 1: dirty).
– If the status is clean, the block is not written back to MM while being replaced.
Write Back Strategy
Virtual memory
Almost all modern processors support virtual memory
• Virtual memory allows a program to treat its memory space as single contiguous block that
may be considerably larger than main memory
• A memory management unit takes care of the mapping between virtual and physical
Addresses.
-A logical (virtual) cache stores virtual addresses rather than physical addresses
• Processor addresses cache directly without going through MMU
• Obvious advantage is that addresses do not have to be translated by the MMU
• A not-so-obvious disadvantage is that all processes have the same virtual address space
— The same virtual address in two processes usually refers to different physical addresses
— So either flush cache with every context switch or add extra bits
Logical and physical cache

A logical cache, also known as a virtual


cache, stores data using virtual
addresses.
-The processor accesses the cache
directly, without going through
the MMU.
-A physical cache stores data using main
memory physical addresses.
A System with Physical Memory Only

 Examples:
 most Cray machines
Memory
 early PCs
0:
 nearly all embedded systems Physical 1:
Addresses

CPU

CPU’s load or store addresses used


directly to access memory N-1:
The
Problem
 Physical memory is of limited size (cost)
 What if you need more?
 Should the programmer be concerned about the size of code/
data blocks fitting physical memory?
 Should the programmer manage data movement from disk to
physical memory?
 Should the programmer ensure two processes do not use the
same physical memory?

 Also, ISA can have an address space greater than the


physical memory size
 E.g., a 64-bit address space with byte addressability
 What if you do not have enough physical memory?
Difficulties of Direct Physical Addressing
 Programmer needs to manage physical memory space
 Inconvenient & hard
 Harder when you have multiple processes

 Difficult to support code and data relocation

 Difficult to support multiple processes


 Protection and isolation between multiple processes
 Sharing of physical memory space

 Difficult to support data/code sharing across processes


Virtual Memory

 Idea: Give the programmer the illusion of a large address


space while having a small physical memory
 So that the programmer does not worry about managing
physical memory

 Programmer can assume he/she has “infinite” amount of


physical memory

 Hardware and software cooperatively and automatically


manage the physical memory space to provide the illusion
 Illusion is maintained for each independent process
Virtual memory
-In most modern computer systems, the physical main memory is not as large as the
address space of the processor. For example, a processor that issues 32-bit addresses has
an addressable space of 4G bytes.
-The size of the main memory in a typical computer with a 32-bit processor may range from
1G to 4G bytes.
-If a program does not completely fit into the main memory, the parts of it not currently being
executed are stored on a secondary storage device, typically a magnetic disk.
-As these parts are needed for execution, they must first be brought into the main memory,
possibly replacing other parts that are already in the memory.
-These actions are performed automatically by the operating system, using a scheme
known as virtual memory.
-Application programmers need not be aware of the limitations imposed by the available
main memory.
-They prepare programs using the entire address space of the processor
Basic Mechanism

 Indirection (in addressing)

 Address generated by each instruction in a program is a


“virtual address”
 i.e., it is not the physical address used to address main
memory
 called “linear address” in x86

 An “address translation” mechanism maps this address to a


“physical address”
 called “real address” in x86
 Address translation mechanism can be implemented in
hardware and software together
Virtual memory organization.
-Memory Management Unit
(MMU),
keeps track of which parts of
the virtual address space are in
the physical memory.
-When the desired data or
instructions are in the main
memory, the MMU translates the
virtual address into the
corresponding physical address.
Paging
Both unequal fixed- size and variable- size partitions are inefficient in the use of memory.
-Suppose, however, that memory is partitioned into equal fixed- size chunks that
are relatively small, and that each process is also divided into small fixed- size chunks of
some size.
-Then the chunks of a program, known as pages, could be assigned to available chunks of
memory, known as frames, or page frames
Paging
Allocation of free frames
Logical and Physical Addresses
Virtual Memory
-The binary addresses that the processor issues for either instructions or data are called
virtual or logical addresses.
-A logical address is expressed as a location relative to the beginning of the program.
Instructions in the program contain only logical addresses.
-A physical address is an actual location in main memory. When the processor executes
a process, it automatically converts from logical to physical address by adding
the current starting location of the process, called its base address, to each logical
address.
-These addresses are translated into physical addresses by a combination of hardware and
software actions.
- If a virtual address refers to a part of the program or data space that is currently in the
physical memory, then the contents of the appropriate location in the main memory are
accessed immediately.
- Otherwise, the contents of the referenced address must be brought into a suitable
location in the memory before they can be used.
-Assume that all programs and data are composed of fixed-length units called pages, each
of which consists of a block of words that occupy contiguous locations in the main memory.
-Pages commonly range from 2K to 16K bytes in length.
-They constitute the basic unit of information that is transferred between the main memory
and the disk whenever the MMU determines that a transfer is required.
Address Translation
-Each virtual address generated by the processor, whether it is for an instruction fetch or an
operand load/store operation, is interpreted as a virtual page number (high-order bits)
followed by an offset (low-order bits) that specifies the location of a particular byte (or word)
within a page.
-Information about the main memory location of each page is kept in a page table.
-This information includes the main memory address where the page is stored and the
current status of the page.
-An area in the main memory that can hold one page is called a page frame.
-The starting address of the page table is kept in a page table base register.
-By adding the virtual page number to the contents of this register, the address of the
corresponding entry in the page table is obtained.
-The contents of this location give the starting address of the page if that page
currently resides in the main memory.
Address Translation
A System with Virtual Memory
(Page based)
Memory
0:
Page Table 1:
Virtual Physical
Addresses 0: Addresses
1:

CPU

P-1:
N-1:

Disk

 Address Translation: The hardware converts virtual addresses into


physical addresses via an OS-managed lookup table (page table)
Virtual Pages, Physical Frames

 Virtual address space divided into pages


 Physical address space divided into frames

 A virtual page is mapped to


 A physical frame, if the page is in physical memory
 A location in disk, otherwise

 If an accessed virtual page is not in memory, but on disk


 Virtual memory system brings the page into a physical frame
and adjusts the mapping this is called demand paging

 Page table is the table that stores the mapping of virtual


pages to physical frames
Address Translation (III)
 Parameters
 P = 2p = page size (bytes).
 N = 2n = Virtual-address limit
 M = 2m = Physical-address limit
n–1 p p–1 0
virtual page number page offset virtual address

address translation

m–1 p p–1 0
physical frame number page offset physical address

Page offset bits don’t change as a result of translation


163
Address Translation (IV)
 Separate (set of) page table(s) per process
 VPN forms index into page table (points to a page table entry)
 Page Table Entry (PTE) provides information about page

virtual address 0
page table
n–1 p p–1
base register
(per process) virtual page number (VPN) page offset

valid access physical frame number (PFN)

VPN acts as
table index

if valid=0
then page m–1 p p–1 0
not in memory physical frame number (PFN) page offset
(page fault)
physical address
16
Address translation in hardware
Most significant bits of VA(Virtual
access) give the VPN (Virtual page
number)
• Page table maps VPN to PFN
(Page frame number)
• PA is obtained from PFN and offset
within a page
• MMU stores (physical) address of
start of page table, not all entries.
• “Walks” the page table to get
relevant PTE
TLB
-In principle, then, every virtual memory reference can cause two physical memory
accesses: one to fetch the appropriate page table entry, and one to fetch the
desired data.
-Thus, a straightforward virtual memory scheme would have the effect
of doubling the memory access time.
-To overcome this problem, most virtual memory schemes make use of a special cache for
page table entries, usually called a translation lookaside buffer (TLB).
- This cache functions in the same way as a memory cache and contains those page table
entries that have been most recently used.
What happens on memory access?
• CPU requests code or data at a virtual address
• MMU must translate VA to PA
– First, access memory to read page table entry
– Translate VA to PA –
Then, access memory to fetch code/data
• Paging adds overhead to memory access
•Solution? A cache for VA-PA mappings
Translation Lookaside Buffer (TLB)

• A cache of recent VA(Virtual access)-PA(Physical access) mappings


• To translate VA to PA, MMU first looks up TLB
• If TLB hit, PA can be directly used
• If TLB miss, then MMU performs additional memory accesses to “walk”
page table
• TLB misses are expensive (multiple memory accesses) – Locality of
reference helps to have high hit rate
• TLB entries may become invalid on context switch and change of page
tables
Operation of Paging and Translation
Lookaside Buffer (TLB)
Translation Lookaside Buffer
-The page table information is used by the MMU for every read and write access.
-Ideally, the page table should be situated within the MMU.
-Unfortunately, the page table may be rather large.
-Since the MMU is normally implemented as part of the processor chip, it is impossible to
include the complete table within the MMU.
-Instead, a copy of only a small portion of the table is accommodated within the MMU, and
the complete table is kept in the main memory.
-The portion maintained within the MMU consists of the entries corresponding to the most
recently accessed pages.
-They are stored in a small table, usually called the Translation Lookaside Buffer (TLB).
-The TLB functions as a cache for the page table in the main memory.
-Each entry in the TLB includes a copy of the information in the corresponding entry in the
page table.
-In addition, it includes the virtual address of the page, which is needed to search the TLB
for a particular page.
-Address translation proceeds as follows. Given a virtual address, the MMU looks in
the TLB for the referenced page.
- If the page table entry for this page is found in the TLB, the physical address is obtained
immediately. If there is a miss in the TLB, then the required entry is obtained from the page
table in the main memory and the TLB is updated.
Single level Page table
Size of single level page table
-Size of an entry=20bits =2.5 bytes
-Number of entries=2^20=1million
-Total ->2.5MB
-For 200 Processes(running instances of programs)
*We spend 50 MB in saving page tables(not acceptable)
Insight: Most of the virtual address space is empty
-Most programs do not require that much of memory
-They require may be 100MBs/200MBs(most of time)
Single level page table
Two level page table
Two level page table
We have two level of page tables
-Primary and secondary page tables
-Not all the entries of the primary page table point to valid secondary page
tables.
-Each secondary page table->1024*2.5B=2.5KB
Maps 4MB of virtual memory
Insight: Allocates only many secondary page tables as required.
-We do not need many secondary page tables due to spatial locality in
program.
-Example: If a program uses 100MB of virtual memory and needs 25
secondary page tables, we need a total of 2.5KB*25=62.5KB of space for
saving secondary page tables.
Numericals
Q.1. Consider a single level paging scheme with a TLB. Assume no page fault occurs. It
takes 20 ns to search the TLB and 100 ns to access the physical memory. If TLB hit ratio is
80%, the effective memory access time is _______ msec.

Q.2. Consider a two level paging scheme with a TLB. Assume no page fault occurs. It takes
20 ns to search the TLB and 100 ns to access the physical memory. If TLB hit ratio is 80%,
the effective memory access time is _______ msec.
Sol.1. Given-
I. Number of levels of page table = 1
TLB access time = 20 ns
Main memory access time = 100 ns
TLB Hit ratio = 80% = 0.8
II.TLB Miss ratio
= 1 – TLB Hit ratio

= 1 – 0.8

= 0.2
Calculating Effective Access Time-
Substituting values in the above formula, we get-
Effective Access Time
= 0.8 x { 20 ns + 100 ns } + 0.2 x { 20 ns + (1+1) x 100 ns }
= 0.8 x 120 ns + 0.2 + 220 ns
= 96 ns + 44 ns
= 140 ns
Thus, effective memory access time = 140 ns.
Sol.2.
Given-
Number of levels of page table = 2
TLB access time = 20 ns
Main memory access time = 100 ns
TLB Hit ratio = 80% = 0.8
Calculating TLB Miss Ratio-
TLB Miss ratio
= 1 – TLB Hit ratio
= 1 – 0.8
= 0.2
Calculating Effective Access Time-
Substituting values in the above formula, we get-
Effective Access Time
= 0.8 x { 20 ns + 100 ns } + 0.2 x { 20 ns + (2+1) x 100 ns }
= 0.8 x 120 ns + 0.2 + 320 ns
= 96 ns + 64 ns
= 160 ns
Thus, effective memory access time = 160 ns.
Paging
Q.1. Consider a machine with 64 MB physical memory and a 32 bit virtual
address space. If the page size is 4 KB, what is the approximate size of
the page table?
Find No. of pages-
No. of frames
No. of entries in page table
Size of page table.
Memory Interleaving
--Interleaved memory implements the concept of accessing more words in single memory
access cycle
-Memory can be partitioned into N separate modules.
-Then N accesses can be done simultaneously.
-Dividing main memory in more than two modules
-Compensate for relative slow speed of DRAM
-Increase bandwidth-by allowing simultaneous access to more than one chunk of memory
-improves performance-processor can transfer more amount of data to and from memory
-divides system memory into multiple blocks
-Two methods-2-way interleaving and 4-way interleaving
-In 2-way interleaving 2 memory blocks accessed at same time for reading and writing
operation
-In 4-way interleaving 4 memory blocks accessed at same time
Memory Interleaving
183 References

You might also like