Memory Organization Lecture
Memory Organization Lecture
and Architecture
Unit 3 : Memory Organization
Prepared by :
Prof. Khushbu Chauhan
Computer Engg. Dept.
MPSTME, NMIMS
Outlines
• Introduction: Internal Memory- Memory characteristics and memory hierarchy.
• Cache Memory- Elements of cache design
• Address mapping and Translation-Direct mapping
• Address mapping and translation- Associative mapping
• Address mapping and translation - Set associative mapping
• Performance characteristics of two level memory
• Semiconductor main memory- Types of RAM, DRAM and SRAM
• Chip logic
• Memory module organization
• High speed memories- Associative memory
• High speed memories- Interleaved memory.
Introduction : Computer Memory
memory at a time.
• Internal
– Usually governed by data bus width
• External
– Usually a block which is much larger than a word
• Addressable unit
– Smallest location which can be uniquely addressed
– Word internally
– Cluster on M$ disks
Access Methods
• Sequential
– Start at the beginning and read through in order
– Access time depends on location of data and previous location
– e.g. tape
• Direct
– Individual blocks have unique address
– Access is by jumping to vicinity plus sequential search
– Access time depends on location and previous location
– e.g. disk
• Random
– Individual addresses identify locations exactly
– Access time is independent of location or previous access
– e.g. RAM
• Associative
– Data is located by a comparison with contents of a portion of the
store
– Access time is independent of location or previous access
– e.g. cache
Performance
•The performance of the memory system is determined using three parameters:
I. Access time: In case of random access memory, it is the time taken by memory to
complete read/ write operation from the instant that an address is sent to the memory.
On the other hand, for nonrandom access memory, access time is the time it takes
to position the read-write mechanism at the desired location.
II. Memory cycle time: This term is used only with random access memory and it is
defined as access time plus additional time required before a second access can commence.
III. Transfer rate: It is defined as the rate at which data can be transferred into or out
of a memory unit.
Physical Types
• Semiconductor
– RAM
• Magnetic
– Disk & Tape
• Optical
– CD & DVD
• Others
– Bubble
– Hologram
Physical Types
Physical Characteristics
• Two most common physical types used today are semiconductor memory and
magnetic surface memory.
•Physical characteristics :-
• Very fast memory system can be achieved if SRAM chips are used.
• These chips are expensive and hence it is impracticable to build a large main
memory using SRAM chips.
• The only alternative is to use DRAM chips for large main memories.
• Processor fetches the code and data from the main memory to execute the
program.
• The DRAMs which form the main memory are slower devices.
work with only small sections of code and data at a particular time.
• In the memory system small section of SRAM is added along with main memory,
referred to as cache memory.
• The program (code) and data that work at a particular time is usually accessed from
the cache memory.
• This is accomplished by loading the active part of code and data from main memory
and cache memory.
• The cache controller looks after the swapping between main memory and cache
memory with the help of DMA controller.
• The cache memory just discussed is called secondary cache.
• Recent processors have the built-in cache memory called primary cache.
• DRAMs along with cache allow main memories in the range of tens of
megabytes to be implemented at a reasonable cost, size and better speed
performance.
• But the size of memory is still small compared to the demands of large
programs with voluminous data.
memory combined with the large memory size of less expensive, lower-speed memory.
• The cache contains a copy of portions of main memory. When the processor attempts to read a
word of memory, a check is made to determine if the word is in the cache. If so, the word is
delivered to the processor. If not, a block of main memory, consisting of some fixed number of
words, is read into the cache and then the word is delivered to the processor. Because of the
phenomenon of locality of reference, when a block of data is fetched into the cache to satisfy a
single memory reference, it is likely that there will be future references to that same memory
• The larger the cache, the larger the number of gates involved in
addressing the cache. The result is that large caches tend to be slightly
slower than small ones—even when built with the same integrated
circuit technology and put in the same place on chip and circuit board.
• The available chip and board area also limits cache size. Because the
performance of the cache is very sensitive to the nature of the
workload, it is impossible to arrive at a single “optimum” cache size.
Cache Sizes of Some Processors
Mapping Function
• Because there are fewer cache lines than main memory blocks, an algorithm is needed for mapping main
memory blocks into cache lines. Further, a means is needed for determining which main memory block
currently occupies a cache line. The choice of the mapping function dictates how the cache is organized.
• Cache of 64kByte
• 24 bit address
– (224=16M)
Direct Mapping
• The MSBs are split into a cache line field r and a tag of s-r (most
significant)
Direct Mapping
• Step-01
• Every multiplexer reads the given line number from the physical address generated using the select lines in
parallel.
• In order to read the line number of a total of L bits, the total number of select lines that every multiplexer
has should be equal to L.
• Step-02
• Once the reading of the line number is done, all the multiplexers go to the line corresponding in the cache
memory with the help of its parallel input lines.
• The total number of input lines every multiplexer has = The total number of lines present in the memory of
the cache.
• Step-03
• A multiplexer outputs a tag bit that it selects from that to the comparator with the help of its output line.
• The total number of output lines present in each multiplexer should be equal to 1.
Direct Mapping Address Structure
• 24 bit address
• 2 bit word identifier (4 byte block)
• 22 bit block identifier
– 8 bit tag (=22-14)
– 14 bit slot or line
• No two blocks in the same line have the same Tag field
• Check contents of cache by finding line and checking Tag
Direct Mapping from Main Memory to Cache
Direct Mapping Cache Line Table
Direct Mapping Cache Organization
Direct Mapping Example
Note that no two blocks that map into the same line
number have the same tag number. Thus, blocks with
starting addresses 000000, 010000, …, FF0000 have
tag numbers 00, 01, …, FF, respectively
Direct Mapping pros & cons
• Simple
• Inexpensive
• Fixed location for given block
– If a program accesses 2 blocks that map to the same line repeatedly,
cache misses are very high
Associative Mapping
• A main memory block can load into any
line of cache
and word
• Once the cache has been filled, when a new block is brought into the cache,
one of the existing blocks must be replaced. For direct mapping, there is
only one possible line for any particular block, and no choice is possible. For
the associative and set- associative techniques, a replacement algorithm is
needed.
• To achieve high speed, such an algorithm must be implemented in
hardware. A number of algorithms have been tried. We mention four of the
most common. Probably the most effective is:
• Least recently used (LRU): Replace that block in the set that has been in the cache longest with no
reference to it. For two- way set associative, this is easily implemented. Each line includes a USE
bit. When a line is referenced, its USE bit is set to 1 and the USE bit of the other line in that set is
set to 0.
• When a block is to be read into the set, the line whose USE bit is 0 is used. Because we are
assuming that more recently used memory locations are more likely to be referenced, LRU should
give the best hit ratio. LRU is also relatively easy to implement for a fully associative cache. The
cache mechanism maintains a separate list of indexes to all the lines in the cache. When a line is
referenced, it moves to the front of the list. For replacement, the line at the back of the list is
used. Because of its simplicity of implementation, LRU is the most popular replacement algorithm.
• first-in-first-out(FIFO): Replace that block in the set that has been in
the cache longest. FIFO is easily implemented as a round-robin or
circular buffer technique. Still another possibility is least frequently used
(LFU): Replace that block in the set that has experienced the fewest
references. LFU could be implemented by associating a counter with
each line. A technique not based on usage (i.e., not LRU, LFU, FIFO, or
some variant) is to pick a line at random from among the candidate
lines. Simulation studies have shown that random replacement provides
only slightly inferior performance to an algorithm based on usage.
Write Policy
• When a block that is resident in the cache is to be replaced, there are two
cases to consider. If the old block in the cache has not been altered, then it
may be overwritten with a new block without first writing out the old block.
• If at least one write operation has been performed on a word in that line of
the cache, then main memory must be updated by writing the line of cache
out to the block of memory before bringing in the new block. A variety of
write policies, with performance and economic trade-offs, is possible. There
are two problems to contend with. First, more than one device may have
access to main memory.
Write through
• Using this technique, all write operations are made to main memory as well as to the cache, ensuring
that main memory is always valid. Any other processor– cache module can monitor traffic to main
memory to maintain consistency within its own cache. The main disadvantage of this technique is
• Multiple CPUs can monitor main memory traffic to keep local (to CPU) cache up to date
• Lots of traffic
• Larger blocks
– Reduce number of blocks that fit in cache
– Data overwritten shortly after being fetched
– Each additional word is less local so less likely to be needed
– Resulting in L3 cache
• Bus access or now on chip…
Virtual Memory
• In most modern computers, the physical main memory is not as large as the address space spanned by an
address issued by the processor.
• Here, the virtual memory technique is used to extend the apparent size of the physical memory.
• It uses secondary storage such as disks, to extend the apparent size of the physical memory.
• When a program does not completely fit into the main memory, it is divided into segments.
• The segments which are currently being executed are kept in the main memory and remaining segments are
stored in the secondary storage devices, such as a magnetic disk.
• If an executing program needs a segment which is not currently in the main memory, the required
segment is copied from the secondary storage device.
• When new segment of a program is to be copied into a main memory, it must replace another segment
already in the memory.
• In modern computers, the operating system moves program and data automatically between the main memory and
secondary storage.
• Techniques that automatically swaps program and data blocks between main memory and secondary storage device are
• The address that processor issues to access either instruction or data are called virtual or logical address.
• If a virtual address refers to a part of the program or data space that is currently in the main memory, then the contents of
• On the other hand, if the reference address is not in the main memory, its contents must be brought into a suitable location in
• The addresses, a program may use to refer to a memory, are distinguished from the addresses the memory system uses to identify physical storage
sites, and program-generated addresses are translated automatically to the corresponding machine addresses.
• The size of virtual storage is limited by the addressing scheme of the computer system and the amount of secondary memory is available not by the actual
• It is a technique that is implemented using both hardware and software. It maps memory addresses used by a program, called virtual addresses, into
• All memory references within a process are logical addresses that are dynamically translated into physical addresses at run time. This means that a
process can be swapped in and out of the main memory such that it occupies different places in the main memory at different times during the course of
execution.
• A process may be broken into a number of pieces and these pieces need not be continuously located in the main memory during execution. The
combination of dynamic run-time address translation and use of page or segment table permits this.
• If these characteristics are present then, it is not necessary that all the pages or segments are present in the main memory during execution. This means
that the required pages need to be loaded into memory whenever required.
Semiconductor Main Memory
• The basic element of a semiconductor memory is the memory
cell.
• Main memory consists of DRAMs supported with SRAM cache.
• These are semiconductor memories.
• The semiconductor memories are classified as shown in fig
Semiconductor Memory Types
DRAM
• RAM technology is divided into two technologies: Dynamic and Static.
• A Dynamic RAM (DRAM) is made with cells that store data as charge on
capacitors. The presence or absence of charge in a capacitor is interpreted
as a binary 1 or 0. Because capacitors have a natural tendency to discharge,
dynamic RAMs require periodic charge refreshing to maintain data storage.
• The term dynamic refers to this tendency of the stored charge to leak away,
even with power continuously applied
DRAM
• Figure is a typical DRAM structure for an
individual cell that stores one bit. The address
line is activated when the bit value from this
cell is to be read or written. The transistor acts
as a switch that is closed (allowing current to
flow) if a voltage is applied to the address line
and open (no current flows) if no voltage is
present on the address line.
DRAM
• For the write operation, a voltage signal is applied to the bit line; a high voltage represents 1, and a low
voltage represents 0. A signal is then applied to the address line, allowing a charge to be transferred to the
capacitor.
• For the read operation, when the address line is selected, the transistor turns on and the charge stored on
the capacitor is fed out onto a bit line and to a sense amplifier. The sense amplifier compares the capacitor
voltage to a reference value and determines if the cell contains a logic 1 or a logic 0. The readout from the
cell discharges the capacitor, which must be restored to complete the operation.
• Although the DRAM cell is used to store a single bit (0 or 1), it is essentially an analog device. The
capacitor can store any charge value within a range; a threshold value determines whether the charge is
interpreted as 1 or 0.
SRAM
• static RAM (SRAM) is a digital device that uses the same logic
(T1, T2, T3, T4) are cross connected in an arrangement that produces
low; in this state, T1 and T4 are off and T2 and T3 are on.1 In logic
state 0, point C1 is low and point C2 is high; in this state, T1 and T4 are
on and T2 and T3 are off. Both states are stable as long as the direct
to retain data
SRAM versus DRAM
• Both static and dynamic RAMs are volatile; that is, power must be continuously supplied to the
memory to preserve the bit values. A dynamic memory cell is simpler and smaller than a static
memory cell. Thus, a DRAM is more dense (smaller cells = more cells per unit area) and less
expensive than a corresponding SRAM.
• On the other hand, a DRAM requires the supporting refresh circuitry. For larger memories, the fixed
cost of the refresh circuitry is more than compensated for by the smaller variable cost of DRAM cells.
Thus, DRAMs tend to be favored for large memory requirements.
• A final point is that SRAMs are somewhat faster than DRAMs. Because of these relative
characteristics, SRAM is used for cache memory (both on and off chip), and DRAM is used for main
memory.
ROM
• ROM (Read Only Memory) It is a read only memory. We can’t write data in this memory.
• It is non-volatile memory i.e. it can hold data even if power is turned off. Generally, ROM is used to store
the binary codes for the sequence of instructions you want the computer to carry out, and data, such as
look up tables.
• This is because this type of information does not change. It is important to note that although we give the
name RAM to static and dynamic read/write memory devices that does not mean that the ROMs that we
are using are also not random access devices.
• In fact, most ROMs are accessed randomly with unique addresses. There are four types of ROM : Masked
ROM, PROM, EPROM and EEPROM.
Chip Logic
• For semiconductor memories, one of the key design issues is the number of bits of
data that may be read/written at a time. At one extreme is an organization in which the
physical arrangement of cells in the array is the same as the logical arrangement of
words in memory. The array is organized into W words of B bits each.
• For example, a 16-Mbit chip could be organized as 1M 16-bit words. At the other
extreme is the so-called 1-bit-per-chip organization, in which data are read/written one
bit at a time. We will illustrate memory chip organization with a DRAM; ROM
organization is similar, though simpler
Typical 16-Mbit DRAM (4M * 4)
• Figure shows a typical organization of a 16-
names or number to identify the location of the named or numbered object within a memory space.
• For example, an account number may be searched in a file to determine the holder’s name and account
status. To search an object, the number of accesses to memory depends on the location of the object and
• The time required to find an object stored in memory can be reduced considerably if objects are selected
• A memory unit accessed by the content is called an associative memory or content addressable memory
(CAM). This type of memory is accessed simultaneously and in parallel on the basis of data content rather
argument word. The words that match with the word stored in the argument
words whose corresponding bits in the match register have been set.
• Only those bits in the argument register having 1’s in their corresponding position of
• For example, if argument register. A and the key register K have the bit configuration
shown below. Only the three rightmost bits of A are compared with memory words
• Main memory is composed of a collection of DRAM memory chips. A number of chips can
be grouped together to form a memory bank. It is possible to organize the memory banks
in a way known as interleaved memory. Each bank is independently able to service a
memory read or write request, so that a system with K banks can service K requests
simultaneously, increasing memory read or write rates by a factor of K.
• If consecutive words of memory are stored in different banks, then the transfer of a block
of memory is speeded up.
256-KByte Memory
Organization
• A 256-KByte memory organization can be built
using memory chips, address lines, and data
lines. The memory chips can be organized into
memory banks, and the address lines specify which
addresses to access.
• Memory chips
• Address lines
• Data lines
Why do we use Memory Interleaving?
• Whenever Processor requests Data from the main memory. A block (chunk) of Data is
Transferred to the cache and then to Processor.
• So whenever a cache miss occurs the Data is to be fetched from the main memory. But
main memory is relatively slower than the cache. So to improve the access time of the
main memory interleaving is used.
• We can access all four Modules at the same time thus achieving Parallelism. It is a
technique for compensating the relatively slow speed of DRAM(Dynamic RAM)
• In this technique, the main memory is divided into memory banks which can be accessed individually without any
• For example: If we have 4 memory banks(4-way Interleaved memory), with each containing 256 bytes, then, the Block
Oriented scheme(no interleaving), will assign virtual address 0 to 255 to the first bank, 256 to 511 to the second bank.
But in Interleaved memory, virtual address 0 will be with the first bank, 1 with the second memory bank, 2 with the
third bank and 3 with the fourth, and then 4 with the first memory bank again.
• Hence, CPU can access alternate sections immediately without waiting for memory to be cached. There are multiple
• Memory interleaving is a technique for increasing memory speed. It is a process that makes the system more efficient,