Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
4 views

Module-4-23CS302.ppt

The document discusses Direct Memory Access (DMA), a method allowing data transfer between I/O devices and main memory without CPU intervention, using a DMA controller that manages the transfer process. It explains the operation modes of DMA, including cycle stealing and block transfer, and describes bus arbitration methods for managing access to the memory bus. Additionally, it covers cache memory concepts, including locality of reference, cache hit/miss scenarios, and various mapping functions to optimize memory access performance.

Uploaded by

nandanhs.146
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Module-4-23CS302.ppt

The document discusses Direct Memory Access (DMA), a method allowing data transfer between I/O devices and main memory without CPU intervention, using a DMA controller that manages the transfer process. It explains the operation modes of DMA, including cycle stealing and block transfer, and describes bus arbitration methods for managing access to the memory bus. Additionally, it covers cache memory concepts, including locality of reference, cache hit/miss scenarios, and various mapping functions to optimize memory access performance.

Uploaded by

nandanhs.146
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 69

Direct Memory Access

Direct Memory Access


⚫ Direct Memory Access (DMA):
⚫ A special control unit may be provided to transfer a block of data
directly between an I/O device and the main memory, without
continuous intervention by the processor.
⚫ Control unit which performs these transfers is a part of the I/O device’s
interface circuit. This control unit is called as a DMA controller.
⚫ DMA controller performs functions that would be normally carried out
by the processor:
⚫ For each word, it provides the memory address and all the control
signals.
⚫ To transfer a block of data, it increments the memory addresses and
keeps track of the number of transfers.
Direct Memory Access (contd..)
⚫ However, the operation of the DMA controller must be under the
control of a program executed by the processor. That is, the
processor must initiate the DMA transfer.
⚫To initiate the DMA transfer, the processor informs the DMA
controller of:
⚫ Starting address,
⚫ Number of words in the block.
⚫ Direction of transfer (I/O device to the memory, or memory to the
I/O device).
⚫Once the DMA controller completes the DMA transfer, it
informs the processor by raising an interrupt signal.
Registers in DMA interface
Status and control
31 30 1 0

IRQ IE R/W done

Starting address

Word count
Direct Memory Access

Mai
Process n
memor
or
y
System bus

Disk/DMA DMA Keyboard


controlle controlle Printe
r r r

Dis Dis Network


k k Interface
DMA Controller

• A hardwired controller called the DMA controller can enable direct


data transfer between I/O device (e.g. disk) and memory without CPU
intervention.
– No need to execute instructions to carry out data transfer.
– Maximum data transfer speed will be determined by the rate with
which memory read and write operaions can be carried out.
– Much faster than programmed I/O.
⚫ Processor and DMA controllers have to use the bus in an
interleaving fashion to access the memory.
⚫ DMA devices are given higher priority than the processor to access
the bus.
⚫ Among different DMA devices, high priority is given to high-speed
peripherals such as a disk or a graphics display device.
⚫ Processor originates most memory access cycles on the bus.
⚫ DMA controller can be said to “steal” memory access cycles from
the processor. This interweaving technique is called as “cycle
stealing”.
⚫ An alternate approach is the provide a DMA controller an exclusive
capability to initiate transfers on the bus, and hence exclusive access to
the main memory. This is known as the block or burst mode.
• DMA transfer can take place in two modes:
• a) DMA cycle stealing
• • The DMA controller requests for the for a few cycles 1 or 2.
• • Preferably when the CPU is not using memory.
• • DMA controller is said to steal cycles from the CPU without the
CPU knowing it.
• b) DMA block transfer
• • The DMA controller transfers the whole block of data without
interrupion.
Bus arbitration
⚫ The device that is allowed to initiate transfers on the bus at any
given time is called the bus master.
⚫ When the current bus master relinquishes its status as the bus
master, another device can acquire this status.
⚫ The process by which the next device to become the bus master is selected
and bus mastership is transferred to it is called bus arbitration.
⚫ Centralized arbitration:
⚫ A single bus arbiter performs the arbitration.
⚫ Distributed arbitration:
⚫ All devices participate in the selection of the next bus master.
Centralized Bus Arbitration(cont.,)
• Bus arbiter may be the processor or a separate unit connected to the bus.

• DMA controller requests the control of the bus by asserting the Bus Request (BR) line.

• In response, the processor activates the Bus-Grant1 (BG1) line, indicating that the
controller may use the bus when it is free.
• BG1 signal is connected to all DMA controllers in a daisy chain fashion.

• BBSY signal is 0, it indicates that the bus is busy. When BBSY becomes 1, the DMA
controller which asserted BR can acquire control of the bus.
Centralized Bus Arbitration

B BS Y

BR

Processo
r

DMA DMA
controller controller
BG1 1 BG2 2
Distributed arbitration
⚫All devices waiting to use the bus share the responsibility of
carrying out the arbitration process.
⚫Each device is assigned a 4-bit ID number.
⚫All the devices are connected using 5 lines, 4 arbitration lines to
transmit the ID, and one line for the Start-Arbitration signal.
⚫To request the bus a device:
⚫ Asserts the Start-Arbitration signal.
⚫ Places its 4-bit ID number on the arbitration lines.

⚫The pattern that appears on the arbitration lines is the logical-


OR of all the 4-bit device IDs placed on the arbitration lines.
Distributed arbitration
Distributed arbitration (contd..)
•Device A has the ID 5 and wants to request the bus:
- Transmits the pattern 0101 on the arbitration lines.
•Device B has the ID 6 and wants to request the bus:
- Transmits the pattern 0110 on the arbitration lines.
•Pattern that appears on the arbitration lines is the logical
OR of the patterns:
- Pattern 0111 appears on the arbitration lines.

Arbitration process:
•Each device compares the pattern that appears on the arbitration lines to
its own
ID, starting with MSB.
•If it detects a difference, it transmits 0s on the arbitration lines for that
and all lower
bit positions.
•Device A compares its ID 5 with a pattern 0101 to pattern 0111.
•It detects a difference at bit position 2, as a result, it transmits a pattern
0100 on the
arbitration lines.
Fundamental Concepts

The Memory System


Some basic concepts
Some basic
concepts(Contd.,)
Memory Access Times: -
It is a useful measure of the speed of the memory unit. It is the time that elapses
between the initiation of an operation and the completion of that operation (for
example, the time between READ and MFC).
Memory Cycle Time :-
It is an important measure of the memory system. It is the minimum time delay
required between the initiations of two successive memory operations (for
example, the time between two successive READ operations). The cycle time is
usually slightly longer than the access time.
Several techniques to increase the
effective size and speed of the memory:
▪ Cache memory (to increase the effective speed).
▪ Virtual memory (to increase the effective size).
Cache Memories
◼Processor is much faster than the main
memory.
▪ As a result, the processor has to spend much of its time waiting
while instructions and data are being fetched from the main
memory.
◼Cache memory is an architectural
arrangement which makes the main
memory appear faster to the processor
than it really is.
◼Cache memory is based on the
property of computer programs known
as “locality of reference”.
Locality of Reference
◼Analysis of programs indicates that
many instructions in localized areas of
a program are executed repeatedly
during some period of time, while the
others are accessed relatively less
frequently.
▪ These instructions may be the ones in a loop, nested loop or few
procedures calling each other repeatedly.
▪ This is called “locality of reference”.
◼Temporal locality of reference:
▪ Recently executed instruction is likely to be executed again very
soon.
◼Spatial locality of reference:
▪ Instructions with addresses close to a recently instruction are likely
to be executed soon.
Cache memories

Mai
Proces Cac memo
n
sor he ry

● Processor issues a Read request, a block of words is transferred


from the main memory to the cache, one word at a time.
● Subsequent references to the data in this block of words are
found in the cache.
● At any given time, only some blocks in the main memory are held
in the cache. Which blocks in the main memory are in the cache
is determined by a “mapping function”.
● When the cache is full, and a block of words needs to be
transferred
from the main memory, some block of words in the cache must
be
replaced. This is determined by a “replacement algorithm”.
Cache hit
● Existence of a cache is transparent to the processor. The
processor issues Read and
Write requests in the same manner.
● If the data is in the cache it is called a Read or Write hit.
● Read hit:
▪ The data is obtained from the cache.

● Write hit:
▪ Cache has a replica of the contents of the main memory.
▪ Contents of the cache and the main memory may be updated
simultaneously. This is the write-through protocol.
▪ Update the contents of the cache, The contents of the main
memory are updated when this block is replaced. This is
write-back or copy-back protocol.
Cache miss
● If the data is not present in the cache, then a Read miss or
Write miss occurs.
● Read miss:
▪ Block of words containing this requested word is transferred
from the memory.
▪ After the block is transferred, the desired word is forwarded to
the processor.
▪ The desired word may also be forwarded to the processor as
soon as it is transferred without waiting for the entire block to
be transferred. This is called load-through or early-restart.

● Write-miss:
▪ Write-through protocol is used, then the contents of the main
memory are updated directly.
▪ If write-back protocol is used, the block containing the
addressed word is first brought into the cache. The desired
word
is overwritten with new information.
Mapping functions

◼Mapping functions determine how


memory blocks are placed in the
cache.
◼A simple processor example:
▪ Cache consisting of 128 blocks of 16 words each.
▪ Total size of cache is 2048 (2K) words.
▪ Main memory is addressable by a 16-bit address.
▪ Main memory has 64K words.
▪ Main memory has 4K blocks of 16 words each.
◼Three mapping functions:
▪ Direct mapping
▪ Associative mapping
▪ Set-associative mapping.
Direct mapping
Mai Block 0 •Block j of the main memory maps to j
memor
n
y
Block 1
modulo 128 of
Cach
ta e the cache. 0 maps to 0, 129 maps to 1.
g Block •More than one memory block is mapped
0
ta
Block onto the same
g
1 position in the cache.
Block 127
•May lead to contention for cache blocks
Block 128 even if the
ta
Block Block 129
cache is not full.
g •Resolve the contention by allowing new
127
block to
replace the old block, leading to a trivial
replacement
Tag Block 255
Block Word algorithm.
5 7 4 Block 256 •Memory address is divided into three fields:
- Low order 4 bits determine one of the 16
Main memory Block 257
address
words in a block.
- When a new block is brought into the
cache,
the the next 7 bits determine which
Block 4095
cache
Associative mapping
Mai Block 0
memor
n
y •Main memory block can be placed into
Cach Block 1
ta e any cache
g Block position.
0
ta
Block •Memory address is divided into two
g
1 fields:
Block 127
- Low order 4 bits identify the word
Block 128 within a block.
ta
Block Block 129
- High order 12 bits or tag bits identify a
g memory
127
block when it is resident in the cache.
•Flexible, and uses cache space
efficiently.
Word
Block 255 •Replacement algorithms can be used to
Tag
12 4 Block 256 replace an
existing block in the cache when the
Main memory Block 257
address
cache is full.
•Cost is higher than direct-mapped cache
because of
the need to search all 128 patterns to
Block 4095 determine
Performance considerations
The Memory System
Performance considerations
Performance considerations

◼ A key design objective of a computer system is to


achieve the best possible performance at the
lowest possible cost.
▪ Price/performance ratio is a common measure of
success.
◼ Performance of a processor depends on:
▪ How fast machine instructions can be brought into the
processor for execution.
▪ How fast the instructions can be executed.
Interleaving
◼Divides the memory system into a
number of memory modules. Each module has
its own address buffer register (ABR) and data buffer register (DBR).
◼Arranges addressing so that successive
words in the address space are placed
in different modules.
◼When requests for memory access
involve consecutive addresses, the
access will be to different modules.
◼ Since parallel access to these modules
is possible, the average rate of fetching
words from the Main Memory can be
increased.
Methods of address layouts
k m
bits bits m k
Modul Address in MM bit bit
e module address Address in module Modul MM
s s
e address

AB DB AB DB AB DB AB DB AB DB AB DB
R R R R R R R R R R R R
Modul Modul Modul Modul Modul Modul
k
e 0 e i en- 1 e 0 e i e2 - 1

◼ Consecutive words are •Consecutive words are located


placed in a module.
◼ High-order k bits of a in consecutive modules.
memory address determine •Consecutive addresses can be
the module. located in consecutive
◼ Low-order m bits of a
memory address determine modules.
the word within a module. •While transferring a block of
◼ When a block of words is data, several memory modules
transferred from main
memory to cache, only one can be kept busy at the same
module is busy at a time. time.
What happens on a write?
◼Cache Write Strategies
◼1. Write Through / Store Through
◼2. Write Back / Copy Back

1.Write Through / Store Through


Write Through Contd.
◼ Information is written to both the cache
block and the main memory block.
◼ Features: – Easier to implement
◼ – Read misses do not result in writes to the
lower level (i.e. MM).
◼ – The lower level (i.e. MM) has the most
updated version of the data
◼ – important for I/O operations and
multiprocessor systems
◼ – A write buffer is often used to reduce CPU
write stall time while data is written to main
memory.
2. Write Back Strategy
◼ Information is written only to the cache block.
◼ • A modified cache block is written to MM only
when it is replaced.
◼ • Features:
◼ – Writes occur at the speed of cache memory.
◼ – Multiple writes to a cache block requires only one
write to MM.
◼ • Write-back cache blocks can be clean or dirty.
◼ – A status bit called dirty bit or modified bit is
associated with each cache block, which indicates
whether the block was modified in the cache (0:
clean, 1: dirty).
◼ – If the status is clean, the block is not written back
to
Write Back Strategy
Hit Rate and Miss Penalty

◼ Hit rate
◼ Miss penalty
◼ Hit rate can be improved by increasing block
size, while keeping cache size constant
◼ Block sizes that are neither very small nor very
large give best results.
◼ Miss penalty can be reduced if load-through
approach is used when loading new blocks into
cache.
Other Performance
Enhancements
Write buffer
◼ Write-through:
● Each write operation involves writing to the main memory.
● If the processor has to wait for the write operation to be
complete, it slows down the processor.
● Processor does not depend on the results of the write
operation.
● Write buffer can be included for temporary storage of write
requests.
● Processor places each write request into the buffer and
continues execution.
● If a subsequent Read request references data which is still in
the write buffer, then this data is referenced in the write buffer.
◼ Write-back:
● Block is written back to the main memory when it is replaced.
● If the processor waits for this write to complete, before reading
the new block, it is slowed down.
● Fast write buffer can hold the block to be written, and the new
block can be read first.
Other Performance
Enhancements (Contd.,)
Prefetching
● New data are brought into the processor when they
are first needed.
● Processor has to wait before the data transfer is
complete.
● Prefetch the data into the cache before they are
actually needed, or a before a Read miss occurs.
● Prefetching can be accomplished through software
by including a special instruction in the machine
language of the processor.
▪ Inclusion of prefetch instructions increases the
length of the programs.
● Prefetching can also be accomplished using
hardware:
▪ Circuitry that attempts to discover patterns in
memory references and then prefetches according
to this pattern.
Arithmetic
Multiplication
Multiplication of unsigned numbers

Product of 2 n-bit numbers is at most a 2n-bit number.


Unsigned multiplication can be viewed as addition of shifted
versions of the multiplicand.
Multiplication of unsigned numbers
(contd..)
add the partial products at each stage.
Rules to implement multiplication are:
If the ith bit of the multiplier is 1, shift the multiplicand and add the
shifted multiplicand to the current value of the partial product.
Hand over the partial product to the next stage
Value of the partial product at the start stage is 0.
Combinatorial array multiplier
Combinatorial array multiplier

Multiplicand
0 m3 0 m2 0 m1 0 m0
(PP0)
q0
0
PP1 p0
q1
0

r
ie
PP2

pl
p1

ti
ul
q2

M
0
PP3 p2
q3
0
,
p7 p6 p5 p4 p3

Product is: p7,p6,..p0

Multiplicand is shifted by displacing it through an array of adders.


Multiplication of unsigned numbers
Typical multiplication cell

Bit of incoming partial product (PPi)


jth multiplicand bit

ith multiplier bit ith multiplier bit

carry out FA carry in

Bit of outgoing partial product (PP(i+1))


Combinatorial array multiplier (contd..)
• Combinatorial array multipliers are:
– Extremely inefficient.
– Have a high gate count for multiplying numbers of practical size such as 32-
bit or 64-bit numbers.
– Perform only one function, namely, unsigned integer product.

• Improve gate efficiency by using a mixture of


combinatorial array techniques and
sequential techniques.
Sequential multiplication
• Recall the rule for generating partial products:
– If the ith bit of the multiplier is 1, add the appropriately shifted multiplicand
to the current partial product.
– Multiplicand has been shifted left when added to the partial product.

• However, adding a left-shifted multiplicand to


an unshifted partial product is equivalent to
adding an unshifted multiplicand to a right-
shifted partial product.
Sequential Circuit Multiplier
Register A (initially 0)

Shift right

C a a q q
n - 1 0 n- 1 0

Multiplier Q
Add/Noadd
control

n-bit
Adder
MUX Control
sequencer

0 0

m m0
n - 1

Multiplicand M
Sequential multiplication (contd..)
M
1 1 0 1
Initial configuration
0 0 0 0 0 1 0 1 1
C A Q
0 1 1 0 1 1 0 1 1 Add
Shift First cycle
0 0 1 1 0 1 1 0 1

1 0 0 1 1 1 1 0 1 Add
Shift Second cycle
0 1 0 0 1 1 1 1 0

0 1 0 0 1 1 1 1 0 No add
Shift Third cycle
0 0 1 0 0 1 1 1 1

1 0 0 0 1 1 1 1 1 Add
Shift Fourth cycle
0 1 0 0 0 1 1 1 1

Product
Signed Multiplication
Signed Multiplication

• Considering 2’s-complement signed operands, what will happen to (-


13)(+11) if following the same method of unsigned multiplication?

1 0 0 1 1 ( - 13)
0 1 0 1 1 ( + 11)

1 1 1 1 1 1 0 0 1 1

1 1 1 1 1 0 0 1 1
Sign extension is
shown in blue 0 0 0 0 0 0 0 0

1 1 1 0 0 1 1

0 0 0 0 0 0

1 1 0 1 1 1 0 0 0 1 ( - 143)

Sign extension of negative multiplicand.


Signed Multiplication- Booth Algorithm
• A technique that works equally well for both
negative and positive multiplier
•In the Booth scheme, -1 times the shifted multiplicand
is selected when moving from 0 to 1, and +1 times the
shifted multiplicand is selected when moving from 1 to
0, as the multiplier is scanned from right to left.

0 0 1 0 1 1 0 0 1 1 1 0 1 0 1 1 0 0

0 +1 -1 +1 0 - 1 0 +1 0 0 - 1 +1 - 1 + 1 0 - 1 0 0

Booth recoding of a multiplier.


Booth Algorithm

0 1 1 0 1 ( + 13) 0 1 1 0 1
X1 1 0 1 0 (- 6) 0 - 1 +1 - 1 0
0 0 0 0 0 0 0 0 0 0
1 1 1 1 1 0 0 1 1
0 0 0 0 1 1 0 1
1 1 1 0 0 1 1
0 0 0 0 0 0
1 1 1 0 1 1 0 0 1 0 ( - 78)

Booth multiplication with a negative multiplier.


Booth Algorithm

Multiplier
V ersion of multiplicand
selected by biti
Bit i Bit i - 1

0 0 0 X M
0 1 +1 X M
1 0 1 X M
1 1 0 X M

Booth multiplier recoding table.


FAST MULTIPLICATION
FAST MULTIPLICATION-Example
Integer Division
Manual Division
21 10101
13 274 1101 100010010
26 1101
14 10000
13 1101
1 1110
1101
1

Longhand division examples.


Longhand Division Steps
• Position the divisor appropriately with
respect to the dividend and performs a
subtraction.
• If the remainder is zero or positive, a quotient
bit of 1 is determined, the remainder is
extended by another bit of the dividend, the
divisor is repositioned, and another
subtraction is performed.
• If the remainder is negative, a quotient bit of
0 is determined, the dividend is restored by
Circuit Arrangement
Shift left

an an-1 a0 qn-1 q0
Dividend Q
A Quotient
Setting

N+1 bit Add/Subtract


adder
Control
Sequencer

0 mn-1 m0

Divisor M

Figure 6.21. Circuit arrangement for binary division.


Restoring Division
• Shift A and Q left one binary position
• Subtract M from A, and place the answer
back in A
• If the sign of A is 1, set q0 to 0 and add M back
to A (restore A); otherwise, set q0 to 1
• Repeat these steps n times
Examples
Initially 0 0 0 0 0 1 0 0 0
0 0 0 1 1
Shift 0 0 0 0 1 0 0 0
Subtract 1 1 1 0 1 First cycle
Set q0 1 1 1 1 0
Restore 1 1
0 0 0 0 1 0 0 0 0
1 0 Shift 0 0 0 1 0 0 0 0
1 1 1 0 0 0 Subtract 1 1 1 0 1
1 1 Set q0 1 1 1 1 1 Second cycle
Restore 1 1
1 0 0 0 0 1 0 0 0 0 0
Shift 0 0 1 0 0 0 0 0
Subtract 1 1 1 0 1
Set q0 0 0 0 0 1 Third cycle

Shift 0 0 0 1 0 0 0 0 1
Subtract 1 1 1 0 1 0 0 1
Set q0 1 1 1 1 1 Fourth cycle
Restore 1 1
0 0 0 1 0 0 0 1 0

Remainder Quotient

Figure 6.22. A restoring-division example.


Nonrestoring Division
• Avoid the need for restoring A after an
unsuccessful subtraction.
• Any idea?
• Step 1: (Repeat n times)
 If the sign of A is 0, shift A and Q left one bit position and
subtract M from A; otherwise, shift A and Q left and add M
to A.
 Now, if the sign of A is 0, set q0 to 1; otherwise, set q0 to 0.
• Step2: If the sign of A is 1, add M to A
Examples
Initially 0 0 0 0 0 1 0 0 0
0 0 0 1 1
Shift 0 0 0 0 1 0 0 0 First cycle
Subtract 1 1 1 0 1
Set q 0 1 1 1 1 0 0 0 0 0

Shift 1 1 1 0 0 0 0 0
Add 0 0 0 1 1 Second cycle

Set q 1 1 1 1 1 0 0 0 0
0

Shift 1 1 1 1 0 0 0 0
1 1 1 1 1 Add 0 0 0 1 1 Third cycle
Restore
0 0 0 1 1 Set q 0 0 0 0 1 0 0 0 1
remainder 0
Add 0 0 0 1 0
Remainder Shift 0 0 0 1 0 0 0 1
Subtract 1 1 1 0 1 Fourth cycle
Set q 1 1 1 1 1 0 0 1 0
0

Quotient
A nonrestoring-division example.
IEEE standard for floating point numbers
IEEE standard for floating point numbers
IEEE standard for floating point numbers
IEEE standard for floating point numbers
IEEE standard for floating point numbers
Example

You might also like