Unit 5

UNIT V
MEMORY AND I/O SYSTEMS 9
Memory hierarchy - Memory technologies Cache basics Measuring and improving

cache performance - Virtual memory, TLBs - Input/output system, programmed I/O,
DMA and interrupts, I/O processors.
MEMORY HIERARCHY
The fastest access is to data held in processor registers.

Therefore, if we consider the registers to be part of the memory hierarchy, then
the processor registers are at the top in terms of speed of access.
At the next level of the hierarchy is a relatively small amount of memory that
can be implemented directly on the processor chip.
This memory, called a processor cache, holds copies of the instructions and data
stored in a much larger memory that is provided externally.
There are often two or more levels of cache. A primary cache is always located
on the processor chip.
This cache is small and its access time is comparable to that of processor
registers. The primary cache is referred to as the level 1 (L1) cache.
A larger, and hence somewhat slower, secondary cache is placed between the
primary cache and the rest of the memory. It is referred to as the level 2 (L2)
cache. Often, the L2 cache is also housed on the processor chip.
Some computers have a level 3 (L3) cache of even larger size, in addition to the
L1 and L2 caches. An L3 cache, also implemented in SRAM technology, may
or may not be on the same chip with the processor and the L1 and L2 caches.
The next level in the hierarchy is the main memory. This is a large memory
implemented using dynamic memory components, typically assembled in
memory modules such as DIMMs.
The main memory is much larger but significantly slower than cache memories.
In a computer with a processor clock of 2 GHz or higher, the access time for the
main memory can be as much as 100 times longer than the access time for the
L1 cache.
Disk devices provide a very large amount of inexpensive memory, and they are
widely used as secondary storage in computer systems.
They are very slow compared to the main memory. They represent the bottom
level in the memory hierarchy.
CACHE MEMORIES
The speed of the main memory is very low in comparison with the speed of
modern processors.
For good performance, the processor cannot spend much of its time waiting to
access instructions and data in main memory.

Hence, it is important to devise a scheme that reduces the time needed to
access the necessary information.

Since the speed of the main memory unit is limited by electronic and
packaging constraints, the solution must be sought in a different architectural
arrangement.
An efficient solution is to use a fast cache memory which essentially makes
the main memory appear to the processor to be faster than it really is.
The effectiveness of cache mechanism is based on the property of computer
programs called Locality of reference.
Locality of Reference:
Many instructions in the localized areas of the program are executed repeatedly
during some time period and remainder of the program is accessed relatively
infrequently.
It manifests itself in 2 ways.They are,
Temporal(The recently executed instruction are likely to be executed again
very soon.)
Spatial (The instructions in close proximity to recently executed instruction
are also likely to be executed soon.

If the active segment of the program is placed in cache memory, then the total
execution time can be reduced significantly.

The term Block refers to the set of contiguous address locations of some size. The
cache line is used to refer to the cache block.
Use of Cache Memory
When a Read request is received from the processor, the contents of a block of
memory words containing the location specified are transferred into the cache
one word at a time.

Subsequently, when the program references any of the locations in this block, the
desired contents are read directly from the cache.

The Cache memory stores a reasonable number of blocks at a given time but this
number is small compared to the total number of blocks available in Main
Memory.
The correspondence between main memory block and the block in cache memory is
specified by a mapping function.
The Cache control hardware decide that which block should be removed to create
space for the new block that contains the referenced word.
The collection of rule for making this decision is called the replacement
algorithm.
The cache control circuit determines whether the requested word currently exists
in the cache.
If it exists, then Read/Write operation will take place on appropriate cache
location. In this case Read/Write hit will occur.

In a Read operation, the memory will not involve.
The write operation is proceed in 2 ways.They are,
o
Write-through protocol
Write-back protocol
Write-through protocol:
Here the cache location and the main memory locations are updated
simultaneously.
Write-back protocol:
This technique is to update only the cache location and to mark it as with
associated flag bit called dirty/modified bit.

The word in the main memory will be updated later,when the block containing
this marked word is to be removed from the cache to make room for a new block.
If the requested word currently not exists in the cache during read operation, then
read miss will occur.

To overcome the read miss Load through / Early restart protocol is used.
Read Miss:
The block of words that contains the requested word is copied from the main
memory into cache.
Load through:
After the entire block is loaded into cache, the particular word requested
is forwarded to the processor.

If the requested word not exists in the cache during write operation, then
Write Miss will occur.

If Write through protocol is used, the information is written directly into
main memory.
If Write back protocol is used then block containing the addressed word is first
brought into the cache and then the desired word in the cache is over-written
with the new information.
Mapping Function:
Consider a cache consisting of 128 blocks of 16 words each, for a total of

2048(2K)words, and assume that the main memory is addressable by a 16-bit
address.
The main memory has 64K words, which we will view as 4K blocks of 16 words
each.
Direct Mapping:
It is the simplest technique in which block j of the main memory maps onto block
j modulo 128 of the cache.

Thus whenever one of the main memory blocks 0,128,256 is loaded in the cache,
it is stored in block 0.
Block 1,129,257 are stored in cache block 1 and so on.
The contention may arise when,
When the cache is full
When more than one memory block is mapped onto a given cache
block position.
The contention is resolved by allowing the new blocks to overwrite the currently
resident block.
Placement of block in the cache is determined from memory address.
Direct Mapped Cache

The memory address is divided into 3 fields. They are,
Low Order 4 bit field(word)Selects one of 16 words in a block.

7 bit cache block fieldWhen new block enters cache,7 bit determines the cache
position in which this block must be stored.

5 bit Tag fieldThe high order 5 bits of the memory address of the block is
stored in 5 tag bits associated with its location in the cache.

As execution proceeds, the high order 5 bits of the address is compared with tag
bits associated with that cache location.

If they match, then the desired word is in that block of the cache.
If there is no match, then the block containing the required word must be first read
from the main memory and loaded into the cache.
Merit:
It is easy to implement.
Demerit:
It is not very flexible.
Associative Mapping:
In this method, the main memory block can be placed into any cache block
position.
12 tag bits are required to identify a memory block when it is resident in the
cache.
The tag bits of an address received from the processor are compared to the tag bits
of each block of the cache to see if the desired block is present. This is called
associative mapping.
Associative Mapped Cache

It gives complete freedom in choosing the cache location.
A new block that has to be brought into the cache has to replace(eject)an existing
block if the cache is full.

In this method, the memory has to determine whether a given block is in the
cache. A search of this kind is called an associative Search.
Merit:
It is more flexible than direct mapping technique.
Demerit:
Its cost is high.
Set-Associative Mapping:
It is the combination of direct and associative mapping.

The blocks of the cache are grouped into sets and the mapping allows a block of
the main memory to reside in any block of the specific set.
In this case, the cache has two blocks per set, so the memory blocks
0,64,128..4032 maps into cache set 0, and they can occupy either of the two
block position within the set.

6 bit set field Determines which set of cache contains the desired block.
6 bit tag field The tag field of the address is compared to the tags of the two
blocks of the set to clock if the desired block is present.
Set-Associative Mapping
The cache which contains 1 block per set is called direct Mapping.
A cache that has k blocks per set is called as k-way set associative cache.
Each block contains a control bit called a valid bit.
The Valid bit indicates that whether the block contains valid data.
The dirty bit indicates that whether the block has been modified during its
cache residency.
Valid bit=0 When power is initially applied to system
Valid bit =1 When the block is loaded from main memory at first time. If the
main memory block is updated by a source & if the block in the source is
already exists in the cache, then the valid bit will be cleared to 0.
If Processor & DMA uses the same copies of data then it is called as the Cache
Coherence Problem.
Merit:
The Contention problem of direct mapping is solved by having few choices for
block placement.
The hardware cost is decreased by reducing the size of associative search.
Replacement Algorithm:
In direct mapping, the position of each block is pre-determined and there is no
need of replacement strategy.

In associative & set associative method, the block position is not pre-determined;
ie..when the cache is full and if new blocks are brought into the cache, then the
cache controller must decide which of the old blocks has to be replaced.
Therefore, when a block is to be over-written, it is sensible to over-write the one
that has gone the longest time without being referenced. This block is called Least
recently Used(LRU) block & the technique is called LRU algorithm.

The cache controller track the references to all blocks with the help of block
counter.
Eg:
Consider 4 blocks/set in set associative cache, 2 bit counter can be used for each
block.
When a hit occurs, then block counter=0;The counter with values
originally lower than the referenced one are incremented by 1 & all others
remain unchanged.
When a miss occurs and the set is not full, the counter associated with the
new block loaded from the main memory is set to 0, and the values of all
other counters are increased by one.

When a miss occurs & if the set is full, the blocks with the counter value 3
is removed, the new block is put in its place & its counter is set to 0 and other
block counters are incremented by 1.

The LRU algorithm has been used extensively. Although it performs well , it
can lead to poor performances in some cases. Example: array, which is too
large to fit into the cache.
Merit:
The performance of LRU algorithm is improved by randomness in deciding which
block is to be over-written.
Several other replacement algorithms are also used in practice. A reasonable rule would
be to remove the oldest block from a full set when a new block must be brought in.
Many replacement algorithms didnt follow the access of cache, it is generally not as
effective as the LRU algorithm.
The simplest algorithm is to randomly choose the block to be overwritten. Interestingly
enough, this simple algorithm has been found to be quite effective in practice.
PERFORMANCE CONSIDERATION(Improving cache performance):
Two Key factors in the commercial success are the performance & cost. The
objective is the best possible performance at lower cost.

The challenge in design alternative is to improve the performance without
increasing the cost.

A common measure of success is called the Pricel Performance ratio.
Performance depends on how fast the machine instruction are brought to the
processor and how fast they are executed.

The memory hierarchy shows the best price/performance ratio. Each level of
hierarchy plays an important role.

The speed and efficiency of data transfer between various levels of hierarchy are
having great significance. Both is not possible if , both the slow and the fast units
are accessed in the same manner, but can be achieved by the parallelism in the
organization of the slower unit.

To achieve parallelism(ie. Both the slow and fast units are accessed in the same
manner),interleaving is used.
Interleaving:
If the main memory of a computer is structured as a collection of physically separate
modules, each with its own address buffer register(ABR) and data buffer
register(DBR), memory access operations may proceed in more than one module at
the same time.
Thus, the aggregate rate of transmission of words to and from the main memory system
can be increased.
How individual addresses are distributed over the modules is critical. Two methods of
address layout are there.
In the first case, the memory address generated by the processor is decoded. The high
order k bits name one of n modules, and the low-order m bits name a particular word
in that module.
When consecutive locations are accessed, when a block of data is transferred to a
cache, only one module is involved. At the same time, devices with direct memory
access(DMA) ability may be accessing information in other memory modules.
The second and more effective way to address the modules is called memory
interleaving.
The low order k bits of the memory address select a module, and the high order m bits
name a location within the module.
In this way, consecutive addresses are located in successive modules. Thus, any
component of the system that generates request for access to consecutive memory
locations can keep several modules busy at any one time.
This results in faster access to a block of data and higher average utilization of the
memory system as a whole.
Interleaving is used within SDRAM chips to improve the speed of accessing
successive words of data.
Hit rate and miss penalty:
The effective implementation of the memory hierarchy is the success rate in accessing
information at various levels of the hierarchy.
The successful access to data in a cache is called a hit. The number of hits are stated as
a fraction of all attempted accesses is called the hit rate, and the miss rate is the
number of misses stated as a fraction of attempted accesses.
Ideally, the entire memory hierarchy would appear to the processor as a single memory
unit that has the access time of a cache on the processor chip and the size of a
magnetic disk.
How close, ideal it is, depends largely on the hit rate at different levels of the hierarchy.
High hit rates are essential for high-performance computers, wil over 0.9.
Performance is affected by the miss. The extra time needed to bring the desired
information into the cache is called the miss penalty. This penalty make the processor
to stall.
In general, the miss penalty is the time needed to bring a block of data from a slower
unit to a faster unit. The miss penalty is reduced, if efficient mechanisms are
implemented.
How can the hit rate be improved? An obvious possibility is to make the cache larger,
but with increased cost. Another possibility is to increase the block size while
keeping the total cache size constant to take advantage of spatial locality.
If all items in a larger block are needed in a computation, then it is better to load these
items into the cache as a consequence of a single miss, rather than loading several
smaller blocks as a result of several misses. The efficiency of parallel access to
blocks in an interleaved memory is the basic reason for this advantages.
The performance of a computer is affected positively by increased hit rate and
negatively by increased miss penalty, the block sizes that are neither very small nor
very large give the best results.
In practice, block sizes in the range of 16 to 128 bytes have been the most popular
choices.
Note: that the miss penalty can be reduced if the load-through approach is used when
loading new blocks into the cache.
Caches on the processor chip:
The space on the processor chip is needed for many other functions, this limits
the size of the cache that can be accommodated.

All high- performance processor chips include some form of a cache. Some
manufacturers have chosen to implement two separate caches, one for
instructions and another for data.

Example: 68404, Pentium 4 processors. Others have implement a single cache
for both instructions and data.

Example: ARM710T processor.
A combined cache provides better bit rate and it offers greater flexibility in
mapping new information into the cache.

However, if separate caches are used, it is possible to access both caches at the
same time, which leads to increased parallelism, and hence, better performance.
Disadvantages: increase parallelism comes at the expense of more complex
circuitry.
In high performance processors two levels of caches are normally used. The L1
cache is on the processor chip. The L2 cache, which is much larger, may be
implemented externally.
If both L1 and L2 caches are used, the L1 cache should be designed to do fast
access. A practical way to speed up access to the cache is to access more than
one word simultaneously and let the processor use them one at a time. This is
used in many commercial processors.

A workstation may include an L1 cache with the capacity of tens of kilobytes
and an L2 cache of several megabytes.

The average access time experienced by the processor in a system with two
levels of caches is
t ave = h1C1 + (1-h1)h2C2+(1-h1)(1-h2)M
Where h1 is the hit rate in the L1 cache
h2 is the hit rate in the L2 cache
C1 is the time to access information in the L1 cache
C2 is the time to access information in the L2 cache
M is the time to access information in the main memory.
Other enhancements:
Several other possibilities exist for enhancing performance.
Write Buffer:
When the write-through protocol is used, each write operation results in writing a
new value into the main memory

If the processor must wait for the memory function to be completed, then it is
slowed down by all write requests.

It is not necessary for the processor to wait for the write request to be completed.
To improve performance, a write buffer can be included for temporary storage of
write requests.
The processor places each write request into this buffer and continues execution of
the next instruction.

The write requests stored in the write buffer are sent to the main memory when it is
not having any read operation.

The write buffer may hold a number of write requests.
A different situation occurs with the write-back protocol.

The processor will have to wait longer for the new block to be read into the cache.
A fast write buffer is used for temporary storage of the dirty block that is ejected
from the cache while the new block is being read.

Afterward, the contents of the buffer are written into the main memory. Thus, the
write buffer also works well for the write back protocol.
Prefetching
The new data are brought into the cache when they are first needed. A read miss
occurs, and the desired data are loaded from the main memory. The processor has
to pause until the new data arrive.

To avoid stalling the processor, it is possible to prefetch the data into the cache
before they are needed. The simplest way to do this is through software.
A special prefetch instruction may be provided in the instruction set of the
processor. Executing this instruction causes the addressed data to be loaded into
the cache, in the case of a read miss.

Prefetch instructions can be inserted into a program either by the programmer or
by the compiler. Note that software prefetching entails a certain overhead because
inclusion of prefetch instructions increases the length of programs.

However, the overall effect of software prefetching on performance is positive and
it is supported by machine instructions of many processors.

Prefetching can also be done through hardware. This involves adding circuitry that
attempts to discover a pattern in memory references and then prefetches data
according to that.
Example: Intels Pentium 4 processor has facilities for prefetching information
into its caches using both software and hardware approaches. There are special
prefetch instructions that can be included in programs to bring a block of data into
a desired level of cache.
Lookup-free cache:
A cache that can support multiple outstanding misses is called lookup-free. Since
it can service only one miss at a time, it must include circuitry that keep tracks of
all outstanding misses. This may be done with special registers that hold the
pertinent information about these misses.

A processor that uses a pipelined organization, which overlaps the execution of
several instructions, a read miss caused by one instruction could stall the
execution of other instructions.
VIRTUAL MEMORY:
In most modern computer systems, the physical main memory is not as large
as the address space spanned by an address issued by the processor. For
example, a processor that issues 32-bit address has an addressable space of
4G bytes.
Techniques that automatically move program and data blocks into the
physical main memory when they are required for execution is called the
Virtual Memory.
The binary address that the processor issues either for instruction or data
are called the virtual / Logical address.

The virtual address is translated into physical address by a combination of
hardware and software components. This kind of address translation is done
by MMU(Memory Management Unit).

A special hardware unit, called the memory management unit(MMU),
translates virtual addresses into physical addresses.

When the desired data are in the main memory, these data are fetched
/accessed immediately.
If the data are not in the main memory, the MMU causes the Operating system to
bring the data into memory from the disk.

Transfer of data between disk and main memory is performed using DMA scheme.
Virtual Memory Organization

Address Translation:
In address translation, all programs and data are composed of fixed length units
called Pages.
The Page consists of a block of words that occupy contiguous locations in the
main memory.
The pages are commonly range from 2K to 16K bytes in length.
The cache bridge speed up the gap between main memory and secondary storage and
it is implemented in software techniques.
Each virtual address generated by the processor contains virtual Page
number(Low order bit) and offset(High order bit)
Virtual Page number+ Offset Specifies the location of a particular byte (or word)
within a page.
Page Table:
It contains the information about the main memory address where the page is
stored & the current status of the page.
Page Frame:
An area in the main memory that holds one page is called the page frame.
Page Table Base Register:
It contains the starting address of the page table.
Virtual Page Number + Page Table Base register Gives the address of the
corresponding entry in the page table.ie)it gives the starting address of the page if that
page currently resides in memory.
Control Bits in Page Table:
The Control bits specifies the status of the page while it is in main memory.
Function:
The control bit indicates the validity of the page ie)it checks whether the page is
actually loaded in the main memory.

It also indicates that whether the page has been modified during its residency in
the memory; this information is needed to determine whether the page should be
written back to the disk before it is removed from the main memory to make room
for another page.
Virtual Memory Address Translation
The Page table information is used by MMU for every read & write access.
The Page table is placed in the main memory but a copy of the small portion of
the page table is located within MMU.
Use of Associative Mapped TLB
This small portion or small cache is called Translation Look Aside Buffer
(TLB). This portion consists of the page table entries that corresponds to the
TLB will invalidate the corresponding entry in the TLB.

Given a virtual address, the MMU looks in TLB for the referenced page.
If the page table entry for this page is found in TLB, the physical address is
obtained immediately.
If there is a miss in TLB, then the required entry is obtained from the page table in
most recently accessed pages and also contains the virtual address of the entry.
When the operating system changes the contents of page table ,the control bit in
the main memory & TLB is updated.

When a program generates an access request to a page that is not in the main
memory, then Page Fault will occur.

The whole page must be brought from disk into memory before an access
can proceed.
When it detects a page fault, the MMU asks the operating system to generate an
interrupt.
The operating System suspend the execution of the task that caused the page fault
and begin execution of another task whose pages are in main memory because the
long delay occurs while page transfer takes place.

When the task resumes, either the interrupted instruction must continue from the
point of interruption or the instruction must be restarted.
If a new page is brought from the disk when the main memory is full, it must
replace one of the resident pages. In that case, it uses LRU algorithm which removes
the least referenced Page.

A modified page has to be written back to the disk before it is removed from the
main memory. In that case, write through protocol is used.
Accessing I/O devices

A simple arrangement to connect I/O devices to a computer is to use a single bus structure. It
consists of three sets of lines to carry
Address
Data
Control Signals.
Each I/O device is assigned a unique set of addresses.
When the processor places a particular address on address lines, the devices that
recognize this address responds to the command issued on the control lines.
The processor request either a read or write operation and the requested data are
transferred over the data lines.

When I/O devices & memory share the same address space, the arrangement is called
memory mapped I/O.

With memory mapped I/O, any machine instruction that can access memory can be
used to transfer data to or from an I/O devices. For example, if DATAIN is the
address of the input buffer associated with the keyboard.

Most computer systems use memory mapped I/O. Some processors have special In
and Out instructions to perform I/O transfers.

For example, processors in the Intel family have special I/O instructions and a
separate 16-bit address space for I/O devices.
Address Decoder:
It enables the device to recognize its address when the address appears on address lines.
Data register
It holds the data being transferred to or from the processor.
Status register It contains information relevant to the operation of the I/O devices.
Both the data and status registers are connected to the data bus and assigned unique
address.
The address decoder, data & status registers and the control circuitry required to coordinate I/O transfers constitute the devices interface circuit.
For an input device, SIN status flag in used SIN = 1, when a character is entered at the
keyboard.
For an output device, SOUT status flag is used SIN = 0, once the char is read by
processor.
Hence, by checking the SIN flag, the software can ensure that it is always reading valid
data.
This is often accomplished in a program loop that repeatedly reads the status register and
checks the state of sin.
Registers in keyboard and display interfaces
DIR Q Interrupt Request for display.

KIR Q Interrupt Request for keyboard.
KEN
keyboard enable.
DEN
Display Enable.
SIN, SOUT status flags.

The data from the keyboard are made available in the DATAIN register & the data sent
to the display are stored in DATAOUT register.
EXPLANATION:
This program, reads a line of characters from the keyboard & stores it in a memory
buffer starting at locations LINE.
Then it calls the subroutine PROCESS to process the input line.
As each character is read, it is echoed back to the display.
Register Ro is used as a updated using Auto increment mode so that successive
characters are stored in successive memory location.
Each character is checked to see if there is carriage return (CR), char, which has the
ASCII code 0D(hex).
If it is, a line feed character (on) is sent to more the cursor one line down on the display
& subroutine PROCESS is called. Otherwise, the program loops back to wait for another
character from the keyboard.
PROGRAM CONTROLLED I/O
Here the processor repeatedly checks a status flag to achieve the required
synchronization between Processor & I/O device.(ie) the processor polls the device.
Programmed Input/ Output, a method for controlling Input/ Output operations, which is
included in most computers.
It is particularly useful in small low- speed systems where hardware costs must be
minimized.
It requires that all input/ output operators be executed under the direct control of the
CPU, i.e., every data- transfer operation involving an input/ output device requires the
execution of an instruction by the CPU.
Typically, the data transfer is between CPU registers and a buffer register connected to
the input/ output device.
The input/ output device does not have direct access to main memory.
A data transfer from input/ output device to main memory requires the execution of
several instructions by the CPU, including an input instruction to transfer a word from
the input/ output device to the CPU and a store instruction to transfer a word from CPU
to main memory.
1 or 2 additional instructions may be needed for address computation and data word
counting.
Direct memory access is a technique used for high-speed I/O devices. It involves having
the device interface transfer data directly to or from the memory, without continuous
involvement by the processor.
Input/ Output Addressing:
In systems with programmed Input/ output, in devices, main memory and the CPU
normally communicate via a common shared bus.
The address lines of that bus which are used to select main- memory locations can also
be used to select input/ output devices.
Each junction between the system bus and input/ output device is called input/ output
port and is assigned a unique address.
The input/ output port includes a data buffer registers, thus making if little different from
a main memory location with respect to the CPU.
Memory mapped Input/ Output:
Some machines, such as Motorola 68000 microprocessor series, is to assign part of the
main memory address space to input/ output ports. This is called Memory- mapped
input/ output.
A memory referenced instructions that causes data to be fetched from or stored at address
X automatically becomes an input/ output instruction if X is made the address of an I/O
port.
The memory load and store instruction are used to transfer a word of data to or from an
I/O port; no special I/O instructions are needed.
The control lines READ and WRITE, which are activated by the CPU on decoding a memory
reference instructions, are used to initiate either a memory access cycle or an I/O transfer.
I/O mapped I/O:
In I/O mapped I/O technique, the memory and I/O address space are kept separate a
memory reference instruction activation the READ M or WRITE M control line does
not affect the I/O devices.
Separate I/O instructions are required to activate the READ IO and WRITE IO lines,
which causes a word to be transferred between the addressed IO port and the CPU.
An IO address and a main- memory location may have the same address. This scheme is
used, for example, in the Intel 8085 and 8086 microprocessor series.
IO Instructions:
Programmed IO can be implemented by a few as 2 IO instructions.
For example, the Intel 8085 has 2 main IO instructions. The instruction IN X causes a
word to be transferred from IO port X to the 8085s accumulator register.
The instruction OUT X transfer a word from the accumulator to IO port X.
When IO instructions IN or OUT is encountered by the CPU, the addressed IO port is
expected to be ready to respond to the instruction. This means that the IO device must
transfer data to or from the data bus within the specified period.
To prevent loss of information or an indefinitely long IO instruction execution time, it is
thus desirable that the CPU know the IO device status, so that a data transfer is carried
out when the device is in the known ready state.
In programmed IO system, the CPU is usually programmed to test the IO device states
because initiating the data transfer.
Status can be specified by a single bit of information that the IO device can make
available on a continuous basis e.g., by setting a flip- flop connected to the data lines at
some IO port.
The determination of IO device status by the CPU requires the following steps.
o Step1:Read the status information.
o Step2: Test the status to determine if the device is ready to begin data transfer.
o Step3:If not ready, return to step 1; otherwise proceed with the data transfer.
Example: Intel 8085 program to read one word from an Input/ Output device.
WAIT:
IN 1
CPI READY
JNZ WAIT
IN 2
Read IO device status into accumulator.

Compare immediate data READY to accumulator.
If Z!= 1 (Input/ Output device not ready) jump to WAIT.
Read data word into accumulator.
If programmed IO is the primary method of controlling IO device in a computer addition IO

instructions may be provided argument the simple IN and OUT instructions.
DIRECT MEMORY ACCESS
Data are transferred by executing instructions such as,

Move DATAIN, R0
An instruction to transfer input or output data is executed only after the processor
determines that the I/O device is ready.

To do this , the processor either polls a status flag in the device interface or waits for the
device to send an interrupt request.

A special control unit may be provided to allow the transfer of large block of data at high
speed directly between the external device and main memory, without continous
intervention by the processor. This approach is called DMA.

DMA transfers are performed by a control circuit called the DMA Controller. To initiate
the transfer of a block of words , the processor sends,
Starting address
Number of words in the block
Direction of transfer.
When a block of data is transferred , the DMA controller increment the memory address
for successive words and keep track of number of words and it also informs the
processor by raising an interrupt signal.

While a DMA transfer is taking place, the program requested the transfer cannot
continue and the processor can be used to execute another program.

After DMA transfer is completed, the processor returns to the program that requested the
transfer.
Registes in a DMA Controller

R/W Determines the direction of transfer .When
R/W =1, DMA controller read data from memory to I/O device. R/W =0, DMA controller perform
write operation.
Done Flag=1, the controller has completed transferring a block of data and isready to receive
another command.
Bit 30 is the Interrupt-enable flag, IE.
IE=1, it causes the controller to raise an interrupt (interrupt Enabled) after it hascompleted
transferring the block of data.
IRQ=1, it indicates that the controller has requested an interrupt.
Use of DMA controllers in a computer system
A DMA controller connects a high speed network to the computer bus. The disk
controller two disks, also has DMA capability and it provides two DMA channels.
To start a DMA transfer of a block of data from main memory to one of the disks, the
program write s the address and the word count inf. Into the registers of the
corresponding channel of the disk controller.

It also provides the disk controller with information to identify the data for future
retrieval.
The DMA controller proceeds independently to implement the specified operation.
When DMA transfer is completed, it will be recorded in status and control registers of
the DMA channel (ie) Done bit=IRQ=IE=1.
Cycle Stealing:
Requests by DMA devices for using the bus are having higher priority than processor requests.
Top priority is given to high speed peripherals such as ,
Disk
High speed Network Interface and Graphics display device.

Since the processor originates most memory access cycles, the DMA controller can be
said to steal the memory cycles from the processor.
This interviewing technique is called Cycle stealing.
Burst Mode:
The DMA controller may be given exclusive access to the main memory to transfer a
block of data without interruption. This is known as Burst/Block Mode
Bus Arbitration:
Bus Master:
The device that is allowed to initiate data transfers on the bus at any given time is called
the bus master.

When the current master relinquishes control of the bus, another device can acquire this
status.
It is the process by which the next device to become the bus master is selected and the
bus mastership is transferred to it.
Types:
There are 2 approaches to bus arbitration. They are,
Centralized arbitration (A single bus arbiter performs arbitration)

Distributed arbitration (all devices participate in the selection of next bus master).
Centralized Arbitration:
The bus arbiter may be the processor or a separate unit connected to the bus.
Here the processor is the bus master and it may grants bus mastership to one of its DMA
controller.
A DMA controller indicates that it needs to become the bus master by activating the Bus
Request line (BR) which is an open drain line.

The signal on BR is the logical OR of the bus request from all devices connected to it.
When BR is activated the processor activates the Bus Grant Signal (BGI) and indicated
the DMA controller that they may use the bus when it becomes free. This signal is
connected to all devices using a daisy chain arrangement.

If DMA controller 1 requests the bus, it blocks the propagation of Grant Signal to other
devices and it indicates to all devices that it is using the bus by activating open collector
line, Bus Busy (BBSY).
A simple arrangement for bus arbitration using a daisy chain
The timing diagram shows the sequence of events for the devices connected to the
processor is shown.
DMA controller 2 requests and acquires bus mastership and later releases the bus. During
its tenure as bus master, it may perform one or more data transfer.
After it releases the bus, the processor resources bus mastership.
Sequence of signals during transfer of bus mastership for the devices
The timing diagram shows the sequence of events for the devices connected to the
processor is shown.
DMA controller 2 requests and acquires bus mastership and later releases the bus. During
its tenure as bus master, it may perform one or more data transfer operations, depending
on whether it is operating in the cycle stealing or block mode.

After it releases the bus, the processor resources bus mastership
Distributed Arbitration:
It means that all devices waiting to use the bus have equal responsibility in carrying out
the arbitration process, without using a central arbiter.
A distributed arbitration scheme
Each device on the bus is assigned a 4-bit identification number.

When one or more devices request the bus, they assert the Start-Arbitration signal & place
their 4 bit ID number on four open collector lines, ARB0 to ARB3.

A winner is selected as a result of the interaction among the signals transmitted over
these lines by all contenders.

The net outcome is that the code on the four lines represents the request that has the
highest ID number.
The drivers are of open collector type. Hence, if the i/p to one driver is equal to 1, and
the i/p to another driver connected to the same bus line is equal to 0(ie. bus the is in lowvoltage state).
Eg:
Assume two devices A & B have their ID 5 (0101), 6(0110) and their code is 0111.
Each devices compares the pattern on the arbitration line to its own ID starting from
MSB.
If it detects a difference at any bit position, it disables the drivers at that bit position and
for all lower-order bits. It does this by placing 0 at the i/p of these drivers.
In our eg. A detects a difference in line ARB1, hence it disables the drivers on lines
ARB1 & ARB0.
This causes the pattern on the arbitration line to change to 0110 which means that B has
won the contention.

Decentralized arbitration has the advantage of offering higher reliability, because
operation of the bus is not dependent on any single device.

Many schemes have been proposed and used in practice to implement distributed
arbitration.
INTERRUPTS
The program enters a wait loop in which it repeatedly tests the device status. During this
period, the processor will not performing any useful computation.
There are many situations where other tasks can be performed while waiting for an I/O
device to become ready. To allow this to happen, we can arrange for the I/O device to
alert the processor when it becomes ready.
It can do so by sending a hardware signal called an interrupt to the processor. At least
one of the bus control lines, called an interrupt request line.
The Interrupt request line will send a hardware signal called the interrupt signal to the
processor.
On receiving this signal, the processor will perform the useful function during the
waiting period.
Consider a task that requires some computations to be performed and the results to be
printed on a line printer.
This is followed by more computations and output, and so on.
Let the program consist of two routines, COMPUTE and PRINT.
Assume that COMPUTE produces a set of n lines of output, to be printed by the PRINT
routine.
Transfer of control through the use of interrupts

The routine executed in response to an interrupt request is called Interrupt Service
Routine.
The interrupt resembles the subroutine calls.
The processor first completes the execution of instruction i Then it loads the PC(Program
Counter) with the address of the first instruction of the ISR. After the execution of ISR,
the processor has to come back to instruction i + 1.
Therefore, when an interrupt occurs, the current contents of PC which point to i +1 is put
in temporary storage in a known location.
A return from interrupt instruction at the end of ISR reloads the PC from that temporary
storage location, causing the execution to resume at instruction i+1. When the processor
is handling the interrupts, it must inform the device that its request has been recognized
so that it remove its interrupt requests signal.
This may be accomplished by a special control signal called the interruptacknowledge
signal.
The task of saving and restoring the information can be done automatically by the
processor.
The processor saves only the contents of program counter & status register (ie) it
saves only the minimal amount of information to maintain the integrity of the program
execution.
Saving registers also increases the delay between the time an interrupt request is received
and the start of the execution of the ISR. This delay is called theInterrupt Latency.
Generally, the long interrupt latency in unacceptable.
The concept of interrupts is used in Operating System and in Control Applications,
where processing of certain routines must be accurately timed relative to external events.
This application is also called as real-time processing.
Interrupt Hardware:
A single interrupt request line may be used to serve ndevices. All devices are connected
to the line via switches to ground.
To request an interrupt, a device closes its associated switch, the voltage on INTR line
drops to 0(zero).
If all the interrupt request signals (INTR1 to INTRn) are inactive, all switches are open
and the voltage on INTR line is equal to Vdd. This is the inactive state of line
When a device requests an interrupts by closing its switch, the voltage on the line drops
to 0, causing the interrupt request signal, INTR, received by the processor to go to 1.
Since the closing of one or more switches will cause the line voltage to drop 0, the value
of INTR is the logical OR of the requests from individual devices. That is,
(ie) INTR = INTR1++INTRn
An equivalent circuit for an open drain bus used to implement a common interrupt request line
INTR It is used to name the INTR signal on common line it is active in the lowvoltage state.
Open collector (bipolar circuit) or Open drain (MOS circuits) is used to drive
INTRline.
The Output of the Open collector (or) Open drain control is equal to a switch to the
ground that is open when gates input is in 0 state and closed when the gates input is in1
state.
Resistor Ris called a pull-up resistor because it pulls the line voltage upto the high
voltage state when the switches are open.
Enabling and Disabling Interrupts:
The arrival of an interrupt request from an external device causes the processor to
suspend the execution of one program & start the execution of another.
Becauseinterrupts can arrive at any time, they may alter the sequence of events from that
envisaged by the programmer.
Hence, the interruption of program execution must be carefully controlled.
A fundamental facility found in all computers is the ability to enable and disable such
interruptions as desired.
There are many situations in which the processor should ignore interrupt requests. For
example, in the case of Compute-Print program, an interrupt request from the printer
should be accepted only if there are output lines to be printed.
After printing the last line of a set of n lines, interrupts should be disabled until another
set becomes available for printing.
In another case, it may be necessary to guarantee that a particular sequence of
instructions is executed to the end without interruption because the interrupt-service
routine may change some of the data used by the instructions in question.
A simple way is to provide machine instructions, such as Interrupt-enable and Interrupt-
disable, that perform these functions.

INTR is active during the execution of Interrupt Service Routine.
It is essential to ensure that this active request signal does not lead to successive
interruptions, causing the system to enter an infinite loop from which it cannot recover.
Several mechanisms are available to solve this problem.
There are 3 mechanisms to solve the problem of infinite loop which occurs due to
successive interruptions of active INTR signals.
The first possibility is to have the processor hardware ignore the interrupt-request line
until the execution of the first instruction of the interrupt service routine has been
complete.
The second option, which is suitable for a simple processor with only one interrupt
request line, is to have the processor automatically disable interrupts before starting the
execution of the interrupt service routine.
After saving the contents of the PC and the processor status register(PS) on the stack, the
processor performs the equivalent of executing an interrupt-disable instruction. It is often
the case that one bit in the PS register, called interrupt-enable, indicates whether
interrupts are enable. Third option is below explained
Edge-triggered:
The processor has a special interrupt request line for which the interrupt handling circuit
responds only to the leading edge of the signal. Such a line said to be edge-triggered.
The following are the typical scenario.
The device raises an interrupt request.
The processor interrupts the program currently being executed.
Interrupts are disabled by changing the control bits is PS (Processor Status register)
The device is informed that its request has been recognized & in response, it deactivates the
INTR signal.
The action requested by the interrupt is performed by the interrupt service routine.
Interrupts are enabled & execution of the interrupted program is resumed.
Handling Multiple Devices:

Let us now consider the situation where a number of devices capable of initiating
interrupts are connected to the processor. Because these device are optionally
independent, there is no definite order in which they will generate interrupts.
For example, device X may request an interrupt caused by device Y is being serviced, or
several devices may request interrupts at exactly the same time.
When several devices requests interrupt at the same time, it raises some questions. They
are.
How can the processor recognize the device requesting an interrupt?

Given that the different devices are likely to require different ISR, how can the processor
obtain the starting address of the appropriate routines in each case?
Should a device be allowed to interrupt the processor while another interrupt is being
serviced?
How should two or more simultaneous interrupt requests be handled?
These problems are resolved vary from one computer to another, and the approach taken
is an important consideration in determining the computers suitability for a given
application.
Polling Scheme:
When a request is received over the common interrupt request line, additional
information is needed to identify the particular device that activated the line.
If two devices have activated the interrupt request line, the ISR for the selected device
(first device) will be completed & then the second request can be serviced.
The information needed to determine whether a device is requesting an interrupt is
available in its status register. When a device raises an interrupt request, it sets to 1 one
of the bits in its status register, which we will call the IRQ bit.
The simplest way to identify the interrupting device is to have the ISR polls all the
encountered with the IRQ bit set is the device to be serviced.
KIRQ and DIRQ are the interrupt request bits for the keyboard and the display,
respectively.
IRQ (Interrupt Request) -> when a device raises an interrupt requests, the status register
IRQ is set to 1.
Merit:
It is easy to implement.
Demerit:
The time spent for interrogating the IRQ bits of all the devices that may not be requesting any
service.
Vectored Interrupt:
To reduce the time involved in the polling process, a device requesting an interrupt may
identify itself directly to the processor.
Then the processor can immediately start executing the corresponding interrupt service
routine.
The term vectored interrupts refers to all interrupt handling schemes based on this
approach.
A device requesting an interrupt can identify itself by sending a special code to the
processor over the bus.
This enables the processor to identify individual devices even if they share a single
interrupt request line.
The code supplied by the processor indicates the starting address of the ISR for the
device.
The code length ranges from 4 to 8 bits.
The location pointed to by the interrupting device is used to store the staring address to
ISR.
The processor reads this address, called the interrupt vector & loads into PC. The
interrupt vector also includes a new value for the Processor Status Register. When the
processor is ready to receive the interrupt vector code, it activate the interrupt
acknowledge (INTA) line.

In most computers, I/O devices send the interrupt vector code over the data bus, using
the bus control signals to ensure that devices do not interfere with each other.
When a device sends an interrupt request, the processor may not be ready to receive the
interrupt vector code immediately.
When the processor is ready to receive the interrupt vector code, it activate the interrupt
acknowledge (INTA) line.
Interrupt Nesting:
Interrupt-service routines are typically short, and the delay they may cause is acceptable
for most simple devices.
For some devices, however, a long delay in responding to an interrupt request may lead
to erroneous operation. Consider, for example, a computer that keeps track of the time of
day using a real time clock.
This is a device that sends interrupt requests to the processor at regular intervals. For
each of these requests, the processor executes a short interrupt service routine to
increment a set of counters in the memory that keep track of time in seconds.
Multiple Priority Scheme:
In multiple level priority scheme, we assign a priority level to the processor that can be
changed under program control.
The priority level of the processor is the priority of the program that is currently being
executed.
The processor accepts interrupts only from devices that have priorities higher than its
own.
At the time the execution of an ISR for some device is started, the priority of the
processor is raised to that of the device.
The action disables interrupts from devices at the same level of priority or lower.
Privileged Instruction:
The processor priority is usually encoded in a few bits of the Processor Status word. It
can also be changed by program instruction & then it is write into PS. These instructions
are called privileged instruction. This can be executed only when the processor is in
supervisor mode.
The processor is in supervisor mode only when executing OS routines.
It switches to the user mode before beginning to execute application program.
Privileged Exception:
User program cannot accidently or intentionally change the priority of the processor &
disrupts the system operation.

An attempt to execute a privileged instruction while in user mode, leads to a special type
of interrupt called the privileged exception.
Implementation of Interrupt Priority using individual Interrupt request acknowledge lines

Each of the interrupt request line is assigned a different priority level.
Interrupt request received over these lines are sent to a priority arbitration circuit in the
processor.
A request is accepted only if it has a higher priority level than that currently assigned to
the processor,
Simultaneous Requests:
Let us now consider the problem of simultaneous arrivals of interrupt requests from two
or more devices.
The processor must have some means of deciding which request to service first.
The processor simply accepts the request having the highest priority. If several devices
share one interrupt request line, some other mechanism is needed.
Polling the status registers of the I/O devices is the simplest such mechanism. In this
case, priority is determined by the order in which the devices are polled.
Daisy Chain:
When vectored interrupts are used, we must ensure that only one device is selected to
send its interrupts vector code.
A widely used scheme is to connect the devices to form a daisy chain.
The interrupt request line INTR is common to all devices. The interrupt acknowledge
line INTA is connected in a daisy chain fashion such that INTA signal propagates serially
through the devices.
When several devices raise an interrupt request, the INTR is activated & the processor
responds by setting INTA line to 1. this signal is received by device.
Device1 passes the signal on to device2 only if it does not require any service. If
devices1 has a pending request for interrupt blocks that INTA signal & proceeds to put
its identification code on the data lines.
Therefore, the device that is electrically closest to the processor has the highest priority.
Interrupt priority schemes

Merits:
It requires fewer wires than the individual connections.
Arrangement of Priority Groups:
Here the devices are organized in groups & each group is connected at a different
priority level.
Within a group, devices are connected in a daisy chain.
Controlling Device Requests:
It is important to ensure that interrupt requests are generated only by those I/O devices
that are being used by a given program.
KEN
DEN
Keyboard Interrupt Enable

Display Interrupt Enable
KIRQ / DIRQ
Keyboard / Display unit requesting an interrupt.
There are two mechanism for controlling interrupt requests.
At the devices end, an interrupt enable bit in a control register determines whether the
device is allowed to generate an interrupt requests.
At the processor end, either an interrupt enable bit in the PS (Processor Status) or a
priority structure determines whether a given interrupt requests will be accepted.
Initiating the Interrupt Process:
Load the starting address of ISR in location INTVEC (vectored interrupt).

Load the address LINE in a memory location PNTR. The ISR will use this location as a
pointer to store the i/p characters in the memory.
Enable the keyboard interrupts by setting bit 2 in register CONTROL to 1.
Enable interrupts in the processor by setting to 1, the IE bit in the processor status
register PS.
ISR has to perform the following tasks:
Read the input characters from the keyboard input data register. This will cause the
interface circuits to remove its interrupt requests.
Store the characters in a memory location pointed to by PNTR & increment PNTR.
When the end of line is reached, disable keyboard interrupt & inform program main.
Return from interrupt.
Exceptions:
An interrupt is an event that causes the execution of one program to be suspended and
the execution of another program to begin.
The Exception is used to refer to any event that causes an interruption.
Kinds of exception:
Recovery from errors

Debugging
Privileged Exception
Recovery From Errors:

Computers have error-checking code in Main Memory , which allows detection of errors
in the stored data.
If an error occurs, the control hardware detects it informs the processor by raising an
interrupt.
The processor also interrupts the program, if it detects an error or an unusual condition
while executing the instruction of this program.
It suspends the program being executed and starts an exception service routine.
This routine takes appropriate action to recover from the error.
Debugging:
System software has a program called debugger, which helps to find errors in a program.
The debugger uses exceptions to provide two important facilities They are
Trace
Breakpoint
Trace Mode:
When processor is in trace mode , an exception occurs after execution of every instance
using the debugging program as the exception service routine.
The debugging program examine the contents of registers, memory location etc. On
return from the debugging program the next instance in the program being debugged is
executed
The trace exception is disabled during the execution of the debugging program.
Break point:
Here the program being debugged is interrupted only at specific points selected by the
user.
An instruction called the Trap (or) software interrupt is usually provided for this purpose.
While debugging the user may interrupt the program execution after instruction I.
When the program is executed and reaches that point it examine the memory and register
contents.
Privileged Exception:
To protect the OS of a computer from being corrupted by user program certain instance
can be executed only when the processor is in supervisor mode. These are called
privileged exceptions.
When the processor is in user mode, it will not execute instruction. Whenthe processor is
in supervisor mode, it will execute instance.
Use of interrupts in operating systems:
The operating system is responsible for coordinating all activities within a computer.
It makes extensive use of interrupts to perform I/O operations and communicate with and
control the execution of user programs.
The interrupt mechanism enables the OS to assign properties, switch from one user
program to another, implement security and protection features, and coordinate I/O
activities.
The OS incorporates the interrupt-service routines for all devices connected to a
computer.
Application programs do not perform I/O operations themselves. When an application
program needs an input or an output operation, it points to to the data to be transferred
and asks the OS to perform the operation.
The OS suspends the execution of that program temporarily and performs the requested
I/O operation.
When the operation is completed, the OS transfers control back to the application
program. The OS and the application program pass control back and forth using software
interrupts.
In a computer that has both a supervisor and a user mode, the processor switches its
operation to supervisor mode at the time it accepts an interrupt request. It does so by

setting a bit in the processor status register after saving the old contents of that register
on the stack.
Multitasking is a mode of operation in which a processor executes several user programs
at the same time.
A program, together with any information that describes its current state of execution, is
regarded by the OS as an entity called a process.
A process can be in one of three states:Running, Runnable, or Blocked.
I/O DEVICES AND PROCESSORS:
Input/output devices:
Input-output devices are the means by which a computer communicate with the outside
world. A primary function of IO devices is to act as data transducers, i.e., to convert
information from one physical representation to another. Unlike processors, IO devices
do not alter the information content or meaning of the data on which they act.
Since data is transferred and processed with a computer system in the form of digital
electrical signals, input devices transform other forms of information to or digital
electrical signals.
The below table lists some typical IO devices and the information medium they employ.
Note that many of these devices make use of electromechanical technologies; hence their
speed of operation slow compared with process and main memory speeds.
An IO devices can be controller directly by the CPU, but it is often under the immediate
control of a special purpose processor or control unit which directs the flow of
information between the processor and main memory.
IO PROCESSORS:
The IO processor is a logical intention of the IO control methods considered so far. In
systems with programmed IO, peripheral devices are controlled by the CPU.
The DMA concepts extends to the IO devices limited control over data transfers to and
from main memory.

An IOP has the additional ability to execute certain instructions, which give it more
complete control over IO operations.

An IOP, like CPU, is an instruction set processor, but it usually has a more restricted
instruction set than the CPU, IOPs are primarily communications links between IO devices
and main memory hence the use of the term channel for IO. IOPs have also been called
peripheral processing units to emphasize their subsidiary role with respect to the central
processing unit.
IO instructions:
In a computer system with IOPs, the CPU does not normally execute IO data-transfer
instructions.
Such instructions are contained is IO programs stored in main memory and are fetched and
executed by the IOPs.

The CPU executes a small number of IO constructions which allow it to initiate and
terminate the execution of IO programs via the IOP, and also to the states of the IO system.
The IO instructions executed by the IOP are primarily associated with data transfer.
A typical IOP instruction has the form:
Read(Write) block of n words from(to) device X to(from) memory region Y.
The IOP is provided with direct access to main memory(DMA) and so can control the
system bus when that bus is not required by the CPU.

An IOP can execute a sequence of data transfer operations involving different regions of
main memory and different IO devices, without CPU intervention.

Other instruction type such as arithmetic, logical and branch may be included in the IOPs
instruction set to facilitate the calculation of complex address, IO devices priorities etc..
A third category of IO instructions includes those executed by specific IO devices. These
instructions control functions such as REWIND SEEK ADDRESS or PRINT LINE

Instructions are this type are fetched by the IOP and transmitted as data to the appropriate
IO devices.
The CPU supervises IO operations by means of a small set of privileged IO instructions
with the format of above figure (a). The address field specifies a based register B and a
displacement D, which identify both the 10 device to be used and the IOP to which it is
attached.
There are three major instructions of this type.
1. START IO
2. HALT IO
3. TEST IO
START IO is used to initiate an IO operation. It provides the IOP it names with the main
memory addresses of the IO program to be executed by the IOP.

HALT IO causes the IOP to terminate IO program execution.
TEST IO allows the CPU to determine the status of the named IO device and IOP.
The IO instructions executed by IOP are called channel command words(CCWS) and have
the format shown in figure(b).

There are three main type,
Data transfer instructions including input(read), output(write) and sense. These CCWS
causes the number of bytes specified in the data count field to be transferred between the
specified main memory area the previously.

Branch instructions which causes the IOP to fetch the next CCW from the specified
memory address rather than the next sequential locations. This is a simple unconditional
jump within an IO program.

IO device control instructions. These instructions are transmitted directly to the 10 device
and are used to specify functions peculiar that device which do not involve data transfers.
The opcode of data transfer instruction may be transmitted directly to the IO

device as the command byte during a device selection process. If the IO device
requires additional control information it can be supplied with it via an output
data transfer.
The flags field of the CCW is used to modify or extend the operation specified its
opcode. Another flag specifies command changing, which means that the
current CCW is followed by another CCW is to be executed immediately.

If this flag is not set, the IOP causes IO program execution after executing the
current CCW.
IOP Organization:
The IOP and CPU share access to a common main memory M via the system bus.
M contains separate programs for execution by the CPU and the IOP; it also
contains a communication region IOCR used for passing information in the form
of messages between the 2 processors.

The CPU can place there the parameters of an IO task, e.g., the address of the IO
programs to be executed, and the identity of the device to be used.

The CPU and IOP also communicate more directly via special control lines.
Standard DMA or bus grand/ acknowledge lines are used for arbitration of the
system bus between the 2 processors.

The CPU can attract the IOPs attention, e.g., when executing an IO instructions
like START IO or TEST IO, by activating the ATTENTION line.

This typically causes the IOP to being execution of an IOP program whose
specifications have been placed in the IOCR communication area.

In an essentially similar fashion, the IOP attracts the CPUs attention by
activating one or more INTERRUPT REQUEST lines, causing the CPU to
execute an interrupt- service routine that responds to the IOP by, for instance,
defining a new IO program for the IOP to execute.

The figure shows the behavior of a typical IOP for IBM S/360-370 series.
An IO operation began when the CPU encounters a START IO instruction whose
address and an attention in figure. This causes the CPU to transmit the IO device
address and an attention signal to the specified IOP.

The IOP then fetches the channel address word CAW previously placed in
memory location 72, which is a part of the CPU- IOP communication region.
The CAW contains the absolute 24- bit starting address of the IO program to be executed by
the IOP, as well as memory protection key.

The IOP next initializes the specified IO device for the IO operation by carrying out an IO
device selection procedure via a standard communication protocol over the S/360- 370 IO
bus.
This asynchronous IO bus contains over 30 unidirectional lines, including 2 8- bit data
buses designed for transmitting IO device addresses and status information.

The IOP first transmits the (8- bit) of the required IO device, which responds with its
address and a byte of status information.
If the status is valid, the IOP outputs a command byte indicating the IO operations to be
performed, and data transfer can begin.

If the IO device is unavailable for some reason, the IO operation is aborted and an IO
interrupt is generated.
The CPU can maintain direct control over the IO operation by periodically executing TEST
IO.
This causes the IOP to construct a channel status word CSW, which it stores at memory
location 64.
The CPU can then fetch the CSW and examine it. This type of programmed IO is an
inefficient way for the CPU to monitor IO operations, so the IOPs are provided with the
ability to send interrupt requests to the CPU.

The CPU tests for interrupts once during each CPU instruction cycle.
Interrupt requests can be masked by an interrupt mask register which forms part of the
CPUs program status word register PSW.

When the S/360 -370 CPU responds to an IO interrupt, it automatically stores the current
PSW as the old PSW at memory location 56 and fetches a new PSW from location 120.
The new PSW contain a program counter field pointing to the interrupt service routine to be
used.
If it is desired to save the CPU general registers, explicit instructions for this purpose must
be included in the interrupt service program.

The interrupting IOP updates the channel status word at location 64, which provides the
CPU with further information on the source and nature of the interrupt.

Unit 5

Uploaded by

Copyright:

Available Formats

Unit 5

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Unit 5

Uploaded by

Copyright:

Available Formats

UNIT V

MEMORY AND I/O SYSTEMS 9

Memory hierarchy - Memory technologies Cache basics Measuring and improving

The fastest access is to data held in processor registers.

access instructions and data in main memory.

access the necessary information.

Temporal(The recently executed instruction are likely to be executed again

are also likely to be executed soon.

execution time can be reduced significantly.

Use of Cache Memory

one word at a time.

desired contents are read directly from the cache.

location. In this case Read/Write hit will occur.

associated flag bit called dirty/modified bit.

read miss will occur.

is forwarded to the processor.

Write Miss will occur.

Consider a cache consisting of 128 blocks of 16 words each, for a total of

j modulo 128 of the cache.

When the cache is full

Direct Mapped Cache

Low Order 4 bit field(word)Selects one of 16 words in a block.

position in which this block must be stored.

stored in 5 tag bits associated with its location in the cache.

bits associated with that cache location.

It is not very flexible.

Associative Mapped Cache

block if the cache is full.

It is more flexible than direct mapping technique.

Its cost is high.

It is the combination of direct and associative mapping.

block position within the set.

In direct mapping, the position of each block is pre-determined and there is no

need of replacement strategy.

recently Used(LRU) block & the technique is called LRU algorithm.

other counters are increased by one.

block counters are incremented by 1.

objective is the best possible performance at lower cost.

increasing the cost.

processor and how fast they are executed.

hierarchy plays an important role.

organization of the slower unit.

the size of the cache that can be accommodated.

instructions and another for data.

for both instructions and data.

mapping new information into the cache.

used in many commercial processors.

and an L2 cache of several megabytes.

new value into the main memory

slowed down by all write requests.

the next instruction.

not having any read operation.

A different situation occurs with the write-back protocol.

from the cache while the new block is being read.

to pause until the new data arrive.

the cache, in the case of a read miss.

inclusion of prefetch instructions increases the length of programs.

it is supported by machine instructions of many processors.