Mcs 012 PDF
Mcs 012 PDF
Introduction
Objectives
The von Neumann Architecture
Instruction Execution: An Example
Instruction Cycle
1.4.1
1.4.2
1.5
1.0
5
5
5
9
12
Interrupts
Interrupts and Instruction Cycle
1.6
1.7
Page Nos.
18
The Beginning
First Generation Computers
Second Generation Computers
Third Generation Computers
Later Generations
Summary
Solutions/Answers
29
29
INTRODUCTION
The use of Information Technology (IT) is well recognised. IT has become a must for
the survival of all business houses with the growing information technology trends.
Computer is the main component of an Information Technology network. Today,
computer technology has permeated every sphere of existence of modern man. From
railway reservations to medical diagnosis, from TV programmes to satellite launching,
from matchmaking to criminal catching everywhere we witness the elegance,
sophistication and efficiency possible only with the help of computers.
In this unit, you will be introduced to one of the important computer system
structures: the von Neumann Architecture. In addition, you will be introduced to the
concepts of a simple model of Instruction execution. This model will be enhanced in
the later blocks of this course. More details on these terms can be obtained from
further reading. We have also discussed about the main developments during the
various periods of computer history. Finally, we will discuss about the basic
components of microprocessors and their uses.
1.1
OBJECTIVES
1.2
The von Neumann architecture was the first major proposed structure for a generalpurpose computer. However, before describing the main components of von Neumann
Introduction to Digital
Circuits
architecture, let us first define the term computer as this will help us in discussing
about von Neumann architecture in logical detail.
Computer is defined in the Oxford dictionary as An automatic electronic apparatus
for making calculations or controlling operations that are expressible in numerical or
logical terms.
The definition clearly categorises computer as an electronic apparatus although the
first computers were mechanical and electro-mechanical apparatuses. The definition
also points towards the two major areas of computer applications viz., data
processings and computer assisted controls/operations. Another important aspect of
the definition is the fact that the computer can perform only those operations/
calculations, which can be expressed in Logical or Numerical terms.
Some of the basic questions that arise from above definition are:
How are the data processing and control operations performed by an electronic device
like the computer?
Well, electronic components are used for creating basic logic circuits that are used to
perform calculations. These components are further discussed in the later units.
However, for the present discussion, it would be sufficient to say that there must be a
certain unit that will perform the task of data processing and control.
What is the basic function performed by a computer? The basic function performed by
a computer is the execution of the program. A program is a sequence of instructions,
which operates on data, to perform certain tasks such as finding a prime number. The
computer controls the execution of the program.
What is data in computers? In modern digital computers data is represented in binary
form by using two symbols 0 and 1. These are called binary digits or bits. But the data
which we deal with consists of numeric data and characters such as decimal digits 0 to
9, alphabets A to Z, arithmetic operators (e.g. +,-, etc.), relations operators (e.g. =, > ,
etc.), and many other special characters (e.g.;,@,{,], etc.). Therefore, there has to be a
mechanism for data representation. Old computers use eight bits to represent a
character. This allows up to 28 = 256 different items to be represented uniquely. This
collection of eight bits is called a byte. Thus, one byte is used to represent one
character internally. Most computers use two bytes or four bytes to represent numbers
(positive and negative) internally. The data also includes the operational data such as
integer, decimal number etc. We will discuss more about data representation in the
next unit.
Thus, the prime task of a computer is to perform instruction execution. The key
questions, which can be asked in this respect, are: (a) how are the instructions
supplied to the computer? and (b) how are the instructions interpreted and executed?
Let us answer the second question first. All computers have a Unit that performs the
arithmetic and logical functions. This Unit is referred to as the Arithmetic Logic Unit
(ALU). But how will the computer determine what operation is to be performed by
ALU or in other words who will interpret the operation that is to be performed by
ALU?
This interpretation is done by the Control Unit of the computer. The control unit
accepts the binary form of instruction and interprets the instruction to generate control
signals. These control signals then direct the ALU to perform a specified arithmetic or
logic function on the data. Therefore, by changing the control signal the desired
function can be performed on data. Or conversely, the operations that need to be
performed on the data can be obtained by providing a set of control signals. Thus, for
a new operation one only needs to change the set of control signals.
The unit that interprets a code (a machine instruction) to generate respective control
signals is termed as Control Unit (CU). A program now consists of a sequence of
codes. Each code is, in effect, an instruction, for the computer. The hardware
interprets each of these instructions and generates respective control signals such that
the desired operation is performed on the data.
The Arithmetic Logic Unit (ALU) and the Control Unit (CU) together are termed as
the Central Processing Unit (CPU). The CPU is the most important component of a
computers hardware.
All these arithmetic and logical Operations are performed in the CPU in special
storage areas called registers. The size of the register is one of the important
considerations in determining the processing capabilities of the CPU. Register size
refers to the amount of information that can be held in a register at a time for
processing. The larger the register size, the faster may be the speed of processing.
But, how can the instructions and data be put into the computers? The instructions and
data to a computer are supplied by external environment; it implies that input devices
are needed in the computer. The main responsibility of input devices will be to put the
data in the form of signals that can be recognised by the system. Similarly, we need
another component, which will report the results in proper format. This component is
called output device. These components together are referred to as input/output (I/O)
devices.
In addition, to transfer the information, the computer system internally needs the
system interconnections. At present we will not discuss about Input/Output devices
and system interconnections in details, except the information that most common
input/output devices are keyboard, monitor and printer, and the most common
interconnection structure is the Bus structure. These concepts are detailed in the later
blocks.
Input devices can bring instructions or data only sequentially, however, a program
may not be executed sequentially as jump, looping, decision-making instructions are
normally encountered in programming. In addition, more than one data element may
be required at a time. Therefore, a temporary storage area is needed in a computer to
store temporarily the instructions and the data. This component is referred to as
memory.
The memory unit stores all the information in a group of memory cells such as a
group of 8 binary digits (that is a byte) or 16 bits or 32 bits etc. These groups of
memory cells or bits are called memory locations. Each memory location has a unique
address and can be addressed independently. The contents of the desired memory
locations are provided to the CPU by referring to the address of the memory location.
The amount of information that can be held in the main memory is known as memory
capacity. The capacity of the main memory is measured in Mega Bytes (MB) or Giga
Bytes (GB). One-kilo byte stands for 210 bytes, which are 1024 bytes (or
approximately 1000 bytes). A Mega byte stands for 220 bytes, which is approximately
a little over one million bytes, a giga byte is 230 bytes.
Let us now define the key features of von Neumann Architecture:
The control unit (CU) interprets each of these instructions and generates respective
control signals.
The Arithmetic Logic Unit (ALU) performs the arithmetic and logical
Operations in special storage areas called registers as per the instructions of
control unit. The size of the register is one of the important considerations in
determining the processing capabilities of the CPU. Register size refers to the
Introduction to Digital
Circuits
An Input/ Output system involving I/O devices allows data input and reporting
of the results in proper form and format. For transfer of information a computer
system internally needs the system interconnections. One such interconnection
structure is BUS interconnection.
Main Memory is needed in a computer to store instructions and the data at the
time of Program execution. Memory to CPU is an important data transfer path.
The amount of information, which can be transferred between CPU and
memory, depends on the size of BUS connecting the two.
It was pointed out by von-Neumann that the same memory can be used for
Storing data and instructions. In such a case the data can be treated as data on
which processing can be performed, while instructions can be treated as data,
which can be used for the generation of control signals.
The von Neumann machine uses stored program concept, i.e., the program
and data are stored in the same memory unit for execution. The computers prior
to this idea used to store programs and data on separate memories. Entering and
modifying these programs was very difficult as they were entered manually by
setting switches, plugging, and unplugging.
A von Neumann machine has only a single path between the main memory and
control unit (CU). This feature/constraint is referred to as von Neumann bottleneck.
Several other architectures have been suggested for modern computers. You can know
about non von Neumann architectures in further readings.
Check Your Progress 1
1)
a)
b)
c)
d)
T/F
e)
f)
von Neumann machine has one path between memory and control unit.
3)
1.3
After discussing about the basic structure of the computer, let us now try to answer the
basic question: How does the Computer execute a Program? Let us explain this with
the help of an example from higher level language domain.
Problem: Write a program to add two numbers.
A sample C program (Assuming two fixed values of numbers as a = 5 and b = 2)
1.
#include <stdio.h>
2.
main ()
3.
4.
5.
c= a+b;
6.
7.
The program at line 4 declares variables that will be equivalent to 3 memory locations
namely a, b and c. At line 5 these variables are added and at line 6 the value of c is
printed.
But, how will these instructions be executed by CPU?
First you need to compile this program to convert it to machine language. But how
will the machine instructions look like?
Let us assume a hypothetical instruction set of a machines of a size of 16 binary digits
(bits) instructions and data. Each instruction of the machine consists of two
components: (a) Operation code that specifies the operation that is to be performed by
the instruction, and (b) Address of the operand in memory on which the given
operation is to be performed.
Let us further assume that the size of operation code is assumed to be of six bits;
therefore, rest 10 bits are for the address of the operand. Also the memory word size
is assumed to be of 16 bits. Figure 2 shows the instruction and data formats for this
machine. However, to simplify our discussion, let us present the operation code using
Pnemonics like LOAD, ADD, STORE and decimal values of operand addresses and
signed decimal values for data.
Introduction to Digital
Circuits
The instruction execution is performed in the CPU registers. But before we define the
process of instruction execution let us first give details on Registers, the temporary
storage location in CPU for program execution. Let us define the minimum set of
registers required for von Neumann machines:
Accumulator Register (AC): This register is used to store data temporarily for
computation by ALU. AC is considered to contain one of the operands. The result of
computation by ALU is also stored back to AC. It implies that the operand value is
over-written by the result.
Memory Address Register (MAR): It specifies the address of memory location from
which data or instruction is to be accessed (read operation) or to which the data is to
be stored (write operation). Refer to figure 3.
Memory Buffer Register (MBR): It is a register, which contains the data to be written
in the memory (write operation) or it receives the data from the memory (read
operation).
Program Counter (PC): It keeps track of the instruction that is to be executed next,
that is, after the execution of an on-going instruction.
Instruction Register (IR): Here the instructions are loaded prior to execution.
Comments on figure 3 are as follows:
10
All representation are in decimals. (In actual machines the representations are in
Binary).
The Number of Memory Locations = 16
Size of each memory location = 16 bits = 2 Bytes (Compare with contemporary
machines word size of 16,32, 64 bits)
Thus, size of this sample memory = 16 words (Compare it with actual memory)
size, which is 128 MB, 256 MB, 512 MB, or more).
In the diagram MAR is pointing to location 10.
The last operation performed was read memory location 10 which is 65 in
this. Thus, the contents of MBR is also 65.
Now let us define several operation codes required for this machine, so that we can
translate the High level language instructions to assembly/machine instructions.
Operation
Code
LOAD
STORE
as
as
ADD
as
A sample machine instructions for the assumed system for line 5 that is c = a + b in
the program would be:
LOAD
ADD
STORE
Please note that a simple one line statement in C program has been translated to
three machine instructions as above. Please also note that these translated instructions
are machine dependent.
Now, how will these instructions execute?
Let us assume that the above machine instructions are stored in three consecutive
memory locations 1, 2 and 3 and the PC contains a value (1), which in turn is address
of first of these instructions. (Please refer to figure 4 (a)).
Figure 4:
Step 2:
11
Introduction to Digital
Circuits
The IR has the instruction LOAD A, which is decoded as Load the content
of address A in the accumulator register.
The address of operand that is 13, that is A, is transferred to MAR register.
The content of memory location (specified by MAR that is location 13)
is transferred to MBR.
The content of MBR is transferred to Accumulator Register.
Thus, the accumulator register is loaded with the content of location A, which is 5.
Now the instruction 1 execution is complete, and the next instruction that is 2
(indicated by PC) is fetched and PC is incremented to 3. This instruction is ADD B,
which instruct CPU to add the contents of memory location B to the accumulator. On
execution of this instruction the accumulator will contain the sum of its earlier value
that is A and the value stored in memory location B.
On execution of the instruction at memory location 3, PC becomes 4; the accumulator
results are stored in location C, that is 15, and IR still contains the third instruction.
This state is shown in figure 4 (C).
Please note that the execution of the instructions in the above example is quite simple
and requires only data transfer and data processing operations in each instruction.
Also these instructions require one memory reference during its execution.
Some of the problems/limitations of the example shown above are?
1.
2.
3.
1.4
INSTRUCTION CYCLE
We have discussed the instruction execution in the previous section, now let us
discuss more about various types of instruction execution.
What are the various types of operations that may be required by computer for
execution of instruction? The following are the possible steps:
Step to be
performed
How is it done
Who does it
2.
Memory Read
operation is done. Size
of instruction is
important. In addition,
PC is incremented to
point to next
instruction in sequence.
3.
Decode the
instruction
CU.
S.No.
12
4.
5.
6.
ALU/CU
7.
Memory write
13
Introduction to Digital
Circuits
Thus, in general, the execution cycle for a particular instruction may involve more
than one stage and memory references. In addition, an instruction may ask for an I/O
operation. Considering the steps above, let us work out a more detailed view of
instruction cycle. Figure 5 gives a diagram of an instruction cycle.
Please note that in the preceding diagram some steps may be bypassed while some
may be visited more than once. The instruction cycle shown in figure 5 consists of
following states/stages:
First the address of the next instruction is calculated, based on the size of
instruction and memory organisation. For example, if in a computer an
instruction is of 16 bits and if memory is organized as 16-bits words, then the
address of the next instruction is evaluated by adding one in the address of
the current instruction. In case, the memory is organized as bytes, which can
be addressed individually, then we need to add two in the current instruction
address to get the address of the next instruction to be executed in sequence.
Now, the next instruction is fetched from a memory location to the CPU
registers such as Instruction register.
The next state decodes the instruction to determine the type of operation
desired and the operands to be used.
In case the operands need to be fetched from memory or via Input devices, then
the address of the memory location or Input device is calculated.
Next, the operand is fetched (or operands are fetched one by one) from the
memory or read from the Input devices.
Finally, the results are written back to memory or Output devices, wherever
desired by first calculating the address of the operand and then transferring the
values to desired destination.
Please note that multiple operands and multiple results are allowed in many
computers. An example of such a case may be an instruction ADD A, B. This
instruction requires operand A and B to be fetched.
In certain machines a single instruction can trigger an operation to be performed on an
array of numbers or a string of characters. Such an operation involves repeated fetch
for the operands without fetching the instruction again, that is, the instruction cycle
loops at operand fetch.
Thus, a Program is executed as per the instruction cycle of figure 5. But what happens
when you want the program to terminate in between? At what point of time is an
interruption to a program execution allowed? To answer these questions, let us discuss
the process used in computer that is called interrupt handling.
1.4.1
Interrupts
The term interrupt is an exceptional event that causes CPU to temporarily transfer its
control from currently executing program to a different program which provides
service to the exceptional event. It is like you asking a question in a class. When you
ask a question in a class by raising hands, the teacher who is explaining some point
may respond to your request only after completion of his/her point. Similarly, an
interrupt is acknowledged by the CPU when it has completed the currently executing
instruction. An interrupt may be generated by a number of sources, which may be
either internal or external to the CPU.
14
Figure 6 Gives the list of some common interrupts and events that cause the
occurrence of those interrupts.
Interrupt Condition
Occurrence of Event
Division by Zero
The number exceeds the maximum
allowed.
Attempt of executing an
illegal/privileged instruction.
Trying to reference memory
location other than allowed for that
program.
Interrupts are a useful mechanism. They are useful in improving the efficiency of
processing. How? This is to the fact that almost all the external devices are slower
than the processor, therefore, in a typical system, a processor has to continually test
whether an input value has arrived or a printout has been completed, in turn wasting a
lot of CPU time. With the interrupt facility CPU is freed from the task of testing status
of Input/Output devices and can do useful processing during this time, thus increasing
the processing efficiency.
How does the CPU know that an interrupt has occurred?
There needs to be a line or a register or status word in CPU that can be raised on
occurrence of interrupt condition.
Once a CPU knows that an interrupt has occurred then what?
First the condition is to be checked as to why the interrupt has occurred. That includes
not only the device but also why that device has raised the interrupt. Once the
15
Introduction to Digital
Circuits
The CPU must find out the source of the interrupt, as this will determine which
interrupt service routine is to be executed.
The CPU then acquires the address of the interrupt service routine, which are
stored in the memory (in general).
What happens to the program the CPU was executing before the interrupt? This
program needs to be interrupted till the CPU executes the Interrupt service
program. Do we need to do something for this program? Well the context of this
program is to be saved. We will discuss this a bit later.
Finally, the CPU executes the interrupt service routine till the completion of the
routine. A RETURN statement marks the end of this routine. After that, the
control is passed back to the interrupted program.
1.4.2
16
17
Introduction to Digital
Circuits
interruption would be when it has completed the previous instruction and is about to
start a new instruction. Figure 7 shows instruction execution cycle with interrupt
cycle, where the interrupt condition is acknowledged. Please note that even interrupt
service routine is also a program and after acknowledging interrupt the next
instruction executed through instruction cycle is the first instruction of interrupt
servicing routine.
In the interrupt cycle, the responsibility of the CPU/Processor is to check whether any
interrupts have occurred checking the presence of the interrupt signal. In case no
interrupt needs service, the processor proceeds to the next instruction of the current
program. In case an interrupt needs servicing then the interrupt is processed as per the
following.
T/F
ii) MAR and MBR both are needed to fetch the data /instruction from the
memory.
iii) A clock may generate an interrupt.
iv) Context switching is not desired before interrupt processing.
v) In case multiple interrupts occur at the same time, then only one of
the interrupt will be acknowledged and rest will be lost.
2)
What is an interrupt?
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
..
3)
1.5
Let us now discuss the history of computers because this will give the basic
information about the technological development trends in computer in the past and
its projections for the future. If we want to know about computers completely, then we
18
must look at the history of computers and look into the details of various
technological and intellectual breakthroughs. These are essential to give us the feel of
how much work and effort has been done in the past to bring the computer to this
shape. Our effort in this section will be to describe the conceptual breakthroughs in
the past.
The ancestors of modern age computer were the mechanical and electromechanical
devices. This ancestry can be traced as far back as the 17th Century, when the first
machine capable of performing four mathematical operations, viz. addition,
subtraction, division and multiplication, appeared. In the subsequent subsection we
present a very brief account of Mechanical Computers.
1.5.1
The Beginning
Blaise Pascal made the very first attempt towards automatic computing. He invented a
device, which consisted of lots of gears and chains which used to perform repeated
additions and subtractions. This device was called Pascaline. Later many attempts
were made in this direction.
Charles Babbage, the grandfather of the modern computer, had designed two
computers:
The Difference Engine: It was based on the mathematical principle of finite
differences and was used to solve calculations on large numbers using a formula. It
was also used for solving the polynomial and trigonometric functions.
The Analytical Engine by Babbage: It was a general purpose-computing device,
which could be used for performing any mathematical operation automatically. The
basic features of this analytical engine were:
The next notable attempts towards computers were electromechanical. Zuse used
electromechanical relays that could be either opened or closed automatically. Thus,
the use of binary digits, rather than decimal numbers started, in computers.
Harvard Mark-I and the Bug
The next significant effort towards devising an electromechanical computer was made
at the Harvard University, jointly sponsored by IBM and the Department of UN Navy,
Howard Aiken of Harvard University developed a system called Mark I in 1944. Mark
I was a decimal machine, that is, the computations were performed using decimal
digits.
Some of you must have heard a term called bug. It is mainly used to indicate errors
in computer programs. This term was coined when one day, a program in Mark-I did
not run properly due to a moth short-circuiting the computer. Since then, the moth or
the bug has been linked with errors or problems in computer programming. Thus, the
process of eliminating error in a program is known as debugging.
19
Introduction to Digital
Circuits
1.5.2
It is indeed ironic that scientific inventions of great significance have often been
linked with supporting a very sad and undesirable aspect of civilization, that is,
fighting wars. Nuclear energy would not have been developed as fast, if colossal
efforts were not spent towards devising nuclear bombs. Similarly, the origin of the
first truly general-purpose computer was also designed to meet the requirement of
World War II. The ENIAC (the Electronic Numerical Integrator And Calculator) was
designed in 1945 at the University of Pennsylvania to calculate figures for thousands
of gunnery tables required by the US army for accuracy in artillery fire. The ENIAC
ushered in the era of what is known as first generation computers. It could perform
5000 additions or 500 multiplications per minute. It was, however, a monstrous
installation. It used thousands of vacuum tubes (18000), weighed 30 tons, occupied a
number of rooms, needed a great amount of electricity and emitted excessive heat.
The main features of ENIAC can be summarised as:
The trends, which were encountered during the era of first generation computers were:
1.5.3
Silicon brought the advent of the second generation computers. A two state device
called a transistor was made from silicon. Transistor was cheaper, smaller and
dissipated less heat than vacuum tube, but could be utilised in a similar way to
vacuum tubes. A transistor is called a solid state device as it is not created from wires,
metal glass capsule and vacuum which was used in vacuum tubes. The transistors
were invented in 1947 and launched the electronic revolution in 1950.
20
1.5.4
The third generation has the basic hardware technology: the Integrated Circuits (ICs).
But what are integrated circuits? Let us first define a term called discrete components.
A single self-contained transistor is called discrete component. The discrete
components such as transistors, capacitors, resistors were manufactured separately and
were soldered on circuit boards, to create electronic components or computer cards.
All these cards/components then were put together to make a computer. Since a
computer can contain around 10,000 of these transistors, therefore, the entire
mechanism was cumbersome. The basic idea of integrated circuit was to create
electronic components and later the whole CPU on a single Integrated chip. This was
made possible by the era of microelectronics (small electronics) with the invention of
Integrated Circuits (ICs).
In an integrated circuit technology the components such as transistors, resistors and
conductors are fabricated on a semiconductor material such as silicon. Thus, a desired
circuit can be fabricated in a tiny piece of silicon. Since, the size of these components
is very small in silicon, thus, hundreds or even thousands of transistors could be
fabricated on a single wafer of silicon. These fabricated transistors are connected with
a process of metalisation, thus, creating logic circuits on the chip.
21
Introduction to Digital
Circuits
Reliability: The integrated circuit interconnections are much more reliable than
soldered connections. In addition, densely packed integrated circuits enable
fewer inter-chip connections. Thus, the computers are more reliable. In fact, the
two unreliable extremes are when the chips are in low-level integration or
extremely high level of integration almost closer to maximum limits of
integration.
Low cost: The cost of a chip has remained almost constant while the chip
density (number of gates per chip) is ever increasing. It implies that the cost of
computer logic and memory circuitry has been reducing rapidly.
Greater Operating Speed: The more is the density, the closer are the logic or
memory elements, which implies shorter electrical paths and hence higher
operating speed.
The third generation computers mainly used SSI chips. One of the key concept which
was brought forward during this time was the concept of the family of compatible
computers. IBM mainly started this concept with its system/360 family.
A family of computers consists of several models. Each model is assigned a model
number, for example, the IBM system/360 family have, Model 30,40, 50,65 and 75.
The memory capacity, processing speed and cost increases as we go up the ladder.
However, a lower model is compatible to higher model, that is, program written on a
lower model can be executed on a higher model without any change. Only the time of
execution is reduced as we go towards higher model and also a higher model has more
22
number of instructions. The biggest advantage of this family system was the flexibility
in selection of model.
For example, if you had a limited budget and processing requirements you could
possibly start with a relatively moderate model. As your business grows and your
processing requirements increase, you can upgrade your computer with subsequent
models depending on your need. However, please note that as you have gone for the
computer of the same family, you will not be sacrificing investment on the already
developed software as they can still be used on newer machines also.
Let us summarise the main characteristics of a computer family. These are:
S.No. Feature
1.
Instruction set
Similar instructions.
Normally, the instructions set on a lower and
member is a subset of higher end member. A
program written on lower end member can
be executed on a higher end member, but
program written on higher end member may
or may not get executed on lower end
members.
Operating System
Speed of instruction
execution
Increases
Increases
Memory size
Increases
Cost
Increases
But how was the family concept implemented? Well, there were three main features
of implementation. These were:
The major developments which took place in the third generation, can be summarized
as:
23
Introduction to Digital
Circuits
1.5.5
Later Generations
One of the major milestones in the IC technology was the very large scale integration
(VLSI) where thousands of transistors can be integrated on a single chip. The main
impact of VLSI was that, it was possible to produce a complete CPU or main memory
or other similar devices on a single IC chip. This implied that mass production of
CPU, memory etc. can be done at a very low cost. The VLSI-based computer
architecture is sometimes referred to as fourth generation computers.
The Fourth generation is also coupled with Parallel Computer Architectures. These
computers had shared or distributed memory and specialized hardware units for
floating point computation. In this era, multiprocessing operating system, compilers
and special languages and tools were developed for parallel processing and distributed
computing. VAX 9000, CRAY X-MP, IBM/3090 were some of the systems
developed during this era.
Fifth generation computers are also available presently. These computers mainly
emphasise on Massively Parallel Processing. These computers use high-density
packaging and optical technologies. Discussions on such technologies are beyond the
scope of this course.
However, let us discuss some of the important breakthroughs of VLSI technologies in
this subsection:
Semiconductor Memories
Initially the IC technology was used for constructing processor, but soon it was
realised that the same technology can be used for construction of memory. The first
memory chip was constructed in 1970 and could hold 256 bits. The cost of this first
chip was high. The cost of semiconductor memory has gone down gradually and
presently the IC RAMs are quite cheap. Although the cost has gone down, the
memory capacity per chip has increased. At present, we have reached the 1 Gbits on a
single memory chip. Many new RAM technologies are available presently. We will
give more details on these technologies later in Block 2.
Microprocessors
Keeping pace with electronics as more and more components were fabricated on a
single chip, fewer chips were needed to construct a single processor. Intel in 1971
achieved the breakthrough of putting all the components on a single chip. The single
chip processor is known as a microprocessor. The Intel 4004 was the first
microprocessor. It was a primitive microprocessor designed for a specific application.
Intel 8080, which came in 1974, was the first general-purpose microprocessor. This
microprocessor was meant to be used for writing programs that can be used for
general purpose computing. It was an 8-bit microprocessor. Motorola is another
manufacturer in this area. At present 32 and 64 bit general-purpose microprocessors
are already in the market. Let us look into the development of two most important
series of microprocessors.
S.No.
Processor
Year
Memory size
Bus width
Comment
4004
1971
640 bytes
4 bits
8 bits
First general-purpose
micro-processor. It was
used in development of
first personal computer
2.
8080
1974
64 KB
3.
8086
1978
1 MB
16 bits
24
Supported
instruction cache
memory or queue
Was the first
powerful machine
80386
1985-1988
various
versions.
4 G Byte
Processor
32 bits
First 32 bit
The processor
supports
multitasking
Use of powerful
cache technology.
Supports pipeline
based instruction
execution
Contains built-in
facility in the term
of built-in math coprocessor for
floating point
instructions
Uses superscalar
techniques, that is
execution of multiple
instructions in parallel.
80486
1989-1991
4 g Byte
32 bits
Pentium
1993-1995
64 G Bytes
32-bits and
64 bits
Pentium
II
1997
64 G Bytes
64 bits
Pentium
III
1999
64 B bytes
64 bits
Supports 3 D graphics
software.
Pentium
IV
2000
64 G Bytes
64 bits
10
Itaium
2001
64 G bytes
64 bits
Supports massively
parallel computing
architecture.
64 bits
11
Xeon
2001
64 G bytes
Hyper-threading:
Nonthreaded program instructions are executed in a single order at a time, till the
program completion. Suppose a program have 4 tasks namely A, B, C, D. Assume
that each task consist of 10 instructions including few I/O instructions. A simple
sequential execution would require A B C D sequence.
In a threaded system these tasks of a single process/program can be executed in
parallel provided is no data dependency. Since, there is only one processor these tasks
will be executed in threaded system as interleaved threads, for example, 2 instructions
of A 3 instruction of B, 1 instruction of C, 4 instruction of D, 2
instruction of C etc. till completion of the threads.
25
Introduction to Digital
Circuits
Hyper-threading allows 2 threads A & B to execute at the same time. How? Some of
the more important parts of the CPU are duplicated. Thus, there exists 2 executing
threads in the CPU at the exact same time. Please note that both these sections of the
CPU works on the same memory space (as threads are the same program). Eventually
dual CPUs will allow the computer to execute two threads in two separate programs at
the same time.
Thus, Hyper-threading technology allows a single microprocessor to act like two
separate threaded processors to the operating system and the application program that
use it.
Hyper-threading requires software that has multiple threads and optimises speed of
execution. A threaded program executes faster on hyper threaded machine. However,
it should be noted that not all programs can be threaded.
The other architecture that has gained popularity over the last decade is the power PC
family. These machines are reduced set instruction computer (RISC) based
technologies. RISC technologies and are finding their application because of
simplicity of Instructions. You will learn more about RISC in Block 3 of this course.
The IBM made an alliance with Motorola and Apple who has used Motorola 68000
chips in their Macitosh computer to create a POWER PC architecture. Some of the
processors in this family are:
S.No. Processor
Year
Bus Width
Comment
601
1993
32 bits
603/603e
1994
32 bits
604/604e
1997
64 bits
G3
1997
64 bits
G4
1999
64 bits
G6
2003
64 bits
The VLSI technology is still evolving. More and more powerful microprocessors and
more storage space now is being put in a single chip. One question which we have still
not answered, is: Is there any classification of computers? Well-for quite sometime
computers have been classified under the following categories:
26
Micro-controllers
Micro-computers
Engineering workstations
Mini computers
Mainframes
Super computers
Network computers.
great heights. In fact it has been stated that embedded technology computing power
available even in a car today is much more than what was available in the system on
first lunar mission.
Microcomputers
A microcomputers CPU is a microprocessor. They are typically used as single user
computer although present day microcomputers are very powerful. They support
highly interactive environment specially like graphical user interface like windows.
These computers are popular for home and business applications. The microcomputer
originated in late 1970s. The first microcomputers were built around 8-bit
microprocessor chips. What do we mean by an 8-bit chip? It means that the chip can
retrieve instructions/data from storage, manipulate, and process an 8-bit data at a time
or we can say that the chip has a built- in 8-bit data transfer path.
An improvement on 8-bit chip technology was seen in early 1980s, when a series of
16-bit chips namely 8086 and 8088 were introduced by Intel Corporation, each one
with an advancement over the other.
8088 was an 8/16 bit chip i.e. an 8-bit path is used to move data between chip and
primary storage (external path), but processing was done within the chip using a 16bit path (internal path) at a time. 8086 was a 16/16-bit chip i.e. the internal and
external paths both were 16 bits wide. Both these chips could support a primary basic
memory of storage capacity of 1 Mega Byte (MB).
Similar to Intels chip series exists another popular chip series of Motorola. The first
16-bit microprocessor of this series was MC 68000. It was a 16/32-bit chip and could
support up to 16 MB of primary storage. Advancement over the 16/32 bit chips was
the 32/32 chips. Some of the popular 32-bit chips were Intels 80486 and MC 68020
chip.
Most of the popular microcomputers were developed around Intels chips, while most
of the minis and super minis were built around Motorolas 68000 series chips. With
the advancement of display and VLSI technology a microcomputer was available in
very small size. Some of these are laptops, note book computers etc. Most of these are
of the size of a small notebook but equivalent capacity of an older mainframe.
Workstations
The workstations are used for engineering applications such as CAD/CAM or any
other types of applications that require a moderate computing power and relatively
high quality graphics capabilities. Workstations generally are required with high
resolution graphics screen, large RAM, network support, a graphical user interface,
and mass storage device. Some special type of workstation comes, without a disk.
These are called diskless terminals/ workstations. Workstations are typically linked
together to form a network. The most common operating systems for workstations are
UNIX, Windows 2003 Server, and Solaris etc.
Please note that networking workstation means any computer connected to a local
area network although it could be a workstation or a personal computer.
Workstations may be a client to server Computers. Server is a computer that is
optimised to provide services to other connected computers through a network.
Servers usually have powerful processors, huge memory and large secondary storage
space.
Minicomputer
The term minicomputer originated in 1960s when it was realised that many computing
tasks do not require an expensive contemporary mainframe computers but can be
solved by a small, inexpensive computer.
27
Introduction to Digital
Circuits
The mini computers support multi-user environment with CPU time being shared
among multiple users. The main emphasis in such computer is on the processing
power and less for interaction. Most of the present day mini computers have
proprietary CPU and operating system. Some common examples of a mini-computer
are IBM AS/400 and Digital VAX. The major use of a minicomputer is in data
processing application pertaining to departments/companies.
Mainframes
Mainframe computers are generally 32-bit machines or higher. These are suited to big
organisations, to manage high volume applications. Few of the popular mainframe
series were DEC, IBM, HP, ICL, etc. Mainframes are also used as central host
computers in distributed systems. Libraries of application programs developed for
mainframe computers are much larger than those of the micro or minicomputers
because of their evolution over several decades as families of computing. All these
factors and many more make the mainframe computers indispensable even with the
popularity of microcomputers.
Supercomputers
The upper end of the state of the art mainframe machine are the supercomputers.
These are amongst the fastest machines in terms of processing speed and use
multiprocessing techniques, where a number of processors are used to solve a
problem. There are a number of manufacturers who dominate the market of
supercomputers-CRAY, IBM 3090 (with vector), NEC Fujitsu, PARAM by C-DEC
are some of them. Lately, a range of parallel computing products, which are
multiprocessors sharing common buses, have been in use in combination with the
mainframe supercomputers. The supercomputers are reaching upto speeds well over
25000 million arithmetic operations per second. India has also announced its
indigenous supercomputer. They support solutions to number crunching problems.
Supercomputers are mainly being used for weather forecasting, computational fluid
dynamics, remote sensing, image processing, biomedical applications, etc. In India,
we have one such mainframe supercomputer system-CRAY XMP-14, which is at
present, being used by Meteorological Department.
Let us discuss about PARAM Super computer in more details
PARAM is a high-performances, scalable, industry standard computer. It has evolved
from the concepts of distributes scalable computers supporting massive parallel
processing in cluster of networked of computers. The PARAMs main advantages is
its Scalability. PARAM can be constructed to perform Tera-floating point operations
per second. It is a cost effective computer. It supports a number of application
software.
PARAM is made using standard available components. It supports Suns Ultra
SPARC series servers and Solaris Operating System. It is based on open
environments and standard protocols. It can execute any standard application
available on Sun Solaris System.
Some of the applications that have been designed to run in parallel computational
mode on PARAM include numerical weather forecasting, seismic data processing,
Molecular modelling, finite element analysis, quantum chemistry.
It also supports many languages and Software Development platforms such as:
Solaris 2.5.1 Operating system on I/O and Server nodes, FORTRAN 77, FORTRAN
90, C and C++ language compilers, and tools for parallel program debugging,
Visualisation and parallel libraries, Distributed Computing Environment, Data
warehousing tools etc.
28
2)
3)
1.6
SUMMARY
1.7
SOLUTIONS / ANSWERS
3.
29
Introduction to Digital
Circuits
False
ii) True
iii) True
iv) False
v) False, they may be acknowledged as per priority.
2.
3.
A machine, which can be used for variety of applications and is not modeled
only for specific applications. von Neumann machines are general-purpose
machines since they can be programmed for any general application, while
microprocessor based control systems are not general-purpose machines as they
are specifically modeled as control systems.
2.
3.
30
Low cost
Increased operating speed
Reduction in size of the computers
Reduction in power and cooling requirements
More reliable
The concept of the family of computer was floated by IBM 360 series where the
features and cost increase from lower end members to higher end members.
31
Data Representation
Introduction
Objectives
Data Representation
Number Systems: A Look Back
Decimal Representation in Computers
Alphanumeric Representation
Data Representation For Computation
2.6.1
2.6.2
2.6.3
2.6.4
2.7
2.8
2.0
Page Nos.
31
31
31
32
36
37
39
Summary
Solutions/ Answers
56
56
INTRODUCTION
In the previous Unit, you have been introduced to the basic configuration of the
Computer system, its components and working. The concept of instructions and their
execution was also explained. In this Unit, we will describe various types of binary
notations that are used in contemporary computers for storage and processing of data.
As far as instructions and their execution is concerned it will be discussed in detailed
in the later blocks.
The Computer System is based on the binary system; therefore, we will be devoting
this complete unit to the concepts of binary Data Representation in the Computer
System. This unit will re-introduce you to the number system concepts. The number
systems defined in this Unit include the Binary, Octal, and Hexadecimal notations. In
addition, details of various number representations such as floating-point
representation, BCD representation and character-based representations have been
described in this Unit. Finally the Error detection and correction codes have been
described in the Unit.
2.1
OBJECTIVES
2.2
DATA REPRESENTATION
31
Introduction to Digital
Circuits
2.3
32
Data Representation
= 240 + 2
= (242)10
Conversion of Decimal Number to Binary Number: For converting a decimal
number to binary number, the integer and fractional part are handled separately. Let us
explain it with the help of an example:
Example 1: Convert the decimal number 43.125 to binary number.
Solution:
Integer Part = 43
Fraction 0.125
Integer Part
43
21
10
05
02
01
Quotient on division by 2
21
10
05
02
01
00
Remainder on division by 2
1
1
0
1
0
1
Read
You will get the Integer part of the number, if you READ the remainder in the
direction of the Arrow.
33
Introduction to Digital
Circuits
Fraction
On Multiplication by 2
0.125
0.250
0.500
0.250
0.500
1.000
Read
You will get the fractional part of the number, if you READ the Integer part of
the number in the direction of the Arrow.
One easy direct method in Decimal to binary conversion for integer part is to first
write the place values as:
26
25
24
23
22
21
20
64
32
16
Step 1:
Take the integer part e.g. 43, find the next lower or equal binary place value
number, in this example it is 32. Place 1 at 32.
Subtract the place value from the number, in this case subtract 32 from 43,
which is 11.
Repeat the two steps above till you get 0 at step 2.
On getting a 0 put 0 at all other place values.
Step 2:
Step 3:
Step 4:
16
32
16
-1
43 -32 =11
11- 8 = 3
3-2 = 1
1-1= 0
You can extend this logic to fractional part also but in reverse order. Try this method
with several numbers. It is fast and you will soon be accustomed to it and can do the
whole operation in single iteration.
Conversion of Binary to Octal and Hexadecimal: The rules for these conversions
are straightforward. For converting binary to octal, the binary number is divided into
34
groups of three, which are then combined by place value to generate equivalent octal.
For example the binary number 1101011.00101 can be converted to Octal as:
1
101
011
001
01
001
101
011
001
010
Data Representation
(Please note the number is unchanged even though we have added 0 to complete the
grouping. Also note the style of grouping before and after decimal. We count three
numbers from right to left while after the decimal from left to right.)
Thus, the octal number equivalent to the binary number 1101011.00101 is (153.12)8.
Similarly by grouping four binary digits and finding equivalent hexadecimal digits for
it can make the hexadecimal conversion. For example the same number will be
equivalent to (6B.28)H..
110
1011
0010 1
0110 1011
0010 1000
11
6
B
2
8
(11in hexadecimal is B)
.
Thus equivalent hexadecimal number is (6B.28)H
Conversely, we can conclude that a hexadecimal digit can be broken down into a
string of binary having 4 places and an octal can be broken down into string of binary
having 3 place values. Figure 1 gives the binary equivalents of octal and hexadecimal
numbers.
Octal Number
Hexadecimal
Number
000
0000
001
0001
010
0010
011
0011
100
0100
101
0101
110
0110
111
0111
1000
Binary-coded
Hexadecial
-Decimal-
1001
10
1010
11
1011
12
1100
13
1101
14
1110
15
1111
35
Introduction to Digital
Circuits
1)
1100.1101
ii)
10101010
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
2)
23
ii) 49.25
iii) 892
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
3)
2.4
DECIMAL REPRESENTATION IN
COMPUTERS
The binary number system is most natural for computer because of the two stable
states of its components. But, unfortunately, this is not a very natural system for us as
we work with decimal number system. So, how does the computer perform the
arithmetic? One solution that is followed in most of the computers is to convert all
input values to binary. Then the computer performs arithmetic operations and finally
converts the results back to the decimal number so that we can interpret it easily. Is
there any alternative to this scheme? Yes, there exists an alternative way of
performing computation in decimal form but it requires that the decimal numbers
should be coded suitably before performing these computations. Normally, the
decimal digits are coded in 7-8 bits as alphanumeric characters but for the purpose of
arithmetic calculations the decimal digits are treated as four bit binary code.
As we know 2 binary bits can represent 22 = 4 different combinations, 3 bits can
represent 23 = 8 combinations, and similarly, 4 bits can represent 24 = 16
combinations. To represent decimal digits into binary form we require 10
combinations, but we need to have a 4-digit code. One such simple representation may
be to use first ten binary combinations to represent the ten decimal digits. These are
popularly known as Binary Coded Decimals (BCD). Figure 2 shows the binary coded
decimal numbers.
36
Decimal
0000
0001
0010
0011
0100
0101
0110
0111
1000
1001
10
0001
0000
11
0001
0001
12
0001
0010
13
0001
0011
..
20
0010
..
..
30
0011
Data Representation
0000
0000
0100
0011
0001
0010
0101
Compare the equivalent BCD with equivalent binary value. Both are different.
2.5
ALPHANUMERIC REPRESENTATION
But what about alphabets and special characters like +, -, * etc.? How do we represent
these in a computer? A set containing alphabets (in both cases), the decimal digits (10
in number) and special characters (roughly 10-15 in numbers) consist of at least 70-80
elements.
ASCII
One such standard code that allows the language encoding that is popularly used is
ASCII (American Standard Code for Information Interchange). This code uses 7 bits
37
Introduction to Digital
Circuits
Punctuation characters
Lowercase characters
There are several different variants of EBCDIC. Most of these differ in the
punctuation coding. More details on EBCDIC codes can be obtained from further
reading and web pages on EBCDIC.
Comparison of ASCII and EBCDIC
38
latest client server software. Unicode is required by modern standards such as XML,
Java, JavaScript, CORBA 3.0, etc. It is supported in many operating systems, and
almost all modern web browsers. Unicode includes character set of Dev Nagari. The
emergence of the Unicode Standard, and the availability of tools supporting it, is
among the most significant recent global software technology trends.
Data Representation
The ISCII is an eight-bit code that contains the standard ASCII values till 127 from
128-225 it contains the characters required in the ten Brahmi-based Indian scripts. It
is defined in IS 13194:1991 BIS standard. It supports INSCRIPT keyboard which
provides a logical arrangement of vowels and consonants based on the phonetic
properties and usage frequencies of the letters of Bramhi-scripts. Thus, allowing use
of existing English keyboard for Indian language input. Any software that uses ISCII
codes can be used in any Indian Script, enhancing its commercial viability. It also
allows transliteration between different Indian scripts through change of display
mode.
2.6
As discussed earlier, binary codes exist for any basic representation. Binary codes can
be formulated for any set of discrete elements e.g. colours, the spectrum, the musical
notes, chessboard positions etc. In addition these binary codes are also used to
formulate instructions, which are advanced form of data representation. We will
discuss about instructions in more detail in the later blocks. But the basic question
which remains to be answered is:
How are these codes actually used to represent data for scientific calculations?
The computer is a discrete digital device and stores information in flip-flops (see Unit
3, 4 of this Block for more details), which are two state devices, in binary form. Basic
requirements of the computational data representation in binary form are:
Representation of sign
Representation of Magnitude
If the number is fractional then binary or decimal point, and
Exponent
The solution to sign representation is easy, because sign can be either positive or
negative, therefore, one bit can be used to represent sign. By default it should be the
left most bit (in most of the machines it is the Most Significant Bit).
Thus, a number of n bits can be represented as n+l bit number, where n+lth bit is the
sign bit and rest n bits represent its magnitude (Please refer to Figure 3).
39
Figure 3: A (n + 1) bit number
Introduction to Digital
Circuits
The decimal position can be represented by a position between the flip-flops (storage
cells in computer). But, how can one determine this decimal position? Well to
simplify the representation aspect two methods were suggested: (1) Fixed point
representation where the binary decimal position is assumed either at the beginning or
at the end of a number; and (2) Floating point representation where a second register
is used to keep the value of exponent that determines the position of the binary or
decimal point in the number.
But before discussing these two representations let us first discuss the term
complement of a number. These complements may be used to represent negative
numbers in digital computers.
Complement: There are two types of complements for a number of base (also called
radix) r. These are called rs complement and (r- 1)s complement. For example, for
decimal numbers the base is 10, therefore, complements will be 10s complement and
(10-1) = 9s complement. For binary numbers we talk about 2s and 1s complements.
But how to obtain complements and what do these complements means? Let us
discuss these issues with the help of following example:
Example 2: Find the 9s complement and 10s complement for the decimal number
256.
Solution:
9s complement: The 9s complement is obtained by subtracting each digit of the
number from 9 (the highest digit value). Let us assume that we want to represent a
maximum of four decimal digit number range. 9s complement can be used for BCD
numbers.
9s complement of 256
9
-0
9
-2
9
-5
9
-6
Similarly, for obtaining 1s complement for a binary number we have to subtract each
binary digit of the number from the digit 1.
10s complement: Adding 1 in the 9s complement produces the 10s complement.
10s complement of 0256 = 9743+1 = 9744
Please note on adding the number and its 9s complement we get 9999 (the maximum
possible number that can be represented in the four decimal digit number range) while
on adding the number and its 10s complement we get 10000 (The number just higher
than the range. This number cannot be represented in four digit representation.)
Example3: Find 1s and 2s complement of 1010 using only four-digit representation.
Solution:
1s complement: The 1s complement of 1010 is
1
-1
0
40
1
-0
1
1
-1
0
1
-0
1
The number is
The 1s complement is
Please note that wherever you have a digit 1 in number the complement contains 0 for
that digit and vice versa. In other words to obtain 1s complement of a binary number,
we only have to change all the 1s of the number to 0 and all the zeros to 1s. This can
be done by complementing each bit of the binary number.
Data Representation
The number is
The 1s complement is
The 1s complement is
The 2s complement can also be obtained by not complementing the least significant
zeros till the first 1 is encountered. This 1 is also not complemented. After this 1 the
rest of all the bits are complemented on the left.
Therefore, 2s complement of the following number (using this method) should be
(you can check it by finding 2s complement as we have done in the example).
The number is
The 2s complement
is
The 2s complement
is
The 2s complement is
1
No change in this
bit only
2.6.1
The fixed-point numbers in binary uses a sign bit. A positive number has a sign bit 0,
while the negative number has a sign bit 1. In the fixed-point numbers we assume that
the position of the binary point is at the end, that is, after the least significant bit. It
implies that all the represented numbers will be integers. A negative number can be
represented in one of the following ways:
41
Introduction to Digital
Circuits
Decimal
Number
Sign Bit
Magnitude (7 bits)
+6
000 0110
-6
000 0110
Representation (8 bits)
Magnitude/ 1s complement
for negative number (7 bits)
Sign Bit
+6
000 0110
-6
111 1001
For negative number take 1s complement of all the bits (including sign
bit) of the positive number
Signed 2s Complement Representation
Decimal
Number
Representation (8 bits)
Magnitude/ 1s complement
for negative number (7 bits)
Sign Bit
+6
000 0110
-6
111 1010
For negative number take 2s complement of all the bits (including sign
bit) of the positive number
Arithmetic addition
The complexity of arithmetic addition is dependent on the representation, which has
been followed. Let us discuss this with the help of following example.
Example 4: Add 25 and -30 in binary using 8 bit registers, using:
Solution:
Number
+25
-25
+30
-30
42
To do the arithmetic addition with one negative number only, we have to check the
magnitude of the numbers. The number having smaller magnitude is then subtracted
from the bigger number and the sign of bigger number is selected. The
implementation of such a scheme in digital hardware will require a long sequence of
control decisions as well as circuits that will add, compare and subtract numbers. Is
there a better alternative than this scheme? Let us first try the signed 2s complement.
Data Representation
Number
Sign Bit
Magnitude
+25
001 1001
-25
110 0111
+30
001 1110
-30
110 0010
Addition of
two positive
number
Addition of
smaller
Positive and
larger
negative
Number
Positive
value of
result
Addition of
larger
Positive and
smaller
negative
Number
Decimal
equivalent
number
Comments
Carry
out
Sign out
+25
+30
0
0
001
001
+55
011
0111
+25
-30
0
1
001
110
1001
0010
-05
111
1011
+05
000
0101
-25
+30
1
1
110
001
0111
1110
+05
000
0101
Magnitude
2s complement of above
result
Perform simple binary
addition. No carry in to
the sign bit and carry out
of the sign bit
-25
-30
1
1
110
110
0111
0010
-55
110
1001
011
0111
2s complemnt of above
result
+55
Please note how easy it is to add two numbers using signed 2s Complement. This
procedure requires only one control decision and only one circuit for adding the two
numbers. But it puts on additional condition that the negative numbers should be
stored in signed 2s complement notation in the registers. This can be achieved by
43
Introduction to Digital
Circuits
complementing the positive number bit by bit and then incrementing the resultant by 1
to get signed 2s complement.
Signed 1s complement representation
Addition of
two positive
number
Addition of
smaller
Positive and
larger
negative
Number
Positive
value of
result
Addition of
larger
Positive and
smaller
negative
Number
Addition of
two negative
Numbers
Decimal
equivalent
number
Carry
out
Sign out
+25
+30
0
0
001
001
1001
1110
+55
001
0111
+25
-30
0
1
001
110
1001
0001
-05
111
1011
+05
000
0101
-25
+30
1
0
110
001
0111
1110
000
0101
Magnitude
Add carry
to Sum
and
discard it
+05
000
0101
-25
-30
1
1
110
110
0111
0010
-55
100
0111
+55
Add carry
to sum and
discard it
Positive
value of
result
Comments
100
1000
011
0111
1s complemnt of above
result
44
+0
-0
Signed magnitude
000 0000
000 0000
Signed 1s complement
000 0000
111 1111
But, in signed 2s complement there is just one zero and there is no positive or
negative zero.
+0 in 2s Complement Notation: 0 000 0000
Data Representation
1
0 000 0000
(27 1) to (27 1)
= (1281) to (128 1)
= 127 to 127
127 to 127
But, for signed 2s complement we can represent +127 to 128. The 128 is
represented in signed 2s complement notation as 10000000.
Arithmetic Subtraction: The subtraction can be easily done using the 2s
complement by taking the 2s complement of the value that is to be subtracted
(inclusive of sign bit) and then adding the two numbers.
Signed 2s complement provides a very simple way for adding and subtracting two
numbers. Thus, many computers (including IBM PC) adopt signed 2s complement
notation. The reason why signed 2s complement is preferred over signed 1s
complement is because it has only one representation for zero.
Overflow: An overflow is said to have occurred when the sum of two n digits number
occupies n+ 1 digits. This definition is valid for both binary as well as decimal digits.
65
75
0
0
100 0001
100 1011
140
000 1100
The expected result is +140 but the binary sum is a negative number and is equal to
116, which obviously is a wrong result. This has occurred because of overflow.
45
Introduction to Digital
Circuits
2s
Complement
Mantissa
Decimal
1
1
011
111
1111
0001
-65
-75
011
0000
-140
Carry
out
Sign
bit
2s
Complement
Mantissa
1
1
011 1111
111 0001
111 0100
Thus, overflow has occurred, i.e. the arithmetic results so calculated have exceeded
the capacity of the representation. This overflow also implies that the calculated
results will be erroneous.
2.6.2
The purpose of this representation is to keep the number in decimal equivalent form
and not binary as above. A decimal digit is represented as a combination of four bits;
thus, a four digit decimal number will require 16 bits for decimal digits representation
and additional 1 bit for sign. Normally to keep the convention of one decimal digit to
4 bits, the sign sometimes is also assigned a 4-bit code. This code can be the bit
combination which has not been used to represent decimal digit e.g. 1100 may
represent plus and 1101 can represent minus.
For example, a simple decimal number 2156 can be represented as:
1101 0010 0001 0101 0110
Sign
Although this scheme wastes considerable amount of storage space yet it does not
require conversion of a decimal number to binary. Thus, it can be used at places where
the amount of computer arithmetic is less than that of the amount of input/output of
data e.g. calculators or business data processing situations. The arithmetic in decimal
can also be performed as in binary except that instead of signed complement, signed
nines complement is used and instead of signed 2s complement signed 9s
complement is used. More details on decimal arithmetic are available in further
readings.
Check Your Progress 2
1)
Write the BCD equivalent for the three numbers given below:
i)
23
ii) 49.25
46
iii) 892
.....................................................................................................................................
Data Representation
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
2)
10100010
ii) 00000000
iii) 11001100
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
.
3)
+50 and 5
2.6.3
Floating-point number representation consists of two parts. The first part of the
number is a signed fixed-point number, which is termed as mantissa, and the second
part specifies the decimal or binary point position and is termed as an Exponent. The
mantissa can be an integer or a fraction. Please note that the position of decimal or
binary point is assumed and it is not a physical point, therefore, wherever we are
representing a point it is only the assumed position.
Example 1: A decimal + 12.34 in a typical floating point notation can be represented
in any of the following two forms:
47
Introduction to Digital
Circuits
This number in any of the above forms (if represented in BCD) requires 17 bits for
mantissa (1 for sign and 4 each decimal digit as BCD) and 9 bits for exponent (1 for
sign and 4 for each decimal digit as BCD). Please note that the exponent indicates the
correct decimal location. In the first case where exponent is +2, indicates that actual
position of the decimal point is two places to the right of the assumed position, while
exponent 2 indicates that the assumed position of the point is two places towards the
left of assumed position. The assumption of the position of point is normally the same
in a computer resulting in a consistent computational environment.
Floating-point numbers are often represented in normalised forms. A floating point
number whose mantissa does not contain zero as the most significant digit of the
number is considered to be in normalised form. For example, a BCD mantissa + 370
which is 0 0011 0111 0000 is in normalised form because these leading zeros are not
part of a zero digit. On the other hand a binary number 0 01100 is not in a normalised
form. The normalised form of this number is:
0
1100
0100
Sign
Normalised Mantissa
101000100
Mantissa (Integer)
00100
Exponent
A zero cannot be normalised as all the digits in mantissa in this case have to be zero.
Arithmetic operations involved with floating point numbers are more complex in
nature, take longer time for execution and require complex hardware. Yet the floatingpoint representation is a must as it is useful in scientific calculations. Real numbers
are normally represented as floating point numbers.
The following figure shows a format of a 32-bit floating-point number.
0
Sign
9
Significand = 23 bits
48
31
cannot be zero, therefore, it has to be 1. Thus, it is not necessary to store this first bit
and it is being assumed implicitly for the number. Therefore, a 23-bit mantissa can
represent 23 + 1 = 24 bit mantissa in our representation.
Data Representation
49
Introduction to Digital
Circuits
In floating point numbers, the basic trade-off is between the range of the numbers and
accuracy, also called the precision of numbers. If we increase the exponent bits in 32bit format, the range can be increased, however, the accuracy of numbers will go
down, as size of mantissa will become smaller. Let us take an example, which will
clarify the term precision. Suppose we have one bit binary mantissa then we can
represent only 0.10 and 0.11 in the normalised form as given in above example
(having an implicit 1). The values such as 0.101, 0.1011 and so on cannot be
represented as complete numbers. Either they have to be approximated or truncated
and will be represented as either 0.10 or 0.11. Thus, it will create a truncation or round
off error. The higher the number of bits in mantissa better will be the precision.
In floating point numbers, for increasing both precision and range more number of
bits are needed. This can be achieved by using double precision numbers. A double
precision format is normally of 64 bits.
Institute of Electrical and Electronics Engineers (IEEE) is a society, which has created
lot of standards regarding various aspects of computer, has created IEEE standard 754
for floating-point representation and arithmetic. The basic objective of developing this
standard was to facilitate the portability of programs from one to another computer.
This standard has resulted in development of standard numerical capabilities in
various microprocessors. This representation is shown in figure 6.
0
S
31
Significand (N)
0
S
11 12
63
Significand (N)
Figure 7 gives the floating-point numbers specified by the IEEE Standard 754.
Single Precision Numbers (32 bits)
Exponent (E)
Significand (N)
Value / Comments
255
Not equal to 0
Do represent a number
255
0<E<255
Any
(1.N) 2E-127
For example, if S is zero that is positive
number.
N=101 (rest 20 zeros) and E=207
Then the number is = +(1.101) 2207-127
= + 1.101 280
Not equal to 0
(0.N) 2-126
0 depending on the sign bit.
50
Exponent (E)
Significand (N)
Value / Comments
2047
Not equal to 0
2047
0<E<2047
Any
Not equal to 0
(1.N) 2E-1023
(0.N) 2-1022
0 depending on the sign bit
Data Representation
Please note that IEEE standard 754 specifies plus zero and minus zero and plus
infinity and minus infinity. Floating point arithmetic is more sticky than fixed point
arithmetic. For floating point addition and subtraction we have to follow the following
steps:
2.6.4
Before we wind up the data representation in the context of todays computers one
must discuss about the code, which helps in recognition and correction of errors.
Computer is an electronic media; therefore, there is a possibility of errors during data
transmission. Such errors may result from disturbances in transmission media or
external environment. But what is an error in binary bit? An error bit changes from
0 to 1 or 1 to 0. One of the simplest error detection codes is called parity bit.
Parity bit: A parity bit is an error detection bit added to binary data such that it makes
the total number of 1s in the data either odd or even. For example, in a seven bit data
0110101 an 8th bit, which is a parity bit may be added. If the added parity bit is even
parity bit then the value of this parity bit should be zero, as already four 1s exists in
the 7-bit number. If we are adding an odd parity bit then it will be 1, since we already
have four 1 bits in the number and on adding 8th bit (which is a parity bit) as 1 we are
making total number of 1s in the number (which now includes parity bit also) as 5, an
odd number.
Similarly in data 0010101 Parity bit for even parity is 1
Parity bit for odd parity is 0
But how does the parity bit detect an error? We will discuss this issue in general as an
error detection and correction system (Refer figure 8).
51
Introduction to Digital
Circuits
The Objective : Data should be transmitted between a source data pair reliably,
indicating error, or even correcting it, if possible.
The Process:
An error detection function is applied on the data available at the source end an
error detection code is generated.
The data and error detection or correction code are stored together at source.
On receiving the data transmission request, the stored data along with stored
error detection or correction code are transmitted to the unit requesting data
(Destination).
On receiving the data and error detection/correction code from source, the
destination once again applies same error detection/correction function as has
been applied at source on the data received (but not on error detection/
correction code received from source) and generates destination error
detection/correction code.
Source and destination error codes are compared to flag or correct an error as
the case may be.
The parity bit is only an error detection code. The concept of error detection and
correction code has been developed using more than one parity bits. One such code is
Hamming error correcting code.
Hamming Error-Correcting Code: Richard Hamming at Bell Laboratories devised
this code. We will just introduce this code with the help of an example for 4 bit data.
Let us assume a four bit number b4, b3, b2, b1. In order to build a simple error
detection code that detects error in one bit only, we may just add an odd parity bit.
However, if we want to find which bit is in error then we may have to use parity bits
for various combinations of these 4 bits such that a bit error can be identified
uniquely. For example, we may create four parity sets as
Source Parity
52
Destination Parity
b1, b2, b3
P1
D1
b2, b3, b4
P2
D2
b3, b4, b1
P3
D3
P4
D4
Data Representation
Now, a very interesting phenomena can be noticed in the above parity pairs. Suppose
data bit b1 is in error on transmission then, it will cause change in destination parity
D1, D3, D4.
ERROR IN
(one bit only)
b1
D1, D3, D4
b2
D1, D2, D4
b3
D1, D2,D3, D4
b4
D2, D3, D4
Thus, by simply comparing parity bits of source and destination we can identify that
which of the four bits is in error. This bit then can be complemented to remove error.
Please note that, even the source parity bit can be in error on transmission, however,
under the assumption that only one bit (irrespective of data or parity) is in error, it will
be detected as only one destination parity will differ.
What should be the length of the error detection code that detects error in one bit?
Before answering this question we have to look into the comparison logic of error
detection. The error detection is done by comparing the two i bit error detection and
correction codes fed to the comparison logic bit by bit (refer to figure 8). Let us have
comparison logic, which produces a zero if the compared bits are same or else it
produces a one.
Therefore, if similar Position bits are same then we get zero at that bit Position, but if
they are different, that is, this bit position may point to some error, then this Particular
bit position will be marked as one. This way a matching word is constructed. This
matching word is i bit long, therefore, can represent 2i values or combinations.
For example, a 4-bit matching word can represent 24=16 values, which range from 0
to 15 as:
0000,
0001,
0010,
0011,
0100,
0101,
0110,
0111
1000,
1001,
1010,
1011,
1100,
1101,
1110,
1111
The value 0000 or 0 represent no error while the other values i.e. 2i-1
(for 4 bits 24 1=15, that is from 1 to 15) represent an error condition. Each of these
2i 1(or 15 for 4 bits) values can be used to represent an error of a particular bit.
Since, the error can occur during the transmission of N bit data plus i bit error
correction code, therefore, we need to have at least N+i error values to represent
them. Therefore, the number of error correction bits should be found from the
following equation:
2i 1
>= N+i
>= 8+i
LHS = 23 1 = 7; RHS = 8+3 = 11
53
Introduction to Digital
Circuits
i=4
Therefore, for an eight-bit word we need to have at least four-bit error correction code
for detecting and correcting errors in a single bit during transmission.
Similarly for 16 bit word we need to have i = 5
25 1 = 31 and 16+i = 16+5 = 21
For 16-bit word we need to have five error correcting bits.
Let us explain this with the help of an example:
Let us assume 4 bit data as 1010
The logic is shown in the following table:
Source:
Source Data
b4
b3
b2
b1
P1
(b1, b2, b3)
P2
(b2, b3, b4)
P3
(b3, b4, b1)
P4
(b1, b2, b3,b4 )
b3
b2
b1
D1
(b1, b2, b3)
D2
(b2, b3, b4)
D3
(b3, b4, b1)
D4
(b1, b2, b3,b4 )
b3
b2
b1
D1
(b1, b2, b3)
D2
(b2, b3, b4)
D3
(b3, b4, b1)
D4
(b1, b2, b3,b4 )
Now let us take a case when data received is correct but on receipt one of the parity
bit, let us say P4 become 0. Please note in this case since data is 1010 the destination
parity bits will be D1=0, D2=1, D3=0, D4=1. Thus, P1 D1, P2 D2, P3 D3, will
be same but P4 D4 differs. This does not belong to any of the combinations in
Figure 9. Thus we conclude that P4 received is wrong.
Please not that all these above cases will fail in case error is in more than one
bits. Let us see by extending the above example.
54
Normally, Single Error Correction (SEC) code is used in semiconductor memories for
correction of single bit errors, however, it is supplemented with an added feature for
detection of errors in two bits. This is called a SEC-DED (Single Error CorrectionDouble Error Detecting) code. This code requires an additional check bit in
comparison to SEC code. We will only illustrate the working principle of SEC-DED
code with the help of an example for a 4-bit data word. Basically, the SEC-DED code
guards against the errors of two bits in SEC codes.
Data Representation
Case: 4
b3
1
b2
0
b1
0
b4
b3
b2
b1
D1
(b1, b2, b3)
D2
(b2, b3, b4)
D3
(b3, b4, b1)
D4
(b1, b2, b3,b4 )
D5P5 mismatch indicates that there is double bit error, so do not try to correct error,
instead asks the sender to send the data again. Thus, the name single error correction,
but double error detection, as this code corrects single bit errors but only detects error
in two bit.
Check Your Progress 3
1)
2)
Find the even and odd parity bits for the following 7-bit data:
i)
ii)
iii)
iv)
0101010
0000000
1111111
1000100
.....................................................................................................................................
.....................................................................................................................................
3)
Find the length of SEC code and SEC-DED code for a 16-bit word data transfer.
55
Introduction to Digital
Circuits
.....................................................................................................................................
.....................................................................................................................................
.....................................................................................................................................
2.7
SUMMARY
2.8
SOLUTIONS/ANSWERS
1.
(i)
23
1
22
1
21
0
20
0
2-1
1
2-2
1
2-3
0
2-4
1
thus; Integer
= (1 23+1 22+0 21+0 20) = (23+22) = (8+4) = 12
Fraction = (1 2-1+1 2-2+0 2-3+1 2-4) = 2-1+2-2+2-4 = 0.5+0.125 + 0.0625 =0.6875
ii)
10101010
2
128 64
=1
32
16
56
16
Integer is 49.
32
16
1/16
Data Representation
Fraction is 0.25
1/2
1/4
1/8
256
128
64
32
16
Decimal to Hexadecimal
16) 23 (1
-16
7
Hexadecimal is 17
Binary to Hexadecimal (hex)
= 1
0111
0001 0111
1
7
ii)
49.25
or 110001.010
Decimal to hex
Integer part = 49
16 ) 49 ( 3
-48
1
Integer part =
31
Fraction part
.25 16
11
0001 .
0011 0001 .
3
1
.
31.4
010
0100
4
57
Introduction to Digital
Circuits
892
55
12=C
55
1.
i)
ii)
iii)
2.
(ii)
(iii)
Number
10100010
00000000
11001100
1s complement
01011101
11111111
00110011
2s complement
01011110
00000000
00110100
3.
+50 is 0
+5 is 0
5 is 1
therefore
Add+50
0110010
0000101
1111011
0110010
1 1111011
1
0101101
^
carry out (discard the carry)
Carry in to sign bit
=1
= 1 Therefore, no overflow
+45
+65
-65
+45
65
is
is
is
0
0
1
0
1
1
0101101
1000001
0111111
0101101
0111111
1101100
No carry into sign bit, no carry out of sign bit. Therefore, no overflow.
+20 is
0010100
Therefore, -20 is
1101100
58
(iii)
+75
+85
is
is
0
0
1001011
1010101
0100000
Data Representation
=1
=0
1.
i)
1010.0001
= 1.0100001 23
So, the single precision number is :
Significand = 010 0001 000 0000 0000 0000
Exponent = 3+127 = 130 = 10000010
Sign=0
So the number is = 0 1000 0010 010 0001 0000 0000 0000 0000
ii)
-0.0000111
-1.11 2-5
Significand = 110 0000 0000 0000 0000 0000
Exponent = 127-5 = 122 = 0111 1010
Sign
= 1
So the number is
1 0111 1010
110 0000 0000 0000 0000
2.
3.
Data
0101010
0000000
1111111
1000100
0000
Odd parity bit
0
1
0
1
N = 16
i=?
so the equation is
2i 1 > =16 + i
at
at
i
=4
4 -1 > =16+4
2
15 > = 20 Not true.
i
=5
25 -1 > =16+5
31> = 21 True the condition is satisfied.
Although, this condition will be true for i > 5 also but we want to use only minimum
essential correction bits which are 5.
For SEC-DED code we require an additional bit as overall parity. Therefore, the SECDED code will be of 6 bits.
59
Introduction to Digital
Circuits
Page Nos.
Introduction
Objectives
Logic Gates
Logic Circuits
Combinational Circuits
60
60
60
62
63
3.5
3.6
3.7
3.8
3.0
72
73
Adders
Decoders
Multiplexer
Encoder
Programmable Logic Array
Read Only Memory ROM
Summary
Solutions/ Answers
82
82
INTRODUCTION
In the previous units, we have discussed the basic configuration of computer system
von Neumann architecture, data representation and simple instruction execution
paradigm. But How does a computer actually perform computations?. Now, we will
attempt to find answer of this basic query. In this unit, you will be exposed to some
of the basic components that form the most essential parts of a computer. You will
come across terms like logic gates, binary adders, logic circuits and combinational
circuits etc. These circuits are the backbone of any computer system and knowing
them is quite essential. The characteristics of integrated digital circuits are also
discussed in this unit.
3.1
OBJECTIVES
define and describe some of the useful circuits of a computer system such as
multiplexer, decoders, ROM etc.
3.2
LOGIC GATES
A logic gate is an electronic circuit which produces a typical output signal depending
on its input signal. The output signal of a gate is a simple Boolean operation of its
input signal. Gates are the basic logic elements that produce signals of binary 1 or 0.
We can represent any Boolean function in the form of gates.
60
In general we can represent each gate through a distinct graphic symbol and its
operation can be given by means of algebraic expression. To represent the inputoutput relationship of binary variables in each gate, truth tables are used. The
notations and truth -tables for different logic gates are given in Figure 3.1.
Principles of Logic
Circuits I
The truth table of NAND and NOR can be made from NOT (A AND B) and NOT
(A OR B) respectively. Exclusive OR (XOR) is a special gate whose output is one
only if the two inputs are not equal. The inverse of exclusive OR, called as XNOR
gate, can be a comparator which will produce a 1 output if two inputs are equal.
The digital circuits use only one or two types of gates for simplicity in fabrication
purposes. Therefore, one must think in terms of functionally complete set of gates.
What does functionally complete set imply? A set of gates by which any Boolean
function can be implemented is called a functionally complete set. The functionally
complete sets are: [AND, NOT], [NOR], [NAND], [OR, NOT].
61
Introduction to Digital
Circuits
3.3
LOGIC CIRCUITS
A Boolean function can be implemented into a logic circuit using the basic gates:AND , OR & NOT. Consider, for example, the Boolean function: F (A,B,C) = A B + C
The relationship between this function and its binary variables A, B, C can be
represented in a truth table as shown in figure 3.2(a) and figure 3.2(b) shows the
corresponding logic circuit.
Inputs
Output
A B+C
Thus, in a logic circuit, the variables coming on the left hand side of boolean
expression are inputs to circuit and the variable function coming on the right hand side
of expression is taken as output.
Here, there is one important point to note i.e. there is only one way to represent the
boolean expression in a truth table but can be expressed in variety of logic circuits.
How? [try to find the answer]
Check Your Progress 1
1)
What are the logic gates and which gates are called as Universal gates.
.....................................................................................................................................
.....................................................................................................................................
2)
.
.
.
3)
62
4)
Principles of Logic
Circuits I
.....................................................................................................................................
.....................................................................................................................................
5)
3.4
COMBINATIONAL CIRCUIT
The basic design issue related to combinational circuits is: the Minimization of
number of gates. The normal circuit constraints for combinational circuit design are :
3.4.1
If a product term of SOP expression contains every variable of that function either in
true or complement form then it is defined as a Minterm or Standard Product. This
minterm will be true only for one combination of input values of the variables. For
example, in the SOP expression
F (A, B, C) = (A. B.C) + ( A . B . C) + (A . B)
We have three product terms namely A.B.C, . A . B .C and A.B. But only first two of
them qualifies to be a minterm, as the third one does not contain variable C or its
63
Introduction to Digital
Circuits
Minterm
Maxterm
Term
Representation
Term
Representation
abc
m0
a + b+ c
M0
a bc
m1
a +b + c
M1
ab c
m2
a +b+ c
M2
abc
m3
a +b+c
M3
a bc
m4
a + b+ c
M4
a bc
m5
a + b+ c
M5
m6
a +b+ c
M6
a b c
a bc
m7
a +b+c
M7
We can represent any Boolean function alegebrically directly in minterm and maxterm
form from the truth table. For minterms, consider each combination of variables that
produces a 1 output in function and then taking OR of all those terms. For example,
the function F in figure 3.5 is represented in minterm form by ORing the terms where
the output F is 1 i.e. a b c , a b c a b c, a b c & a b c .
a
0
0
0
0
1
1
1
1
b
0
0
1
1
0
0
1
1
c
0
1
0
1
0
1
0
1
F
0
1
1
1
0
0
1
1
m0
m1
m2
m3
m4
m5
m6
m7
Thus,
F (a,b,c) = a b c + a b c + a b c+ a b c + a b c
= m1 + m2+ m3 + m6 + m7
=
64
(1,2,3,6,7)
Principles of Logic
Circuits I
F (a, b, c) = a b c + a b c + a b c
If we take the complement of F , we get the function F in maxterm form.
F (a, b, c) = ( F ) = ( a b c + a b c + a b c) = ( a b c ) . (a b c ) . (a b c)
= (a + b + c) ( a + b + c) ( a + b + c ) [De Morgans law]
= M0 . M4 . M5
= (0, 4, 5)
The product symbol stands for ANDing the maxterms.
Here, you will appreciate the fact that the terms which were missing in minterm form
are present in maxterm form. Thus if any form is known then the other form can be
directly formed.
The Boolean function expressed as a sum of minterms or product of maxterms has the
property that each and every literal of the function should be present in each and every
term in either normal or complemented form.
3.4.2
Minimization of Gates
Algebraic Simplification
Karnaugh Maps
Quine McCluskey Method
Algebraic Simplification
65
(a) F = a b c + a b c + a b
Introduction to Digital
Circuits
(b) F = a b c + a b c + a b
Figure 3.6 : Two logic diagrams for same boolean expression
ab
The logic diagram of the simplified expression is drawn in fig 3.6 (b) using NOT, OR
and AND gates (the same operation can be performed by using a single XOR gate).
Thus the number of gates are reduced to 5 gates (2 inverters, 2 AND gates & 1 OR)
instead of 7 gates. (3 inverters, 3 AND & 1 OR gate).
The algebraic function can appear in many different forms although a process of
simplification exists yet it is cumbersome because of absence of routes which tell
what rule to apply next. The Karnaugh map is a simple direct approach of
simplification of logic expressions.
Karnaugh Maps
66
Principles of Logic
Circuits I
Please note:
1)
2)
3)
4)
Decimal equivalents of column are given for help in understanding where the
position of the respective set lies. It is not the value filled in the square. A
square can contain one or nothing.
The 00, 01, 11 etc written on the top implies the value of the respective
variables.
Wherever the value of a variable is 0 it is said to represent its compliment form.
The value of only one variable changes when we move from one row to the next
row or one column to the next column.
Step 2:
The next step in Karnaugh map is to map the truth table into the map. The
mapping is done by putting a 1 in the respective square belonging to the 1
value in the truth table. This mapped map is used to arrive at simplified
Boolean expression which then can be used for drawing up the optimal
logical circuit. Step 2 will be more clear in the example.
Step 3:
67
Introduction to Digital
Circuits
Now, let us see how to use K map simplification for finding the
Boolean function for the cases whose truth table is given in figure 3.8(a)
and 3.8(B) shows the K-Map for this.
Example:
Decimal
Output F
10
11
12
13
14
15
(a)
Or F = ( 0, 1, 2, 6, 8, 9, 10)
Truth table
Let us see what the pairs which can be considered as adjacent in the Karnaughs here.
The pairs are:
1)
2)
3)
68
Four corners
=
=
=
=
=
( A B C D + A B C D ) + (A B C D +A B C D )
A B D ( C +C) + A B D ( C +C)
[as C+ C = 1]
A B D+A B D
B D ( A + A)
BD
2)
3)
Principles of Logic
Circuits I
( A B C D + A B C D) + (A B C D + A B C D)
A B C ( D + D) + A B C ( D + D)
A B C+ A B C
BC
= B D+B C+ACD
[Note : This expression can be directly obtained from the K-Map after making
quadrets and doublets. Try to find how ?]
The expressions so obtained through K-Maps are in the forms of the sum of the
product form i.e. it is expressed as the sum of the products of the variables. This
expression can be expressed in product of sum form, but for this special method are
required to be used [already discussed in last section].
Let us see how we can modify K-Map simplification to obtain POS form. Suppose in
the previous example instead of using 1 we combined the adjacent 0 squares then we
will obtain the inverse function and on taking transform of this function we will get
the POS form.
Another important aspect about this simple method of digital circuit design is
DONOT care conditions. These conditions further simplify the algebraic function.
These conditions imply that it does not matter whether the output produced is 0 or 1
for the specific input. These conditions can occur when the combination of the
number of inputs are more than needed. For example, calculation through BCD where
4 bits are used to represent a decimal digit implies we can represent 24 = 16 digits but
since we have only 10 decimal digits therefore 6 of those input combination values do
not matter and are a candidate for DONOT care condition.
For the purpose of exercises you can do the exercise from the reference [1], [2] ,[3]
given in Block introduction.
What will happen if we have more than 4 6 variables? As the numbers of variables
increases K-Maps become more and more cumbersome as the numbers of possible
combinations of inputs keep on increasing.
Quine McKluskey Method
A tabular method was suggested to deal with the increasing number of variables
known as Quine McKluskey Method. This method is suitable for programming and
hence provides a tool for automating design in the form of minimizing Boolean
expression.
The basic principle behind the Quine McKluskey Method is to remove the terms,
which are redundant and can be obtained by other terms.
To understand Quine - Mc Kluskey method, lets us see following example:Given, F (A,B,C,D,E)
69
Introduction to Digital
Circuits
Step I:
Term/var
ABCDE
ABC D E
A B C DE
A BCD E
A B CD E
A B C DE
AB C DE
A B C DE
Checked/Unchecked
Forming the pairs which differ in only one variable, also put check (v)
against the terms selected and finding resultant terms as follows :-
Step II :
AB C D E
AB C D E
ABCE
A B CD E
AB CD E
BCDE
AB C DE
A BC DE
ACDE
AB C DE
A B C DE
BCDE
In the new terms, again find all the terms which differ only in one variable and put a
check () across those terms i.e.
BC DE
B C DE
BCE
Step III :
ABCDE ABC D E
ABCE
A B C DE A B C D E
A B CD E
A B C DE
ABCDE
A CD E
B CE
A BC D
Thus all columns have mark X. Thus the final expression is:
F (A,B,C,D,E) =
A B C E + A C DE + B CE
70
Step I :
Step II :
Check all the terms that differ in only one variable and then combine the
pairs by removing the variable that differs in those terms. Thus a new
table is formed.
Principles of Logic
Circuits I
This process is repeated, if necessary, in the new table also until all
uncommon terms are left i.e. no matches left in table.
Step III :
Finally, a two dimensional table is formed all terms which are not
eliminated in the table form rows and all original terms form the column.
At each intersection of row and column where row term is subset of column
term, a X is placed.
a)
b)
Step IV :
a)
b)
c)
1)
A B C+A BC
(ii)
(A+B) . ( A + B )
F = ( ( A .B) + B )
(ii)
F = ( ( A. B) . ( A B ) )
.....................................................................................................................................
3)
Simplify the following boolean functions in SOP and POS forms by means of
K-Maps.
Also draw the logic diagram.
F (A,B,C,D) = (0,2,8,9,10,11,14,15)
. ....................................................................................................................................
71
Introduction to Digital
Circuits
3.5
The digital circuits, which we use now-a-days, are constructed with NAND or NOR
gates instead of ANDORNOT gates. NAND & NOR gates are called Universal
Gates as we can implement any digital system with these gates. To prove this point
we need to only show that the basic gates : AND , OR & NOT, can be implemented
with either only NAND or with only NOR gate. This is shown in figure 3.9 below:
Figure 3.9 : Basic Logic Operations with NAND and NOR gates
Any Boolean expression can be implemented with NAND gates, by expressing the
function in sum of product form.
Example: Consider the function F (A, B, C) = (1,2,3,4,5,7). Firstly bring it in
SOP form. Thus, from the K-Map shown in figure 3.10(a), we find
F (A.B.C) = C + AB + A B = C + AB + A B
= C . (A B) . (A B)
72
(1,2,3,4,5,7).
Similarly, any Boolean expression can be implemented with only NOR gate by
expressing in POS form. Let us take same example, F (A, B, C) = (1,2,3,4,5,7).
Principles of Logic
Circuits I
As discussed in section 3.4.1, the above function F can be represented in POS form as
F (A, B, C) = (0,6)
)(
) (
)(
= A + B + C . A + B + C = A + B + C . A. + B + C
= A + B + C + A + B + C
Figure 3.11: Logic circuit for function F (A, B, C) = (1,2,3,4,5,7) using NOR gates
After discussing so much about the design let us discuss some important
combinational circuits. We will not go into the details of their design in this unit.
3.6
EXAMPLES OF COMBINATIONAL
CIRCUITS
3.6.1
Adders
Adders play one of the most important roles in binary arithmetic. In fact fixed point
addition is often used as a simple measure to express processors speed. Addition and
subtraction circuit can be used as the basis for implementation of multiplication and
division. ( we are not giving details of these, you can find it in Suggested Reading).
Thus, considerable efforts have been put in designing of high speed addition and
substraction circuits. It is considered to be an important task since the time of
Babbage. Number codes are also responsible for adding to the complexity of
arithmetic circuit. The 2s complement notation is one of the most widely used codes
for fixed-point binary numbers because of ease of performing addition and subtraction
through it.
A combinational circuit which performs addition of two bits is called a half adder,
while the combinational circuit which performs arithmetic addition of three bits (the
third bit is the previous carry bit) is called a full adder.
In half adder the inputs are:
73
Introduction to Digital
Circuits
(c ) Logic Diagram
Figure 3.12: Half Adder implementation
Let us take the full adder. For this another variable carry from previous bit addition is
added let us call it p. The truth table and K-Map for this is shown in figure 3.13.
K=Maps for S
Truth table
74
K Maps for C
Principles of Logic
Circuits I
x y p +xyp
= xp(y+ y)
= xp
b)
xyp +xy p
= xy
c)
xyp+xyp
= yp
Thus, C = x p + x y + y p
In case of K-Map for S, there are no adjacencies. Therefore,
S= x y p+ x y p +xy p +xyp
Till now we have discussed about addition of bit only but what will happen if we are
actually adding two numbers. A number in computer can be 4 byte i.e. 32 bit long or
even more. Even for these cases the basic unit is the full adder. Let us see (for
example) how can we construct an adder which adds two 4 bit numbers. Let us
assume that the numbers are: x3 x2 x1 x0 and y3 y2 y1 y0; here xi and yi (i = 0 to 3)
represent a bit. The 4-bit adder is shown in figure 3.14.
75
Introduction to Digital
Circuits
The overall sum is represented by S3 S2 S1 S0 and over all carry is C3 from the 4th bit
adder. The main feature of this adder is that carry of each lower bit is fed to the next
higher bit addition stage, it implies that addition of the next higher bit has to wait for
the previous stage addition. This is called ripple carry adder. The ripple carry becomes
time consuming when we are going for addition of say 32 bit. Here the most
significant bit i.e. the 32nd bits has to wait till the addition of first 31 bits is complete.
Therefore, a high-speed adder, which generates input carry bit of any stage directly
from the input to previous stages was developed. These are called carry lookahead
adders. In this adder the carry for various stages can be generated directly by the logic
expressions such as:
C0 = x0 y0
C1 = x1 y1 + (x1 + y1) C0
The complexity of the look ahead carry bit increases with higher bits. But in turn it
produces the addition in a very small time. The carry look ahead becomes increasingly
complicated with increasing numbers of bits. Therefore, carry look ahead adders are
normally implemented for adding chunks of 4 to 8 bits and the carry is rippled to next
chunk of 4 to 8 bits carry look ahead circuit.
Adder- subtractor
The control input x controls the operations i.e. if x =0 then the circuit behaves like
an adder and if x =1 then circuit behaves like a subtractor. The operation is
summarized as :
a)
76
When x = 0, c = 0, the output of all XOR gates will be the same as the
corresponding input Bi where i = 0 to 3. Thus, Ai & Bi are added through full
adders giving Sum, Si & carry Ci
b)
3.6.2
When x = 1, the output of all XOR gates will be complement of input Bi where i
=0 to 3, to which carry C0=1 is added. Thus, the circuit finds A plus 2s
complement of B, that is equal to AB.
Principles of Logic
Circuits I
Decoders
Decoder converts one type of coded information to another form. A decoder has n
inputs and an enable line (a sort of selection line) and 2n output lines. Let us see an
example of 3 8 decoder which decodes a 3 bit information and there is only one
output line which gets the value 1 or in other words, out of 23 = 8 lines only 1 output
line is selected. Thus, depending on selected output line the information of the 3 bits
can be recognized or decoded.
Please make sure while constructing the logic diagram wherever the values in the truth
table are appearing as zero in input and one in output the input should be fed in
complemented form e.g. the first 4 entries of truth table contains 0 in I0 position and
hence I0 value 0 is passed through a NOT gate and fed to AND gates a, b, c and
d which implies that these gates will be activated/selected only if I0 is 0. If I0 value is
1 then none of the top 4 AND gates can be activated. Similar type of logic is valid for
I1. Please note the output line selected is named 000 or 010 or 111 etc. The output
value of only one of the lines will be 1. These 000, 010 indicates the label and suggest
that if you have these I0 I1 I2 input values the labeled line will be selected for the
output. The enable line is a good resource for combining two 3 8 decoders to make
one 4 16 decoder.
77
Introduction to Digital
Circuits
3.6.3
Multiplexer
Multiplexer is one of the basic building units of a computer system which in principle
allows sharing of a common line by more than one input lines. It connects multiple
input lines to a single output line. At a specific time one of the input lines is selected
and the selected input is passed on to the output line. The diagram 4 1 multiplexer
( MUX) is given in figure 3.16.
( c) Logic diagram
Figure 3.17: 4
1 Multiplexer
But how does the multiplexer know which line to select? This is controlled by the
select lines. The select lines provide the communication among the various
components of a computer. Now let us see how the multiplexer also known as MUX
works, here for simplicity we will take the example of 4 1 MUX i.e. there are 4
input lines connected to 1 output line. For the sake of consistency we will call input
line as I, and output line as O and control line a selection line S or enable as E.
Please notice the way in which S0 and S1 are connected in the circuit. To the a AND
gate S0 and S1 are inputted in complement form that means a gate will output I0 when
both the selection lines have a value 0 which implies S0 = 1 and S1 = 1, i.e. S0= 0
and S1=0 and hence the first entry in the truth table. Please note that at S0 = 0 and S1 =
0, AND gate b, c, d will yield 0 output and when all these outputs will pass OR
gate e they will yield I0 as the output for this case. That is for S0=0 and S1=0 the
output becomes I0, which in other words can be said as For S0 = 0 and S1 = 0, I0
input line is selected by MUX. Similarly other entries in the truth table are
corresponding to the logical nature of the diagram. Therefore, by having two control
lines we could have a 4 1 MUX. To have 8 1 MUX we must have 3 control lines or
with 3 control lines we could make 23 = 8 i.e. 8 1 MUX. Similarly, with n control
lines we can have
2n 1 MUX. Another parameter which is predominant in MUX design is a number of
inputs to AND gate. These inputs are determined by the voltage of the gate, which
normally support a maximum of 8 inputs to a gate.
78
Where can these devices used in the computer? The multiplexers are used in digital
circuits for data and controlled signal routing.
Principles of Logic
Circuits I
We have seen a concept where out of n input lines, 1 can be selected, can we have a
reverse concept i.e. if we have one input line and data is transmitted to one of the
possible 2n lines where n represents the number of selection lines. This operation is
called Demultiplexing.
3.6.4
Encoders
An Encoder performs the reverse function of the decoder. An encoder has 2n input
lines and n output line. Let us see the 8 3 encoder which encodes 8 bit information
and produces 3 outputs corresponding to binary numbers. This type of encoder is also
called octalto binary encoder. The truth table of encoder is shown in figure 3.17.
O2
O1
O0
D0
D1
D2
D3
D4
D5
D6
D7
I0
I1
I2
I3
I4
I5
I6
I7
From the encoder table, it is evident that at any given time only one input is assumed
to have 1 value. This is a major limitation of encoder. What will happen when two
inputs are together active? The obvious answer is that since the output is not defined
the ambiguity exists. To avoid this ambiguity the encoder circuit has input priority so
that only one input is encoded. The input with high subscript can be given higher
priority. For example, if both D2 and D6 are 1 at the same time, then the output will be
110 because D6 has higher priority then D2.
The encoder can be implimented with 3 OR gates whose inputs can be determined
from the truth table. The output can be expressed as:
O0 = I1 + I3 + I5 + I7
O1 = I2 + I3 + I6 + I7
O2 = I4 + I5 + I6 + I7
You can draw the K-Maps to determine above functions and draw the related
combinational circuit
79
Introduction to Digital
Circuits
3.6.5
Till now the individual gates are treated as basic building blocks from which various
logic functions can be derived. We have also learned about the stratergies of
minimization of number of gates. But with the advancement of technology the
integration provided by integrated circuit technology has increased resulting into
production of one to ten gates on a single chip (in small scale integration). The gate
level designs are constructed at the gate level only but if the design is to be done using
these SSI chips the design consideration needs to be changed as a number of such SSI
chips may be used for developing a logic circuit. With MSI and VLSI we can put even
more gates on a chip and can also make gate interconnections on a chip. This
integeration and connection brings the advantages of decreased cost, size and
increased speed. But the basic drawback faced in such VLSI & MSI chip is that for
each logic function the layout of gate and interconnection needs to be designed. The
cost involved in making such custom designed is quite high. Thus, came the concept
of Programmable Logic Array, a general purpose chip which can be readily adopted
for any specific purpose.
The PLA are designed for SOP form of Boolean function and consist of regular
arrangements of NOT, AND & OR gate on a chip. Each input to the chip is passed
through a NOT gate, thus the input and its complement are available to each AND
gate. The output of each AND gate is made available for each OR gate and the output
of each OR gate is treated as chip output. By making appropriate connections any
logic function can be implemented in these Programmable Logic Array.
80
The figure 3.18(a) shows a PLA of 3 inputs and 2 outputs. Please note the
connectivity points, all these points can be connected if desired. Figure 3.18(b) shows
an implementation of logic function:
Principles of Logic
Circuits I
3.6.6
Read-only-Memory (ROM)
Figure 3.19 shows the block diagram of ROM. It consists of k input address lines
and n output data lines. An m n ROM is an array of binary cell organised into m
(2k = m) words of n bits each. The ROM does not have any data input because the
write operation is not defined for ROM. ROM is classified as a combinational circuit
and constructed internally with decoder and a set of OR gates.
In general, a m n ROM (where m= 2k, k = no. of address lines) will have an internal
k 2k decoder and n OR gate. Each OR gates has 2k inputs which are connected to
each of the outputs of the decoder.
81
Introduction to Digital
Circuits
1)
2)
Map the function having 4 variables in a K- Map and draw the truth table. The
funcion is
F (A, B, C, D) = (2,6,10,14).
..
..
..
3)
Find the optimal logic expression for the above function. Draw the reasultant
logic diagram.
..
.
.
4)
5)
3.7
SUMMARY
This unit provides you the information regarding a basis of a computer system. The
key elements for the design of a combinational circuit like adders etc. are discussed in
this unit. With the advent of PLAs the designing of circuit is changing and now the
scenario is moving towards micro processors. With this developing scenario in the
forefront and the expectation of Ultra- Large- Integration (ULSI) in view, time is not
far of when design of logic circuits will be confined to single microchip components.
You can refer to latest trends of design and development including VHDL (a hardware
design language) in the further readings.
3. 8
SOLUTIONS/ANSWERS
1.
82
Logic gates produce typical outputs based on input values NAND and NOR are
universal gates as they can be used to constant any other logic gate.
Principles of Logic
Circuits I
2.
F = A + B + A + B
)(
= A+B. A+B
= ( A + B ) . (A+ B )
= ( A + B ). A + ( A + B ) B
= A .A+A B + A . B + B . B
= 0 + A B+A B+B
= 0 + B (A+ A ) + B
= 0 + B+B = B
3.
4.
5.
1 (i):
A
F= (A B C + A B C )
0
0
0
0
0
0
1
1
0
1
0
1
0
0
1
0
83
Introduction to Digital
Circuits
1
1
1
1
0
0
1
1
0
1
0
1
1
0
0
0
F=(A+B). ( A + B )
0
0
1
1
0
1
0
1
0
1
1
0
(ii)
(i)
= ( ( A.B ) + B)
= +A+B + B
(B+ B is always 1)
= A+1
= 1
(ii)
=
=
=
=
=
( A.B) . ( A B )
( A + B ). ( A B )
AAB + A BB
AB + AB
AB
F= A B+ BC + A D
(
)
F = (A B). (B C ). (A D )
F = (A + B) . (B + C ). (A + D )
F= A B+ BC + A D
84
Principles of Logic
Circuits I
K-Map
Truth table
85
Introduction to Digital
Circuits
3.
4.
5.
86
4.4
4.5
4.0
87
87
87
88
Basic Flip-Flops
Excitation Tables
Master Slave Flip Flops
Edge Triggered Flip-flops
4.6
4.7
4.8
Page Nos.
Introduction
Objectives
Sequential Circuits: The Definition
Flip Flops
4.3.1
4.3.2
4.3.3
4.3.4
Principles of Logic
Circuits II
95
98
Registers
Counters Asynchronous Counters
Synchronous Counters
RAM
103
105
105
INTRODUCTION
By now you are aware of the basic configuration of computer systems, how the data is
represented in computer systems, logic gates and combinational circuits. In this unit
you will learn how all the computations are performed inside the system. You will
come across terms like flip flops, registers, counters, sequential circuits etc. Here, you
will also learn how to make circuits using combinational and sequential circuits.
These circuit design will help you in performing practicals in MCSL-017 lab course.
4.1
OBJECTIVES
define some of the useful circuits of a computer system like registers counters
etc.; and
4.2
87
Introduction to Digital
Circuits
These sequential circuits unlike combinational circuits are time dependent. The
sequential circuits are broadly classified, depending upon the time at which these are
observed and their internal state changes. The two broad classifications of sequential
circuits are:
Synchronous
Asynchronous
Synchronous circuits use flip-flops and their status can change only at discrete
intervals (Doesnt it seems as good choice for discrete digital devices such a
computers?). Asynchronous sequential circuits may be regarded as combinational
circuit with feedback path. Since the propagation delays of output to input are small,
they may tend to become unstable at times Thus, complex asynchronous circuits are
difficult to design.
The synchronization in a sequential circuit is achieved by a clock pulse generator,
which gives continuous clock pulse. Figure. 4.2. shows the form of a clock pulse.
A clock pulse can have two states: - 0 or 1; disabled or active state. The storage
elements can change their state only when a clock pulse occurs. Sequential circuits
that have clock pulses as input to flip-flops are called clocked sequential circuit.
4.3
88
FLIP-FLOPS
Let us see flip-flops in detail. A flip-flop is a binary cell, which stores 1-bit of
information. It itself is a sequential circuit. By now we know that flip-flop can change
its state when clock pulse occurs but when? Generally, a flip-flop can change its state
when the clocks transitions from 0 to 1 (rising edge) or from 1 to 0 (falling edge) and
not when clock is 1. If the storage element changes its state when clock is exactly at 1
then it is called latch. In simple words, flip-flop is edge-triggered and latch is leveltriggered.
4.3.1
Principles of Logic
Circuits II
Basic Flip-flops
Let us first see a basic latch. A latch or a flip-flop can be constructed using two NOR
or NAND gates. Figure 4.3 (a) shows logic diagram for S-R latch using NOR gates.
The latch has two inputs S & R for set and reset respectively. When the output is
Q=1 & Q =0, the latch is said to be in the set state. When Q=0 & Q =1, it is the reset
state. Normally, The outputs Q & Q are complement of each other. When both inputs
are equal to 1 at the same time, an undefined state results, as both outputs are equal to
0.
Figure 4.3 (b) Shows truth table for S-R latch. Let us examine the latch more
closely.
i)
ii)
iii)
Let us try to construct most common flip- flops from this basic latch.
R-S Flip flop - The graphic symbol of S-R flip-flop is shown in Fig 4.4. It has three
inputs, S (set), R (reset) and C (for clock). The Q(t+1) is the next state of flip-flop
after the occurrence of a clock pulse. Q(t) is the present state, that is present Q value
(Set-1 or Reset 0).
89
Introduction to Digital
Circuits
In figure 4.4 (a), the arrowhead symbol in front of clock pulse C indicates that the
flip-flop responds to leading edge (from 0 to 1) of input clock signal.
Operation of R-S flip-flop can be summarised as:
1)
2)
3)
4)
If no clock signal i.e. C=0 then output can not change irrespective of R & S
values
When clock signal changes from 0 to 1 and S=1, R=0 then output Q=1 & Q =0
(Set)
If R=1 S=0 & clock signal C changes from 0 to 1 then output Q=0 & Q =1
(Reset)
During positive clock transition if both S & R become 1 then output is not
defined, as it may become 0 or 1 depending upon internal timing delays
occurring in circuit.
D Flip -Flop
90
D flip-flop is also referred as Delay flip-flop because it delays the 0 or 1 applied to its
input by a single clock pulse.
Principles of Logic
Circuits II
J-K flip-flop
The J-K flip-flop is also a modification of SR flip-flop, it has 2 inputs like S & R and
all possible inputs combinations are valid in J K flip-flop.
Figure. 4.6 shows implementation of J K flip-flop. The inputs J & K behave exactly
like input S & R to set and reset flip-flop, respectively. When J & K are 1, the flipflop output is complemented with clock transition. [Try this as an exercise]
T flip-flop
T (Toggle) flip-flop is obtained from JK flip-flop by joining inputs J &K together. The
implementation of T flip-flop is shown in figure. 4.7. When T=0, the clock pulse
transition does not change the state. When T=1, the clock pulse transition complement
the state of the flip-flop.
b) Logic Diagram
91
Introduction to Digital
Circuits
4.3.2
Excitation Tables
The characteristic tables of flip-flops provide the next state when inputs and the
present state are known. These tables are useful for analysis of sequential circuits.
But, during the design process, we know the required transition from present state to
next state and wish to find the required flip-flop inputs. Thus comes the need of a
table that lists the required input values for given change of state. Such a table is
called excitation Table. Fig 4.8 shows excitation tables for all flip-flops.
Q(t)
Q(t+1)
Q((t)
Q(t+1)
Q(t)
Q(t+1)
Q(t)
Q(t+1)
Q(t) & Q(t+1) indicates present and next state for flip a flop, respectively. The symbol
X in the table means dont care condition i.e. doesnt matter whether input is 0 or 1.
Let us discuss more deeply, how these excitation tables are formed. For this, we take
an example of J-K Flip flop.
1)
The state transition from present state 0 to next state 0 (Figure 408 (a) can be
achieved when
(a)
(b)
The state transition from present state 0 to next state 1 can be achieved when
(a)
(b)
Here, also in either case J=1 but K can be 0 or1 that means again K is
represented as a dont care case.
3)
92
Similarly, state transition from present state 1 to next state 0 can be achieved
when
(a)
(b)
This indicates that in either case K=1 but J can be either 0 or 1 thus dont care case.
4)
Principles of Logic
Circuits II
For state transition from present state 1 to next state 1 can be achieved when
(a)
(b)
Next State
Can be achieved
a)
b)
a)
b)
a)
b)
a)
b)
Similarly, the excitation tables for the rest of the flip-flops can be derived (Try to do
this as an exercise).
Check Your Progress 1
1.
2.
3.
4.3.3
Master-Slave Flip-Flop
The master slave flip-flop consists of two flip-flops. One is the master flip-flop &
other is called the slave flip-flop. Fig 4.9 shows implementation of master-slave flipflop using J-K flip-flop.
93
Introduction to Digital
Circuits
(ii)
When the clock pulse is 0, the master flip-flop is disabled but the slave
becomes active and its output Q & Q becomes equal to Y and Y
respectively. Why? Well the possible combination of the value of Y and Y
are either Y=1, Y =0 or Y=0 Y =1. Thus, the slave flip-flop can have
following combinations: (a) J=1, K=0
(stet flip-flop)
(clear flip-flop)
When inputs are applied at JK and clock pulse becomes 1, only master gets
activated resulting in intermediate output Y going to state 0 or 1 depending on
the input and previous state. Remember that during this time slave is also
maintaining its previous state only. As the clock pulse becomes 0, the master
becomes inactive and slave acquires the same state as master as explained in (a)
and (b) conditions above.
4.3.4
Edge-Triggered flip-flops
94
Principles of Logic
Circuits II
Output
can not
change
Positive
Transition
Output
cannot
change
Negative
Transition
CC
O
(a) Positive edge-triggered D flip-flop
The effective positive clock transition includes a minimum time called setup time, for
which the D input must be maintained at constant value before the occurrence of clock
transition. Similarly, a minimum time called hold time, for which the D input must
not change after the application of positive transition of the pulse.
Check Your Progress 2
1.
2.
4.4
95
Introduction to Digital
Circuits
1)
2)
3)
4)
5)
Draw state table or state diagram from the problem statement, (if state diagram
is available, draw state table also)
Give binary codes to states.
From state table, make input equation in simplified form. i.e. generating
Boolean functions which describes signals for the inputs of flip-flops.
From state table, derive output equation in simplified form.
Draw logic diagram with required flip-flops and combinational circuits.
Let us take an example to illustrate the above procedure. Suppose we want to design
2-bit binary counter using D flip-flop. The circuit goes through repeated binary states
00, 01, 10 and 11 when external input X = 1 is applied. The state of circuit will not
change when X = 0. The state table & state diagram for this is shown in figure 4.12.
But how do we make this state diagram? Please note the number of flip-flops 2 in
our example as we are designing 2 bits counter. Various states of two bit input would
be 00 01 10 and 11. These are shown in circle. The arrow indicate the transitions
on an input value X. For example, when the counter is in state 00 and input value
X=0 occurs, the counter remains in 00 state. Hence the loop back on X= 0. However,
on encountering X=1 the counter moves to state 01. Like wise in all other states
similar transition occur. For making state table remember the excitation table of D
flip-flop given in figure 4.8 (c).
The present state of the two flip-flops and next states of the flip-flops are put into the
table along with any input value. For example, if the present state of flip-flops is 01
and input value is 1 then counter will move to state 10. Notice these values in the
fourth row of the values in the state table (figure 4.12 (a)
Or we can write as
A
0
B
A (Next)
1 X =1
1
B (Next)
0
This implies that flip-flop. A has moved from state clear to set. As we are making the
counter using D flip-flop, the question is what would be the input DA value of A flipflop that allows this transition that is Q(t) = 0 to Q(t+1) =1 possible for A flip flop.
On checking the excitation table for D Flip-flop, we find the value of D input of A
flip-flop (called DA in this example) would be 1. Similarly, the B flip-flop have a
transition Q(t) = 1 to Q(t+1)=0, thus, DB, would be 0. Hence notice the values of flipflop inputs DA and DB. (Row 3).
a) State Table
96
Next step indicates simplification of input equation to flip-flop which is done using
K-Maps as shown in fig 4.13. But why did we make K-map for DA or DB which
happens to be flip-flop input values? Please note in sequential circuit design, we are
designing the combinational logic that controls the state transition of flip-flops. Thus,
each input to a flip-flop is one output of this combinational logic and the present state
of flip-flops and any other input value form the input values to this combinational
logic.
Principles of Logic
Circuits II
= A B + A X + A BX
DB
= B X + BX
97
Introduction to Digital
Circuits
Note: Similarly, the sequential circuits can be designed using any number of flipflops using state diagrams and combinational circuits design methods.
4.5
Let us now discuss some of the useful examples of sequential circuits like registers,
counters etc.
4.5.1
Registers
A register is a group of flip-flops, which store binary information, and gates, which
controls when and how information is transferred to the register. An n-bit register has
n flip-flops and stores n-bits of binary information. Two basic types of registers are:
parallel registers and shift registers.
A parallel register is one of the simplest registers, consisting of a set of flip-flops that
can be read or written simultaneously. Fig. 4.15 shows a 4-bit register with parallel
input-output. The signal lines Io to I3 inputs to flip-flops, which may be output of other
arithmetic circuits like multipliers, so that data from different sources can be loaded
into the register. It has one additional line called clear line, which can clears the
register completely. This register is called a parallel register as all the bits of the
register can be loaded in a single clock pulse.
A shift register is used for shifting the data to the left or right. A shift register operates
in serial input-output mode i.e. data is entered in the register one bit at a time from one
end of the register and can be read from the other end as one bit at a time. Fig. 4.16
shows a 4-bit right shift register using D logical shift functions.
98
Please note that in this register signal shift enable is used instead of clock pulse, why?
Because it is not necessary that we want the register to perform shift on each clock
pulse.
Principles of Logic
Circuits II
A register, which shifts data only in one direction, is called uni-directional shift
register and a register, which can shift data in both directions, is called bi-directional
shift register. Shift register can be constructed for bi-directional shift with parallel
input-output. A general shift register structure may have parallel data transfer to or
from the register along with added facility of left or right shift. This structure will
require additional control lines for indicating whether parallel or serial output is
desired and left or right shift is required. A general symbolic diagram is shown in Fig.
4.17 for this register.
There are 3 main control lines shown in the above figure. If parallel load enable is
active, parallel input-output operation is done otherwise serial input- output shift
select line for selecting right or left shift. If it has value 0 then right shift is performed
and for value 1, left shift is done. Shift enable signal indicates when to start shift.
4.5.2
99
Introduction to Digital
Circuits
The input line to J & K of all flip-flops is kept high i.e. logic1. Each time a clock
pulse occurs the value of flip-flop is complemented (Refer to characteristic table of J
K flip-flop in Figure. 4.6 (c). Please note that the clock pulse is given only to first flipflop and second flip-flop onwards, the output of previous flip-flop is fed as clock
signal. This implies that these flip-flops will be complemented if the previous flip-flop
has a value 1. Thus, the effect of complement will ripple through these flip-flops.
4.5.3
Synchronous Counters
The major disadvantage of ripple counter is the delay in changing the value. How? To
understand this, take an instance when the state of ripple counter is 0111. Now the
next state will be 1000, which means change in the state of all flip-flops. But will it
occur simultaneously in ripple counter? No, first O0 will change then O1, O2 & lastly
O3. The delay is proportional to the length of the counter. Therefore, to avoid this
disadvantage of ripple counters, synchronous counters are used in which all flip-flops
change their states at same time. Fig 4.19 shows 3-bit synchronous counter.
You can understand the working of this counter by analyzing the sequence of states
(O0, O1, O2) given in Figure 4.20
100
O2
O1
O0
0
0
0
0
1
1
1
1
0
0
0
1
1
0
0
1
1
0
0
1
0
1
0
1
0
1
0
Principles of Logic
Circuits II
ii)
iii)
The third flip-flop is fed by an AND gate which is connected with output of first
and second flip-flops. It will be complemented only when first & second flipflops are in Set State.
4.5.4
Here we will confine our discussion, in general to the RAM only as an example of
sequential circuit. A memory unit is a collection of storage cells or flip flops
alongwith associated circuits required to transfer information in and out of the device.
The access time and cycle time it takes are constant and independent of the location,
hence the name random access memory.
RAMs are organized (logically) as words of fixed length. The memory communicates
with other devices through data input and output lines, address selection lines and
control lines that specify the direction of transfer.
Now, let us try to understand how data is stored in memory. The internal construction
of a RAM of m words and n bits per word consists of m n binary cells and
associated circuits for dectecting individual words. Figure 4.21 shows logic diagram
and block digram of a binary cell.
The input is fed to AND gate a in complemented form. The read operation is
indicated by 1 on read/ write signal. Therefore during the read operation only the
AND gate c becomes active. If the cell has been selected, then the output will
become equal to the state of flip flop i.e. the data value stored in flip flop is read. In
write operation a & b gates become active and they set or clear the J-K flip flop
depending upon the input value. Please note in case input is 0, the flip flop will go to
clear state and if input is 1, the flip flop will go to set state. In effect, the input data is
101
Introduction to Digital
Circuits
reflected in the state of flip-flop. Thus, we say that input data has been stored in flipflop or binary cell.
Fig 4.22 is the extension of this binary cell to an IC RAM circuit, where a 2 4
decoder is used to select one of the four words. (For 4 words we need 2 address lines)
Please note that each decoder output is connected to a 4bit word and the read/write
signal is given to each binary cell. Once the decoder selects the word, the read/write
input tells the operation. This is derived using an OR gate, since all the non-selected
cells will produce a zero output. When the memory select input to decoder is 0, none
of the words is selected and the contents of the cell are unchanged irrespective of
read/write input.
Figure 4.22: 4
102
RAM
After discussing so much about combinational circuits and sequential circuits, let us
discuss in the next section an example having a combination of both circuits.
4.6
Principles of Logic
Circuits II
Let us design a synchronous BCD counter. A BCD counter follows a sequence of ten
states and returns to 0 after the count of 9. These counters are also called decade
counters. This type of counter is useful in display applications in which BCD is
required for conversion to a decimal readout. Fig 4.23 shows the characteristic table
for this counter.
Present State
Next State
Flip-Flops Inputs
JA
KA
JB
KB
JC
KC
JD
KD
[NOTE : Remember excitation table for J-K flip flop given in fig 4.8]
There are 4 flip-flop inputs for decade counter i.e. A, B, C, D. The next state of
flip-flop is given in the table. JA & KA indicates the flip flop input corresponding to
flip-flop-A. Please note this counter require 4-flip-flops.
From this the flip flop input equations are simplified using K-Maps as shown in figure
4.24. The unused minterms from 1010 through 1111 are taken as dont care
conditions.
103
Introduction to Digital
Circuits
=
=
BCD
CD
KA
KB
=
=
D
CD
=
AD
KC
=
D
JC
JD
=
1
KD
=
1
The logic circuit can be made with 4 JK flip flops & 3 AND gates
104
Principles of Logic
Circuits II
2)
3)
4.7
SUMMARY
As told to you earlier this unit provides you information regarding sequential circuits
which is the foundation of digital design. Flip-flops are basic storage unit in sequential
circuits are derived from the latches. The sequential circuit can be formed using
combinational circuits (discussed in the last unit) and flip flops. The behavior of
sequential circuit can be analyzed using tables & state diagrams.
Registers, counters etc. are structured sequential blocks. This unit has outlined the
construction of registers, counters, RAM etc. Lastly, we discussed how a circuit can
be designed using both sequential & combinational circuits. For more details, the
students can refer to further reading.
4.8
SOLUTIONS / ANSWERS
1)
2)
Flip flop is the basic storage element for synchronous sequential circuits.
Whereas latches are bistable devices whose state normally depends upon the
asynchronous inputs and are not suitable for use in synchronous sequential
circuits using single clock.
3)
Excitation table indicates that if present and next state are known then what will
be inputs whereas a characteristics table indicates just opposite of this i.e. inputs
are known the, next state has to be found.
1)
2)
105
Introduction to Digital
Circuits
1)
The main difference is the time when the counter flip-flops change its states. In
synchronous counter all the flip flops that need to change; change
simultaneously. In asynchronous counter the complement if to be done may
ripple through a series of flip-flops.
2)
Yes, but this: circuit will generate sequence of states where only 1-bit changes
at a time i.e. 0000, 1000, 1100, 1110, 1111, 0111, 0011, 0001
3)
Yes, We require 23 i.e. three flip flops for the sequence 0, 1, 2, 3, 4, 5&6.
Present State
Next State
JA
KA
JB
KB
JC
KC
The state is dont care condition: Make the suitable K-maps. The following are the
flip-flop input values:
JA= BC
KA = B
JB = C
KB = C + A
JC = A + B
KC = 1
106
Introduction
Objectives
The Memory Hierarchy
RAM, ROM, DRAM, Flash Memory
Secondary Memory and Characteristics
1.4.1
1.4.2
1.4.3
1.5
1.6
21
26
Cache Memory
Cache Organisation
Memory Interleaving
Associative Memory
Virtual Memory
The Memory System of Micro-Computer
1.8.1
1.8.2
5
5
5
7
13
1.7
1.8
Page Nos.
34
36
1.9 Summary
1.10 Solutions /Answers
39
39
1.0 INTRODUCTION
In the previous Block, we have touched upon the basic foundation of computers,
which include concepts on von Neumann machine, instruction, execution, the digital
data representation and logic circuits. In this Block we will define some of the most
important component units of a computer, which are the memory unit and the inputoutput units. In this unit we will discuss various components of the memory system
of a computer system. Computer memory is organised into a hierarchy to minimise
cost. Also, it does not compromise the overall speed of access. Memory hierarchy
include cache memory, main memory and other secondary storage technologies. In
this Unit, we will discuss the main memory, the secondary memory and high-speed
memories such as cache memory, and the memory system of microcomputer.
1.1 OBJECTIVES
After going though this Unit, you will be able to:
Basic Computer
Organisation
The storage devices along with the algorithm or information on how to control and
manage these storage devices constitute the memory system of a computer.
A memory system is a very simple system, yet it exhibits a wide range of technology
and types. The basic objective of a computer system is to increase the speed of
computation. Likewise the basic objective of a memory system is to provide fast,
uninterrupted access by the processor to the memory such that the processor can
operate at the speed it is expected to work.
But does this kind of technology where there is no speed gap between processor and
memory speed exist? The answer is yes, it does. Unfortunately as the access time
(time taken by CPU to access a location in memory) becomes less the cost per bit of
memory becomes higher. In addition, normally these memories require power supply
till the information needs to be stored. Both these things are not very convenient, but
on the other hand the memories with smaller cost have very high access time that will
result is in slower operation of the CPU. Thus, the cost versus access time anomaly
has led to a hierarchy of memories where we supplement fast memories with larger,
cheaper, slower memories. These memory units may have very different physical and
operational characteristics; therefore, the memory system is very diverse in type, cost,
organisation, technology and performance. This memory hierarchy will work only if
the frequency of access to the slower memories is significantly less than the faster
memories. The memory hierarchy system consists of all storage devices employed in a
computer system from the slow but high capacity auxiliary memory to a relatively
faster main memory, to an even smaller and faster cache memory accessible to the
high speed registers and processing logic. Figure 1 illustrates the components of a
typical memory system.
CPU
(Registers)
Cache
Memory
Main
Memory
I/O
Interfaces
Magnetic Tapes
Magnetic Disks
A typical storage hierarchy is shown in Figure 1 above. Although Figure 1 shows the
block diagram, it includes the storage hierarchy:
Register
Cache memory
Main memory
Secondary Storage and
Mass Storage.
As we move up the hierarchy, we encounter storage elements that have faster access
time, higher cost per bit stored, and slower access time as a result of moving down the
hierarchy. Thus, cache memory generally has the fastest access time, the smallest
storage capacity, and the highest cost per bit stored. The primary memory (main
memory) falls next in the storage hierarchy list. On-line, direct-access secondary
storage devices such as magnetic hard disks make up the level of hierarchy just below
the main memory. Off-line, direct-access and sequential access secondary storage
devices such as magnetic tape, floppy disk, zip disk, WORM disk, etc. fall next in the
storage hierarchy. Mass storage devices, often referred to as archival storage, are at
6
the bottom of the storage hierarchy. They are cost-effective for the storage of very
large quantities of data when fast access time is not necessary.
The quantum of data that is transferred between two consecutive memory layers
at a time also increases as we go from a higher to lower side. For example, from
main memory to Cache transfer one or few memory words are accessed at a time,
whereas in a hard disk to main memory transfer, a block of about 1 Megabyte is
transferred in a single access. You will learn more about this in the later sections
of the unit.
Let us now discuss various forms of memories in the memory hierarchy in more
details.
The construction shown in Figure 2(a) is made up of one JK flip-flop and 3 AND
gates. The two inputs to the system are one input bit and read/write signal. Input is fed
in complemented form to AND gate a. The read/write signal has a value of 1 if it is a
read operation. Therefore, during the read operation the AND gate c has the
read/write input as 1. Since AND gate a and b have 0 read/write input, and if the
7
Basic Computer
Organisation
chip is selected i.e. this cell is currently being selected, then output will become equal
to the state of flip-flop. In other words the data value stored in flip-flop has been read.
In write operation only a and b gates get a read/write value of 1 and they set or
clear the JK flip-flop depending on the data input value. If the data input is 0, the flipflop will go to clear state and if data input is 1, the flip-flop will go to set state. In
effect, the input data is reflected in the state of the flip-flop. Thus, we say that the
input data has been stored in flip-flop or binary cell.
A 32 4 RAM means that this RAM has 32 words, 5 address lines (25 = 32), and 4 bit
data word size. Please note that we can represent a RAM using 2AD, where A is the
number of address lines and D is the number of Data lines. Figure 2 (b) is the
extension of the binary cell to an integrated 32 4 RAM circuit where a 5 32 bit
decoder is used. The 4 bit data inputs come through an input buffer and the 4-bit data
output is stored in the output buffer.
A chip select ( CS ) control signal is used as a memory enable input. When CS = 0 that
is CS = 1, it enables the entire chip for read or write operation. A R/W signal can be
used for read or write operation. The word that is selected will determine the overall
output. Since all the above is a logic circuit of equal length that can be accessed in
equal time, thus, the word RAM.
DRAM (Dynamic Random Access Memory)
RAM technology is divided into two technologies: dynamic and static. A dynamic
RAM (DRAM) is made with cells that store data as charge on capacitors. The
presence or absence of charge on capacitor is interpreted as binary 1 or 0. Because
capacitors have a natural tendency to discharge, dynamic RAM requires periodic
charge refreshing to maintain data storage. The term dynamic refers to this tendency
of the stored charge to leak away, even with power continuously applied.
8
Figure 3(a) is a typical DRAM structure for an individual cell that stores one bit. The address
line is activated when the bit value from this cell is to be read or written. The transistor acts as
a switch that is closed (allowing current to flow) if a voltage is applied to the address line
and open (no current flows) if no voltage is present on the address line.
Basic Computer
Organisation
For the write operation (please refer to Figure 3 (a), a voltage signal is applied to the
bit line; a high voltage represents 1, and a low voltage represents 0. A signal is then
applied to the address line, allowing a charge to be transferred to the capacitor.
For the read operation, when the address line is selected, the transistor turns on and
the charge stored on the capacitor is fed out onto a bit line and to the sense amplifier.
The sense amplifier compares the capacitor voltage to a reference value and
determines if the cell contains logic 1 or logic 0. The read out from the cell
discharges the capacitor, which must be restored to complete the operation.
Although the DRAM cell is used to store a single bit (0 or 1), it is essentially an
analog device. The capacitor can store any charge value within a range; a threshold
value determines whether the charge is interpreted as 1 or 0.
Row address selection specifying the present address values A0 to A10 (11
address lines only). For the rows, it is stored in the row address buffer through
decoder.
The column address buffer is loaded with the column address values, which
are also applied to through A0 to A10 lines only. Please note that these lines
should contain values for the column.
This job will be done through a change in external signal RAS (Row address
Strobe) because this signal is high at the rising edge of the clock.
CAS (Column address Strobe) causes the column address to be loaded with
these values.
Each column is of 4 bits, that is, those require 4 bit data lines from input/output
buffer. On memory write operation data in bit lines being activated while on
read sense lines being activated.
This chip requires 11 address lines (instead of 22), 4 data in and out lines and
other control lines.
As there are 11 row address lines and 11 column address lines and each
column is of 4 bits, therefore, the size of the chip is 211 211 4 = 2048
2048 4 = 16 mega bits. On increasing address lines from 11 to 12 we have
212 212 4 = 64 mega bits, an increase of a factor of 4. Thus, possible sizes
of such chips may be 16K, 256K, 1M, 4M, 16M, and so on.
Refreshing of the chip is done periodically using a refresh counter. One simple
technique of refreshing may be to disable read-write for some time and refresh
all the rows one by one.
n address lines
2 m
ROM
M output bits
Input Output
I1 I2 O1 O2
0
0
1
1
0
1
0
1
0
1
1
0
1
0
1
0
Figure 4: ROM
A ROM is characterised by the number of words (2n ) and the number of bits (m) per
word. For example, a 32 8 ROM which can be written as 25 8 consists of 32 words
of 8 bit each, which means there are 8 output lines and 32 distinct words stored in the
unit. There are only 5 input lines because 32 = 25and with 5 binary variables, we can
specify 32 addresses.
A ROM is basically a combinational circuit and can be constructed as shown in Figure
4(c). On applying an Input I1 = 0, I2 = 0, the 00 line of the decoder is selected and we
will get O1 = 0 and O2 = 1; on applying I1 = 0 and I2 = 1 we will get O1 = 1 AND O2 =
0. This same logic can be used for constructing larger ROMs.
ROMs are the memories on which it is not possible to write the data when they are online to the computer. They can only be read. This is the reason why it is called readonly memory (ROM). Since ROM chips are non-volatile, the data stored inside a
ROM are not lost when the power supply is switched off, unlike the case of a volatile
RAM chip. ROMs are also known as permanent stores.
The ROMs can be used for storing micro-programs, system programs and subroutines.
ROMs are non-volatile in nature and need not be loaded in a secondary storage
device. ROMs are fabricated in large numbers in a way where there is no room for
even a single error. But, this is an inflexible process and requires mass production.
Therefore, a new kind of ROM called PROM was designed which is also non-volatile
and can be written only once and hence the name Programmable ROM(PROM). The
supplier or the customer can perform the writing process in PROM electrically.
Special equipment is needed to perform this writing operation. Therefore, PROMs are
more flexible and convenient than ROMs.
The ROMs / PROMs can be written just once, but in both the cases whatever is
written once cannot be changed. But what about a case where you read mostly but
write only very few times? This led to the concepts of read mostly memories and the
best example of these are EPROMs (Erasable PROMs) and EEPROMs (Electrically
Erasable PROMs).
The EPROMs can be read and written electrically. But, the write operation is not
simple. It requires erasure of whole storage cells by exposing the chip to ultra violet
light, thus bringing them to the same initial state. Once all the cells have been brought
to same initial state, then the EPROM can be written electrically. EEPROMs are
becoming increasingly popular, as they do not require prior erasure of previous
11
Basic Computer
Organisation
contents. However, in EEPROMS the writing time is considerably higher than the
reading time. The biggest advantage of EEPROM is that it is non-volatile memory and
can be updated easily, while the disadvantages are the high cost and at present they
are not completely non-volatile and the write operation takes considerable time. But
all these advantages are disappearing with growth in technology. In general, ROMs
are made of cheaper and slower technology than RAMs.
Flash Memory
This memory is another form of semiconductor memory, which was first introduced in
the mid-1980. These memories can be reprogrammed at high speed and hence the
name flash. This is a type of non-volatile, electronic random access memory.
Basically this memory falls in between EPROM and EEPROM. In flash memory the
entire memory can be erased in a few seconds by using electric erasing technology.
Flash memory is used in many I/O and storage devices. Flash memory is also used to
store data and programming algorithms in cell phones, digital cameras and MP3
music players.
Flash memory serves as a hard drive for consumer devices. Music, phone lists,
applications, operating systems and other data are generally stored on stored on flash
chips. Unlike the computer memory, data are not erased when the device is turned off.
There are two basic kinds of flash memory:
Code Storage Flash made by Intel, AMD, Atmel, etc. It stores programming
algorithms and is largely found in cell phones.
Data Storage Flash made by San Disk, Toshiba, etc. It stores data and comes in
digital cameras and MP3 players.
Category
Random-access
Memory (RAM)
Read only
Memory (ROM)
Programmable
ROM (PROM)
Erasable PROM
(EPROM)
Readwrite
memory
Readonly
memory
Readonly
memory
Read-mostly
memory
Electrically
Erasable
(EEPROM)
Flash memory
Erasure
Electrically,
byte level
Not possible
Write
Mechanism
Electrically
Volatile
Masks
Non-volatile
Not possible
Electrically
Non-volatile
UV light chip
level
Electrically
Non-volatile
Read-mostly
memory
Electrically,
byte level
Electrically
Non-volatile
Read-mostly
memory
Electrically,
block level
Electrically
Non-volatile
1.4
12
Volatility
It is desirable that the operating speed of the primary storage of a computer system be
as fast as possible because most of the data transfer to and from the processing unit is
via the main memory. For this reason, storage devices with fast access times, such as
semiconductors, are generally used for the design of primary storage. These
high-speed storage devices are expensive and hence the cost per bit of storage is also
high for a primary storage. But the primary memory has the following limitations:
a)
b)
Volatile: The primary storage is volatile and the data stored in it is lost when the
electric power is turned off. However, the computer systems need to store data on
a permanent basis for several days, months or even several years.
The result is that an additional memory called secondary storage is used with most of
the computer systems. Some popular memories are described in this section.
1.4.1
This is one of the components of todays personal computer, having a capacity of the
order of several Giga Bytes and above. A magnetic disk has to be mounted on a disk
drive before it can be used for reading or writing of information. A disk drive contains
all the mechanical, electrical and electronic components for holding one or more disks
and for reading or writing of information on it. That is, it contains the central shaft on
which the disks are mounted, the access arms, the read/write head and the motors to
rotate the disks and to move the access arms assembly. Now-a-days, the disk drive
assembly is packed in very small casing although having very high capacity. Now let
us know about what a magnetic disk is.
Magnetic Disk
Data are recorded on and later retrieved from the disk via a conducting coil named the
head; in many systems there are two heads, a read head and a write head.
During a read or write operation, the head is stationary while the platter rotates
beneath it.
13
Basic Computer
Organisation
The write mechanism is based on the fact that electricity flowing through a coil
produces a magnetic field. Pulses are sent to the write head, and magnetic patterns are
recorded on the surface below, with different patterns for positive and negative
currents. The write head itself is made of easily magnetisable material and is in the
shape of a rectangular doughnut with a gap along one side and a few turns of
conducting wire along the opposite side (Figure 6). An electric current in the wire
induces a magnetic field across the gap, which in turn magnetizes a small area of the
recording medium. Reversing the direction of the current reverses the direction of the
magnetization on the recording medium.
The traditional read mechanism is based on the fact that a magnetic field moving
relative to a coil produces an electrical current in the coil. When the surface of the
disk passes under the head, it generates a current of the same polarity as the one
already recorded. The structure of the head for reading is in this case essentially the
same as for writing and therefore the same head can be used for both. Such single
heads are used in floppy disk systems and in older rigid disk systems.
Data Organization and Formatting
The head is a relatively small device capable of reading from or writing to a portion of
the platter rotating beneath it. This gives rise to the organization of data on the platter
in a concentric set of rings, called tracks; each track is of the same width as the head.
There are thousands of tracks per surface.
Figure 7 depicts this data layout. Adjacent tracks are separated by gaps. This prevents,
or at least minimizes, errors due to misalignment of the head. Data are transferred to
Figure 7: Layout of Magnetic Disk
and from the disk in sectors. To identify the sector position normally there may be a
starting point of a track and a starting and end point of each sector. But the question is
how is a sector of a track recognised? A disk is formatted to record control data on it
such that some extra data are stored on it for identical purpose. This control data is
14
accessible only to the disk drive and not to the user. Please note that in Figure 7 as we
move away from the centre of a disk the physical size of the track is increasing. Does
it mean we store more data on the outside tracks? No. A disk rotates at a constant
angular velocity. But, as we move away from centre the liner velocity is more than the
liner velocity nearer to centre. Thus, the density of storage of information decreases as
we move away from the centre of the disk. This results in larger physical sector size.
Thus, all the sectors in the disk store same amount of data.
An example of disk formatting is shown in Figure 8. In this case, each track contains
30 fixedlength sectors of 600 bytes each. Each sector holds 512 bytes of data plus
control information useful to the disk controller. The ID field is a unique identifier or
address used to locate a particular sector. The SYNC byte is a special bit pattern that
delimits the beginning of the field. The track number identifies a track on a surface.
The head number identifies a head, because this disk has multiple surfaces. The ID
and data fields each contain an error-detecting code.
Physical Characteristics
Figure 9 lists the major characteristics that differentiate among the various types of
magnetic disks. First, the head may either be fixed or movable either respect to the
radial direction of the platter. In a fixed-head disk, there is one read-write head per
track. All of the heads are mounted on a rigid arm that extends across all tracks; such
systems are rare today. In a movable-head disk, there is only one read-write head.
Again, the head is mounted on an arm. Because the head must be able to be positioned
above any track, the arm can be extended or retracted for this purpose.
Head Motion
Platters
Fixed head (one per track)
Moveable head (one per surface)
Single platter
Multiple platter
Disk Portability
Head mechanism
Non-removable
Basic Computer
Organisation
Removable disk
Fixed gap
Aerodynamic gap (Winchester)
Sides
Single sided
Double sided
Figure 9: Physical characteristics of Disk Systems
The disk itself is mounted in a disk drive, which consists of the arm, a shaft that
rotates the disk, and the electronics needed for input and output binary data. A nonremovable disk is permanently mounted in the disk drive; the hard disk in a personal
computer is a non-removable disk. A removable disk can be removed and replaced
with another disk. The advantage of the latter type is that unlimited amounts of data
are available with a limited number of disk systems. Furthermore, ZIP cartridge disks
are examples of removable disks. Figure 10 shows other components of the disks.
Platter
Surface
Read write
hea
d
Spindle
Head arm
Read/write
The head mechanism provides a classification of disks into three types. Traditionally,
the read-write head has been positioned at a fixed distance above the platter, allowing
an air gap. At the other extreme is a head mechanism that actually comes into physical
contact with the medium during a read or write operation. This mechanism is used
with the floppy disk, which is a small, flexible platter and the least expensive type of
disk.
To understand the third type of disk, we need to comment on the relationship between
data density and the distance of head from the surface. The head generates or senses
an electromagnetic field of sufficient magnitude to write and read properly. The
narrower the head is, the closer it must be to the platter surface to function. A
narrower head means narrower tracks and therefore greater data density, which is
desirable. However, the closer the head is to the disk, the greater are the risks of errors
from impurities or imperfections.
To push the technology further, the Winchester disk was developed. Winchester
heads are used in sealed drive assemblies that are almost free of contaminants. They
are designed to operate closer to the disks surface than conventional rigid disk heads,
thus allowing greater data density. The head is actually an aerodynamic foil that rests
16
lightly on the platters surface when the disk is motionless. The air pressure generated
by a spinning disk is enough to make the foil rise above the surface. The resulting
non-contact system can be engineered to use narrower heads that operate closer to the
platters surface than conventional rigid disk heads.
Disks operate is semi-random mode of operation and normally are referenced block
wise. The data access time on a disk consists of two main components:
Seek time: Time to position the head on a specific track. On a fixed head disks
it is the time taken by the electronic circuit to select the required head while in
movable head disks it is the time required to move the head to a particular track.
Latency time: This is the time required by a sector to reach below the
read/write head. On an average it is half of the time taken for a rotation by the
disk.
In addition to the seek and latency time, the time taken to transfer a (read/write) block
of words can be considered but normally it is too small in comparison to latency and
seek time and in general the disk access time is considered to be the sum of seek time
and latency time. Since access time of disks is large, therefore it is advisable to read a
sizeable portion of data in a single go and that is why the disks are referenced block
wise. In fact, you will find that in most of the computer system, the input/output
involving disk is given a very high priority. The basic reason for such priority is the
latency time that is needed once the block which is to be read passes below the readwrite head; it may take time of the order of milliseconds to do that again, in turn
delaying the Input /Output and lowering the performance of the system.
1.4.2
Optical Memories
In 1983, one of the most successful consumer products of all times was introduced:
the compact disk (CD) digital audio system. This CD was a non-erasable disk that
could store more than 60 minutes of audio information on one side. The huge
commercial success of this CD enabled the development of low-cast optical-disk
storage technology that has revolutionised computer data storage. A variety of opticaldisk systems has been introduced. We briefly review each of these.
Compact Disk ROM (CD-ROM)
Both the audio CD and the CD-ROM (compact disk read-only memory) share a
similar technology. The main difference is that CD-ROM players are more rugged and
have error correction devices to ensure that data are properly transferred from disk to
computer. Both types of disk are made the same way. The disk is formed from a resin,
such as polycarbonate. Digitally recorded information (either music or computer data)
is imprinted as a series of microscopic pits on the surface of the polycarbonate. The
pitted surface is then coated with a highly reflective surface, usually aluminium. This
shiny surface is protected against dust and scratches by a topcoat of clear acrylic.
Finally, a label can be silk-screened onto the acrylic.
17
Basic Computer
Organisation
Sync: The sync field identifies the beginning of a block. It consists of a byte of
all 0s, 10 bytes of all 1s, and bytes of all 0s.
Header: The header contains the block address and the mode byte. Mode 0
specifies a blank data field; mode 1 specifies the use of an error-correcting code
and 2048 bytes of data; mode 2 specifies 2336 bytes of user data with no error
correcting code.
But what is the Min (Minute), Sec (Second) and Sector fields in the Header field?
The sectors of CD-ROM are not organised like the sectors in hard disks (Please refer
Figure
12: Block
Format
Layoutlength
on CD-ROM
Figure
12(b)).
Rather,
theyand
areDisk
all equal
segments. If we rotate the CD drive at
constant speed the linear velocity of disk surface movement will be higher at the outer
side than that of the centre portions. To offset this liner speed gap, either we store less
data on the outer sectors or we reduce the speed of rotation while reading outer tracks.
The CD follows the later approach, that is, instead of moving the CD drive at constant
velocity, it is rotated at variable velocity. The speed or rotation of disk reduces as we
move away from the centre such that the sectors can be read in constant time. This
method of reading is called Constant Liner Velocity (CLV).
18
2.
The optical disk together with information stored on it can be mass replicated
inexpensively, unlike a magnetic disk. The database on a magnetic disk has to be
reproduced by copying data from one disk to second disk, using two disk drives.
3.
The optical disk is removable, allowing the disk itself to be used for archival
storage. Most magnetic disks are non-removable. The information on nonremovable magnetic disks must first be copied on tape before the disk drive / disk
can be used to store new information.
2.
It has an access time much longer than that of a magnetic disk drive (as it
employs CLV), as much as half a second.
The CD-RW optical disk can be repeatedly written and overwritten, as with a
magnetic disk. Although a number of approaches have been tried, the only pure
optical approach that has proved attractive is called phase change. The phase change
disk uses a material that has two significantly different reflectivities in two different
phase states. There is an amorphous state, in which the molecules exhibit a random
orientation and which reflects light poorly; and a crystalline state, which has a smooth
surface that reflects light well. A beam of laser light can change the material from one
phase to the other. The primary disadvantage of phase change optical disks is that the
material eventually and permanently loses its desirable properties. Current materials
can be used for between 500,000 and 1,000,000 erase cycles.
The CDRW has the obvious advantage over CD-ROM and CD-R that it can be
rewritten and thus used as a true secondary storage. As such, it competes with
magnetic disk. A key advantage of the optical disk is that the engineering tolerances
for optical disks are much less severe than for high-capacity magnetic disks. Thus,
they exhibit higher reliability and longer life.
Digital Versatile Disk (DVD)
19
Basic Computer
Organisation
With the capacious digital versatile disk (DVD), the electronics industry has at last
found an acceptable replacement for the videotape used in videocassette recorders
(VCRs) and, more important for this discussion, replace the CD-ROM in personal
computers and servers. The DVD has taken video into the digital age. It delivers
movies with impressive picture quality, and it can be randomly accessed like audio
CDS, which DVD machines can also play. Vast volumes of data can be crammed onto
the disk, several times as much as a CD-ROM. With DVDs huge storage capacity and
vivid quality, PC games will become more realistic and educational software will
incorporate more video.
1.4.3
CCDs are used for storing information. They have arrays of cells that can hold charge
packets of electron. A word is represented by a set of charge packets, the presence of
each charge packet represent the bit-value 1. The charge packets do not remain
stationary and the cells pass the charge to the neighbouring cells with the next clock
pulse. Therefore, cells are organized in tracks with a circuitry for writing the data at
the beginning and a circuitry for reading the data at the end. Logically the tracks (one
for each bit position) may be conceived as loops since the read circuitry passes the
information back to the write circuit, which then re-creates the bit values in the track
unless new data is written to the circuit.
These devices come under the category of semi-random operation since the devices
must wait till the data has reached the circuit for detection of charge packets. The
access time to these devices is not very high. At present this technology is used only
in specific applications and commercial products are not available.
Magnetic Bubble Memories
1.
T/F
2.
b) The disadvantage of DRAM over static RAM is the need to refresh the
capacitor charge every few milliseconds.
T/F
T/F
3.
4.
5.
6. Describe the internal architecture of a DRAM that stores 4K bytes chip size and
uses a square register array. How many address lines will be needed? Suppose the
same configuration exists for an old RAM, then how many address lines will be
needed?
7. How many RAM chips of size 256K 1 bit are required to build 1M Byte
memory?
...
...
Basic Computer
Organisation
Strip 1
Strip 2
Strip 3
Strip 4
Strip 5
Strip 6
Strip 7
Strip 8
Strip 9
Strip 10
Strip 11
Strip 12
Strip 3
Strip 14
Strip 15
Strip 1
Strip 2
Strip 3
Strip 0
Strip 1
Strip 2
Strip 3
Strip 4
Strip 5
Strip 6
Strip 7
Strip 4
Strip 5
Strip 6
Strip 7
Strip 8
Strip 9
Strip 10
Strip 11
Strip 8
Strip 9
Strip 10
Strip 11
Strip 12
Strip 3
Strip 14
Strip 15
Strip 12
Strip 13
Strip 14
Strip 15
b0
b1
b2
b3
f0(b)
f1(b)
f2(b)
b0
b1
b2
Block 0
b3
Parity(b)
Block 1
Block 2
Block 3
Parity (1-3)
Block 5
Block 6
Block 7
Parity (4-7)
Block 8
Block 9
Block 10
Block 11
Parity (8-11)
Block 12
Block 13
Block 14
Block 15
Parity(12-15)
22
Block 0
Block 1
Block 2
Block 3
Parity (0-3)
Block 4
Block 5
Block 6
Parity (4-7)
Block 7
Block 8
Block 9
Parity(8-11)
Block 10
Block 11
Block 12
Parity(12-15)
Block 13
Block 14
Block 15
Parity(16-19)
Block 16
Block 17
Block 18
Block 19
The term RAID was coined by researchers at University of Berkley. In their paper the
meaning of RAID was Redundant Array of Inexpensive Disks. However, later the
term Independent was adopted instead of Inexpensive to signify performance and
reliability gains.
RAID has been proposed at various levels, which are basically aimed to cater for the
widening gap between the processor and on-line secondary storage technology.
The basic strategy used in RAID is to replace the large capacity disk drive with
multiple smaller capacity disks. The data on these disks is distributed to allow
simultaneous access, thus improving the overall input/output performance. It also
allows an easy way of incrementing the capacity of the disk. Please note that one of
the main features of the design is to compensate for the increase in probability of
failure of multiple disks through the use of parity information. The seven levels of
RAID are given in Figure 13 shown above. Please note that levels 2 and 4 are not
commercially offered.
In RAID technologies have two important performance considerations:
23
Basic Computer
Organisation
RAID
Level
0
Category
Data Transfer
Rate (Read /write)
Typical
Application
Striping
Large strips:
Excellent
Small strip:
Excellent
Mirroring
Good / fair
Fair /fair
Parallel
Access
Poor
Excellent
Commercially
not useful.
Parallel
Access
Poor
Excellent
Large I/O
request size
application, such
as imaging CAD
Independe
nt access
Excellent/ fair
Fair / poor
Commercially
not useful.
Independe
nt access
Excellent / fair
Fair / poor
High request
rate read
intensive, data
lookup
Independe
nt access
Excellent/ poor
Fair / poor
Application
requiring
extremely high
availability
Applications
requiring high
performance for
non-critical data
System drives;
critical files
Why are high-speed memories needed? Is the main memory not a high-speed
memory? The answer to the second question is definitely No, but why so? For this,
we have to go to the fundamentals of semiconductor technology, which is beyond the
scope of the Unit. Then if the memories are slower, then how slow are they? On an
average it has been found that the operating speed of main memories lack by a factor
24
In addition, each instruction requires several memory accesses (it may range from 2 to
7 or even more sometimes). If an instruction requires even 2 memory accesses, even
then almost 80% of the time of executing an expression, processors waits for memory
access.
The question is what can be done to increase this processor-memory interface
bandwidth? There are four possible answers to the question. These are:
a)
Decrease the memory access time; use a faster but expensive technology for main
memory.
b)
Access more words in a single memory access cycle. That is, instead of accessing
one word from the memory in a memory access cycle, access more words.
c)
Insert a high-speed memory termed as Cache between the main memory and
processor.
d)
Hardware researchers are taking care of the first point. Let us discuss some high speed
memories that are in existence at present.
1.6.1
Cache Memory
Cache memory is an extremely fast, small memory between CPU and main memory
whose access time is closer to the processing speed of the CPU. It acts as a high-speed
buffer between CPU and main memory and is used to temporarily store currently
active data and instructions during processing. Since the cache memory is faster than
main memory, the processing speed is increased by making data and instructions
needed in present processing available in the cache
The obvious question that arises is how the system can know in advance which data
and instruction are needed in present processing so as to make it available beforehand
in the cache. The answer to this question comes from a principle known as locality of
reference. According to this principle, during the course of execution of most
programs, memory references by the processor, for both instructions and data, tend to
cluster. That is, if an instruction is executed, there is a likelihood of the nearby
instruction being executed soon. Locality of reference is true not only for reference to
program instruction but also for references to data. As shown in Figure 14, the cache
memory acts as a small, fast-speed buffer between the processor and main memory.
25
Basic Computer
Organisation
26
1.
Cache Size: Cache memory is very expensive as compared to the main memory
and hence its size is normally kept very small. It has been found through
statistical studies that reasonably small caches can have a significant impact on
processor performance. As a typical example of cache size, a system having 1
GB of main memory may have about 1 MB of cache memory. Many of todays
personal computers have 64KB, 128KB, 256KB, 512KB, or 1 MB of cache
memory.
2.
Block Size: Block size refers to the unit of data (few memory words) exchanged
between cache and main memory. As the block size increases from very small to
larger size, the hit ratio (fraction of times that referenced instruction/data is found
in cache) will at first increase because of the principle of locality since more and
more useful words are brought into the cache. However, the hit ratio will begin to
decrease as the block size further increases because the probability of using the
newly fetched words becomes less than the probability of reusing the words that
must be moved out of the cache to make room for the new block. Based on this
fact, the block size is suitably chosen to maximise the hit ratio.
3.
Replacement Policy: When a new block is to be fetched into the cache, another
may have to be replaced to make room for the new block. The replacement policy
decides which block to replace in such a situation. Obviously, it will be best to
replace a block that is least likely to be needed again in the near future.
4.
Write Policy: If the contents of a block in the cache are altered, then it is
necessary to write it back to main memory before replacing it. The write policy
decides when the altered words of a block are written back to main memory. At
one extreme, an updated word of a block is written to the main memory as soon
as such updates occur in the block. At the other extreme, all updated words of the
block are written to the main memory only when the block is replaced from the
cache. The latter policy minimises overheads of memory write operations but
temporarily leaves main memory in an inconsistent (obsolete) state.
1.6.2
Cache Organisation
Cache memories are found in almost all latest computers. They are very useful for
increasing the speed of access of information from memory. Let us look into their
organisation in more detail in this section.
The fundamental idea of cache organisation is that by keeping the most frequently
accessed instructions and data in the fast cache memory; hence the average memory
access time will approach the access time of the cache.
The basic operation of the cache is as follows. When the CPU needs to access
memory, the cache is examined. If the word addressed by the CPU is not found in the
cache, the main memory is accessed to read the word. A block of words is then
transferred from main memory to cache memory.
The performance of cache memory is frequently measured in terms of a quantity
called hit ratio. When the CPU refers to the main memory and finds the word in
cache, it is said to produce a hit. If the word is not found in cache, it is in the main
memory and it counts as a miss. The ratio of the number of hits divided by the total
CPU references to memory is the hit ratio.
The average memory access time of a computer system can be improved considerably
by use of a cache. For example, if memory read cycle takes 100 ns and a cache read
cycle takes 20 ns, then for four continuous references, the first one brings the main
memory contents to cache and the next three from cache.
The time taken with cache = (100+20)
+
(203)
(For the first read operation) (For the last three read operations)
= 120
+
60
= 180 ns
Time taken without cache = 1004 = 400 ns
Thus, the closer are the reference, the better is the performance of cache.
The basic characteristic of cache memory is its fast access time. Therefore, very little
or no time must be wasted when searching for words in the cache. The transformation
of data from main memory to cache memory is referred to as a mapping process. The
mapping procedure for the cache organization is of three types:
1.
2.
3.
Associative mapping
Direct mapping
Set-associative mapping
Main memory
32K12
CPU
Cache
27
Basic Computer
Organisation
Size of main memory address (Given word size of 12 bits) = 32 K words = 2 15 words
15 bits are needed for address
Block Size of Cache = 2 Main Memory Words
Figure 15: Cache Memory
For every word stored in cache, there is a duplicate copy in the main memory. The
CPU communicates with both memories. It first sends a 15 bits (32K =25 210 = 215)
address to cache. If there is a hit, the CPU uses the relevant 12 bits data from 24 bit
cache data. If there is a miss, the CPU reads the block containing the relevant word
from the main memory. So the key here is that a cache must store the address and data
portions of the main memory to ascertain whether the given information is available in
the cache or not. However, let us assume the block size as 1 memory word for the
following discussions.
Associative Mapping
The most flexible and fastest cache organization uses an associative memory which is
shown in Figure 16. The associative memory stores both the address and data of the
memory word. This permits any location in cache to store any word from the main
memory. The address value of 15 bits is shown as a five-digit octal number and its
corresponding 12 bits word is shown as a five digit octal number. A CPU address of
15 bits is placed in the argument register and the associative memory is searched for a
matching address. If the address is found, the corresponding 12 bits data is read and
sent to the CPU. If no matches are found, the main memory is accessed for the word.
The address-data pair is then transferred to the associative cache memory. This
address checking is done simultaneously for the complete cache in an associative way.
Argument register
28
Address
Data
01001
3450
03767
7613
23245
1234
24250
2205
Direct Mapping
In the general case, there are 2k words in cache memory and 2n words in the main
memory. The n-bits memory address is divided into two fields: k bits for the index
field and (n - k) bits for the tag field.
The direct mapping cache organization uses the n-bit address to access the main
memory and k-bit index to access the cache. The internal organization of the words in
the cache memory is as shown in Figure 17. Each word in cache consists of the data
word and its associated tag. When a new word is first brought into the cache, the tag
bits are stored alongside the data bits. When the CPU generates a memory request, the
index field is used for the address to access the cache.
The tag field of the CPU address is compared with the tag in the word read from the
cache. If the two tags match, there is a hit and the desired data word is in cache. If
there is no match, there is a miss and the required word is read from the main
memory.
Let us consider a numerical example shown in Figure 18. The word at address zero is
at present stored in the cache (index = 000, tag = 00, data = 1456). Suppose that the
CPU wants to access the word at address 02000. The index address is 000, so it is
used to access the cache. The two tags are then compared. The cache tag is 00 but the
address tag is 02, which does not produce a match. Therefore, the main memory is
accessed and the data word 4254 is transferred to the CPU. The cache word at index
address 000 is then replaced with a tag of 02 and data of 4254.
29
Basic Computer
Organisation
Set-Associative Mapping
Tag
Data
Tag
Data
000
01
3450
02
5670
777
02
6710
00
2340
Write Policy: The data in cache and main memory can be written by processors or
input/output devices. The main problems associated in writing with cache memories
are:
Figure 19: Two-Way Set-Associative Mapping Cache
a)
The contents of cache and main memory can be altered by more than one device.
For example, CPU can write to caches and input/output module can directly write
to the main memory. This can result in inconsistencies in the values of the cache
and main memory.
b)
In the case of multiple CPUs with different cache, a word altered in one cache
automatically invalidate the word in the other cache.
(a) Write through: Write the data in cache as well as main memory. The other
CPUs - Cache combination has to watch with traffic to the main memory and
make suitable amendment in the contents of cache. The disadvantage of this
technique is that a bottleneck is created due to large number of accesses to the
main memory by various CPUs.
(b) Write block: In this method updates are made only in the cache, setting a bit
called Update bit. Only those blocks whose update bit is set is replaced in the
main memory. But here all the accesses to the main memory, whether from other
CPUs or input/output modules, need to be from the cache resulting in complex
circuitry.
(c) Instruction Cache: An instruction cache is one which is employed for accessing
only the instructions and nothing else. The advantage of such a cache is that as
the instructions do not change we need not write the instruction cache back to
memory, unlike data storage cache.
1.6.3
Memory Interleaving
In this method, the main memory is divided into n equal-size modules and the CPU
has separate Memory Address Register and Memory Base register for each memory
module. In addition, the CPU has n instruction register and a memory access system.
When a program is loaded into the main memory, its successive instructions are stored
in successive memory modules. For example if n=4 and the four memory modules are
M1, M2, M3, and M4 then 1st instruction will be stored in M1, 2nd in M2, 3rd in M3, 4th
in M4, 5th in M1, 6th in M2 and so on. Now during the execution of the program, when
the processor issues a memory fetch command, the memory access system creates n
consecutive memory addresses and places them in the Memory Address Register in
the right order. A memory read command reads all the n memory modules
simultaneously, retrieves the n consecutive instructions, and loads them into the n
instruction registers. Thus each fetch for a new instruction results in the loading of n
consecutive instructions in the n instruction registers of the CPU.
Since the instructions are normally executed in the sequence in which they are written,
the availability of N successive instructions in the CPU avoids memory access after
each instruction execution, and the total execution time speeds up. Obviously, the
fetch successive instructions are not useful when a branch instruction is encountered
during the course of execution. This is because they require the new set of n
successive instructions, overwriting the previously stored instructions, which were
loaded, but some of which were not executed. The method is quite effective in
minimising the memory-processor speed mismatch because branch instructions do not
occur frequently in a program.
Figure 20 illustrates the memory interleaving architecture. The Figure shows a 4- way
(n=4) interleaved memory system.
31
Basic Computer
Organisation
To illustrate with a numerical example, suppose that the argument register A and the
key register K have the bit configuration shown below. Only the three leftmost bits of
a compared with memory words because K has 1s on these positions
A
K
Word 1
Word 2
101 111100
111 000000
100 111100
101 000001
no match
match
Word 2 matches the unmasked argument field because the three leftmost bits of the
argument and the word are equal.
Check Your Progress 2
1.
What is a RAID? What are the techniques used by RAID for enhancing
reliability?
2.
a)
Interleaved memories are best suited for small loops and large sequential
code.
T/F
b)
3.
c)
High-speed memories are needed to bridge the gap of speed between I/O
device and memory.
T/F
d)
T/F
e)
T/F
How can the Cache memory and interleaved memory mechanisms be used to
improve the overall processing speed of a Computer system?
33
Basic Computer
Organisation
4.
Assume a Computer having 64 word RAM (assume 1 word = 16 bits) and cache
memory of 8 blocks (block size = 32 bits). Where can we find Main Memory
Location 25 in cache if (a) Associative Mapping (b) Direct mapping and (c) 2
way set associative (2 blocks per set) mapping is used.
5.
6.
1.7
VIRTUAL MEMORY
In a memory hierarchy system, programs and data are first stored in auxiliary or
secondary memory. The program and its related data are brought into the main
memory for execution. What if the size of Memory required for the Program is more
than the size of memory? Virtual memory is a concept used in some large computer
systems that permit the user to construct programs as though a large memory space
were available, equal to the totality of secondary memory. Each address generated by
the CPU goes through an address mapping from the so-called virtual address to a
physical address in the main memory. Virtual memory is used to give programmers
the illusion that they have a very large memory at their disposal, even though the
computer actually has a relatively small main memory. A Virtual memory system
provides a mechanism for translating program-generated addresses into correct main
memory locations. This is done dynamically, while programs are being executed in
the CPU. The translation or mapping is handled automatically by the hardware by
means of a mapping table.
Address Space and Memory Space
An address used by a programmer will be called a virtual address, and the set of such
addresses the address space. An address in the main memory is called a physical
address. The set of such locations is called the memory space. Thus, the address space
is the set of addresses generated by programs as they reference instructions and data;
the memory space consists of the actual main memory locations directly addressable
for processing.
34
In our example, the address field of an instruction code will consist of 20 bits but
physical memory addresses must be specified with only 16-bits. Thus CPU will
reference instructions and data with a 20 bits address, but the information at this
address must be taken from physical memory because access to auxiliary storage for
individual words will be prohibitively long. A mapping table is then needed, as shown
in Figure 23, to map a virtual address of 20 bits to a physical address of 16 bits. The
mapping is a dynamic operation, which means that every address is translated
immediately as a word is referenced by CPU.
Virtual address
Virtual
Address
Register
(20 bits)
Memory
Mapping
Table
Main memory
Address
Register (16 bits)
Memory table
buffer register
Main
Memory
Main memory
Buffer register
1.8
Basic Computer
Organisation
Till now we have discussed various memory components. But, how is the memory
organised in the physical computer? Let us discuss various kinds of memory
technologies used in personal computer.
1.8.1
From the early days of semiconductor memory until the early 1990s, memory was
manufactured, brought and installed as a single chip. Chip density went from 1K bits
to 1M bits and beyond, but each chip was a separate unit. Early PCs often had empty
sockets into which additional memory chips could be plugged, if and when the
purchaser needed them. At present, a different arrangement is often used called SIMM
or DIMM.
A group of chips, typically 8 to 16, is mounted on a tiny printed circuit board and sold
as a unit. This unit is called a SIMM or DIMM depending on whether it has a row of
connectors on one side or both sides of the board.
A typical SIMM configuration might have 8 chips with 32 megabits (4MB) each on
the SIMM. The entire module then holds 32MB. Many computers have room for four
modules, giving a total capacity of 128MB when using 32MB SIMMs. The first
SIMMs had 30 connectors and delivered 8 bits at a time. The other connectors were
addressing and control. A later SIMM had 72 connectors and delivered 32 bits at a
time. For a machine like Pentium, which expected 64-bits at once, 72-connectors
SIMMs were paired, each one delivering half the bits needed.
A DIMM is capable of delivering 64 data bits at once. Typical DIMM capacities are
64MB and up. Each DIMM has 84 gold patted connectors on each side for a total of
168 connectors. SIMM and DIMM are shown in Figure 24 (a) and (b) respectively.
How they are put on a motherboard is shown in Figure 24 (c).
SIMM
DIMM
36
1.8.2
The basic building block of the main memory remains the DRAM chip, as it has for
decades. Until recently, there had been no significant changes in DRAM architecture
since the early 1970s. The traditional DRAM chip is constrained both by its internal
architecture and by its interface to the processors memory bus. The two schemes that
currently dominate the market are SDRAM and RDRAM. A third one, that is Cache
RAM, is also very popular.
SDRAM (Synchronous DRAM)
One of the most widely used forms of DRAM is the synchronous DRAM (SDRAM).
Unlike the traditional DRAM, which is asynchronous, the SDRAM exchanges data
with the processor synchronized to an external clock signal and running at the full
speed of the processor /memory bus without imposing wait states.
In a typical DRAM, the processor presents addresses and control levels to the
memory, indicating that a set of data at a particular location in memory should be
either read from or written into the DRAM. After a delay, the access time, the DRAM
either writes or reads the data during the access-time delay. The DRAM performs
various internal functions, such as activating the high capacitance of the row and
column lines, sensing the data and routing the data out through the output buffers. The
processor must simply wait through this delay, slowing system performance.
With synchronous access, the DRAM moves data in and out under control of the
system clock. The processor or other master issues the instruction and address
information, which is latched on to by the DRAM. The DRAM then responds after a
set number of clock cycles. Meanwhile, the master can safely do other tasks while the
SDRAM is processing the request.
The SDRAM employs a burst mode to eliminate the address setup time. In burst
mode, a series of data bits can be clocked out rapidly after the first bit has been
accessed. The mode is useful when all the bits to be accessed are in sequence and in
the same row of the array as the initial access. In addition, the SDRAM has a
multiple-bank internal architecture that improves opportunities for on-chip
parallelism.
37
Basic Computer
Organisation
The mode register and associated control logic is another key feature differentiating
SDRAMs from conventional DRAMs. It provides a mechanism to customize the
SDRAM to suit specific system needs. The mode register specifies the burst length,
which is the number of separate units of data synchronously fed onto the bus. The
register also allows the programmer to adjust the latency between receipt of a read
request and the beginning of data transfer.
The SDRAM performs best when it is transferring large blocks of data serially, such
as for applications like word processing, spreadsheets, and multimedia.
RDRAM (Rambus DRAM)
RDRAM, developed by Rambus, has been adopted by Intel for its Pentium and
Itanium processors. It has become the main competitor to SDRAM. RDRAM chips
are vertical packages, with all pins on one side. The chip exchanges data with the
processor over 28 wires no more than 12 centimeters long. The bus address up to 320
RDRAM chips and is rated at 1.6 GBps.
The special RDRAM bus delivers address and control information using an
asynchronous block-oriented protocol. After an initial 480 ns access time, this
produces the 1.6 GBps data rate. The speed of RDRAM is due to its high speed Bus.
Rather than being controlled by the explicit RAS CAS R/W, and CE signals used in
conventional DRAMs an RDAR gets a memory request over the high-speed bus. This
request contains the desired address, the type of operation and the number of bytes in
the operation.
CDRAM (Cache DRAM)
1.
Difference between
a) SDRAM and RDRAM b) SIMM and DIMM
2.
38
1.9
SUMMARY
In this unit, we have discussed the details of the memory system of the computer.
First we discussed the concept and the need of the memory hierarchy. Memory
hierarchy is essential in computers as it provides an optimised low-cost memory
system. The unit also covers details on the basic characteristics of RAMs and different
kinds of ROMs. These details include the logic diagrams of RAMs and ROMs giving
basic functioning through various control signals. We have also discussed the latest
secondary storage technologies such as CD-ROM, DVD-ROM, CD-R, CD-RW etc.
giving details about their data formats and access mechanisms.
The importance of high-speed memories such as cache memory, interleaved memory
and associative memories are also described in detail. The high-speed memory,
although small, provides a very good overall speed of the system due to locality of
reference. There are several other concepts such as the memory system of the
microcomputer which consists of different types of chips such as SIMM, DIMM and
different types of memory such as SDRAM, RDRAM also defined in easy way. The
unit also contains details on Virtual Memory. For more details on the memory system
you can go through further units.
2.
a)
b)
c)
a)
b)
c)
True
True
False
Ultraviolet light, electrically
Dynamic
Sequential
3.
a)
b)
c)
d)
e)
RAM
Volatile Memory
Faster access time
Higher cost per bit storage
Random access
Less storage capacity
Non volatile
Slower than RAM
Lower than RAM
Sequential access
Higher storage capacity
4.
5.
6.
4K bytes is actually 4 1024 = 4096 bytes and the DRAM holds 4096 eight bit
words. Each word can be thought of as being stored in an 8 bit register and there
are 4096 registers connected to a common data bus internal to the chip. Since
4096 = (64)2, the registers are arranged in a 64 64 array, that is there are 64=26
rows and 64=26 columns. This requires a 6 64 decoder to decode six- address
inputs for the row select and a second 6 64 decoder to decode six other address
39
Basic Computer
Organisation
inputs for the column select. Using the structure as shown in Figure 3 (b), it
requires only 6 bit address input.
While in the case of an old RAM, the chip requires 12 address lines ( Please refer
to Figure 2(b)), since 4096 = 212 and there are 4096 different addresses.
7.
223
= 25 = 32
218
1)
A disk array known as RAID systems is a mass storage device that uses a set of
hard disks and hard disk drives with a controller mounted in a single box. All the
disks of a disk array form a single large storage unit. The RAID systems were
developed to provide large secondary storage capacity with enhanced
performance and enhanced reliability. The performance is based upon the data
transfer rate, which is very high rather than taking an individual disk. The
reliability can be achieved by two techniques that is mirroring (the system makes
exact copies of files on two hard disks) and stripping (a file is partitioned into
smaller parts and different parts of the files are stored on different disks).
2.
a)
b)
c)
d)
e)
3.
The Cache memory is a very fast, small memory placed between CPU and main
memory whose access time is closer to the processing speed of the CPU. It acts
as a high-speed buffer between CPU and main memory and is used to
temporarily store data and instruction needed during current processing. In
memory interleaving, the main memory is divided into n number of equal size
modules. When a program is loaded in to the main memory, its successive
instruction in also available for the CPU, thus, it avoids memory access after each
instruction execution and the total time speeds up.
4.
True
True
False
True
False
40
5.
Block Offset
Block Offset
Memory Address
Block Address
Cache Address
Memory Address = 25
Block Address = 12 and
. Block offset = 1
Tag = 3; Index (Set
. Number) = 0 and
. Block offset = 1
The Tag is used here to check whether a given address is in a specified set. This
cache has 2 blocks per set, thus, the name two way set associative cache. The
total number of sets here is 8 / 2 = 4.
For Associative mapping the Block address is checked directly in all location of
cache memory.
6.
Basic Computer
Organisation
representation. In a set there are 4 blocks. So, the block field needs 2 bits. Each
block has 64 words. So the block offset field has 6 bits.
Index Filed is of 4 bits.
Block offset is of 6 bits.
Check Your Progress 3
1.
2.
42
The virtual address is 1 GB = 230, thus, 30 bit Virtual address, that will be
translated to physical memory address of 26 bits (64 Mega words = 226 ).
Page No.
Introduction
Objectives
Input / Output Devices or External or Peripheral Devices
The Input Output Interface
The Device Controllers and its Structure
43
43
43
44
46
2.5
2.6
Device Drivers
Input Output Techniques
2.6.1
2.6.2
2.6.3
2.6.4
2.7
2.8
2.9
2.10
48
50
59
61
62
62
2.0 INTRODUCTION
In the previous Unit, we have discussed the memory system for a computer system
that contains primary memory, secondary memory, high speed memory and their
technologies; the memory system of micro-computers i.e., their chips and types of
memory. Another important component in addition to discussing the memory system
will be the input/output system. In this unit we will discuss Input /Output controllers,
device drivers, the structure of I/O interface, the I/O techniques. We will also discuss
about the Input / Output processors which were quite common in mainframe
computers.
2.1 OBJECTIVES
At the end of this unit you should be able to:
Local Bus
(Internal Bus to
Motherboard
Basic Computer
Organisation
CPU
CU ALU
Cache
Main memory
(RAM and
ROM)
Registers
Video
Processor
Keyboard
EISA
VRAM
Mouse
Display
Device
FDD
Mouse
SCSI
CD ROM
PCI
Additional
RAM/ROM
Printer
Primary
HDD
LAN/Network
LAN
Printer
Ethernet
Modem
The microcomputer has a single microprocessor, a number of RAM and ROM chips
and an interface units communicates with various external devices through the I/O
Bus.
The Input / Output subsystem of a computer, referred to as I/O, provides an efficient
mode of communication between the central system and the output environment.
External devices that are under the direct control of the computers are said to be
connected on-line. These devices are designed to read information into or out of the
memory unit upon command from the CPU and are considered to be part of the
computer system. Input / Output devices attached to the computer are also called
peripherals. We can broadly classify peripherals or external devices into 3 categories:
Human readable: suitable for communicating with the computer user, e.g., video
display terminals (VDTs) & printers.
communication link is to resolve the differences that exist between the central
computer and each peripheral. The major differences are:
The processor enquires from the I/O interface to check the status of the attached
device. The status can be busy, ready or out of order.
2.
3.
If the device is operational and ready to transmit, the processor requests the
transfer of data by means of a command, which is a binary signal, to the I/O
interface.
4.
The I/O interface obtains a unit of data (e.g., 8 or 16 bits) from the external
device.
5.
45
Basic Computer
Organisation
1.
2.
Data that are exchanged between the processor and I/O interface sent over the
data bus.
3.
Status: As peripherals are so slow, it is important to know the status of the I/O
interface. The status signals are BUSY or READY or in an error condition from
I/O interface.
Address recognition as each word of memory has an address, so does each I/O
device. Thus an I/O interface must recognize one unique address for each
peripheral it controls.
4.
Each I/O device is linked through a hardware interface called I/O Port.
The communication between I/O controller and Memory is through bus only in
case of Direct Memory Access (DMA), whereas the path passes through the CPU
for such communication in case of non-DMA.
CPU
Memory
Path not
involving DMA
System bus
DMA path
Device
Controller
(Multi-port)
Device
Controller
(Single-port)
I/O Port
I/O Port
I/O Port
I/O
Device
I/O
Device
I/O
Device
Using device controllers for connecting I/O devices to a computer system instead of
connecting them directly to the system bus has the following advantages:
A device controller can be shared among multiple I/O devices allowing many I/O
devices to be connected to the system.
I/O devices can be easily upgraded or changed without any change in the
computer system.
I/O devices of manufacturers other than the computer manufacturer can be easily
plugged in to the computer system. This provides more flexibility to the users in
buying I/O devices of their choice.
There is a need of I/O logic, which should interpret and execute dialogue
between the processor and I/O interface. Therefore, there need to be control lines
between processors and I/O interface.
The data line connecting I/O interface to the system bus must exist. These lines
serve the purpose of data transfer.
Data registers may act as buffer between processor and I/O interface.
The I/O interface contains logic specific to the interface with each device that it
controls.
47
Basic Computer
Organisation
Figure 3 above is a typical diagram of an I/O interface which in addition to all the
registers as defined above has status/control registers which are used to pass on the
status information or the control information.
Device name
Description
/dev/console
/dev/tty01
/dev/tty02
/dev/lp
/dev/dsk/f03h
system console
user terminal 1
user terminal 2
line printer
1.44 MB floppy drive
In MS-DOS, device drivers are installed and loaded dynamically, i.e., they are loaded
into memory when the computer is started or re-booted and accessed by the operating
system as required. The technique has the advantage that it makes addition of a new
driver much simpler, so that it could be done by relatively unskilled users. The
additional merit is that only those drivers which are actually required need to be
loaded into the main memory. The device drivers to be loaded are defined in a special
file called CONFIG.SYS, which must reside in the root directory. This file is
automatically read by MS-DOS at start-up of the system, and its contents acted upon.
A list of some device name is as shown below:
Device name
Description
con:
com1:
com2:
lpt1:
A:
C:
keyboard/screen
serial port1
serial port2
printer port1
first disk drive
hard disk drive
In the Windows system, device drivers are implemented as dynamic link libraries
(DLLs). This technique has the advantages that DLLs contains shareable code which
means that only one copy of the code needs to be loaded into memory. Secondly, a
driver for a new device can be implemented by a software or hardware vendor without
the need to modify or affect the Windows code, and lastly a range of optional drivers
can be made available and configured for particular devices.
In the Windows system, the idea of Plug and Play device installation is required to
add a new device such as a CD drive, etc. The objective is to make this process largely
automatic; the device would be attached and the driver software loaded. Thereafter,
the installation would be automatic; the settings would be chosen to suit the host
computer configuration.
Check Your Progress 1
1.
2.
T/F
T/F
T/F
T/F
T/F
T/F
T/F
T/F
49
Basic Computer
Organisation
3.
Programmed input/output
Interrupt driven input/output
Direct memory access
Programmed I/O
No
Through CPU
Interrupt-driven I/O
Yes
Through CPU
DMA
Yes
Direct to Memory
In programmed I/O, the I/O operations are completely controlled by the processor.
The processor executes a program that initiates, directs and terminate an I/O
operation. It requires a little special I/O hardware, but is quite time consuming for the
processor since the processor has to wait for slower I/O operations to complete.
With interrupt driven I/O, when the interface determines that the device is ready for
data transfer, it generates an interrupt request to the computer. Upon detecting the
external interrupt signal, the processor stops the task it is processing, branches to a
service program to process the I/O transfer, and then returns to the task it was
originally performing which results in the waiting time by the processor being
reduced.
With both programmed and interrupt-driven I/O, the processor is responsible for
extracting data from the main memory for output and storing data in the main memory
during input. What about having an alternative where I/O device may directly store
data or retrieve data from memory? This alternative is known as direct memory access
(DMA). In this mode, the I/O interface and main memory exchange data directly,
without the involvement of processor.
50
Read status
of I/O
Interface
Not ready
(try again)
CPU
I/O
I/O
Issue read
command to
I/O interface
CPU
Read status
of I/O
interface
CPU
I/O
Do other
processing
Interrupt
I/O
CPU
Issue read
block
command to
I/O interface
Read status
of DMA
interface
CPU
DMA
Do other
processing
Interrupt
DMA
CPU
Next instruction
Issue
Error
Condition
Check
Status
Check
Status
Not ready
is not
possible
Ready
Read word
from I/O
interface
Write word
into memory
No
I/O
CPU
CPU
Memory
Issue error
Condition
Ready
Read word
from I/O
Interface
I/O
Write word
into memory
CPU
CPU
Memory
No
Completed?
Completed?
Next instruction
Next instruction
(c) DMA
b)
With the programmed I/O method, the responsibility of the processor is to constantly
check the status of the I/O device to check whether it is free or it has finished
inputting the data. Thus, this method is very time consuming where the processor
wastes a lot of time in checking and verifying the status of an I/O device. Figure 5(a)
gives an example of the use of programmed I/O to read in a block of data from a
peripheral device into memory.
51
Basic Computer
Organisation
I/O Commands
There are four types of I/O commands that an I/O interface may receive when it is
addressed by a processor:
Control: These commands are device specific and are used to provide specific
instructions to the device, e.g. a magnetic tape requiring rewinding and moving
forward by a block.
Test: This command checks the status such as if a device is ready or not or is in
error condition.
Read: This command is useful for input of data from input device.
Write: this command is used for output of data to output device.
I/O Instructions:
An I/O instruction is stored in the memory of the computer and is fetched and
executed by the processor producing an I/O-related command for the I/O interface.
With programmed I/O, there is a close correspondence between the I/O-related
instructions and the I/O commands that the processor issues to an I/O interface to
execute the instructions.
In systems with programmed I/O, the I/O interface, the main memory and the
processors normally share the system bus. Thus, each I/O interface should interpret
the address lines to determine if the command is for itself. There are two methods for
doing so. These are called memory-mapped I/O and isolated I/O.
With memory-mapped I/O, there is a single address space for memory locations and
I/O devices. The processor treats the status and data registers of I/O interface as
memory locations and uses the same machine instructions to access both memory and
I/O devices. For a memory-mapped I/O only a single read and a single write line are
needed for memory or I/O interface read or write operations. These lines are activated
by the processor for either memory access or I/O device access. Figure 6 shows the
memory-mapped I/O system structure.
52
Figure 6: Structure of Memory Mapped I/O
With isolated I/O, there are separate control lines for both memory and I/O device
read or write operations. Thus a memory reference instruction does not affect an I/O
device. In isolated I/O, the I/O devices and memory are addressed separately; hence
separate input/output instructions are needed which cause data transfer between
addressed I/O interface and processor. Figure 7 shows the structure of isolated I/O.
2.6.3 Interrupt-Processing
The occurrence of an interrupt fires a numbers of events, both in the processor
hardware and software. Figure 8 shows a sequence.
53
Basic Computer
Organisation
When an I/O device completes an I/O operation, the following sequence of hardware
events occurs:
1.
2.
3.
The processor tests for the interrupts and sends an acknowledgement signal to the
device that issued the interrupt.
4.
The minimum information required to be stored for the task being currently
executed, before the CPU starts executing the interrupt routine (using its
registers) are:
(a) The status of the processor, which is contained in the register called program
status word (PSW), and
(b) The location of the next instruction to be executed, of the currently executing
program, which is contained in the program counter (PC).
54
5.
Th e processor now loads the PC with the entry location of the interrupt-handling
program that will respond to this interrupting condition. Once the PC has been
loaded, the processor proceeds to execute the next instruction, that is the next
instruction cycle, which begins with an instruction fetch. Because the instruction
fetch is determined by the contents of the PC, the result is that control is
transferred to the interrupt-handler program. The execution results in the
following operations:
6.
The PC & PSW relating to the interrupted program have already been saved on
the system stack. In addition, the contents of the processor registers are also
needed to be saved on the stack that are used by the called Interrupt Servicing
Routine because these registers may be modified by the interrupt-handler. Figure
9(a) shows a simple example. Here a user program is interrupted after the
instruction at location N. The contents of all of the registers plus the address of
the next instruction (N+1) are pushed on to the stack.
7.
The interrupt handler next processes the interrupt. This includes determining of
the event that caused the interrupt and also the status information relating to the
I/O operation.
8.
When interrupt processing is complete, the saved register values are retrieved
from the stack and restored to the registers, which are shown in Figure 9(b).
9.
The final step is to restore the values of PSW and PC from the stack. As a result,
the instruction to be executed will be from the previously interrupted program.
55
Figure 9: Interrupt Handling
Basic Computer
Organisation
How does the processor determine which device issued the interrupt?
2)
If multiple interrupts have occurred, how does the processor decide which one to
be processed first?
To solve these problems, four general categories of techniques are in common use:
Daisy chain: This scheme provides a hardware poll. With this technique, an
interrupt acknowledge line is chained through various interrupt devices. All I/O
interfaces share a common interrupt request line. When the processor senses an
interrupt, it sends out an interrupt acknowledgement. This signal passes through
all the I/O devices until it gets to the requesting device. The first device which
has made the interrupt request thus senses the signal and responds by putting in a
word which is normally an address of interrupt servicing program or a unique
identifier on the data lines. This word is also referred to as interrupt vector. This
address or identifier in turn is used for selecting an appropriate interruptservicing program. The daisy chaining has an in-built priority scheme, which is
determined by the sequence of devices on interrupt acknowledge line.
Bus arbitration: In this scheme, the I/O interface first needs to control the bus
and only after that it can request for an interrupt. In this scheme, since only one
of the interfaces can control the bus, therefore only one request can be made at a
time. The interrupt request is acknowledged by the CPU on response of which
I/O interface places the interrupt vector on the data lines. An interrupt vector
normally contains the address of the interrupt serving program.
An example of an interrupt vector can be a personal computer, where there are several
IRQs (Interrupt request) for a specific type of interrupt.
DMA Function
Although the CPU intervention in DMA is minimised, yet it must use the path
between interfaces that is the system bus. Thus, DMA involves an additional interface
on the system bus. A technique called cycle stealing allows the DMA interface to
transfer one data word at a time, after which it must return control of the bus to the
processor. The processor merely delays its operation for one memory cycle to allow
the directly memory I/O transfer to steal one memory cycle. When an I/O is
requested, the processor issues a command to the DMA interface by sending to the
DMA interface the following information (Figure 10):
Which operations (read or write) to be performed, using the read or write control
lines.
The address of I/O devices, which is to be used, communicated on the data lines.
The starting location on the memory where the information will be read or
written to be communicated on the data lines and is stored by the DMA interface
in its address register.
The number of words to be read or written is communicated on the data lines and
is stored in the data count register.
The DMA interface transfers the entire block of data, one word at a time, directly to or
from memory, without going through the processor. When the transfer is complete,
the DMA interface sends an interrupt signal to the processor. Thus, in DMA the
processor involvement can be restricted at the beginning and end of the transfer,
which can be shown as in the figure above. But the question is when should the DMA
take control of the bus?
For this we will recall the phenomenon of execution of an instruction by the
processor. Figure 11 below shows the five cycles for an instruction execution. The
Figure also shows the five points where a DMA request can be responded to and a
point where the interrupt request can be responded to. Please note that an interrupt
request is acknowledged only at one point of an instruction cycle, and that is at the
interrupt cycle.
57
Basic Computer
Organisation
The DMA mechanism can be configured into a variety of ways. Some possibilities are
shown below in Figure 12(a), in which all interfaces share the same system bus. The
DMA acts as the supportive processor and can use programmed I/O for exchanging
data between memory and I/O interface through DMA interface. But once again this
spoils the basic advantage of DMA not using extra cycles for transferring information
from memory to/from DMA and DMA from/to I/O interface.
58
Figure 12: DMA Configuration
The Figure 12(b) configuration suggests advantages over the one shown above. In
these systems a path is provided between I/O interface and DMA interface, which
does not include the system bus. The DMA logic may become part of an I/O interface
and can control one or more I/O interfaces. In an extended concept an I/O bus can be
connected to this DMA interface. Such a configuration (shown in Figure 12 (c)) is
quite flexible and can be extended very easily. In both these configurations, the added
advantage is that the data between I/O interface and DMA interface is transferred off
the system bus, thus eliminating the disadvantage we have witnessed for the first
configuration.
Which of the I/O techniques does not require an Interrupt Signal? Is this
technique useful in Multiprogramming Operating Systems? Give reason.
2.
What are the techniques of identifying the device that has caused the Interrupt?
3.
T/F
b)
I/O mapped I/O scheme requires no additional lines from CPU to I/O device
except for the system bus.
T/F
c)
Most of the I/O processors have their own memory while a DMA module
does not have its own memory except for a register or a simple buffer area.
T/F
d)
The advantage of interrupt driven I/O over programmed I/O is that in the
first the interrupt mechanisms free I/O devices quickly.
T/F
2.
Addition of I/O controller or I/O interface: The CPU uses programmed I/O
without interrupts. CPU was separated from the details of external I/O interfaces.
3.
Contained use of I/O controllers but with interrupts: The CPU need not spend
time waiting for an I/O operation to be performed, increasing efficiency.
4.
Direct access of I/O interface to the memory via DMA: CPU involvement
reduced to at the beginning and at the end of DMA operation.
59
Basic Computer
Organisation
5.
The CPU directs the I/O processors to execute an I/O program in memory. The
I/O processor fetches and executes these instructions without CPU intervention.
This allows the CPU to specify a sequence of I/O activities and to be interrupted
only when the entire sequence has been performed. With this architecture, a large
set of I/O devices can be controlled, with minimum CPU involvement.
With the last two steps (4 and 5), a major change occurs with the introduction of the
concept of an I/O interface capable of executing a program. For steps 5, the I/O
interface is often referred to as an I/O channel and I/O processor.
Characteristics of I/O Channels
The I/O channel represents an extension of the DMA concept. An I/O channel has the
ability to execute I/O instructions, which gives complete control over the I/O
operation. With such devices, the CPU does not execute I/O instructions. Such
instructions are stored in the main memory to be executed by a special-purpose
processor in the I/O channel itself. Thus, the CPU initiates an I/O transfer by
instructing the I/O channel to execute a program in memory. Two types of I/O
channels are commonly used which can be seen in Figure 13 (a and b).
A selector channel controls multiple high-speed devices and, at any one time, is
dedicated to the transfer of data with one of those devices. Each device is handled by a
controller or I/O interface. Thus the I/O channel serves in place of the CPU in
controlling these I/O controllers.
A multiplexer channel can handle I/O with multiple devices at the same time. If the
devices are slow then byte multiplexer is used. Let us explain this with an example. If
we have three slow devices which need to send individual bytes as:
X1 X2 X3 X4 X5
Y1 Y2 Y3 Y4 Y5
Z1 Z2 Z3 Z4 Z5
Then on a byte multiplexer channel they may send the bytes as X1 Y1 Z1 X2 Y2
Z2 X3 Y3 Z3 For high-speed devices, blocks of data from several devices are
interleaved. These devices are called block multiplexer.
The data are transferred from I/O interface to peripheral (for write) or from
peripheral to I/O interface (for read).
The connection between an I/O interface in a computer system and external devices
can be either point-to-point or multipoint. A point-to-point interface provides a
dedicated line between the I/O interface and the external device. For example
keyboard, printer and external modems are point-to-point links. The most common
serial interfaces are RS-232C and EIA-232.
A multipoint external interface used to support external mass storage devices (such as
disk and tape drives) and multimedia devices (such as CD-ROM, video, audio).
Two important examples of external interfaces are FireWire and InfiniBand.
61
Basic Computer
Organisation
2.
2.9 SUMMARY
This unit is totally devoted to the I/O of computer system. In this unit we have
discussed the identification of I/O interface, description of I/O techniques such as
programmed I/O, interrupt-driven I/O and direct memory access. These techniques are
useful for increasing the efficiency of the input-output transfer process. The concepts
of device drivers for all types of operating systems and device controllers are also
discussed with this unit. We have also defined an input/output processor, the external
communication interfaces such as serial and parallel interfaces and interrupt
processing. The I/O processors are the most powerful I/O interfaces that can execute
the complete I/O instructions. You can always refer to further reading for detail
design.
2.
(a) False (b) True (c) True (d) True (e) True (f) False (g) True (h) False
3.
62
The technique Programmed I/O does not require an Interrupt. It is very inefficient
for Multiprogramming environment as the processor is busy waiting for the I/O
to complete, while this time would have been used for instruction execution of
other programs.
2.
Multiple Interrupt Lines: Having separate line for a device, thus direct
recognition.
Software Poll: A software driven roll call to find from devices whether it has
made an interrupt request.
Daisy Chain: A hardware driven passing the buck type signal that moves
through the devices connected serially. The device on receipt of signal on his
turn, if has interrupt informs its address.
Bus Arbitration: In this scheme, the I/O interface requests for control of the
Bus. This is a common process when I/O processors are used.
3.
DMA is an I/O technique that minimises the CPU intervention at the beginning
and end of a time consuming I/O. One, commonplace where DMA is used is
when I/O is required from a Hard Disk, since one single I/O request requires a
block of data transfer which on the average may take a few milliseconds. Thus,
DMA will free CPU to do other useful tasks while I/O is going on.
4.
a) False
b) False
c) True
d) False
The I/O channels were popular in older mainframes, which included many I/O
devices and I/O requests from many users. The I/O channel takes control of all
I/O instructions from the main processor and controls the I/O requests. It is
mainly needed in situations having many I/O devices, which may be shared
among multiple users.
2.
The external interfaces are the standard interfaces that are used to connect third
party or other external devices. The standardization in this area is a must.
63
Basic Computer
Organisation
Introduction
Objectives
Secondary Storage Systems
Hard Drives
3.3.1
3.3.2
3.3.3
3.3.4
3.4
Page No.
64
64
65
65
Removable Drives
72
3.5
3.6
3.7
3.0
75
Summary
Solutions /Answers
78
78
INTRODUCTION
In the previous units of this block, we have discussed the primary memory system,
high speed memories, the memory system of microcomputer, and the input/output
interfaces and techniques for a computer. In this unit we will discuss the secondary
storage devices such as magnetic tapes, magnetic disks and optical disks, also known
as backing storage devices. The main purpose of such a device is that it provides a
means of retaining information on a permanent basis. The main discussion provides
the characteristics of hard-drives, formatting, drive cache, interfaces, etc. The detailed
discussion on storage devices is being presented in the Unit. The storage technologies
have moved a dimension from very small storage devices to Huge Giga byte
memories. Let us also discuss some of the technological achievements that made such
a technology possible.
3.1
OBJECTIVES
Storage is the collection of places where long-term information is kept. At the end of
the unit you will be able to:
64
describe the characteristics of the different secondary storage drives, i.e., their
drive speed, access time, rotation speed, density etc.;
describe the low-level and high level formatting of a blank disk and also the use
of disk partitioning;
distinguish among the various types of drives, i.e., hard drives , optical drives
removable drives and cartridge drive; and
Secondary Storage
Techniques
As discussed in Block 2 Unit 1, there are several limitations of primary memory such
as limited capacity, that is, it is not sufficient to store a very large volume of data; and
volatility, that is, when the power is turned off the data stored is lost. Thus, the
secondary storage system must offer large storage capacities, low cost per bit and
medium access times. Magnetic media have been used for such purposes for a long
time. Current magnetic data storage devices take the form of floppy disks and hard
disks and are used as secondary storage devices. But audio and video media, either in
compressed form or uncompressed form, require higher storage capacity than the
other media forms and the storage cost for such media is significantly higher.
Optical storage devices offer a higher storage density at a lower cost. CD-ROM can be
used as an optical storage device. Many software companies offer both operating
system and application software on CD-ROM today. This technology has been the
main catalyst for the development of multimedia in computing because it is used in
the multimedia external devices such as video recorders and digital recorders (Digital
Audio Tape) which can be used for the multimedia systems.
Removable disk, tape cartridges are other forms of secondary storage devices are used
for back-up purposes having higher storage density and higher transfer rate.
3.3
HARD DRIVES
The Disks are normally mounted on a disk drive that consists of an arm and a shaft
along with the electronic circuitry for read-write of data. The disk rotates along with
the shaft. A non-removable disk is permanently mounted on the disk drive. One of the
most important examples of a non-removable disk is the hard disk of the PC. The disk
is a platter coated with magnetic particles. Early drives were large. Later on, smaller
hard (rigid) disk drivers were developed with fixed and removable pack. Each pack
held about 30MB of data and became known as the Winchester drive. The storage
capacity of todays Winchester disks is usually of the order of a few tens of
Megabytes to a few Gigabytes. Most Winchester drives have the following common
features:
the disk and read/write heads are enclosed in a sealed airtight unit;
the disk(s) spin at a high speed, one such speed may be 7200 revolutions per
minute;
the data on disk surface (platter) are arranged in the series of concentric rings.
Each ring is called a track, is subdivided into a number of sectors, each sector
holding a specific number of data elements called bytes or characters.
The smallest unit that can be written to or read from the disk is a sector. The
storage capacity of the disk can be determined as the number of tracks, number of
sectors, byte per sector and number of read/write heads.
65
Basic Computer
Organisation
66
Bandwidth: The bandwidth can be measured in bytes per second. The sustained
bandwidth is the average data rate during a large transfer, i.e., the number of
bytes divided by the transfer time. The effective bandwidth is the overall data
rate provided by the drive. The disk drive bandwidth ranges from less than 0.25
megabytes per second to more than 30 megabytes per second.
Access latency: A disk access simply moves the arm to the selected cylinder and
waits for the rotational latency, which may take less than 36ms. The latency
depends upon the rotation speed of the disk which may be anywhere from 300
RPM to 7200 RPM. An average latency of a disk system is equal to half the time
taken by the disk to rotate once. Hence, the average latency of a disk system
whose rotation speed is 7200 RPM will be 0.5 / 7200 minutes = 4.1 ms.
Secondary Storage
Techniques
Rotation Speed: This refers to the speed of rotation of the disk. Most hard disks
rotate at 7200 RPM (Revolution per Minute). To increase data transfer rates, higher
rotation speeds, or multiple read/write heads arranged in parallel or disk arrays are
required.
Access Time: The access time is the time required between the requests made for a
read or write operation till the time the data are made available or written at the
requested location. Normally it is measured for read operation. The access time
depends on physical characteristics and access mode used for that device.
The disk access time has two major components:
Seek Time: The seek time is the time for the disk arm to move the heads to the
cylinder containing the desired sector.
Latency Time: The latency time is the additional time waiting for the disk to
rotate the desired sector to the disk head.
The sums of average seek and latency time is known as the average access time.
Basic Computer
Organisation
Limitation of FAT16: The DOS designers decided to use clusters with at least four
sectors in them (thus a cluster size of at least 2KB) for all FAT16 hard disks. That size
suffices for any hard disk with less than a 128MB total capacity. The largest logical
disk drives that DOS can handle comfortably have capacities up to 2GB. For such a
large volume, the cluster size is 32KB. This means that even if a file contains only a
single byte of data, writing it to the disk uses one entire 32KB region of the disk,
making that area unavailable for any other files data storage.
The most recent solution to these large-disk problems was introduced by Microsoft in
its OSR2 release of Windows 95 and it was named FAT32. The cluster entry for
FAT32 uses 32-bit numbers. The minimum size for a FAT32 volume is 512MB.
Microsoft has reserved the top four bits of every cluster number in a FAT32 file
68
allocation table. That means there are only 28-bits for the cluster number, so the
maximum cluster number possible is 268,435,456.
Secondary Storage
Techniques
In the UNIX system, the information related to all these fields is stored in an Inode
table on the disk. For each file, there is an inode entry in the table. Each entry is made
up of 64 bytes and contains the relevant details for that file. These details are:
a)
b)
c)
d)
e)
f)
g)
h)
i)
3.3.4 Hard Drive Interface: IDE, SCSI, EIDE, Ultra DMA and
ATA/66
Secondary storage devices need a controller to act as an intermediary between the
device and the rest of the computer system. On some computers, the controller is an
integral part of the computers main motherboard. On others, the controller is an
expansion board that connects to the system bus by plugging into one of the
computers expansion slots. In order that devices manufactured by independent
vendors can be used with different computer manufacturers, it is important that the
controllers follow some drive interfacing standard. Following are the commonly used
drive interface standards:
Basic Computer
Organisation
1000MB and rotation speed 7200RPM. The controller is embedded on the disk
drive itself. It is an interface between the disk controller and an adopter located
on the motherboard. It has good access time of 20ms and data transfer rates of
about 1Mbps under ideal conditions. Drives are reasonably cheap. The latest
version of the IDE specification enables four IDE channels; each one is capable
of supporting two IDE devices.
70
The SCSI-1 calls for a cable with 8 data wires plus one for parity.
The SCSI-2 enables the use of multiple cables to support 16- or even 32-bit data
transfers in parallel.
The SCSI-3 enables the use of multiple cables to support 32- or even 64-bit data
transfers in parallel.
With fast SCSI, it is possible to transfer 40MB of data per second on a single
SCSI cable.
Secondary Storage
Techniques
The principle behind the EIDE interface is the same as in the IDE interface but
this drive has capacities ranging from 10.2GB to 20.5GB. The rotation speed is
7200RPM. Its feature include 9.5ms access time, a 2MB buffer and support for
the Ultra ATA/66 interface for high speed data throughput and greater data
integrity.
Modern EIDE interfaces enable much faster communication. The speed increases
due to improvements in the protocol that describes how the clock cycles will be
used to address devices and transfer data. The modern EIDE hard drives are Ultra
DMA and ATA/66.
Ultra DMA or ATA/33 (AT Attachment): The ATA standard is the formal
specification for how IDE and EIDE interfaces are supposed to work with hard
drives. The ATA33 enables up to 33.3 million bytes of data to be transferred each
second, hence the name ATA33.
The seek time of a disk is 30ms. It rotates at the rate of 30 rotations per sec. Each
track has 300 sectors. What is the access time of the disk?
2. Calculate the number of entries required in the FAT table using the following
parameters for an MS-DOS system:
Disk capacity
Block size
Blocks/cluster
30MB
512 bytes
4
3.
3.4
REMOVABLE DRIVES
A disk drive with removable disks is called a removable drive. A removable disk can
be replaced by another similar disk on the same or different computer, thus providing
enormous data storage that is not limited by the size of the disk. Examples of
removable disks are floppy disks, CDROM, DVDROM, etc.
71
Basic Computer
Organisation
A floppy is about 0.64 mm thick and is available in diameters 5.25 inch and 3.5 inch.
The data are organized in the form of tracks and sectors. The tracks are numbered
sequentially inwards, with the outermost being 0. The utility of index hole is that
when it comes under a photosenser, the system comes to know that the read/write
head is now positioned on the first sector of the current track. The write-protect notch
is used to protect the floppy against deletion of recorded data by mistake.
The data in a sector are stored as a series of bits. Once the required sector is found, the
average data transfer rate in bytes per second can be computed by the formula:
Average data transfer rate =
Typical values for IBM/PC compatibles are given in the following table:
72
Size
Capacity
Tracks
Sectors
5.25
5.25
3.5
360KB
1.2MB
720KB
40
80
40
9
15
18
3.5
1.44MB
80
18
Secondary Storage
Techniques
1.
CD-ROM (Compact Disk Read Only Memory): This technology has evolved
out of the entertainment electronics market where cassette tapes and long playing
records are being replaced by CDs. The term CD used for audio records stands
for Compact Disk. The disks used for data storage in digital computers are
known as CD-ROM, whose diameter is 5.25 inches. It can store around 650MB.
Information in CD-ROM is written by creating pits on the disk surface by shining
a laser beam. As the disk rotates the laser beam traces out a continuous spiral.
The focused beam creates a circular pit of around 0.8-micrometer diameter
wherever a 1 is to be written and no pits (also called a land) if a 0 is to be written.
Figure 5 shows the CD-ROM & DVD-ROM.
73
Basic Computer
Organisation
these are removable disks, thus they are suitable for archival storage.
very slow as compared to hard disks, i.e., the normal transfer rate is 300 Mbps for
double speed drives and 600 Mbps for quadruple speed drives.
2.
DVD-ROM (Digital Versatile Disk Read Only Memory): DVD-ROM uses the
same principle as a CD-ROM for reading and writing. However, a smaller
wavelength laser beam is used. The total capacity of DVD-ROM is 8.5GB. In
double-sided DVD-ROM two such disks are stuck back to back which allows
recording on both sides. This requires the disk to be reversed to read the reverse
side. With both side recording and with each side storing 8.5GB the total
capacity is 17GB.
In both CD-ROMs and DVD-ROMs, the density of data stored is constant throughout
the spiral track. In order to obtain a constant readout rate the disk must rotate faster,
near the center and slower at the outer tracks to maintain a constant linear velocity
(CLV) between the head and the CD-ROM/DVD-ROM platter. Thus CLV disks are
rotated at variable speed. Compare it with the mechamism of constant angular
velocity (CAV) in which disk is rotated at a constant speed. Thus, in CAV the density
of information storage on outside sectors is low.
The main advantage of having CAV is that individual blocks of data can be accessed
at semi-random mode. Thus the head can be moved from its current location to a
desired track and one waits for the specific sector to spin under it.
The main disadvantage of CAV disk is that a lot of storage space is wasted, since the
longer outer tracks are storing the data only equal to that of the shorter innermost
track. Because of this disadvantage, the CAV method is not recommended for use on
CD ROMs and DVD-ROMs.
Comparison of CD-ROM and DVD-ROM
Characteristics
CD-ROM
Pit length(micron)
Track pitch(micron)
Laser beam wavelength(nanometer)
74
DVD-ROM
0.834
1.6
0.4
0.74
635
780
Capacity
1 layer/1 side
4.7GB
2 layers/1 side
8.5GB
1 layer/2sides
9.4GB
2 layers/2 sides
17GB
Speed 1x
1.38MB/s
650MB
Secondary Storage
Techniques
NO
NO
NO
150KB/s
2.
3.5
Basic Computer
Organisation
Quarter Inch Cartridge Tapes (QIC Standard): These tape cartridges record
information serially in a track with one head. When the end of the tape is reached the
tape is rewound and data is recorder on the next track. There are 9 to 30 tracks. Data
bits are serial on a track and blocks of around 6000 bytes are written followed by
error-correction code to enable correction of data on reading if any error occurs. The
density of data is around 16000 bits per inch in modern tapes. The tapes store around
500 MB. The cassette size is 5.25 inch just like a floppy and mounted in a slot
provided on the front panel of a computer. The tape read/write speed is around 120
inch/second and data are transferred at the rate of 240KB/s.
These tapes are normally interfaced to a computer using the SCSI standard. The data
formats used in these tapes are called QIC standard.
Tape
drive
Capacity
MB
Transfer
rate(KB/S)
read/write
speed(KB/S)
Main application
QIC DEC
TZK 10
QIC DEC
TK50
QIC
TS/1000
DAI
DELTLZ06
525
240
120
Backup, archiving
95
62.5
75
-do-
1000
300
66
-do-
4000
366
1GB/Hour
-do-
3.5.3 CD-R vs CD RW
A CD-R disc looks like a CD. Although all pressed CDs are silver, CD-R discs are
gold or silver on their label side and a deep green or cyan on the recordable side. The
silver/cyan CD-Rs were created because the green dye used in the original CD-R does
not reflect the shorter-wavelength red lasers used in new DVD drives. The cyan dye
used in the CD-R will allow complete compatibility with DVD drives. The CD-R disc
has four layers instead of three for a CD. At the lowest level, the laser light suffices to
detect the presence or absence of pits or marks on the recording surface to read the
disc. At the higher level, it can actually burn marks into the surface.
CD-RW is relatively new technology, but it has been gaining market share quite
rapidly. The drives cost little more than CD-R drives because they can be used to play
audio CDs and CD-ROMs as well as playing and recording CD-RW discs. A CD-RW
disc contains two more layers than a CD-R. The difference is that the recordable layer
is made of a special material, an alloy of several metals.
Iomega Corporation has announced a CD-RW drive, the Iomega 48*24*48 USB 2.0
external CD-RW drive. These drive features buffers under run protection, which list
users record safely, even while multitasking. It offers plug-&-play capability with
Microsoft Windows & Mac OS operating systems and its digital audio extraction rate
(DAE) of 48x allows users to rep or burn a 60-min CD in under 3 min., while
maximum drive speed is attainable only with hi-speed USB 2.0 connections.
Secondary Storage
Techniques
Magnetic tapes are used nowadays in computers for the following purposes:
Memory
Type
Average
Capacity
in byte
Technology
Average
time to
access a
bye
Pemanence
of storage
Access
mode
Purpose in
computer
system
Relative
cost per
byte in
units
Hard disk
50 GB
Magnetic
surgaces on
hard disks
10 msec
Non-volatile
Direct
Large data
files and
program
overflow
from main
memory
1/100
Floppy
disk
10 MB
Magnetic
surgaces on
hard disks
500 msec
Non-volatile
Direct
Data entry.
As input
unit
1/1000
Main
memory
50 MB
Integrated
circuits
20 nsec
Volatile
Random
Program and
data
Cache
memory
0.5 MB
High speed
integrated
circuits
2 nsec
Non-volatile
Direct
Instructions
and data to
be
immediately
used
10
CD-ROM
650 MB
Laser Disk
500 msec
Non-volatile
Direct
Store large
text, pictures
and audio.
Software
distribution
1/10000
DVDROM
8.5 GB
Laser Disk
500 msec
Non-volatile
Direct
Video files
1/100000
Magnetic
tape
5 GB
Long
25 sec
Non-volatile
Sequential
Historical
files.
Backup for
disk
1/1000
Digital Audio Tape (DAT): The most appropriate tape for backing up data from a
disk today is Digital Audio Tape (DAT). It uses a 4mm tape enclosed in a cartridge. It
uses a helical scan, read after write recording technique, which provides reliable data
recording. The head spins at a high speed while the tape moves. Very high recording
densities are obtained. It uses a recording format called Digital Data Storage (DDS),
which provides three levels of error correcting code to ensure excellent data integrity.
The capacity is up to 4GB with a data transfer speed of 366KB/sec. This tape uses
SCSI interface.
Check Your Progress 3
1.
Basic Computer
Organisation
2.
3.6
T/F
T/F
(c)
T/F
T/F
(e)
T/F
SUMMARY
3.7
SOLUTIONS /ANSWERS
Access time is seek time plus latency time. Seek time is the time taken by
read-write head to get into the right track. Latency time is the time taken by readwrite head to position itself in the right sector. Here a track has 300 sectors. So
on an average to position in the right word the read-write head should traverse
150 words. Time taken for this will be 150 / (30 300) second = 17 ms
(approximately). So the access time will be 30 + 17 = 47 ms.
2.
3.
SCSI is a port or rather an I/O Bus that is used for interfacing many devices like
disk drives, printers, etc to computer. SCSI interfaces provide data transmission
rates up to 80 MBits per second. It is an ANSI standard also. EISA (Extended
Industry Standard Architecture) is used for connecting peripherals such as mouse
etc. ATA (Advanced Technology Attachment) is a disk drive that integrates the
controller on the disk drive itself. IDE (Integrated Drive Electronics) is an
interface for mass storage devices that integrates the controller into the disk or
CD-ROM drive.
78
1.
A CD-ROM is a non-erasable disk used for storing computer data. The standard
uses 12 cm disk and can hold more than 650 MB.
A DVD-ROM is used for providing digitized compressed representation of video
as well as the large volume of digital data. Both 8 and 12 cm diameters are used
with a double sided capacity of up to 17GB.
2.
Secondary Storage
Techniques
A CD-R is similar to a CD-ROM but the user can write to the disk only once. A
CD-RW is also similar to a CD-ROM but the user can erase and rewrite to the
disk multiple times.
2.
(a) False (b) False (c) True (d) False (e) False.
79
Basic Computer
Organisation
Page No.
Introduction
Objectives
Keyboard
80
81
81
4.3
4.4
Mouse
Video Cards
4.4.1
4.4.2
4.4.3
4.4.4
4.4.5
4.4.6
4.5
4.6
4.7
4.8
4.9
Resolution
Colour Depth
Video Memory
Refresh Rates
Graphic Accelerators and 3-D Accelerators
Video Card Interfaces
Monitors
4.5.1
4.5.2
4.5.3
4.5.4
4.5.5
4.5.6
4.5.7
95
96
96
97
Classification of Printers
Print Resolutions
Print Speed
Print Quality
Colour Management
4.10 Modems
4.11 Scanners
4.11.1
4.11.2
4.11.3
4.11.4
92
85
87
99
100
Resolution
Dynamic Range/Colour Depth
Size and Speed
Scanning Tips
102
4.13 Summary
4.14 Solutions /Answers
References
104
104
4.0 INTRODUCTION
In the previous units you have been exposed to Input/Output interfaces, control and
techniques etc. This unit covers Input/Output devices and technologies related to
them. The basic aspects covered include:
80
4.1
OBJECTIVES
I/O Technology
4.2
KEYBOARD
The keyboard is the main input device for your computer. It is a fast and accurate
device. The multiple character keys allow you to send data to your computer as a
stream of characters in a serial manner. The keyboard is one device which can be used
in public spaces or offices where privacy is not ensured. The keyboard is efficient in
jobs like data entry. The keyboard is one device which shall stay on for years to come,
probably even after powerful voice-based input devices have been developed.
The precursor of the keyboard was the mechanical typewriter, hence it has inherited
many of the properties of the typewriter.
The Keys
A full size keyboard has the distance between the centres of the keycaps (keys) as
19mm (0.75in).The keycaps have a top of about 0.5in (12.5in) which is shaped as a
sort of dish to help you place your finger. Most designs have the keys curved in a
concave cylindrical shape on the top.
81
Basic Computer
Organisation
QWERTY-based keyboards
Besides the standard alphabet keys having the QWERTY arrangement, a computer
keyboard also consists of the control (alt, Del, Ctrl etc. keys), the function keys (F1,
F2 .. etc.), the numerical keypad etc.
PC 83-key and AT 84-key Keyboards
The PC 83-key was the earliest keyboard offered by IBM with its first Personal
Computers (PC). This had 83 keys. Later IBM added one more key with its PC AT
computer keyboards to make it a 84-key keyboard. The special feature of these
keyboards was that they had function keys in two columns on the left side of the
keyboard.
101-key Enhanced Keyboard
With its newer range of PCs IBM introduced the 101-key Enhanced/Advanced
keyboard. This keyboard is the basic keyboard behind modern QWERTY keyboards.
This has the function keys aligned in a separate row at the top of the PC, to
correspond to the function keys shown by many software on the monitor. However,
this has also been criticised at times for having a small enter key and function keys on
the top! ! ! .
Windows 104-key keyboard
This is enhancements of the 101-key keyboard with special keys for Windows
functions and popup. Individual vendors sometimes make changes to the basic
keyboard design, for example by having a larger enter key.
Dvorak-Dealey keyboard
82
This was one keyboard layout designed to be a challenger to the QWERTY layout.
This was designed by August Dvorak and William Dealey after much scientific
research in 1936. This layout tries to make typing faster. The basic strategy it tries to
incorporate is called hand alteration. Hand alteration implies that if you press one
key with the left hand, the next key is likely to be pressed by the right hand, thus
speeding up typing (assuming you type with both hands).
I/O Technology
However, the Dvorak has not been able to compete with QWERTY and almost all
systems now come with QWERTY 101-key or 104-key based keyboards. Still, there
may be a possibility of designing new keyboards for specific areas, say, for Indian
scripts.
83
Basic Computer
Organisation
well but have the drawback that they follow an indirect approach though they have a
longer life than contact-based keyboards. These keyboards were introduced by IBM.
Contact-Based Keyboards
Contact-based keyboards use switches directly. Though they have a comparatively
shorter life, they are the most preferred kind nowadays due to their lower cost. Three
such kinds of keyboards have been used in PCs:
1.
Mechanical Switches: These keyboards use traditional switches with the metal
contacts directly touching each other. Springs and other parts are used to control
positioning of the keycaps and give the right feel. Overall, this design is not
suited to PC keyboards.
2.
Rubber Dome: In rubber dome keyboards, both contact and positioning are
controlled by a puckered sheet of elastomer, which is a stretchy, rubber-like
synthetic material. This sheet is moulded to have a dimple or dome in each
keycap. The dome houses a tab of carbon or other conductive material which
serves as a contact. When a key is pressed, the dome presses down to touch
another contact and complete the circuit. The elastomer then pushes the key back.
This is the most popular PC keyboard design since the domes are inexpensive
and proper design can give the keyboards an excellent feel.
3.
Membrane: These are similiar to rubber domes except that they use thin plastic
sheets (membranes) with conductive traces on them. The contacts are in the form
of dimples which are plucked together when a key is pressed. This design is often
used in calculators and printer keyboards due to their low cost and trouble-free
life. However, since its contacts require only a slight travel to actuate, it makes
for a poor computer keyboard.
Scan Codes
OPERATOR
means dash which is
longer.
- means hypen which is
shorter.
A scan code is the code generated by a microprocessor in the keyboard when a key is
pressed and is unique to the key struck. When this code is received by the computer it
issues an interrupt and looks up the scan code table in the BIOS and finds out which
keys have been pressed and in what combination. Special memory locations called
status bytes tell the status of the locking and toggle keys, e.g., Caps lock etc. Each
keypress generates two different scan codes one on key-push down called Make
code, another on its popping back called Break code.This two-key technique allows
the computer to tell when a key is held pressed down, e.g., the ALT key while
pressing another key, say, CTRL-ALT-DEL.
There are three standards for scan codes: Mode1 (83-key keyboard PC, PC-XT),
Mode2 (84-key AT keyboard), Mode3 (101-key keyboard onwards). In Mode1 Make
and Break codes are both single bytes but different for the same key. In Mode2 and
Mode3, Make code is a single byte and Break code is two bytes (byte F0(Hex) + the
make code).
Interfacing
The keyboard uses a special I/O port that is like a serial port but does not explicitly
follow the RS-232 serial port standard. Instead of multiple data and handshaking
signals as in RS-232, the keyboard uses only two signals, through which it manages a
bi-directional interface with its own set of commands.
Using its elaborate handshaking mechanism, the keyboard and the PC send commands
and data to each other. The USB keyboards work differently by using the USB
coding and protocol.
84
I/O Technology
Mode1
Mode2
Mode 3
Key
KeyNo.
Make
Break
Make
Break
A
0
31
11
1E
0B
9E
8B
1C
45
F0 1C
F0 45
Enter
43
1C
9C
5A
F0 5A
Left Shift
F1
44
112
2A
3B
AA
BB
12
07
F0 12
F0 07
Connections
5-pin DIN connector: This is the connector of the conventional keyboard having 5
pins (2 IN, 2 OUT and one ground pin), used for synchronization and transfer.
PS/2 connector (PS/2 keyboards): These were introduced with IBMs PS/2
computers and hence are called PS/2 connectors. They have 6-pins but in fact their
wiring is simply a rearrangement of the 5-pin DIN connector. This connector is
smaller in size and quite popular nowadays. Due to the similiar wiring, a 5-pin DIN
can easily be connected to a PS/2 connector via a simple adapter.
Ergonomic Keyboards
Ergonomics is the study of the environment, conditions and efficiency of workers1 .
Ergonomics suggests that the keyboard was not designed with human beings in mind.
Indeed, continuous typing can be hazardous to health. This can lead to pain or some
ailments like the Carpal Tunnel Syndrome.
For normal typing on a keyboard, you have to place your hands apart, bending them at
the wrists and hold this position for a long time. You also have to bend your wrist
vertically especially if you elevate your keyboard using the little feet behind the
keyboards. This stresses the wrist ligaments and squeezes the nerves running into the
hand through the Carpal tunnel, through the wrist bones.
To reduce the stress, keyboards called ergonomic keyboards have been designed.
These split the keyboard into two and angle the two halves so as to keep the wrists
straight. To reduce vertical stress, many keyboards also provide extended wrist rests.
For those who indulge in heavy, regular typing, it is recommended that they use more
ergonomics based keyboards and follow ergonomic advice in all aspects of their
workplace.
4.3
MOUSE
The idea of the Mouse was developed by Douglas C. Engelbart of Stanford Research
institute, and the first Mouse was developed by Xerox corporation. Mouse itself is a
device which gives you a pointer on screen and a method of selection of commands
through buttons on the top. A single button is usually sufficient (as in Mouse with
Apple Macintosh machines) but Mice come with upto 3 buttons.
Types of Mice
Mice can be classified on the basis of the numbers of buttons, position sensing
technology or the type of Interface:
1
85
Basic Computer
Organisation
Sensing Technology
The Mice can be Mechanical or Optical.
Mechanical Mice have a ball made from rough rubbery material, the rotation of
which effects sensors that are perpendicular to each other. Thus, the motion of the
ball along the two axes is detected and reflected as the motion of the pointer on the
screen.
Optical Mice can detect movement without any moving parts like a ball. The typical
optical Mouse used to have a pair of LEDs (Light Emitting Diodes) and photodetectors in each axis and its own Mousepad on which it is slided. However, due to
the maintenance needs of the Mousepad, this was not very successful. Recently,
optical Mice have made a comeback since they can now operate without a Mousepad.
Interface
Mouse is usually a serial device connected to a serial port(RS232), but these
connections can themselves take various forms:
Serial Mouse
Mice that use the standard serial port are called serial. Since Serial ports 1 and 4
(COM1, COM4 under DOS, /dev/ttyS0 and /dev/ttyS3 under Unix/GNU-Linux
systems) and ports 2 and 3 (COM2, COM3 or /dev/ttyS1/dev/ttyS2) share the same
interrupts respectively, one should be careful not to attach the mouse so that it shares
the interrupt with another device in operation like a modem.
Bus Mouse
These Mice have a dedicated Mouse card and port to connect to. Recently, USB
mouse has become popular.
Proprietary
Mouse ports specific to some PCs e.g., IBMs PS/2 and some Compaq computers.
Mouse Protocols
The mouse protocol is the digital code to which the signal from the mouse gets
converted. There are four major protocols: Microsoft, Mouse Systems
Corporation(MSC), Logitech and IBM. Most mice available do support at least the
Microsoft protocol or its emulation.
Resolution versus Accuracy
Resolution of mouse is given in CPI(Counts per Inch) i.e. the number of signals per
inch of travel. This means the mouse will move faster on the screen but it also means
that it will be more difficult to control the accuracy.
Check Your Progress 1
1.
86
2.
Why is keyboard touch important? What kind of touch would you prefer and
which kind of keyboard will give that touch?
I/O Technology
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
3.
4.
a) 2A1E9EAA
c) 121CF01CF012
b) 1CF01C
d) 1CF01C5AF05A
87
Figure 3: Raster Display
Basic Computer
Organisation
The more the number of dots, i.e., the higher the resolution of the image, the sharper
the picture is. The richness of the image is also dependant on the number of colours
(or gray levels for a monochrome display) displayed by the system. The higher the
number of colours, the more is the information required for each dot. Hence, the
amount of memory (framebuffer) required by a system is directly dependent on the
resolution and colour depth required.
4.4.1 Resolution
Resolution is the parameter that defines the possible sharpness or clarity of a video
image. Resolution is defined as the number of pixels that make up an image. These
pixels are then spread across the width and height of the monitor. Resolution is
independent of the physical characteristics of the monitor. The image is generated
without considering the ultimate screen it is to be displayed upon. Hence, the unit of
resolution is the number of pixels, not the number of pixels per inch. For example, a
standard VGA native graphic display mode has a resolution of 640 pixels horizontally
by 480 pixels vertically. Higher resolutions mean the image can be sharper because it
contains more pixels.
The actual on-screen sharpness is given as dots-per-inch, and this depends on both the
resolution and the size of the image. For the same resolution, an image will be
sharper on a smaller screen, i.e., an image which may look sharp on a 15" monitor
may be a little jagged on a 17 display.
88
I/O Technology
1
2
3
4
5
Colour Mode
Monochrome
16-Colours
256-Colours
High Color
True Color
Depth(bits/pixel)
1
4
8
16
24
307200
4
bits
1,228,800 bits
153,600
bytes
153 KB
Minimum Video RAM required and available = 256 KB.
89
Basic Computer
Organisation
Therefore, 16-colour VGA needs at least 153,600 bytes of memory but memory is
only available in exponential powers of 2, hence, the next highest available memory is
= 256 KB.
What is a good resolution? Actually, it depends on your hardware. So, it is the
maximum your hardware can allow you. However, one odd-looking resolution which
has become popular is 1152 864 pixels. Can you judge why this should be so?
(Hint: Think of this resolution at 8-bit colour).
If you cant wait any longer, here is the answer: 1152 864 is nearly one million
pixels. Since 8-bit colour depth means 8 million bits or 1 MB. This is the highest
resolution you can get in 1 MB video memory at 8-bit colour depth, plus this still
leaves you square pixels (in the ratio 4: 3) to allow easy programming.
The above calculations hold good for only two-dimensionsal display systems. This is
because 3-D systems require much more memory because of techniques such as
Double Buffering and Z-Buffering.
90
The first major graphic accelerators were made by the S3 corporation. Modern
Graphic accelerators have internal registers at least 64-bit wide to work on at least 2
pixels at a time. They can use the standard Dynamic RAM (DRAM) or the more
expensive but faster dual-ported Video RAM (VRAM). They support at least the
standard resolutions up to 1024 768 pixels. They often use RAMDACs for colour
support giving full 24-bit or 32-bit colour support. A RAMDAC (Random Access
I/O Technology
Pipelined Memory: The use of Pipelining eliminates wait states allowing faster
operation.
Seperate Address and Data Lines.
High speeds through a special 2X mode that allows running AGP at 133 MHz
instead of the default 66 MHz.
Through AGP, the video board has a direct connection to the microprocessor as a
dedicated high speed interface for video. The system uses DMA (Direct Memory
Access) to move data between main memory and framebuffer. The accelerator chip
uses the main memory for execution of high level functions like those used in 3-D
rendering.
UMA
UMA stands for Unified Memory Architecture. It is an architecture which reduces the
cost of PC construction. In this, a part of the main memory is actually used as
framebuffer. Hence, it eliminates the use of a bus for video processing. Therefore, it
is less costly. Though it is not supposed to perform as well as AGP etc., in some
91
Basic Computer
Organisation
cases it may give a better performance than the bus-based systems. It is the interface
used nowadays in low-cost motherboards.
4.5 MONITORS
A Monitor is the television like box connected to your computer and giving you a
vision into the mind of your PC. It shows what your computer is thinking. It has a
display which is technically defined as the image-producing device, i.e., the screen
one sees and a circuitry that converts the signals from your computer (or similiar
devices) into the proper form for display.
Monitors are or were just like television sets except that television sets have a tuner or
demodulator circuit to convert the signals. However, now monitors have branched
beyond television. They have greater sharpness and colour purity and operate at
higher frequencies.
Generally, when you go to purchase a monitor from the market, you see the following
specifications: The maximum Resolution, the Horizontal and Vertical Frequencies
supported, the tube size and the connectors to the monitor. There are many vendors on
the market like Samsung, LG, Sony etc. Home users generally go in for monitors of
size 17, 15 or 14 . Monitors are also available as the traditional curved screens,
flat screens or LCD. The technology behind Monitors and the above specifications
are discussed ahead.
92
1.
The Phosphor coating : This affects the colour and the persistence (The period
the effect of a single hit on a dot lasts).
2.
The Cathode (Electron Gun) : The sharpness of the image depends on the good
functioning of this gun.
3.
Shadow Mask/ Aperture Grill : This determines the resolution of the screen in
colour monitors.
4.
I/O Technology
93
Basic Computer
Organisation
Horizontal Frequency: The time to scan one line connecting the right edge to the left
edge of the screen horizontally is called the Horizontal cycle and the inverse number
of the Horizontal cycle is called Horizontal Frequency. The unit is KHz (KiloHertz).
Vertical Frequency: Like a Flouroscent lamp, the screen has to repeat the same
image many times per second to display an image to the user. The frequency of this
repetition is called Vertical Frequency or Refresh Rate.
If the resolution generated by the video card and the monitor resolution is properly
matched, you get a good quality display. However, the actual resolution achieved is a
physical quality of the monitor. In colour systems, the resolution is limited by
Convergence (Do the beam of the 3 colours converge exactly on the same dot? ) and
the Dot Pitch. In monochrome monitors, the resolution is only limited by the highest
frequency signals the monitor can handle.
4.5.5 DPI
DPI (Dots Per Inch) is a measure for the actual sharpness of the onscreen image. This
depends on both the resolution and the size of the image. Practical experience shows
that a smaller screen has a sharper image at the same resolution than does a larger
screen. This is because it will require more dots per inch to display the same number
of pixels. A 15-inch monitor is 12-inches horizontally. A 10-inch monitor is 8 inches
horizontally. To display a VGA image (640 480) the 15-inch monitor will require
53DPI and the 10-inch monitor 80 DPI.
4.5.6 Interlacing
Interlacing is a technique in which instead of scanning the image one-line-at-a-time it
is scanned alterenately, i.e., alternate lines are scanned at each pass. This achieves a
doubling of the frame rate with the same amount of signal input. Interlacing is used to
keep bandwidth (amount of signal) down. Presently, only the 8514/A display adapters
use interlacing. Since Interlaced displays have been reported to be more flickery, with
better technology available, most monitors are non-interlaced now.
4.5.7 Bandwidth
Bandwidth is the amount of signal the monitor can handle and it is rated in
MegaHertz. This is the most commonly quoted specification of a monitor. The
Bandwidth should be enough to address each pixel plus synchronizing signals.
Check Your Progress 2
1.
2.
94
3.
What is the difference between Shadow Mask and Dot Pitch for Trinitron and
non-Trinitron monitors?
I/O Technology
..................................................................................................................................
..................................................................................................................................
..................................................................................................................................
4.
How much Video-RAM would you require for a high-colour (16-bits) ColourDepth at 1024 768 resolution? What would be the size of the corresponding
single memory chip you would get from the market?
a) 900KB, 1MB
c) 12.6MB, 16MB
5.
2.
Backlit LCDs
3.
Edgelit LCDs
LCD Technology
The technology behind LCD is called Nematic Technology because the molecules of
the liquid crystals used are nematic i.e. rod-shaped. This liquid is sandwiched
between two thin plastic membranes. These crystals have the special property that
they can change the polarity and the bend of the light and this can be controlled by
grooves in the plastic and by applying electric current.
Passive Matrix
In a passive matrix arrangement, the LCD panel has a grid of horizontal and vertical
conductors and each pixel is located at an intersection. When a current is recieved by
the pixel, it becomes dark. This is the technology which is more commonly used.
Active Matrix
This is called TFT (Thin Film Transistor) technology. In this there is a transistor at
every pixel acting as a relay, receiving a small amount and making it much higher to
activate the pixel. Since the amount is smaller, it can travel faster and hence response
times are much faster. However, TFTs are much more difficult to fabricate and are
costlier.
95
Basic Computer
Organisation
96
The Sound card (The card is often directly built into motherboards nowadays) is a
board that has digital to analog sound converter, amplifier, etc., circuitry to play
sound and to connect the PC to various audio sources.
A sound card may support the following functions:
1.
2.
3.
4.
May use various input sources (Microphone, CD, etc.) and mixer circuits to play
these sounds together.
5.
I/O Technology
4.9
PRINTERS
Printers are devices that put ink on paper in a controlled manner. They manually
produce readable text or photographic images. Printers have gone through a large
transition in technology. They are still available in a wide range of technology and
prices from the dot matrix printer to Inkjet printers to Laser Printers.
4.9.1
Classification of Printers
Impact: Impact printers print by the impact of hammers on the ribbon (e.g., DotMatrix Printers) whereas non-impact printers use other means (e.g., Inkjet,
Laser).
b)
c)
Output : The quantity of output processed at a time: Serial, Line or Page Printers.
Actually, there are many specifications one has to keep in mind while purchasing a
printer. Some of these are Compatibiltiy with other hardware, in-built Memory,
maximum supported memory, actual technology, Printer resolution (Colour, BW),
PostScript support, output type, Printer speed, Media capacity, Weight, Height and
Width of the Printer.
Let us discuss some of these parameters that characterize printers :
97
Basic Computer
Organisation
4.9.2
Print Resolution
Print Resolution is the detail that a printer can give determined by how many dots a
printer can put per inch of paper. Thus, the unit of resolution is Dots per inch. This is
applicable to both impact and non-impact printer though the actual quality will depend
on the technology of the printer.
The required resolution to a great extent determines the quality of the output and the
time taken to print it. There is a tradeoff between quality and time. Lower resolution
means faster printing and low quality. High resolution means slower printing of a
higher quality. There are three readymade resolution modes: draft, near letter quality
(NLQ) and letter quality. Draft gives the lower resolution print and letter quality
higher resolution. In Inkjet and Laser Printers, the highest mode is often called best
quality print.
4.9.4
Print Quality
Print quality depends on various factors but ultimately the quality depends on the
design of the printer and its mechanical construction.
DotMatrix/InkJet Printers
Three main issues determine the quality of characters produced by DotMatrix/InkJet
Printers: - Number of dots in the matrix of each character, the size of the dots and the
addressability of the Printer. Denser matrix and smaller dots make better characters.
Addressability is the accuracy with which a dot can be produced (e.g., 1/120 inch
means printer can put a dot with 1/120 inch of the required dot). Minimum dot matrix
used by general dot matrix printers is 9 9 dots, 18-pin and 24-pin printers use
12 24 to 24 24 matrices. Inkjets may even give up to 72 120 dots. Quality of
output also depends on the paper used. If the ink of an Inkjet printer gets absorbed by
the paper, it spreads and spoils the resolution.
Laser Printer
Laser Printers are page printers. For print quality, they also face the same
addressability issues as DMP/InkJet Printers. However, some other techniques are
possible to use for better quality here.
98
One of these is ReT(Resolution Enhancement Technology) introduced by HewlettPackard. It prints better at the same resolution by changing the size of the dots at
character edges and diagonal lines reducing jagged edges.
I/O Technology
A very important requirement for Laser Printers to print at high quality is Memory.
Memory increases as a square of resolution, i.e., the Dot density, i.e., the dpi.
Therefore, if 3.5 MB is required for a 600 dpi page, approximately 14 MB is required
for 1200 dpi. You need even more memory for colour.
For efficient text printing, the Laser printer stores the page image as ASCII characters
and fonts and prints them with low memory usage. At higher resolutions, the quality
of print toner also becomes important since the resolution is limited by the size of
toner particles.
Apply colours one upon another. This is done using inks which are somewhat
transparent, as modern inks are.
Applying dots of different colours so close to one another that the human eye
cannot distinguish the difference. This is the theory behind Dithering.
3 or 4 colour Printing?
For good printing, printers do not use RBY, instead they use CMYK (Cyan instead of
Blue, Magenta instead of Red, Yellow, and a separate Black). A separate Black is
required since the 3 colours mixed to produce a black (which is called Composite
Black) is often not satisfactory.
What is Dithering?
CMYK gives only 8 colours ( C, M, Y K, Violet= C + M, Orange= M + Y,
Green = C + Y, and the colour of the paper itself! ). What about other colours? For
these, the technique of Dithering is used. Dithering is a method in which instead of
being a single colour dot, it is a small matrix of a number of different colour dots.
Such pixels are called Super-pixels. The dots of a given colour in a Super-pixel
decide the intensity of that colour. The problem with dithering is that it reduces the
resolution of the image since more dots are taken by a single pixel now.
Monitors versus Printer
Monitor screens and Printers use different colour technologies. The monitor uses
RGB and the Printer CMYK. So, how does one know that the colour that is seen is
going to be printed. This is where the Printer driver becomes very important, and
where many computer models and graphic oriented machine score. For long, a claim
to fame of the Apple Macintosh machines has been its very good correspondence
between print and screen colours.
99
Basic Computer
Organisation
4.10 MODEMS
A Modem is one device that most computer users who have surfed the Internet are
aware of. A modem is required because though most of the telecommunications have
become digital, most telephone connections at the user end are still the analog POTS
(Plain Old Telephone Systems/Sets/Service). However, the computer is a digital
device and hence another device is needed which can convert the digital signals to
analog signals and vice-versa. Such a device is the Modem.
Modem stands for Modulator/Demodulator. Modulation is the process which puts
digital information on to the analog circuit by modifying a constant wave (signal)
called the Carrier. This is what happens when you press a button to connect to the
Internet or to a web site. Demodulation is the reverse process, which derived the
digital signal from the modulated wave. This is what happens when you receive data
from a website which then gets displayed by your browser.
Discussion of modulation techniques is out of scope here (you can refer to your course
on Computer Networks).
Modems are available as the following types:
1.
Internal Modems: Internal Modems plug into expansion slots in your PC.
Internal Modems are cheap and efficient. Internal Modems are bus-specific and
hence may not fit universally.
2.
3.
4.
PC-Card Modems: PC and Modems are read with PCMCIA slots found in
notebooks. They are like external Modems which fit into an internal slot. Thus,
they give the advantage of both external and internal modems but are more
expensive.
Modems come according to CCITT/ITU standards, e.g., V.32, V.32bis, V.42 etc.
Modem Language
Modems understand a set of instructions called Hayes Command Set or the AT
Command Set. These commands are used to communicate with the Modem.
Sometimes, when you are in trouble setting up your Modem, it is useful to know some
basic commands, e.g., ATDT 17776 will dial the number 17776 across a Tone Phone
and ATDP 17776 to the number 17776 if it is a Pulse phone.
4.11 SCANNERS
A Scanner is a device that allows you to capture drawings or photographs or text from
tangible sources (paper, slides etc.) into electronic form. Scanners work by detecting
differences in brightness of reflections from an image or object using light sensors.
These light sensors are arranged in an array across the whole width that is scannable.
This packing determines the resolution and details that can be scanned.
Scanners come in various types: Drum Scanners, Flatbed Scanners, Hand Scanners
and Video Scanners. Drum Scanners use a rotating drum to scan loose paper sheets.
Flatbed scanners have movable sensors to scan images placed on a flat glass tray.
100
These are the most expensive kind. Hand held Scanners are the cheapest and most
portable.
I/O Technology
They are useful for many applications but are small in size and need good hand
control for high quality scanning. Video Scanners use Video technology and Video
cameras instead of Scanning technology. Potentially, they can give high resolutions,
scanners in the economical range give poor resolutions.
Faltbed Scanner
Hand-held Scanner
Figure 7: Scanners
When you buy a scanner, there are many factors that can be looked at: Compatibility
of the Scanner with your Computer, The Technology (Depth, Resolution), the media
types supported for scanning, How media can be loaded, Media size supported,
Interfaces supported, physical dimensions, style and ease of use of the scanner.
One exciting application of Scanners is Optical Recognition of Characters (OCR).
OCR software tries to recognise characters from their shapes and write out the
scanned text as a text file. Though this technology is steadingly improving, it is still
not completely reliable especially w.r.t. Indian scripts. However, it can be very
useful to digitize the ancient texts written in Indian scripts.
Scanning technology is also everpresent nowadays in Bar-Code readers and MICR
(Magnetic Ink Character Recognition) cheques. This technology is very useful for
automating data at source of origin, thereby avoiding problems like inaccuracies in
data entry, etc.
4.11.1 Resolution
Optical Resolution
Optical resolution or hardware resolution is the mechanical limit on resolution of the
Scanner. For scanning, the sensor has to advance after each line it scans. The
smallness of this advancement step gives the resolution of the Scanner. Typically,
Scanners may be available with mechanical resolutions of 300, 600, 1200 or 2400 dpi.
Some special scanners even scan at 10,000 dpi.
Interpolated Resolution
Each Scanner is accompanied by a software. This software can increase the apparent
resolution of the scan by a technique called Interpolation. By this technique,
additional dots are interpolated (added) between existing dots. This gives a higher
resolution and smoother picture but without adding any additional information. The
added dots will however lead to larger file sizes.
101
Basic Computer
Organisation
Do not scan at more resolution than required. This saves both time and Disk
Space.
Usually, it is not useful to scan at more than the optical resolution since it adds no
new information. Interpolation can be done later with Image processing
softwares.
For images to be seen only at the Computer Monitor, you may need to only scan
so that the image size in pixels is the same as display resolution. That is, Scan
resolution = Height of image in pixels divided by the screen size in inches. This
may be surprisingly small.
102
I/O Technology
SMPS is the unit into which the electric supply from the mains is attached to your PC
and this supplies DC to the internal circuits. It is more efficient, less expensive and
more complex than linear supplies.
SMPS works in the following way: The electric supply received is sent to a
component called triac which shifts it from 50 Hz to a much higher frequency (almost
20.000 Hz). At the same time, using a technique called Pulse Width modulation, the
pulse is varied to the needs of the computer circuit. Shorter pulses give lower output
voltage. A transformer then reduces back the voltage to the correct levels and
rectifiers and filters generate the pure DC current.
SMPS has two main advantages: They generates less heat since they waste less power,
and use less expensive transformers and circuits since they operate at higher
frequencies.
The power requirement of a PC depends on the motherboard and the peripherals in
your computer. Still, in modern PCs, your requirement may not be more than 150200 Watts.
Check Your Progress 3
1.
2.
Explain the term Resolution and how it applies to Monitors, Cameras, Printers,
Scanners etc.
...................................................................................................................................
...................................................................................................................................
..................................................................................................................................
..................................................................................................................................
3.
4.
Compare Laptops using passive matrix and TFT technology. Which are cheaper
in price?
...................................................................................................................................
...................................................................................................................................
...................................................................................................................................
...................................................................................................................................
...................................................................................................................................
103
Basic Computer
Organisation
5.
6.
7.
b) 4 MB
c) 8MB
d) 16MB
b) WAVE
c) MP3
d) OGG
4.13 SUMMARY
In this unit, we discussed various Input/Output devices. We have covered the input
devices Keyboard, Mouse and Scanner. Various types of Keyboards, Keyboard
layouts (QWERTY, Dvorak) and technologies have been discussed. Various types of
mice and their operation have been discussed. Different types of Scanners, the
underlying technology and use in applications like OCR have been discussed.
The output devices discussed are Monitor, LCD and Printer. The technologies and
specifications behind Monitors, LCD and Printers have been discussed. Colour
management has also been discussed. Video cards, which control the display on
monitors from the CPU and their system of display have been discussed with their
characteristics like depth, resolution and memory. Modem is a communication device
and thereby an I/O device. Its functioning has been discussed. The Power supply, and
especially, the SMPS, which is actually input of electric power for the computing unit,
has also been discussed.
104
1.
The main Merit of Dvorak-Dealey keyboard is the scientific design using Hand
alteration. However, since it came much later than QWERTY it did not become
popular, as QWERTY was already well established.
2.
Keyboard touch gives you a feedback mechanism. This tells you when you
have pressed a key enough and invloluntarlily allows faster typing. The
preferred touch is an individual choice but the best feedback is provided with an
overcenter feel with a `click' sound. The most suitable touch is given by
Rubber Dome keyboards. (refer text for details).
3.
Besides the standard precautions while attaching Hardware, one has to take
precaution regarding interrupt conflict for serial devices., since Serial ports
share their interrupts. (refer text for details).
1.
A true-colour system has a depth of 24 bits per pixel. This means that 8 bits
each are assigned to R,G and B i.e. there are 8 Colour Planes. Hence, in figure4 replace ` n' by 8 to draw the new figure.
2.
Framebuffer is another name for the Display Memory. This is like a time-slice
of what you see on your monitor. Discuss how framebuffer is handled
differently in early display systems, PCI, AGP and UMA. (refer text for
details).
3.
Shadow Mask: Trinitron uses Aperture Grills instead of Shadow Mask, for the
same purpose.
Dot Pitch: Similiarly, instead of Dot Pitch, there is Slot Pitch.
explain the terms Shadow Mask, Aperture Grill, Dot Pitch and Slot Pitch (refer
text).
4.
Ans. (b) 1024 768 2Bytes = 1.6MB. RAM is/was available as 1MB, 4MB,
16MB etc.
Ans. (a) Total screen size = 12 9 = 108 inches. image size
= 1024 768 = 786432 pixels. divide 108 inches by 786432.
5.
I/O Technology
In a digital camera, photos are stored in digital format. Instead of film, these
cameras use Semiconductor devices, called image sensors. There are many
other differences regarding quality, resolution etc.
2.
Resolution is a generic term the parameter that defines the possible sharpness or
clarity of something i.e. how clearly that thing can be resolved. This applies
especially to images. See in what different ways it is used for Monitors,
Cameras, Printers, Scanners and even Mice.
3.
It tells about physical mixing, optical mixing and RGB and CMYK schemes.
The rechnique of dithering is used for rich colour quality. Colours also differ on
monitors and printers. To maintain similarity is also an important issue.
4.
Compare Laptops made using passive matrix and TFT technology. Which are
cheaper in price?
In a Passive matrix arrangement, the LCD has a grid of horizontal and vertical
conductors. Each pixel is located at an intersection. When a current is recieved
by the pixel, it becomes dark whereas in Active Matrix, also called TFT (Thin
Film Transistor) technology, each pixel is active, working as a relay. Hence, it
needs less power and gives better quality display. Passive matrix LCDs are
cheaper but now, TFT LCDs are also economically available. (find out the
latest from the market).
5.
6.
7.
Ans.
All of them.
105
Basic Computer
Organisation
106
References:
1)
http: //whatis.techtarget.com/.
2)
http: //www.epanorama.net/links/pc/index.htm.
3)
http: //www.howstuffworks.com/.
4)
5)
Instruction Set
Architecture
Introduction
Objectives
Instruction Set Characteristics
Instruction Set Design Considerations
1.3.1
1.3.2
1.3.3
1.4
1.6
1.7
1.8
18
Immediate Addressing
Direct Addressing
Indirect Addressing
Register Addressing
Register Indirect Addressing
Indexed Addressing Scheme
Base Register Addressing
Relative Addressing Scheme
Stack Addressing
5
5
6
9
Addressing Schemes
1.4.1
1.4.2
1.4.3
1.4.4
1.4.5
1.4.6
1.4.7
1.4.8
1.4.9
1.5
Page No.
26
Instruction Length
Allocation of Bits Among Opcode and Operand
Variable Length of Instructions
28
29
30
1.0 INTRODUCTION
The Instruction Set Architecture (ISA) is the part of the processor that is visible to the
programmer or compiler designer. They are the parts of a processor design that need
to be understood in order to write assembly language, such as the machine language
instructions and registers. Parts of the architecture that are left to the implementation
are not part of ISA. The ISA serves as the boundary between software and hardware.
The term instruction will be used in this unit more often. What is an instruction?
What are its components? What are different types of instructions? What are the
various addressing schemes and their importance? This unit is an attempt to answer
these questions. In addition, the unit also discusses the design issues relating to
instruction format. We have presented here the instruction set of MIPS
(Microprocessor without Interlocked Pipeline Stages) processor (very briefly) as an
example.
Other related microprocessors instruction set can be studied from further readings. We
will also discuss about the complete instruction set of 8086 micro-processor in unit 1,
Block 4 of this course.
1.1 OBJECTIVES
After going through this unit you should be able to:
The Central
Processing Unit
The common goal of computer designers is to build the hardware for implementing
the machines instructions for CPU. From the programmers point of view, the user
must understand machine or assembly language for low-level programming.
Moreover, the user must be aware of the register set, instruction types and the function
that each instruction performs.
This unit covers both the viewpoints. However, our prime focus is the programmers
viewpoint with the design of instruction set. Now, let us define the instructions, parts
of instruction and so on.
What is an Instruction Set?
Instruction set is the collection of machine language instructions that a particular
processor understands and executes. In other words, a set of assembly language
mnemonics represents the machine code of a particular computer. Therefore, if we
define all the instructions of a computer, we can say we have defined the instruction
set. It should be noted here that the instructions available in a computer are machine
dependent, that is, a different processors have different instruction sets. However, a
newer processor that may belong to some family may have a compatible but extended
instruction set of an old processor of that family. Instructions can take different
formats. The instruction format involves:
Instruction Set
Architecture
the type;
length and position of operation codes in an instruction; and
the number and length of operand addresses etc.
Thus, each instruction consists of several fields. The most common fields found in
instruction formats are:
Opcode: (What operation to perform?)
Opcode
5 6
7 8
Addressing Mode
31
Instruction Length
Figure 1: A Hypothetical Instruction Format of 32 bits
In case of immediate operand the maximum size of the unsigned operand would be
224.
In case it is an address of operand in memory, then the maximum physical memory
size supported by this machine is 224 = 16 MB.
The Central
Processing Unit
For this machine there may be two more possible addressing modes in addition to the
immediate and direct. However, let us not discuss addressing modes right now. They
will be discussed in general, details in section 1.4 of this unit.
The opcode field of an instruction is a group of bits that define various processor
operations such as LOAD, STORE, ADD, and SHIFT to be performed on some data
stored in registers or memory.
The operand address field can be data, or can refer to data that is address of data, or
can be labels, which may be the address of an instruction you want to execute next,
such labels are commonly used in Subroutine call instructions. An operand address
can be:
1.
2.
3.
The opcode field of an instruction specifies the address field of operand on which
data processing is to be performed.
4.
The operands placed in processor registers are fetched faster than that of
operands placed in memory.
5.
Instruction Set
Architecture
A set of data types (e.g. integers, long integers, doubles, character strings etc.).
A set of operations on those data types.
A set of instruction formats. Includes issues like number of addresses,
instruction length etc.
A set of techniques for addressing data in memory or in registers.
The number of registers which can be referenced by an instruction and how
they are used.
We will discuss the above concepts in more detail in the subsequent sections.
Addresses
Floating Point
(Single or Double
Precision)
Numbers
Characters
(ASCII
EBCDIC etc.)
Fixed Point
(Signed or
Unsigned)
Logical
Data (0 or 1
values only)
Binary Coded
Decimal
Numbers: All machine languages include numeric data types. Numeric data
usually use one of three representations:
The Central
Processing Unit
Logical data: Each word or byte is treated as a single unit of data. When an n-bit
data unit is considered as consisting of n 1-bit items of data with each item
having the value 0 or 1, then they are viewed as logical data. Such bit-oriented
data can be used to store an array of Boolean or binary data variables where each
variable can take on only the values 1 (true) and 0 (false). One simple application
of such a data may be the cases where we manipulate bits of a data item. For
example, in floating-point addition we need to shift mantissa bits.
Data
Transfer
Instructions
Data
Processing
Instructions
Program
Control
Instruction
Miscellaneous
Privileged
Operation
Name
Load
Store
Move
10
Mnemonic
LD
ST
MOV
Description
Loads the contents from memory to register.
Store information from register to memory location.
Data Transfer from one register to another or
between CPU registers and memory.
Exchange
XCH
Clear
Set
Push
CLEAR
SET
PUSH
Pop
POP
Instruction Set
Architecture
Logical shifts LOGICAL SHIFT LEFT and LOGICAL SHIFT RIGHT insert
zeros to the end bit position and the other bits of a word are shifted left or right
respectively. The end bit position is the leftmost bit for shift right and the
rightmost bit position for the shift left. The bit shifted out is lost.
0
2.
The Central
Processing Unit
except that the sign bit it remains unchanged. On an arithmetic shift right, the
sign bit is replicated into the bit position to its right. On an arithmetic shift left, a
logical shift left is performed on all bits but the sign bit, which is retained.
The arithmetic left shift and a logical left shift when performed on numbers
represented in twos complement notation cause multiplication by 2 when there is
no overflow. Arithmetic shift right corresponds to a division by 2 provided there
is no underflow.
3.
Circular shifts ROTATE LEFT and ROTATE RIGHT. Bits shifted out at one
end of the word are not lost as in a logical shift but are circulated back into
the other end.
12
Instruction Set
Architecture
Unconditional
Branch
0FFF
1000
1001
1002
1003
1004
1005
1006
1007
2001
2002
2003
2004
MBR 0
X 2001
READ X
BRZ 1007
ADD MBR
TRAS MBR
INC X
JUMP 1001
:
:
:
:
10
20
30
0
Conditional Branch
3rd Cycle
4th Cycle
The SKIP instruction is a zero-address instruction and skips the next instruction
to be executed in sequence. In other words, it increments the value of PC by one
instruction length. The SKIP can also be conditional. For example, the instruction
ISZ skips the next instruction only if the result of the most recent operation is
zero.
CALL and RETN are used for CALLing subprograms and RETurning from
them. Assume that a memory stack has been built such that stack pointer points to
a non-empty location stack and expand towards zero address.
13
The Central
Processing Unit
CALL:
CALL X Procedure Call to function /procedure named X
CALL instruction causes the following to happen:
1. Decrement the stack pointer so that we will not overwrite last thing put on
stack,
(SP SP 1)
:
:
:
:
Stack top
prior to
call
:
(Initial
state)
SP
500
501
502
503
504
505
102
Stack top
prior to
call
SP
PC = 102
500
501
502
503
504
505
(procedure
call)
Stack top
prior to
call
SP
(on
return)
2.
3.
The contents of PC, which is pointing to NEXT instruction, the one just after the
CALL is pushed onto the stack, and, M [SP] PC.
JMP to X, the address of the start of the subprogram is put in the PC register; this
is all a jump does. Thus, we go off to the subprogram, but we have to remember
where we were in the calling program, i.e. we must remember where we came
from, so that we can get back there again.
PC X
RETN :
14
RETN
Return from procedure.
RETN instruction causes the following to happen:
1.
Pops the stack, to yield an address/label; if correctly used, the top of the
stack will contain the address of the next instruction after the call from
which we are returning; it is this instruction with which we want to resume
in the calling program;
2.
Jump to the popped address, i.e., put the address into the PC register.
Instruction Set
Architecture
1.3.3
Operand Storage in the CPU - Where are the operands kept other than the
memory?
Number of explicitly named operands - How many operands are named in an
instruction?
Operand location - Can any ALU instruction operand be located in memory? Or
must all operands be kept internally in the CPU registers?
Operations - What operations are provided in the ISA?
Type and size of operands - What is the type and size of each operand and how
is it specified?
As far as operations and type of operands are concerned, we have already discussed
about these in the previous subsection. In this section let us look into some of the
architectures that are common in contemporary computer. But before we discuss the
architectures, let us look into some basic instruction set characteristics:
The Central
Processing Unit
Evaluation Stack Architecture: A stack is a data structure that implements Last-InFirst-Out (LIFO) access policy. You could add an entry to the stack with a
PUSH(value) and remove an entry from the stack with a POP( ). No explicit operands
are there in ALU instructions, but one in PUSH/POP. Examples of such computers are
Burroughs B5500/6500, HP 3000/70 etc.
On a stack machine "C = A + B" might be implemented as:
PUSH A
PUSH B
ADD
POP C
Stack Architecture: Pros and Cons
Small instructions (do not need many bits to specify the operation).
Compiler is easy to write.
Lots of memory accesses required - everything that is not on the stack is in
memory. Thus, the machine performance is poor.
Multiply (BC)
and store result on
stack top
Push D
Push E
Accumulator Machine
Programs
Comments
LOAD B
Load B in AC
MULT C
Multiply AC with
C in AC
STORE T
Store BC into
Temporary T
LOAD D
MULT E
Load D in AC
Multiply E in AC
MULT
ADD
POP A
Multiply DE and
store result on
stack top
Add the top two
values on the stack
Store the value in
A
ADD T
BC + DE
STORE A
Store Result in A
Instruction Set
Architecture
Registers can be used to store variables as it reduces memory traffic and speeds
up execution. It also improves code density, as register names are shorter than
memory addresses.
Instructions must include bits to specify which register to operate on, hence
large instruction size than accumulator type machines.
Memory access can be minimized (registers can hold lots of intermediate
values).
Implementation is complicated, as compiler writer has to attempt to maximize
register usage.
While most early machines used stack or accumulator architectures, in the last 15
years all CPUs made are GPR processors. The three major reasons are that registers
are faster than memory; the more data that can be kept internally in the CPU the faster
the program will run. The third reason is that registers are easier for a compiler to use.
But while CPUs with GPR were clearly better than previous stack and accumulator
based CPUs yet they were lacking in several areas. The areas being: Instructions
were of varying length from 1 byte to 6-8 bytes. This causes problems with the prefetching and pipelining of instructions. ALU instructions could have operands that
were memory locations because the time to access memory is slower and so does the
whole instruction.
Thus in the early 1980s the idea of RISC was introduced. RISC stands for Reduced
Instruction Set Computer. Unlike CISC, this ISA uses fewer instructions with simple
constructs so they can be executed much faster within the CPU without having to use
memory as often. The first RISC CPU, the MIPS 2000, has 32 GPRs. MIPS is a
load/store architecture, which means that only load and store instructions access
memory. All other computational instructions operate only on values stored in
registers.
17
The Central
Processing Unit
2.
3.
MOVE
DIV
STORE
XOR
BRN
COMPARE
TRAP
(i)
(ii)
(iii)
(iv)
forward where only a required portion is kept in the memory while the rest of the
program and data reside in secondary storage. The data or program portion which are
stored on secondary storage are brought to memory whenever needed and the portion
of memory which is not needed is returned to the secondary storage. Thus, a program
size bigger than the actual physical memory can be executed on that machine. This is
called virtual memory. Virtual memory has been discussed in greater details as part of
the operating system.
Instruction Set
Architecture
they are longer than the physical addresses as total addressed memory in virtual
memory is more than the actual physical memory.
if a virtual addressed operand is not in the memory then the operating system
brings that operand to the memory.
The symbols D, D1, D2,..., etc. refer to actual operands to be used by instructions for
their execution.
Most of the machines employ a set of addressing modes. In this unit, we will describe
some very common addressing modes employed in most of the machines. A specific
addressing mode example, however, is given in Unit 1 of Block 4.
The following tree shows the common addressing modes:
Addressing Modes
Immediate
Memory
Direct
Memory
Reference
Memory
Indirect
Register
Reference
Register
Indexed
Addressing
Displacement
Addressing
Stack
Addressing
Register
Indirect
Base
Addressing
Relative
Addressing
Possible use
For moving constants and initialization of variables
Used for global variables and less often for local variables
Frequently used for storing local variables of procedures
For holding pointers to structure in programming languages C
To access members of an array
For pushing or popping the parameters of procedures
Employed to relocate the programs in memory specially in
multi-programming systems
Accessing iterative local variables such as arrays
Used for local variables
19
The Central
Processing Unit
Instruction
LOAD (I)
07
Opcode
Addressing mode
(immediate)
Operand value
Figure 8: Immediate Addressing
Opcode
LOAD D 500
200
:
0111
500
20
This scheme provides a limited address space because if the address field has n
bits then memory space would contain 2n memory words or locations. For
example, for the example machine of Figure 1, the direct addresses memory
space would be 210.
The effective address in this scheme is defined as the address of the operand,
that is,
EA A and
D = (EA)
Instruction Set
Architecture
The second statement implies that the data is stored in the memory location
specified by effective address.
In this addressing scheme only one memory reference is required to fetch the
operand.
500
50 A
50 A
..0111
In this addressing scheme the effective address EA and the contents of the
operand field are related as:
EA = (A) and (Content of location 500 that is 50A above)
D = (EA) (Contents of location 50A that is 7)
The drawback of this scheme is that it requires two memory references to fetch
the actual operand. The first memory reference is to fetch the actual address of
the operand from the memory and the second to fetch the actual operand using
that address.
In this scheme the word length determines the size of addressable space, as the
actual address is stored in a Word. For example, the memory having a word size
of 32 bits can have 232 indirect addresses.
21
The Central
Processing Unit
Register access is faster than memory access and hence register addressing
results in faster instruction execution. However, register obtains operands only
from memory; therefore, the operands that should be kept in registers are
selected carefully and efficiently. For example, if an operand is moved into a
register and processed only once and then returned to memory, then no saving
occurs. However if an operand is used repeatedly after bringing into register
then we have saved few memory references. Thus, the task of using register
efficiently deals with the task of finding what operand values should be kept in
registers such that memory references are minimised. Normally, this task is
done by a compiler of a high level language while translating the program to
machine language. As a thumb rule the frequently used local variables are kept
in the registers.
The size of register address is smaller than the memory address. It reduces the
instruction size. For example, for a machine having 32 general purpose registers
only 5 bits are needed to address a register.
Instruction Set
Architecture
D = (EA)
The address capability of register indirect addressing scheme is determined by the size
of the register.
The Central
Processing Unit
Value
2532H
2752H
27A8
10H
:
2260H
2568H
70H
:
22A6H
25H
:
:
:
24
Instruction Set
Architecture
Formulae for
addressing mode
EA = A+(R)
D= (EA)
EA = A+ (B)
EA = (PC) + A
EA
Data Value
56 + 2752 = 27A8H
10H
46 + 2260 = 22A6H
2532 + 36 = 2568H
25H
70H
What are the numbers of memory references required to get the data for the
following addressing schemes:
(i)
(ii)
(iii)
(iv)
(v)
Immediate addressing
Direct addressing
Indirect addressing
Register Indirect addressing
Stack addressing.
2.
3.
(i)
(ii)
(iii) Indirect addressing requires fewer memory accesses than that of direct
addressing.
(iv) In stack addressing, operand is explicitly specified.
25
The Central
Processing Unit
However, a 32 bit instruction although will occupy double the space and can be
fetched at double the rate of a 16 bit instruction, but can not be doubly useful.
Factors, which must be considered for deciding about instruction length
26
Memory size
Memory organization
Memory transfer
Instruction Set
Architecture
Number of addressing modes: The more are the explicit addressing modes the
more bits are needed for mode selection. However, some machines have implicit
modes of addressing.
` : As far as memory references are concerned, granularity implies whether an
address is referencing a byte or a word at a time. This is more relevant for
machines, which have 16 bits, 32 bits and higher bits words. Byte addressing
although may be better for character manipulation, however, requires more bits in
an address. For example, memory of 4K words (1 word = 16 bit) is to be
addressed directly then it requires:
WORD Addressing
= 4K words
= 212 words
12 bits are required for word addressing.
Byte Addressing
= 212 words
= 213 bytes
13 bits are required for byte addressing.
The Central
Processing Unit
Lets consider the instruction format of a MIPS computer. MIPS is an acronym for
Microprocessor without Interlocked Pipeline Stages. It is a microprocessor
architecture developed by MIPS Computer Systems Inc. most widely known for
developing the MIPS architecture. The MIPS CPU family was one of the most
successful and flexible CPU designs throughout the 1990s. The MIPS CPU has a fivestage CPU pipeline to execute multiple instructions at the same time. Now what we
have introduced is a new term Pipelining. What else: the 5 stage pipeline, let us just
introduce it here. It defines the 5 steps of execution of instructions that may be
performed in an overlapped fashion. The following diagram will elaborate this
concept:
Instruction execution stages
Instruction 1
stage
11
Instruction 2
Instruction 3
Figure15: Pipeline
All the stages are independent and distinct, that is, the second stage execution of
Instruction 1 should not hinder Instruction 2.
The overall efficiency of the system becomes better.
The early MIPS architectures had 32-bit instructions and later versions have 64-bit
implementations.
The first commercial MIPS CPU model, the R2000, whose instruction format is
discussed below, has thirty-two 32-bit registers and its instructions are 32 bits long.
op
6 bits
rs
rt
5 bits
5 bits
rd
shamt
5 bits
5 bits
funct
5 bits
All MIPS instructions are of the same length, requiring different kinds of instruction
formats for different types of instructions.
Instruction Format
All MIPS instructions are of the same size and are 32 bits long. MIPS designers chose
to keep all instructions of the same length, thereby requiring different kinds of
instruction formats for different kinds of instructions. For example, R-type (register)
or R-format is used for arithmetic instructions (Figure 16). A second type of
28
Instruction Set
Architecture
instruction format is called i-type or i-format and is used by the data transfer
instructions.
Instruction format of I-type instructions is given below:
op
rs
rt
6 bits
5 bits
5 bits
address
16 bits
The 16-bit address means a load word instruction can load any word within a region
of + 215 of the base register rs. Consider a load word instruction given below:
The rt field specifies the destination register, which receives the result of the load.
MIPS Addressing Modes
MIPS uses various addressing modes:
1.
2.
(i)
(ii)
(iii)
(iv)
(v)
(vi)
1.7 SUMMARY
In this unit, we have explained various concepts relating to instructions. We have
discussed the significance of instruction set, various elements of an instruction,
instruction set design issues, different types of ISAs, various types of instructions and
various operations performed by the instructions, various addressing schemes. We
have also provided you the instruction format of MIPS machine. Block 4 Unit 1
contains a detailed instruction set of 8086 machine. You can refer to further reading
for instruction set of various machines.
True
True
29
The Central
Processing Unit
3.
4.
5.
False
True
False
(a) - (iii)
(b) - (i)
(c) - (ii)
3.
(i) True.
(ii) False.
(iii) False.
(iv) False
30
True.
False.
False.
False.
False.
True.
2.0
31
31
32
34
2.6
2.7
2.8
2.9
Page
Introduction
Objectives
Basic CPU Structure
Register Organization
2.3.1
2.3.2
37
38
45
49
50
51
INTRODUCTION
The main task performed by the CPU is the execution of instructions. In the previous
unit, we have discussed about the instruction set of computer system. But, one thing,
which remained unanswered is: how these instructions will be executed by the CPU?
The above question can be broken down into two simpler questions. These are:
What are the steps required for the execution of an instruction? How are these steps
performed by the CPU?
The answer to the first question lies in the fact that each instruction execution
consists of several steps. Together they constitute an instruction cycle. A microoperation is the smallest operation performed by the CPU. These operations put
together execute an instruction.
For answering the second question, we must have an understanding of the basic
structure of a computer. As discussed earlier, the CPU consists of an Arithmetic
Logic Unit, the control unit and operational registers. We will be discussing the
register organisation in this unit, whereas the arithmetic-logic unit and control unit
organisation are discussed in subsequent units.
In this unit we will first discuss the basic CPU structure and the register organisation
in general. This is followed by a discussion on micro-operations and their
implementation. The discussion on micro-operations will gradually lead us towards
the discussion of a very simple ALU structure. The detail of ALU structure is the
topic of the next unit.
2.1 OBJECTIVES
After going through this unit, you should be able to:
The Central
Processing Unit
2. An arithmetic and logic unit (ALU) for performing data manipulation, and
3.
A control unit that coordinates and controls the various operations and initiates
the appropriate sequence of micro-operations for each task.
Computer instructions are normally stored in consecutive memory locations and are
executed in sequence one by one. The control unit allows reading of an instruction
from a specific address in memory and executes it with the help of ALU and
Register.
Instruction Execution and Registers
The basic process of instruction execution is:
1.
2.
It is decoded by the control unit and converted into a set of lower level control
signals, which cause the functions specified by that instruction to be executed.
3.
After the completion of execution of the current instruction, the next instruction
fetched is the next instruction in sequence.
This process is repeated for every instruction except for program control instructions,
like branch, jump or exception instructions. In this case the next instruction to be
fetched from memory is taken from the part of memory specified by the instruction,
rather than being the next instruction in sequence.
But why do we need Registers?
If tcpu is the cycle time of CPU that is the time taken by the CPU to execute a welldefined micro-operation using registers, and tmem is the memory cycle time, that is the
speed at which the memory can be accessed by the CPU, then (tcpu/tmem) is in the
range of 2 to 10, that is CPU is 2 10 times faster than memory. Thus, CPU registers
are the fastest temporary storage areas. Thus, the instructions whose operands are
stored in the fast CPU registers can be executed rapidly in comparison to the
instructions whose operands are in the main memory of a computer. Each instruction
must designate the registers it will address. Thus, a machine requires a large number
of registers.
32
But how do the registers help in instruction execution? We will discuss this with the
help of Figure 1.
Step 1:
The first step of instruction execution is to fetch the instruction that is to be executed.
To do so we require:
In Step 2:
Get the data of memory location B to buffer register for data (DR) using
buffer address register (MAR) by issuing Memory read operation.
This data may be stored in a general purpose register, if so needed let us say
R2
Now, ALU will perform addition of R1 & R2 under the command of
control unit and the result will be put back in R1. The status of ALU
33
The Central
Processing Unit
Similarly, the other instructions are fetched and executed using ALU and
register under the control of the Control Unit.
Thus, for describing instruction execution, we must describe the registers layout,
micro-operations, ALU design and finally the control unit organization. We will
discuss registers and micro- operation in this unit. ALU and Control Unit are
described in Unit 3 and Unit 4 of this Block.
All von-Neumann machines have a program counter (PC) (or instruction counter
IC), which is a register that contains the address of the next instruction to be
executed.
Most computers use special registers to hold the instruction(s) currently being
executed. They are called instruction register (IR).
There are a number of general-purpose registers. With these three kinds of
registers, a computer would be able to execute programs.
Other types of registers:
A few factors to consider when choosing the number of registers in a CPU are:
CPU can access registers faster then it can access main memory.
For addressing a register, depending on the number of addressable registers a
few bit addresses is needed in an instruction. These address bits are definetly
quite less in comparison to a memory address. For example, for addressing 256
registers you just need 8 bits, whereas, the common memory size of 1MB
requires 20 address bits, a difference of 60%.
Compilers tend to use a small number of registers because large numbers of
registers are very difficult to use effectively. A general good number of registers
is 32 in a general machine.
Registers are more expensive than memory but far less in number.
From a users point of view the register set can be classified under two basic
categories.
Programmer Visible Registers: These registers can be used by machine or assembly
language programmers to minimize the references to main memory.
Status Control and Registers: These registers cannot be used by the programmers
but are used to control the CPU or the execution of a program.
Different vendors have used some of these registers interchangeably; therefore, you
should not stick to these definitions rigidly. Yet this categorization will help in better
34
understanding of register sets of machine. Therefore, let us discuss more about these
categories.
35
The Central
Processing Unit
For control of various operations several registers are used. These registers cannot be
used in data manipulation; however, the content of some of these registers can be
used by the programmer. One of the control registers for a von-Neumann machine is
the Program Counter (PC).
Almost all the CPUs, as discussed earlier, have a status register, a part of which may
be programmer visible. A register which may be formed by condition codes is called
condition code register. Some of the commonly used flags or condition codes in such
a register may be:
Flag
Comments
Sign flag
Zero flag
This flag bit will be set if the result of the last arithmetic
operation was zero.
Carry flag
This flag is set, if a carry results from the addition of the highest
order bits or borrow is taken on subtraction of highest order bit.
Equal flag
Overflow flag
Interrupt
Supervisor flag
These flags are set by the CPU hardware while performing an operation. For
example, an addition operation may set the overflow flag or on a division by 0 the
overflow flag can be set etc. These codes may be tested by a program for a typical
conditional branch operation. The condition codes are collected in one or more
registers. RISC machines have several sets of conditional code bits. In these
machines an instruction specifies the set of condition codes which is to be used.
Independent sets of condition code enable the provisions of having parallelism within
the instruction execution unit.
The flag register is often known as Program Status Word (PSW). It contains
condition code plus other status information. There can be several other status and
control registers such as interrupt vector register in the machines using vectored
interrupt, stack pointer if a stack is used to implement subroutine calls, etc.
2. A machine has 20 general-purpose registers. How many bits will be needed for
register address of this machine?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
36
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
3. Can we store status and control information in the memory?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
Let us now look into an example register set of MIPS processor.
By number:
$0 $1 $2
$31
$gp
$fp
$sp $ra
Not all of these are general-purpose registers. The following table describes how each
general register is treated, and the actions you can take with each register.
Name
ZERO
Register
number
Description
Specify in Expression
$zero
$at
2-3
$v0 - $v1
A0 - A3
4-7
Argument registers.
$a0 - $a3
T0 - T7
8-15
Temporary registers
$t0 - $t7
S0 - S7
16-23
Saved registers
$s0 - $s7
T8 - T9
24-25
Temporary registers
$t8 - $t9
K0 - K1
26-27
$k1 - $k2
AT
V0 - V1
37
The Central
Processing Unit
GP
28
$gp
SP
29
$sp
FP
30
$fp
RA
31
$ra
You will also study another 8086 based register organization in Block 4 of this
course. So, all the computers have a number of registers. But, how exactly is the
instruction execution related to registers? To explore this concept, let us first discuss
the concept of Micro-operations.
Move data from memory location sum to register R1 (LOAD R1, sum)
Add an immediate operand to register (R1) and store the results in R1
(ADD R1, 7)
Store data from register R1 to memory location sum (STORE sum, R1).
Thus, several machine instructions may be needed (this will vary from machine to
machine) to execute a simple C statement. But, how will each of these machine
statements be executed with the help of micro-operations? Let us try to elaborate the
execution steps:
38
Thus, we may have to execute the instruction in several steps. For the subsequent
discussion, for simplicity, let us assume that each micro-operation can be completed
in one clock period, although some micro-operations require memory read/write that
may take more time.
Let us first discuss the type of micro-operations. The most common micro-operations
performed in a digital computer can be classified into four categories:
1)
2)
3)
4)
For a register transfer micro-operation there must be a path for data transfer from
the output of the source register to the input of destination register.
In addition, the destination register should have a parallel load capability, as we
expect the register transfer to occur in a predetermined control condition. We
will discuss more about the control unit in Unit 4 of this block.
A common path for connecting various registers is through a common internal
data bus of the processor. In general the size of this data bus should be equal to
the number of bits in a general register.
2.
The individual bits within a register are numbered from 0 (rightmost bit) to n-1
(leftmost bit) as shown in Figure 2b). Common ways of drawing the block
diagram of a computer register are shown below. The name of the 16-bit register
is IR (Instruction Register) which is partitioned into two subfields in Figure 2d).
Bits 0 through 7 are assigned the symbol L (for Low byte) and bits 8 through 15
are assigned the symbol H (for high byte). The symbol IR (L) refers to the loworder byte and IR (H) refers to high-order byte.
a) Register
b) Individual bits
R0
15
15 14 13 2 1 0
0
R1
c) Numbering of bits
15
IR (H)
0
IR (L)
d) Subfields
Figure 2: Register Formats
3.
The Central
Processing Unit
denotes a transfer of all bits from the source register R1 to the destination
register R2 during one clock pulse and the destination register has a parallel load
capacity. However, the contents of register R1 remain unchanged after the
register transfer micro-operation. More than one transfer can be shown using a
comma operator.
4.
If the transfer is to occur only under a predetermined control condition, then this
condition can be specified as a control function. For example, if P is a control
function then P is a Boolean variable that can have a value of 0 or 1. It is
terminated by a colon (:) and placed in front of the actual transfer statement. The
operation specified in the statement takes place only when P = 1. Consider the
statements:
If (P =1) then (R2 R1)
or,
P: R2 R1,
Where P is a control function that can be either 0 or 1.
5.
All micro-operations written on a single line are to be executed at the same time
provided the statements or a group of statements to be implemented together are
free of conflict. A conflict occurs if two different contents are being transferred
to a single register at the same time. For example, the statement: new line X:
R1 R2, R1 R3 represents a conflict because both R2 and R3 are trying to
transfer their contents to R1 at the same time.
6.
Block Diagram
Control
Circuit
R1
Clock
Bits
R2
Timing Diagram
Clock
t ^
t+1 ^
Load
Transfer occurs here
Figure 3: The Register Transfer Time
It is assumed that the control variable is synchronized with the same clock as the one
applied to the register. The control function T is activated by the rising edge of the
clock pulse at time t. Even though the control variable T becomes active just after
time t, the actual transfer does not occur until the register is triggered by the next
positive transition of the clock at time t+1. At time t+1, load input is again active and
the data inputs of R2 are then loaded into the register R1 in parallel. The transfer
occurs with every clock pulse transition while T remains active.
Bus and Memory Transfers
A digital computer has many registers, and rather than connecting wires between all
registers to transfer information between them, a common bus is used. Bus is a path
40
(consists of a group of wires) one for each bit of a register, over which information is
transferred, from any of several sources to any of several destinations.
DR [MAR]
Memory Write
The memory write operation transfers the content of a data register to a memory word
M selected by the address. Assume that the data of register R1 is to be written to the
memory at the address provided in MAR. The write operation can be stated as:
Write:
[MAR] R1
Please note, it means that the location pointed by MAR will be written and not MAR.
Read
MAR
MEMORY
Write
DR
The Central
Processing Unit
that generates the arithmetic sum of two binary numbers of any lengths and is
constructed with full-adder circuits connected in cascade. An n-bit binary adder
requires n full-adders. Add micro-operation, in accumulator machine, can be
performed as:
AC AC + DR
Subtraction is most often implemented in machines through complement and adds
operations. It is specified as:
R3 R1 R2
R3 R1 + (2s complement of R2)
R3 R1 + (1s complement of R2 + 1)
R3 R1 + R2 + 1 (The bar on top of R2 implies 1s complement of R2 which
is bitwise complement)
Adding 1 to the 1s complement produces the 2s complement. Adding the contents
of R1 to the 2s complement of R2 is equivalent to subtracting the contents of R2
from R1 and storing the result in R3. We will describe the basic circuit required for
these micro-operations in the next unit.
The increment micro-operation adds one to a number in a register. This operation is
designated as:
R1 R1 + 1
This can be implemented in hardware by using a binary-up counter.
The decrement micro-operation subtracts one from a number in a register. This
operation is designated as:
R1 R1 1
This can be implemented using binary-down counter.
What about the multiply and division operations? Are not they micro-operations? In
most of the older computers multiply and divisions were implemented using
add/subtract and shift micro-operations. If a digital system has implemented division
and multiplication by means of combinational circuits, then we can call these as the
micro-operations for that system.
10010011 and
01010101
00010001 after AND operation.
Some of the common logic micro-operations are AND, OR, NOT or Complement,
Exclusive OR, NOR, and NAND. In many computers only four: AND, OR, XOR
(exclusive OR) and complement micro-operations are implemented.
Let us now discuss how these four micro-operations can be used in implementing
some of the important applications of manipulation of bits of a word, such as,
changing some bit values or deleting a group of bits. We are assuming that the result
42
We will play a trick with the manipulations we are performing. Let us select 1010 as
4 bit data for register R1, and 1100 data for register R2. Why? Because if you see the
bit combinations of R2, and R1, they represent the truth table entries (read from right
to left and bottom to top) 00, 01, 10 and 11. Thus, the resultant of the logical
operation on them will indicate which logic micro-operation is needed to be
performed for that data manipulation. The following table gives details on some of
these operations:
R1
R2
Operation
name
Selective Set
What is the
operation?
Sets those bits in
Register R1 for
which the
corresponding R2
bit is 1.
Selective Clear
Selective
Complement
Complement those
bits in register R1
for which the
corresponding R2
bits are 1.
Mask
Operations
Insert
1
1
0
1
1
0
0
0
The Central
Processing Unit
Clear
logical
arithmetic and
circular.
In logical shift the data entering by serial input to left most or right most flip-flop
(depending on right or left shift operations respectively) is a 0.
If we connect the serial output of a shift register to its serial input then we encounter a
circular shift. In circular shift left or circular shift right information is not lost, but is
circulated.
In arithmetic shift a signed binary number is shifted to the left or to the right. Thus,
an arithmetic shift-left causes a number to be multiplied by 2, on the other hand a
shift-right causes a division by 2. But as in division or multiplication by 2 the sign of
a number should not be changed, therefore, arithmetic shift must leave the sign bit
unchanged. We have already discussed about shift operations in the Unit 1.
Let us summarize micro-operations using the following table:
Sl. No.
1.
44
Micro-operations
Register transfer
2.
Arithmetic microoperations
3.
4.
Shift
Examples
R1 R2 (register transfer)
[MAR ] R1 (Register to memory)
ADD R1 R1 + R2
SUBTRACT R1 R1 + ( R 2 +1)
INCREMENT R1 R1 +1
DECREMENT R1 R1 1
AND
OR
COMPLEMENT
XOR
Left or right shift
Logical
Arithmetic
Circular
How does the memory read / operation carried out using system bus?
3.
4.
What are the differences between circular and logical shift micro-operations?
Instruction fetch: In this phase the instruction is brought from the address pointed
by PC to instruction register. The steps required are:
Transfer the address of PC to MAR. (Register Transfer)
MAR puts its contents on the address bus for main
memory location selection, the control unit instructs the
MAR to do so and also uses a memory read signal. The
word so read is placed on the data bus where it is
accepted by the Data register (Memory-read using bus.
MAR PC
DR (MAR), PC
PC +1
45
The Central
Processing Unit
IR (Address) and DR
(Address) contain the
Effective address.
MAR DR (Address)
DR (MAR)
IR (Address) DR
(Address)
Thus, the address portion of IR now contains the effective address, which is the direct
address of the operand.
Execution: Now the instruction is ready for execution. A different opcode will
require different sequence of steps for the execution. Therefore, let us discuss a few
examples of execution of some simple instructions for the purpose of identifying
some of the steps needed during instruction execution. Let us start the discussions
with a simple case of addition instruction. Suppose, we have an instruction: Add R1,
A which adds the content of memory location A to R1 register storing the result in
R1. This instruction will be executed in the following steps:
Transfer the address portion of the instruction to the
MAR. (Register transfer)
Read the memory location A and bring the operand
in the DR. (Memory read)
Add the DR with R1 using ALU and bring the
results back to R1. (Add micro-operations)
46
MAR IR (Address)
DR (MAR)
R1 R1 + DR
INCSKIP A
increments A and skips the next instruction if the content of A has become zero. This
is a complex instruction and requires intermediate decision-making. The micro
operations required for this instruction execution are:
Transfer the address portion of IR to the MAR.
(Register transfer)
Read memory. DR on reading will contain the
operand A. (Memory read)
Transfer the contents of DR to R1. We are
assuming that DR, although it can be used in
computation, it cannot be used as destination
of an ALU operation. Thus, we need to transfer
its content to a general purpose register R1
where the operation can be performed.
(Register transfer)
Increment the R1. (Increment micro-operation)
Transfer the content of R1 to DR. (Register
transfer)
MAR IR (Address)
DR (MAR)
R1 DR
R1 R1 +1
DR R1
(MAR) DR
Store the contents of DR- into the location A
using MAR. This operation proceeds through
as: Address bits are applied on address bus by
MAR. The data is put into the data bus. The
control unit providing control signal for
memory write, thus resulting in a memory
write at a location specified by MAR.
(Memory write)
If R1 = 0 then PC PC + 1
If the content of R1 is zero then increment PC
by one, thus skipping the next instruction. This
operation can be performed in parallel to the
memory write. Please note in the last step a
comparison and an action is taken as a single
step. This is possible as it is a simple
comparison based on status flags. (Increment
on a condition)
Let us now take an example of branching operation. Suppose we are using the first
location of subroutine to store the return address, then the steps involved in this
subroutine call (CALL A) can be:
Transfer the contents of address portion of
IR to MAR. (Register Transfer)
Transfer the return address, that is, the
contents of PC to DR. This micro-operation
can be performed in parallel to the previous
micro-operation. (Register transfer)
Transfer the branch address that is stored
in Address part of the instruction to program
counter. (Register transfer)
Store the DR using MAR. Thus, the return
address is stored at the first location of the
subroutine. (This operation normally is done
in stack, but in this example we are storing
the return address in the first location of the
subroutine). This micro-operation can be
performed in parallel to previous micro-
MAR IR (Address),
DR PC
PC IR (Address)
(MAR) DR
47
The Central
Processing Unit
PC PC + 1
Thus, the number of steps required in execution may differ from instruction to
instruction.
Interrupt Processing: On completion of the execution of an instruction, the machine
checks whether there is any pending interrupt request for the interrupts that are
enabled. If an enabled interrupt has occurred then that Interrupt may be processed.
The nature of interrupt varies from machine to machine. However, let us discuss one
simple illustration of interrupt processing events. A simple sequence of steps
followed in interrupt phase is:
DR PC
MAR Address of
location for saving return
address.
(MAR) DR
PC address of the first
instruction interrupt service
programs
After completing the above interrupt processing, CPU will fetch the next instruction
that may be interrupt service program instruction. Thus, during this time CPU might
be doing the interrupt processing or executing the user program. Please note each
instruction of interrupt service program is executed as an instruction in an instruction
cycle.
Please note for a complex machine the instruction cycle will not be as easy as this.
You can refer to further readings for more complex instruction cycles.
IF
ID
OF
EX
SR
10
11
1
Instruction
2
Instruction
3
Instruction
4
Instruction
5
Instruction
6
Instruction
7
IF
ID
OF
EX
SR
IF
ID
OF
EX
SR
IF
ID
OF
EX
SR
IF
ID
OF
EX
SR
IF
ID
OF
EX
SR
IF
ID
OF
EX
SR
The pipeline stages are like steps. Thus, a step of the pipeline is to be complete
in a time slot. The size of the time slot will be governed by the stage taking
maximum time. Thus, if the time taken in various stages is almost similar, we
get the best results.
The first instruction execution is completed on completion of 5th time slot, but
afterwards, in each time slot the next instruction gets executed. So, in ideal
conditions one instruction is executed in the pipeline in each time slot.
Please note that after the 5th time slot and afterwards the pipe is full. In the 5th
time slot the stages of execution of five instructions are:
SR
EX
OF
ID
IF
(instruction 1)
(instruction 2)
(instruction 3)
(instruction 4)
(instruction 5)
On the 5th time slot and later, there may be a register or memory conflict in the
instructions that are performing memory and register references that is various
stages may refer to same registers/memory location. This will result in slower
execution instruction pipeline that is one of the higher number instruction has to
wait till the lower number instructions completed, effectively pushing the whole
pipelining by one time slot.
How can we minimize the problems occurring due to the branch instructions?
We can use many mechanisms that may minimize the effect of branch penalty.
49
The Central
Processing Unit
T F
An instruction cycle does not include indirect cycle if the operands are stored in
the register.
2)
3)
Interrupt cycle results only in jumping to an interrupt service routine. The actual
processing of the instructions of this routine is performed in instruction cycle.
2.8 SUMMARY
In this unit, we have discussed in detail the register organisation and a simple
structure of the CPU. After this we have discussed in details the micro-operations and
their implementation in hardware using simple logical circuits. While discussing
micro-operations our main emphasis was on simple arithmetic, logic and shift microoperations, in addition to register transfer and memory transfer. The knowledge you
have acquired about register sets and conditional codes, helps us in giving us an idea
that conditional micro-operations can be implemented by simply checking flags and
conditional codes. This idea will be clearer after we go through Unit 3 and Unit 4.
We have completed the discussions on this unit, with providing a simple approach of
instruction execution with micro-operations. We have also defined the concepts of
Instruction Pipeline. We will be using this approach for discussing control unit details
in Unit 3 and Unit 4. The following table gives the details of various terms used in
this unit.
Micro-operations
Instruction pipeline
You will also get the details on 8086 microprocessor register sets, conditional codes,
instructions etc. in Unit 1 of Block 4.
You can refer to further readings for more register organisation examples and for
more details on micro-operations and instruction execution.
50
Registers, which are used only for the calculation of operand addresses, are
called address registers.
5 bits
It helps in implementing parallelism in the instruction execution unit.
Yes. Normally, the first few hundreds of words of memory are allocated for
storing control information.
2.
3.
4.
The bits circulate and after a complete cycle the data is still intact in circular
shift. Not so in logical shift.
51
ALU Organisation
Page No.
Introduction
Objectives
ALU Organisation
3.1
3.2
3.2.1
3.2.2
3.3
3.4
3.5
53
53
53
Arithmetic Processors
Summary
Solutions/ Answers
62
63
64
3.0 INTRODUCTION
By now we have discussed the instruction sets and register organisation followed by a
discussion on micro-operations and instruction execution. In this unit, we will first
discuss the ALU organisation. Then we will discuss the floating point ALU and
arithmetic co-processors, which are commonly used for floating point computations.
This unit provides a detailed view on implementation of simple micro-operations that
include registertransfer, arithmetic, logic and shift micro-operation. Finally, the
construction of a simple ALU is given. Thus, this unit provides you the basic insight
into the computer system. The next unit covers details of the control unit. Together
these units describe the two most important components of CPU: the ALU and the
CU.
3.1 OBJECTIVES
After going through this unit, you will be able to:
3.2
ALU ORGANISATION
53
The Central
Processing Unit
Bus
Accumulator
Register (AC)
Control
Multiplier
Quotient
Register (MQ)
Parallel Adder
and other Logic
Circuits
Data Register
(DR)
Control Unit
:
:
Flags
Signals
The above structure has three registers AC, MQ and DR for data storage. Let us
assume that they are equal to one word each. Please note that the Parallel adders and
other logic circuits (these are the arithmetic, logic circuits) have two inputs and only
one output in this diagram. It implies that any ALU operation at most can have two
input values and will generate single output along with the other status bits. In the
present case the two inputs are AC and DR registers, while output is AC register. AC
and MQ registers are generally used as a single AC.MQ register. This register is
capable of left or right shift operations. Some of the micro-operations that can be
defined on this ALU are:
Addition
: AC AC + DR
Subtraction
: AC AC DR
AND
: AC AC ^ DR
OR
: AC AC v DR
Exclusive OR
: AC AC (+) DR
NOT
: AC AC
In this ALU organisation multiplication and division were implemented using shiftadd/subtract operations. The MQ (Multiplier-Quotient register) is a special register
used for implementation of multiplication and division. We are not giving the details
of how this register can be used for implementing multiplication and division
algorithms. For more details on these algorithms please refer to further readings. One
such algorithm is Booths algorithm and you must refer to it in further readings.
For multiplication or division operations DR register stores the multiplicand or
divisor respectively. The result of multiplication or division on applying certain
algorithm can finally be obtained in AC.MQ register combination. These operations
can be represented as:
Multiplication : AC.MQ DR MQ
Division
54
: AC.MQ MQ DR
ALU Organisation
DR is another important register, which is used for storing second operand. In fact it
acts as a buffer register, which stores the data brought from the memory for an
instruction. In machines where we have general purpose registers any of the registers
can be utilized as AC, MQ and DR.
Bit Slice ALUs
It was feasible to manufacture smaller such as 4 or 8 bits fixed point ALUs on a
single IC chip. If these chips are designed as expendable types then using these 4 or 8
bit ALU chips we can make 16, 32, 64 bit array like circuits. These are called bitslice ALUs.
The basic advantage of such ALUs is that these ALUs can be constructed for a
desired word size. More details on bit-slice ALUs can be obtained from further
readings.
2.
3.
Register B
Register C
Register D
2 3
55
The Central
Processing Unit
0 1 2 3
41
MUX 0
0 1 2 3
41
MUX 2
0 1 2 3
41
MUX 1
0 1 2 3
41
MUX 3
C0
C1
When C1 C0 = 00, the 0th data input of all multiplexers are selected and this causes the
bus lines to receive the content of register A since the outputs of register A are
connected to the 0th data inputs of the multiplexers which is then applied to the output
that forms the bus. Similarly, when C1 C0 = 01, register B is selected, and so on. The
following table shows the register that is selected for each of the four possible values
of the selection lines:
C1
C0
Register Selected
D
Figure 3: Bus Line Selection
To construct a bus for 8 registers of 16 bits each, you would require 16 multiplexers,
one for each line in the bus. The number of multiplexers needed to construct the bus
is equal to the number of bits in each register. Each multiplexer must have eight data
input lines and three selection lines (2 3 = 8) to multiplex one bit in the eight registers.
Implementation of Arithmetic Circuits for Arithmetic Micro-operation
An arithmetic circuit can be implemented using a number of full adder circuits or
parallel adder circuits. Figure 4 shows a logical implementation of a 4-bit arithmetic
circuit. The circuit is constructed by using 4 full adders and 4 multiplexers.
56
ALU Organisation
The diagram of a 4-bit arithmetic circuit has four 41 multiplexers and four full
adders (FA). Please note that the FULL ADDER is a circuit that can add two input
bits and a carry-in bit to produce one sum-bit and a carry-out-bit.
So what does the adder do? It just adds three bits. What does the multiplexer do? It
controls one of the input bits. Thus, such combination produces a series of microoperations.
Let us find out how the multiplexer control lines will change one of the Inputs for
Adder circuit. Please refer to the following table. (Please note the convention VALID
ONLY FOR THE TABLE are that an uppercase alphabet indicates a Data Word,
whereas the lowercase alphabet indicates a bit.)
Control
Input
Output of 4 1 Multiplexers
Y input
to
Comments
57
The Central
Processing Unit
S1
S0
MUX(a)
MUX(b)
MUX(c)
MUX(d)
Adder
b0
b1
b2
b3
b0
b1
b2
b3
FH
Now let us discuss how by coupling carry bit (Cin) with these input bits we can obtain
various micro-operations.
Input to Circuits
Register A bits as a0, a1, a2 and a3 in the corresponding X bits of the Full Adder
(FA).
Please note each bit of register A and register B is fed to different full adder
unit.
Please also note that each of the four inputs from A are applied to the X inputs
of the binary adder and each of the four inputs from B are connected to the data
inputs of the multiplexers. It means that the A input directly goes to adder but
B input can be manipulated through the Multiplexer to create a number of
different input values as given in the figure above. The B inputs through
multiplexers are controlled by two selection lines S1 and S0. Thus, using various
combinations of S1 and S0 we can select data bits of B, complement of B, 0
word, or word having All 1s.
The input carry Cin, which can be equal to 0 or 1, goes to the carry input of the
full adder in the least significant position. The other carries are cascaded from
one stage to the next. Logically it is the same as that of addition performed by
us. We do pass the carry of lower digits addition to higher digits. The output of
the binary adder is determined from the following arithmetic sum:
D = X + Y + Cin
OR
D = A + Y + Cin
By controlling the value of Y with the two selection lines S1 and S0 and making Cin
equal to 0 or 1, it is possible to implement the eight arithmetic micro-operations listed
in the truth table.
58
S1
S0
Cin
Y
val
B
D = A+Y +Cin
D=A+B
Equivalent
Micro-Operation
R R1 + R2
Micro-Operation
Name
Add
ALU Organisation
D=A+B+1
R R1 + R2 + 1
D =A+B
R R1 + R 2
D = A + B+ 1
1
1
0
0
0
1
0
0
D=A
D=A+1
R R1 + 2's
complement of R2
R R1
R R1 + 1
D=A1
R R1 + (All 1s)
Decrement
D=A
R R1
Transfer
1
1
D = A + B
D = (A 1) + (B + 1)
D = (A 1) + 2s complement of B
D = (A 1) B
Thus, is the name complement with Borrow
When S1S2 = 10, input value 0 is applied to Y inputs of the full adder.
If Cin = 0, then output D = A + 0 + Cin => D = A
If Cin = 1, then D = A + 0 +1 => D = A + 1
The first is a simple data transfer micro-operation; while the second is an increment
micro-operation.
When S1S2 = 11, input word all 1s is applied to Y inputs of the full adder.
If Cin = 0, then output D = A + All (1s) + Cin => D = A 1 (How? Let us
explain with the help of the following example).
59
The Central
Processing Unit
Example: Let us assume that the Register A is of 4 bits and contains the value 0101
and it is added to an all (1) value as:
0101
1111
1 0100
The 1 is carry out and is discarded. Thus, on addition with all (1s) the number has
actually got decremented by one.
If Cin = 1, then D = A + All(1s) +1 => D = A
The first is the decrement micro-operation; while the second is a data transfer microoperation.
Please note that the micro-operation D = A is generated twice, so there are only seven
distinct micro-operations possible through the proposed arithmetic circuit.
Implementation of Logic Micro-operations
For implementation, let us first ask the questions how many logic operations can be
performed with two binary variables. We can have four possible combinations of
input of two variables. These are 00, 01, 10, and 11. Now, for all these 4 input
combinations we can have 24 = 16 output combinations of truth-values for a function.
This implies that for two variables we can have 16 logical operations. The above
stated fact will be clearer by going through the following figure.
I3
0
0
0
I2
0
0
0
I1
0
0
1
I0
0
1
0
Function
F0 = 0
F1 = x. y
Operation
R 0
R R1R2
F2 = x. y
R R1 R 2
0
0
0
1
1
0
1
0
F3 = x
R R1
F4 = x . y
R R 1 R2
0
0
0
1
1
1
1
0
0
1
1
0
1
0
1
0
F5 = y
F6 = x y
F7 = x + y
R R2
R R1 R2
R R1 R2
F9 = (x y )
R (R 1 R 2 )
Exclusive NOR
F10 = y
R R2
Complement of R2
F11 = x + y
R R1 R 2
F12 = x
R R 1
R1 OR with
complement R2
Complement of R1
F13 = x + y
R R 1 R2
F14 = ( x.y)
F15 = 1
R (R 1 R 2 )
R All 1s
F8 = (x + y )
R (R 1 R 2 )
Comments
Clear
AND
R1 AND with
complement R2
Transfer of R1
R2 AND with
complement R1
Transfer of R2
Exclusive OR
OR
NOR
R2 OR with
complement R1
NAND
Set all the Bits to 1
Please note that in the figure above the micro-operations are derived by replacing the
x and y of Boolean function with registers R1 and R2 on each corresponding bit of
the registers R1 and R2. Each of these bits will be treated as binary variables.
In many computers only four: AND, OR, XOR (exclusive OR) and complement
micro-operations are implemented. The other 12 micro-operations can be derived
60
ALU Organisation
from these four micro-operations. Figure 8 shows one bit, which is the ith bit stage of
the four logic operations. Please note that the circuit consists of 4 gates and a 4 1
MUX. The ith bits of Register R1 and R2 are passed through the circuit. On the basis
of selection inputs S0 and S1 the desired micro-operation is obtained.
Please note that in this figure we have given reference to two previous figures for
arithmetic and logic circuits. This stage of ALU has two data inputs; the ith bits of the
registers to be manipulated. However, the (i 1)th or (i+1)th bit is also fed for the case
of shift micro-operation of only one register. There are four selection lines, which
determine what micro-operation (arithmetic, logic or shift) on the input. The Fi is the
resultant bit after desired micro-operation. Let us see how the value of Fi changes on
the basis of the four select inputs. This is shown in Figure 10:
Please note that in Figure 10 arithmetic micro-operations have both S3 and S2 bits as
zero. Input Ci is important for only arithmetic micro-operations. For logic microoperations S3, S2 values are 01. The values 10 and 11 cause shift micro-operations.
61
The Central
Processing Unit
For this shift micro-operation S1 and S0 values and Ci values do not play any role.
S3
S2
S1
S0
Ci
0
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
0
1
0
1
F=x
F = x+1
F = x+y
F = x+y+1
Microoperation
RR1
RR1+1
RR1+R2
RR1+R2+1
F = x+ y
RR1+ R 2
RR1 R2
0
0
0
0
1
1
1
1
0
1
F = x+( y +1)
F=x1
F=x
Transfer
Increment
Addition
Addition
with carry
Subtract
with borrow
Subtract
RR1 1
RR1
Decrement
Transfer
0
0
0
1
1
1
0
0
1
0
1
0
F = x.y
F = x+y
F = x y
RR1 R2
RR1 R2
RR1 R2
F=
AND
OR
Exclusive
OR
Complement
1
1
0
1
F = Shl(x)
F = Shr(y)
R R 1
R Shl(R1)
RShr(R1)
Name
Shift left
Shift right
Arithmetic
Micro-operation
Logic
Micro-operation
Shift Microoperations
ALU Organisation
2.
3.4 SUMMARY
In this unit, we have discussed in detail the hardware implementation of microoperations. The unit starts with an implementation of bus, which is the backbone for
any register transfer operation. This is followed by a discussion on arithmetic circuit
and micro-operation thereon using full adder circuits. The logic micro-operation
implementation has also been discussed. Thus, leading to a logical construction of a
simple arithmetic logic shift unit. The unit revolves around the basic ALU with the
help of the units that are constructed for the implementation of micro-operations.
In the later part of the unit, we discussed the arithmetic processors. Finally, we have
presented a few chipsets that support the working of a processor for input/output
functions from key board, printer etc.
1.
2.
3.
False
False
True
The Central
Processing Unit
1.
2.
64
Introduction
Objectives
The Control Unit
The Hardwired Control
Wilkes Control
The Micro-Programmed Control
The Micro-Instructions
Page No.
65
65
65
71
72
74
75
4.7
4.8
4.9
78
81
81
4.0 INTRODUCTION
By now we have discussed instruction sets and register organisation followed by a
discussion on micro-operations and a simple arithmetic logic unit circuit. We have
also discussed the floating point ALU and arithmetic processors, which are commonly
used for floating point computations.
In this unit we are going to discuss the functions of a control unit, its structure
followed by the hardwired type of control unit. We will discuss the microprogrammed control unit, which are quite popular in modern computers because of
flexibility in designing. We will start the discussion with several definitions about the
unit followed by Wilkes control unit. Finally, we will discuss the concepts involved in
micro-instruction execution.
4.1 OBJECTIVES
After going through this unit you will be able to:
The Central
Processing Unit
But how does a control unit control the above operations? What are the functional
requirements of the control unit? What is its structure? Let us explore answers of these
questions in the next sections.
Functional Requirements of a Control Unit
Let us first try to define the functions which a control unit must perform in order to
get things to happen. But in order to define the functions of a control unit, one must
know what resources and means it has at its disposal. A control unit must know about
the:
(a) Basic components of the CPU
(b) Micro-operation this CPU performs.
The CPU of a computer consists of the following basic functional components:
The Arithmetic Logic Unit (ALU), which performs the basic arithmetic and
logical operations.
Registers which are used for information storage within the CPU.
Internal Data Paths: These paths are useful for moving the data between two
registers or between a register and ALU.
External Data Paths: The roles of these data paths are normally to link the CPU
registers with the memory or I/O interfaces. This role is normally fulfilled by the
system bus.
The Control Unit: This causes all the operations to happen in the CPU.
Micro- operations for performing arithmetic, logic and shift operations. These
micro-operations involve use of registers for input and output.
The basic responsibility of the control unit lies in the fact that the control unit must be
able to guide the various components of CPU to perform a specific sequence of microoperations to achieve the execution of an instruction.
What are the functions, which a control unit performs to make an instruction
execution feasible? The instruction execution is achieved by executing microoperations in a specific sequence. For different instructions this sequence may be
different. Thus the control unit must perform two basic functions:
But how are these two tasks achieved? The control unit generates control signals,
which in turn are responsible for achieving the above two tasks. But, how are these
control signals generated? We will answer this question in later sections. First let us
discuss a simple structure of control unit.
66
A control unit has a set of input values on the basis of which it produces an output
control signal, which in turn performs micro-operations. These output signals control
the execution of a program. A general model of control unit is shown in Figure 1.
In the model given above the control unit is a black box, which has certain inputs and
outputs.
The inputs to the control unit are:
Flags: Flags are used by the control unit for determining the status of the CPU &
the outcomes of a previous ALU operation. For example, a zero flag if set
conveys to control unit that for instruction ISZ (skip the next instruction if zero
flag is set) the next instruction is to be skipped. For such a case control unit cause
increment of PC by program instruction length, thus skipping next instruction.
Control Signals from Control Bus: Some of the control signals are provided to
the control unit through the control bus. These signals are issued from outside the
CPU. Some of these signals are interrupt signals and acknowledgement signals.
On the basis of the input signals the control unit activates certain output control
signals, which in turn are responsible for the execution of an instruction. These output
control signals are:
Control signals, which are required within the CPU: These control signals
cause two types of micro-operations, viz., for data transfer from one register to
another; and for performing an arithmetic, logic and shift operation using ALU.
Control signals to control bus: These control signals transfer data from or to
CPU register to or from memory or I/O interface. These control signals are
issued on the control bus to activate a data path on the data / address bus etc.
67
The Central
Processing Unit
Now, let us discuss the requirements from such a unit. A prime requirement for
control unit is that it must know how all the instructions will be executed. It should
also know about the nature of the results and the indication of possible errors. All this
is achieved with the help of flags, op-codes, clock and some control signals to itself.
A control unit contains a clock portion that provides clock-pulses. This clock signal is
used for measuring the timing of the micro-operations. In general, the timing signals
from control unit are kept sufficiently long to accommodate the proportional delays of
signals within the CPU along various data paths. Since within the same instruction
cycle different control signals are generated at different times for performing different
micro-operations, therefore a counter can be utilised with the clock to keep the count.
However, at the end of each instruction cycle the counter should be reset to the initial
condition. Thus, the clock to the control unit must provide counted timing signals.
Examples, of the functionality of control units along with timing diagrams are given
in further readings.
How are these control signals applied to achieve the particular operation? The
control signals are applied directly as the binary inputs to the logic gates of the logic
circuits. All these inputs are the control signals, which are applied to select a circuit
(for example, select or enable input) or a path (for example, multiplexers) or any other
operation in the logic circuits.
A program execution consists of a sequence of instruction cycles. Each instruction
cycle is made up of a number of sub cycles. One such simple subdivision includes
fetch, indirect, execute, and interrupt cycles, with only fetch and execute cycles
always occurring. Each sub cycle involves one or more micro-operations.
Let us revisit the micro-operations described in Unit 2 to discuss how the events of
any instruction cycle can be described as a sequence of such micro-operations.
The Fetch Cycle
The beginning of each instruction cycle is the fetch cycle, and causes an instruction to
be fetched from memory.
The fetch cycle consists of four micro-operations that are executed in three timing
steps. The fetch cycle can be written as:
T1 : MAR PC
T2 : MBR [MAR]
PC PC + I
MBR
T3 : IR
where I is the instruction length. We assume that a clock is available for timing
purposes and that it emits regularly spaced clock pulses. Each clock pulse defines a
time unit. Thus, all the units are of equal duration. Each micro-operation can be
performed within the time of a single time unit. The notation (T1, T2, T3) represents
successive time units. What is done in these time units?
In the second time unit the contents of memory location specified by MAR is
moved to MBR and the contents of the PC is incremented by I.
In the third time unit the content of MBR is moved to IR.
Once an instruction is fetched, the next step is to fetch the operands. Considering the
same example as of Unit 2, the instruction may have direct and indirect addressing
modes. An indirect address is handled using indirect cycle. The following microoperations are required in the indirect cycle:
T1 : MAR IR (address)
T2 : MBR [MAR]
T3 : IR (address) MBR (address)
The MAR is loaded with the address field of IR register. Then the memory is read to
fetch the address of operand, which is transferred to the address field of IR through
MBR as data is received in MBR during the read operation.
Thus, the IR now is in the same state as of direct address, viz., as if indirect addressing
had not been used. IR is now ready for the execute cycle.
The Execute Cycle
The fetch and indirect cycles involve a small, fixed sequence of micro-operations.
Each of these cycles has fixed sequence of micro-operations that are common to all
instructions.
This is not true of the execute cycle. For a machine with N different opcodes, there are
N different sequences of micro-operations that can occur. Let us consider some
hypothetical instructions:
An add instruction that adds the contents of memory location X to Register R1 with
R1 storing the result:
ADD R1, X
The sequence of micro-operations may be:
T1 : MAR IR (address)
T2 : MBR [MAR]
T3 : R1
R1 + MBR
At the beginning of the execute cycle IR contains the ADD instruction and its direct
operand address (memory location X). At time T1, the address portion of the IR is
transferred to the MAR. At T2 the referenced memory location is read into MBR
Finally, at T3 the contents of R1 and MBR are added by the ALU.
Let us discuss one more instruction:
ISZ X, it increments the content of memory location X by 1. If the result is 0, the next
instruction in the sequence is skipped. A possible sequence of micro-operations for
this instruction may be:
T1 : MAR IR (address)
T2 : MBR [MAR]
T3 : MBR MBR+ 1
T4 : [MAR] MBR
If (MBR = 0) then (PC PC+ I )
Please note that for this machine we have assumed that MBR can be incremented by
ALU directly.
69
The Central
Processing Unit
The PC is incremented if MBR contains 0. This test and action can be implemented as
one micro-operation. Note also that this micro-operation can be performed during the
same time unit during which the updated value in MBR is stored back to memory.
Such instructions are useful in implementing looping.
The Interrupt Cycle
On completion of the execute cycle the current instruction execution gets completed.
At this point a test is made to determine whether any enabled interrupts have occurred.
If so, the interrupt cycle is performed. This cycle does not execute an interrupt but
causes start of execution of Interrupt Service Program (ISR). Please note that ISR is
executed as just another program instruction cycle. The nature of this cycle varies
greatly from one machine to another. A typical sequence of micro-operations of the
interrupt cycle are:
T1 : MBR PC
T2 : MAR Save-Address
PC
ISR- Address
T3 : [MAR] MBR
At time T1, the contents of the PC are transferred to the MBR, so that they can be
saved for return from the interrupt. At time T2 the MAR is loaded with the address at
which the contents of the PC are to be saved, and PC is loaded with the address of the
start of the interrupt-servicing routine. At time T3 MBR, which contains the old value
of the PC, is stored in the memory. The processor is now ready to begin the next
instruction cycle.
The Instruction Cycle
The instruction cycle for this given machine consists of four cycles. Assume a 2-bit
instruction cycle code (ICC). The ICC can represent the state of the processor in terms
of cycle. For example, we can use:
00 : Fetch
01 : Indirect
10 : Execute
11 : Interrupt
At the end of each of the four cycles, the ICC is set appropriately. Please note that an
indirect cycle is always followed by the execute cycle and the interrupt cycle is
always followed by the fetch cycle. For both the execute and fetch cycles, the next
cycle depends on the state of the system. Let us show an instruction execution using
timing diagram and instruction cycles:
70
Please note that the address line determine the location of memory. Read/ write signal
controls whether the data is being input or output. For example, at time T2 in M2 the
read control signal becomes active, A9 A0 input contains MAR that value is kept
enabled on address bits and the data lines are enabled to accept data from RAM, thus
enabling a typical RAM data output on the data bus.
For reading no data input is applied by CPU but it is put on data bus by memory after
the read control signal to memory is activated. Write operation is activated along with
data bus carrying the output value.
This diagram is used for illustration of timing and control. However, more
information on these topics can be obtained from further readings.
The Central
Processing Unit
A decoder will have n binary inputs and 2n binary outputs. Each of these 2n different
input patterns will activate a single unique output line.
The clock portion of the control unit issues a repetitive sequence of pulses for the SS
duration of micro-operation(s). These timing signals control the sequence of execution
of instruction and determine what control signal needs to applied at what time for
instruction execution.
2.
3.
Control field which indicates the control lines which are to be activated and
Address field, which provides the address of the next microinstruction to be
executed.
The control memory in Wilkes control is organized, as a PLAs like matrix made of
diodes. This is partial matrix and consists of two components, the control signals and
the address of the next micro-instruction. The register I contains the address of the
next micro-instruction that is one step of instruction execution, for example T1 in M1
or T2 in M2 etc. as in Figure 2. On decoding the control signals are generated that
cause execution of micro-operation(s) of that step. In addition, the control unit
indicates the address of the next micro-operation which gets loaded through register II
to register I. Register I can also be loaded by register II and enable IR input control
signal. This will pass the address of first micro-instruction of execute cycle. During a
machine cycle one row of the matrix is activated. The first part of the row generates
the control signals that control the operations of the processor. The second part
generates the address of the row to be selected in the next machine cycle.
At the beginning of the cycle, the address of the row to be selected is contained in
register I. This address is the input to the decoder, which is activated by a clock pulse.
This activates the row of the control matrix. The two-register arrangement is needed,
as the decoder is a combinational circuit; with only one register, the output would
become the input during a cycle. This may be an unstable condition due to repetitive
loop.
73
The Central
Processing Unit
The micro-instructions are stored in the control memory. The address register for the
control memory contains the address of the next instruction that is to be read. The
control memory Buffer Register receives the micro-instruction that has been read. A
micro-instruction execution primarily involves the generation of desired control
signals and signals used to determine the next micro-instruction to be executed. The
sequencing logic section loads the control memory address register. It also issues a
read command to control memory. The following functions are performed by the
micro-programmed control unit:
1.
2.
74
The sequence logic unit specifies the address of the control memory word that is
to be read, in the Address Register of the Control Memory. It also issues the
READ signal.
The desired control memory word is read into control memory Buffer Register.
3.
4.
The content of the control memory buffer register is decoded to create control
signals and next-address information for the sequencing logic unit.
The sequencing logic unit finds the address of the next control word on the basis
of the next-address information from the decoder and the ALU flags.
As we have discussed earlier, the execute cycle steps of micro-operations are different
for all instructions in addition the addressing mode may be different. All such
information generally is dependent on the opcode of the instruction Register (IR).
Thus, IR input to Address Register for Control Memory is desirable. Thus, there exist
a decoder from IR to Address Register for control memory. (Refer Figure 5). This
decoder translates the opcode of the IR into a control memory address.
2.
What will be the control signals and address of the next micro-instruction in the
Wilkes control example of Figure 4, if the entry address for a machine instruction
selects the last but one (branching control line) and the conditional bit value for
branch is true?
..................................................................................................................................
......................................................................................................................
..
75
The Central
Processing Unit
The next important question about the micro-instruction is: how are they organized in
the control memory? One of the simplest ways to organize control memory is to
arrange micro-instructions for various sub cycles of the machine instruction in the
memory. The Figure 6 shows such an organisation.
such a micro-instruction may be hundreds of bits. A typical horizontal microinstruction with its related fields is shown in Figure 7(a).
In a vertical micro-instruction many similar control signals can be encoded into a few
micro-instruction bits. For example, for 16 ALU operations, which may require 16
individual control bits in horizontal micro-instruction, only 4 encoded bits are needed
in vertical micro-instruction. Similarly, in a vertical micro-instruction only 3 bits are
needed to select one of the eight registers. However, these encoded bits need to be
passed from the respective decoders to get the individual control signals. This is
shown in figure 7(b).
77
The Central
Processing Unit
In general, a horizontal control unit is faster, yet requires wider instruction words,
whereas vertical control units, although; require a decoder, are shorter in length. Most
of the systems use neither purely horizontal nor purely vertical micro-instructions
figure 7(c).
2.
3.
4.
78
One bit is needed for each control signal; therefore, the number of bits required
in a micro-instruction is high.
It presents a detailed hardware view, as control signal need can be determined.
Since each of the control signals can be controlled individually, therefore these
micro-instructions are difficult to program. However, concurrency can be
exploited easily.
Almost no control logic is needed to decode the instruction as there is one to
one mapping of control signals to a bit of micro-instruction. Thus, execution of
micro-instruction and hence the micro-program is faster.
The unencoded micro-instruction aims at optimising the performance of a
machine.
In most of the cases, the design is kept between the two extremes. The LSI 11 (highly
encoded) and IBM 3033 (unencoded) control units are close examples of these two
approaches.
Execution/decoding of slightly encoded micro-instructions
In general, the micro-programmed control unit designs are neither completely
unencoded nor highly encoded. They are slightly coded. This reduces the width of
control memory and micro-programming efforts. The basic technique for encoding is
shown in Figure 8. The micro-instruction is organised as a set of fields. Each field
contains a code, which, upon decoding, activates one or more control signals. The
execution of a micro-instruction means that every field is decoded and generates
control signals. Thus, with N fields, N simultaneous actions can be specified. Each
action results in the activation of one or more control signals. Generally each control
signal is activated by no more than one field. The design of an encoded microinstruction format can be stated in simple terms:
Organize the format into independent fields. That is, each field depicts a set of
actions such that actions from different fields can occur simultaneously.
Define each field such that the alternative actions that can be specified by the
field are mutually exclusive. That is, only one of the actions specified for a
given field could occur at a time.
Another aspect of encoding is whether it is direct or indirect (Figure 8). With indirect
encoding, one field is used to determine the interpretation of another field.
Another aspect of micro-instruction execution is the micro-instruction sequencing that
involves address calculation of the next micro-instruction. In general, the next microinstruction can be (refer Figure 6):
A detailed discussion on these topics is beyond this unit. You must refer to further
readings for more detailed information on Micro-programmed Control Unit Design.
79
The Central
Processing Unit
Figure (a):
Figure (b):
Figure 8: Micro-instruction Encoding
2.
a)
b)
c)
d)
e)
f)
Status bits supplied from ALU to sequencing logic have no role to play
with the sequencing of micro-instruction.
3.
80
4.
...............................................................................................................................
..............................................................................................................
4.8 SUMMARY
In this unit we have discussed the organization of control units. Hardwired, Wilkes
and micro-programmed control units are also discussed. The key to such control units
are micro-instruction, which can be briefly (that is types and formats) described in this
unit. The function of a micro-programmed unit, that is, micro-programmed execution,
has also been discussed. The control unit is the key for the optimised performance of a
computer. The information given in this unit can be further appended by going
through further readings.
2.
3.
(a) False (b) False (c) True (d) False (e) True (f) False.
3.
Wilkes control typically has one address field. However, for a conditional
branching micro-instruction, it contains two addresses. The Wilkes control, in
fact, is a hardware representation of a micro-programmed control unit.
4.
Unencoded Micro instructions
Large number of bits
Difficult to program
No decoding logic
Highly encoded
Relatively less bits
Easy to program
Need decoding logic
81
The Central
Processing Unit
82
Optimizes machine
performances
Detailed hardware view
Reduced Instruction
Set Computer
Architecture
Introduction
Objectives
Introduction to RISC
5.2.1
5.2.2
5.2.3
5.3
5.4
5.5
5.6
5.7
5.8
Page No.
83
83
83
RISC Architecture
The Use of Large Register File
Comments on RISC
RISC Pipelining
Summary
Solutions/ Answers
88
90
93
94
98
98
5.0 INTRODUCTION
In the previous units, we have discussed the instruction set, register organization and
pipelining, and control unit organization. The trend of those years was to have a large
instruction set, a large number of addressing modes and about 16 32 registers.
However, their existed a pool of thought which was in favour of having simplicity in
instruction set. This logic was mainly based on the type of the programs, which were
being written for various machines. This led to the development of a new type of
computers called Reduced Instruction Set Computer (RISC). In this unit, we will
discuss about the RISC machines. Our emphasis will be on discussing the basic
principles of RISC and its pipeline. We will also discuss the arithmetic and logic units
here.
5.1 OBJECTIVES
After going through this unit you should be able to:
83
The Central
Processing Unit
If we review the history of computer families, we find that the most common
architectural change is the trend towards even more complex machines.
Reduced Instruction
Set Computer
Architecture
MO M1 + M2 and EO E2
MO M1 + M2 and EO E1
If all these steps are coded as one machine instruction, then this simple instruction will
require many instruction execution cycles. If this instruction is made as part of the
machine instruction set as: ADDF A,B (Add floating point numbers A & B and store
results in A) then it will just be a single machine instruction. All the above steps
required will then be coded with the help of micro-operations in the form of Control
Unit Micro-Program. Thus, just one instruction cycle (although a long one) may be
needed. This cycle will require just one instruction fetch. Whereas in the program
memory instructions will be fetched.
However, faster cache memory for Instruction and data stored in registers can create
an almost similar instruction execution environment. Pipelining can further enhance
such speed. Thus, creating an instruction as above may not result in faster execution.
Microcode and VLSI Technology
It is considered that the control unit of a computer be constructed using two ways;
create micro-program that execute micro-instructions or build circuits for each
instruction execution. Micro-programmed control allows the implementation of
complex architectures more cost effective than hardwired control as the cost to expand
an instruction set is very small, only a few more micro-instructions for the control
store. Thus, it may be reasoned that moving subroutines like string editing, integer to
floating point number conversion and mathematical evaluations such as polynomial
evaluation to control unit micro-program is more cost effective.
Code Density and Smaller Faster Programs
The memory was very expensive in the older computer. Thus there was a need of less
memory utilization, that is, it was cost effective to have smaller compact programs.
Thus, it was opined that the instruction set should be more complex, so that programs
are smaller. However, increased complexity of instruction sets had resulted in
85
The Central
Processing Unit
instruction sets and addressing modes requiring more bits to represent them. It is
stated that the code compaction is important, but the cost of 10 percent more memory
is often far less than the cost of reducing code by 10 percent out of the CPU
architecture innovations.
The smaller programs are advantageous because they require smaller RAM space.
However, today memory is very inexpensive, this potential advantage today is not so
compelling. More important, small programs should improve performance. How?
Fewer instructions mean fewer instruction bytes to be fetched.
However, the problem with this reasoning is that it is not certain that a CISC program
will be smaller than the corresponding RISC program. In many cases CISC program
expressed in symbolic machine language may be smaller but the number of bits of
machine code program may not be noticeably smaller. This may result from the
reason that in RISC we use register addressing and less instruction, which require
fewer bits in general. In addition, the compilers on CISCs often favour simpler
instructions, so that the conciseness of complex instruction seldom comes into play.
Let us explain this with the help of the following example:
Assumptions:
The Complex Instruction is: Add C, A, B having 16 bit addresses and 8 bit data
operands
All the operands are direct memory reference operands
The machine has 16 registers. So the size of a register address is = 24 = 16 = 4
bits.
The machine uses an 8-bit opcode.
8
8
Add
16
C
16
A
Load
Load
Add
Store
16
B
rA
rB
rC
rC
4
A
B
rA
C
16
rB
Memory-to-Memory
Register-to-Register
Instruction size (I) = 56 bits
I = 104 bits
Data Size
(D) = 24 bits
D = 24bits
Total Memory Load (M) = 80 bits
M = 128 bits
(a) Add A & B to store result in C
8
Add
Add
Sub
16
C
A
D
16
A
C
D
16
B
D
B
8
Load
Load
Add
Load
Add
Sub
Store
4
rA
rB
rC
rD
rA
rD
rD
16
A
B
rB
D
rC
rD
D
rA
Rd
rB
Memory-to-Memory
Register-to-Register
Instruction size (I) = 168 bits
I = 172 bits
Data Size
(D) = 72 bits
D = 32bits
Total Memory Load (M) = 240 bits
M = 204 bits
(b) Execution of the Instruction Sequence: C = A + B, A = C + D, D = D - B
Figure 1: Program size for different Instruction Set Approaches
86
So, the expectation that a CISC will produce smaller programs may not be realised.
Support for High-Level Language
Reduced Instruction
Set Computer
Architecture
With the increasing use of more and higher level languages, manufacturers had
provided more powerful instructions to support them. It was argued that a stronger
instruction set would reduce the software crisis and would simplify the compilers.
Another important reason for such a movement was the desire to improve
performance.
However, even though the instructions that were closer to the high level languages
were implemented in Complex Instruction Set Computers (CISCs), still it was hard to
exploit these instructions since the compilers were needed to find those conditions that
exactly fit those constructs. In addition, the task of optimising the generated code to
minimise code size, reduce instruction execution count, and enhance pipelining is
much more difficult with such a complex instruction set.
Another motivation for increasingly complex instruction sets was that the complex
HLL operation would execute more quickly as a single machine instruction rather
than as a series of more primitive instructions. However, because of the bias of
programmers towards the use of simpler instructions, it may turn out otherwise. CISC
makes the more complex control unit with larger microprogram control store to
accommodate a richer instruction set. This increases the execution time for simpler
instructions.
Thus, it is far from clear that the trend to complex instruction sets is appropriate. This
has led a number of groups to pursue the opposite path.
Operations
Simple assignment 3545%
Procedure Calls
Most time consuming
operation.
Looping 2-6%
Observations
The Central
Processing Unit
Most of the scalars were found to be local variables whereas most of the arrays or
structures were global variables.
Most of the dynamically called procedures pass lower than six arguments.
The numbers of scalar variables are less than six.
A good machine design should attempt to optimize the performance of most time
consuming features of high-level programs.
Performance can be improved by more register references rather than having more
memory references.
There should be an optimized instructional pipeline such that any change in flow
of execution is taken care of.
Reduced Instruction
Set Computer
Architecture
One instruction per cycle: A machine cycle is the time taken to fetch two
operands from registers, perform the ALU operation on them and store the
result in a register. Thus, RISC instruction execution takes about the same time
as the micro-instructions on CISC machines. With such simple instruction
execution rather than micro-instructions, it can use fast logic circuits for control
unit, thus increasing the execution efficiency further.
Thus, RISC is potentially a very strong architecture. It has high performance potential
and can support VLSI implementation. Let us discuss these points in more detail.
The Central
Processing Unit
90
Registers Nos.
09
10 83
84 89
(6 Registers)
90 99
(10 Registers)
100 105
(6 Registers)
106 115
(10 Registers)
Used for
Global variables
required by fA, fB, and
fC
Unused
Used by parameters of
fC that may be passed
to next call
Used for local variable
of fC
Function A
Used by parameters
that were passed from
fB fC
Local variables of fB
116 121
(6 Registers)
122 131
(10 Registers)
Local variable of fA
132 138
(6 Registers)
Parameter passed to fA
Temporary
variables of
function A
Local
variables of
function A
Parameters
of function
A
Function B
Temporary
variables of
function B
Local
variables of
function B
Parameters
of function
B
Function C
Reduced Instruction
Set Computer
Architecture
Temporary
variables of
function C
Local
variables of
function C
Parameters
of function
C
Please note the functioning of the registers: at any point of time the global registers
and only one window of registers is visible and is addressable as if it were the only set
of registers. Thus, for programming purpose there may be only 32 registers. Window
in the above example although has a total of 138 registers. This window consists of:
But what is the maximum function calls nesting can be allowed through RISC? Let us
describe it with the help of a circular buffer diagram, technically the registers as above
have to be circular in the call return hierarchy.
This organization is shown in the following figure. The register buffer is filled as
function A called function B, function B called function C, function C called function
D. The function D is the current function. The current window pointer (CWP) points
to the register window of the most recent function (function D in this case). Any
register references by a machine instruction is added with the contents of this pointer
to determine the actual physical registers. On the other hand the saved window
pointer identifies the window most recently saved in memory. This action will be
needed if a further call is made and there is no space for that call. If function D now
calls function E arguments for function E are placed in Ds temporary registers
indicated by D temp and the CWP is advanced by one window.
91
The Central
Processing Unit
If function E now makes a call to function F, the call cannot be made with the current
status of the buffer, unless we free space equivalent to exactly one window. This
condition can easily be determined as current window pointer on incrementing will be
equal to saved window pointer. Now, we need to create space; how can we do it? The
simplest way will be to swap FA register to memory and use that space. Thus, an N
window register file can support N 1 level of function calls.
Thus, the register file, organized in the form as above, is a small fast register buffer
that holds most of the variables that are likely to be used heavily. From this point of
view the register file acts almost like a cache memory.
So let us find how the two approaches are different:
Characteristics of large-register-file and cache organizations
Large Register File
Hold local variables for almost all
functions. This saves time.
The variables are individual.
Global variables are assigned by the
compilers.
Save/restore needed only after the
maximum call nesting is over (that is n
1 open windows) .
It follows faster register addressing.
Cache
Recently used local variables are fetched
from main memory for any further use.
Dynamic use optimises memory.
The transfer from memory is block wise.
It stores recently used variables. It cannot
keep track of future use.
Save/restore based on cache replacement
algorithms.
It is memory addressing.
All but one point above basically show comparative equality. The basic difference is
due to addressing overhead of the two approaches.
The following figure shows the difference. Small register (R) address is added with
current window Pointer W#. This generates the address in register file, which is
decoded by decoder for register access. On the other hand Cache reference will be
generated from a long memory address, which first goes through comparison logic to
ascertain the presence of data, and if the data is present it goes through the select
circuit. Thus, for simple variables access register file is superior to cache memory.
92
Reduced Instruction
Set Computer
Architecture
a. RISC has a large register file so that more variables can be stored in register
or longer periods of time.
b. Only global variables are stored in registers.
c. Variables are passed as parameters in registers using temporary registers in a
window.
d. Cache is superior to a large register file as it stores most recently used local
scalars.
2. An overlapped register window RISC machine is having 32 registers. Suppose 8
of these registers are dedicated to global variables and the remaining 24 are split
for incoming parameters, local and scalar variables and outgoing parameters.
What are the ways of allocating these 24 registers in the three categories?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
The Central
Processing Unit
CISCs provide better support for high-level languages as they include high-level
language constructs such as CASE, CALL etc.
Yes CISC architecture tries to narrow the gap between assembly and High Level
Language (HLL); however, this support comes at a cost. In fact the support can be
measured as the inverse of the costs of using typical HLL constructs on a particular
machine. If the architect provides a feature that looks like the HLL construct but runs
slowly, or has many options, the compiler writer may omit the feature, or even, the
HLL programmer may avoid the construct, as it is slow and cumbersome. Thus, the
comment above does not hold.
It is more difficult to write a compiler for a RISC than a CISC.
The studies have shown that it is not so due to the following reasons:
If an instruction can be executed in more ways than one, then more cases must be
considered. For it the compiler writer needed to balance the speed of the compilers to
get good code. In CISCs compilers need to analyze the potential usage of all available
instruction, which is time consuming. Thus, it is recommended that there is at least
one good way to do something. In RISC, there are few choices; for example, if an
operand is in memory it must first be loaded into a register. Thus, RISC requires
simple case analysis, which means a simple compiler, although more machine
instructions will be generated in each case.
RISC is tailored for C language and will not work well with other high level
languages.
But the studies of other high level languages found that the most frequently executed
operations in other languages are also the same as simple HLL constructs found in C,
for which RISC has been optimized. Unless a HLL changes the paradigm of
programming we will get similar result.
The good performance is due to the overlapped register windows; the reduced
instruction set has nothing to do with it.
Certainly, a major portion of the speed is due to the overlapped register windows of
the RISC that provide support for function calls. However, please note this register
windows is possible due to reduction in control unit size from 50 to 6 per cent. In
addition, the control is simple in RISC than CISC, thus further helping the simple
instructions to execute faster.
94
Reduced Instruction
Set Computer
Architecture
Let us explain pipelining in RISC with an example program execution sample. Take
the following program (R indicates register).
(Load from memory location A)
(Load from memory location B)
(RC = RA + RB) )
(RD = RA - RB)
(RE = RC RD)
(Store in memory location C)
LOAD RA
LOAD RB
ADD RC ,RA , RB
SUB RD , RA , RB
MUL RE , RC , RD
STOR RE
Return to main.
Load RA M(A)
Load RB M(B)
Add RC RA +RB
Sub RD RA - RB
Mul RE RCRD
Stor RE M( C )
Time --------------
Return
Time = 17 units
1 2 3 4 5
E
F
6 7
10 11 12 13 14 15
D
F
16 17
Figure 7 shows a simple pipelining scheme, in which F and E phases of two different
instructions are performed simultaneously. This scheme speeds up the execution rate
of the sequential scheme.
Load RA M(A)
Load RB M(B)
Add RC RA + RB
Sub RD RA - RB
Mul RE RC RD
Stor RE M(C)
Return
Time
Total time = 11 units
E
F
D
E
F
D
E
F
E
F
E
F
8
E D
F
E
9 10 11
Please note that the pipeline above is not running at its full capacity. This is because
of the following problems:
We are assuming a single port memory thus only one memory access is allowed at
a time. Thus, Fetch and Data transfer operations cannot occur at the same time.
Thus, you may notice blank in the time slot 3, 5 etc.
The last instruction is an unconditional jump. Please note that after this instruction
the next instruction of the calling program will be executed. Although not visible
in this example a branch instruction interrupts the sequential flow of instruction
execution. Thus, causing inefficiencies in the pipelined execution.
This pipeline can simply be improved by allowing two memory accesses at a time.
Thus, the modified pipeline would be:
The pipeline may suffer because of data dependencies and branch instructions
penalties. A good pipeline has equal phases.
95
The Central
Processing Unit
Load RA M(A)
Load RB M(B)
E
F
Add RC RA + RB
Sub RD RA - RB
Mul RE = RC RD
Stor RE M( C )
Time ------
Return
Time = 8 units
E
F
Optimization of Pipelining
RISC machines can employ a very efficient pipeline scheme because of the simple
and regular instructions. Like all other instruction pipelines RISC pipeline suffer from
the problems of data dependencies and branching instructions. RISC optimizes this
problem by using a technique called delayed branching.
One of the common techniques used to avoid branch penalty is to pre-fetch the branch
destination also. RISC follows a branch optimization technique called delayed jump
as shown in the example given below:
Load RA M(A)
Load RB M(B)
Add RC RA + RB
Sub RD RA - RB
If RD < 0 Return
E
F
Stor RC M( C )
E
F
Return
E
F
E
F
E
F
E
F
(b) The No operation instruction causes decision of the If instruction known, thus
correct instruction can be fetched.
96
Load RA M(A)
Load RB M(B)
Reduced Instruction
Set Computer
Architecture
E
F
Sub RD RA - RB
If RD < 0 Return
Add RC RA + RB
Stor RC M( C )
Return
E
F
E
F
E
F
D
E
(c) The branch is calculated before, thus the pipeline need not be emptied. This is
delayed branch.
Figure 9: Delayed Branch
Finally, let us summarize the basic differences between CISC and RISC architecture.
The following table lists these differences:
CISC
RISC
The Central
Processing Unit
3. What are the problems of RISC architecture? How are these problems
compensated such that there is no reduction in performance?
..............................................................................................................................
..............................................................................................................................
..............................................................................................................................
5.7 SUMMARY
RISC represents new styles of computers that take less time to build yet provide a
higher performance. While traditional machines support HLLs with instruction that
look like HLL constructs, this machine supports the use of HLLs with instructions that
HLL compilers can use efficiently. The loss of complexity has not reduced RISCs
functionality; the chosen subset, especially when combined with the register window
scheme, emulates more complex machines. It also appears that we can build such a
single chip computer much sooner and with much less effort than traditional
architectures.
Thus, we see that because of all the features discussed above, the RISC architecture
should prove to be far superior to even the most complex CISC architecture.
In this unit we have also covered the details of the pipelined features of the RISC
architecture, which further strengthen our arguments for the support of this
architecture.
2.
a) False
b) False
c) False
True
False
True
False
No. of Local
Registers
Incoming
Parameter
Registers
1
Outgoing
Parameter
Registers
1
22
20
18
4
5
6
4
5
6
16
14
12
10
9
10
11
12
9
10
11
12
6
4
2
0
Reduced Instruction
Set Computer
Architecture
It has a single port memory reducing the access to one device at a time
Branch instruction
The data dependencies between the instructions
99
UNIT 1
MICROPROCESSOR
ARCHITECTURE
Structure
1.0
1.1
1.2
1.3
1.4
1.5
1.7
1.8
11
13
Addressing Modes
1.6.1
1.6.2
1.6.3
1.6.4
5
5
5
7
1.6
Page No.
Introduction
Objectives
Microcomputer Architecture
Structure of 8086 CPU
1.3.1
1.3.2
Microprocessor
Architecture
29
Summary
Solutions/Answers
33
33
1.0 INTRODUCTION
In the previous blocks of this course, we have discussed concepts relating to CPU
organization, register set, instruction set, addressing modes with a few examples. Let
us look at one microprocessor architecture in regard of all the above concepts. We
have selected one of the simplest processors 8086, for this purpose. Although the
processor technology is old, all the concepts are valid for higher end Intel processor.
Therefore, in this unit, we will discuss the 8086 microprocessor in some detail.
We have started the discussion of the basic microcomputer architecture. This
discussion is followed by the details on the components of CPU of the 8086
microprocessor. Then we have discussed the register organization for this processor.
We have also discussed the instruction set and addressing modes for this processor.
Thus, this unit presents exhaustive details of the 8086 microprocessor. These details
will then be used in Assembly Programming.
1.1 OBJECTIVES
After going through this unit, you should be able to:
Assembly Language
Programming
Bus Sizes
1. The Address bus: 8085 microprocessor has 16 bit lines. Thus, it can access up to
216 = 64K Bytes. The address bus of 8086 microprocessor has a 20 bits address
bus. Thus it can access upto 220 = 1M Byte size of RAM directly.
2. Data bus is the number of bits that can be transferred simultaneously. It is 16 bits
in 8086.
Microprocessors
The microprocessor is a complete CPU on a single chip. The main advantages of the
microprocessor are:
More throughput
The assembly language for more advanced chips subsumes the simplest 8086/ 8088
assembly language. Therefore, we will confine our discussions to Intel 8086/8088
assembly language. You must refer to the further readings for more details on
assembly language of Pentium, G4 and other processors.
Microprocessor
Architecture
Assembly Language
Programming
The word independent implies that these two units can function parallel to each other.
In other words they may be considered as two stages of the instruction pipeline.
Reading or writing data memory or I/O port from memory or Input/ Output.
The instruction/ data is then passed to the execution unit. This BIU consists of:
(a)
(b)
The addresses that need to be used in programs are relocatable as they are
the offsets. Thus, the segmentation supports relocatability.
Although the size of address, is 20 bits, yet only the maximum segment
size, that is 16 bits, needs to be kept in instruction, thus, reducing
instruction length.
Microprocessor
Architecture
Although the size of each segment can be 64K, as they are overlapping segments we
can create variable size of segments, with maximum as 64K. Each segment has a
specific function. 8086 supports the following segments:
As per model of assembly program, it can have more than one of any type of
segments. However, at a time only four segments one of each type, can be active.
The 8086 supports 20 address lines, thus supports 20 bit addresses. However, all the
registers including segment registers are of only 16 bits. So how may this mapping of
20 bits to 16 bits be performed?
Let us take a simple mapping procedure:
The top four hex digits of initial physical address constitute segment address.
You can add offset of 16 bits (4 Hex digits) from 0000h to FFFFh to it . Thus, a
typical segment which starts at a physical address 10000h will range from 10000h to
1FFFFh. The segment register for this segment will contain 1000H and offset will
9
Assembly Language
Programming
range from 0000h to FFFFh. But, how will the segment address and offset be added to
calculate physical address? Let us explain using the following examples:
Example 1 (In the Figure above)
The value of the stack segment register (SS) = 6000h
The value of the stack pointer (SP) which is Offset = 0010h
Thus, Physical address of the top of the stack is:
SS
SP
6 0 0
+ 0 0
Physical Address
0
1
0
0
0 1
Implied zero
0
0
0
0
0
2
0
0
Implied Zero
4
+
4
1
8
2
A 0
3 4
ImpliedZero
IP
Physical Address
4 5 A 0 4
Instruction Pointer
The instruction pointer points to the offset of the current instruction in the code
segment. It is used for calculating the address of instruction as shown above.
10
1.3.2
Microprocessor
Architecture
Execution unit performs all the ALU operations. The execution unit of 8086 is of 16
bits. It also contains the control unit, which instructs bus interface unit about which
memory location to access, and what to do with the data. Control unit also performs
decoding and execution of the instructions. The EU consists of the following:
(a)
Registers
All CPUs have a defined number of operational registers. 8086 has several general
purpose and special purpose registers. We will discuss these registers in the following
sections.
Assembly Language
Programming
Special Registers
A Last in First Out (LIFO) stack is a data structure used for parameter passing, return
address storage etc. 8086 stack is 64K bytes. Base of the stack is pointed to by the
stack segment (SS) register while the offset or top of the stack is stored in Stack
Pointer (SP) register. Please note that although the memory in 8086 has byte
addresses, stack is a word stack, which is any push operation will occupy two bytes.
Flags Register
A flag represents a condition code that is 0 or 1. Thus, it can be represented using a
flip- flop. 8086 employs a 16-bit flag register containing nine flags. The following
table shows the flags of 8086.
Flags
Meaning
Comments
Conditional Flags represent result of last arithmetic or logical instruction executed.
Conditional flags are set by some condition generated as a result of the last
mathematical or logical instruction executed. The conditional flags are:
CF
Carry Flag
1 if there is a carry bit
PF
Parity Flag
1 on even parity 0 on odd parity
AF
Auxiliary Flag
Set (1) if auxiliary carry for BCD occurs
ZF
Zero Flag
Set if result is equal to zero
SF
Sign Flag
Indicates the sign of the result (1 for minus, 0
for plus)
OF
Overflow Flag
set whenever there is an overflow of the result
Control flags, which are set or reset deliberately to control the operations of the
execution unit. The control flags of 8086 are as follows:
TF
Single step trap
Used for single stepping through the program
flag
IF
DF
Interrupt Enable
flag
String direction
flag
What is the purpose of the queue in the bus interface unit of 8086
microprocessors?
2.
Find out the physical addresses for the following segment register: offset
(a) SS:SP = 0100h:0020h
(b) DS:BX = 0200h:0100h
(c) CS:IP = 4200h:0123h
12
3.
(a)
(b)
(c)
The Source Index (SI) and Destination Index(DI) registers in 8086 can also be
used as general registers.
(d)
Microprocessor
Architecture
Operand(s); Comment
For example, to add the content of AL and BL registers to get the result in AL, we use
the following assembly instruction.
NEXT: ADD
AL,BL
; AL AL + BL
Please note that NEXT is the label field. It is giving an identity to the statement. It is
an optional field, and is used when an instruction is to be executed again through a
LOOP or GO TO. ADD is symbolic op-code, for addition operation. AL and BL are
the two operands of the instructions. Please note that the number of operands is
dependent upon the instructions. 8086 instructions can have zero, one or two
operands. An operand in 8086 can be:
1.
2.
3.
4.
A register
A memory location
A constant called literal
A label.
MNEMONIC
MOV des, src
DESCRIPTION
des src; Both the operands should
be byte or word. src operand can be
register, memory location or an
immediate operand des can be
register or memory operand.
Restriction: Both source and
destination cannot be memory
operands at the same time.
EXAMPLE
MOV CX,037AH
; CX register is initialized
; with immediate value
; 037AH.
MOV AX,BX
; AXBX
13
Assembly Language
Programming
PUSH operand
POP des
XLAT
IN accumulator,
port address
OUT port
address,
Accumulator
LEA register,
source
14
PUSH BX
; decrement stack pointer
; by; two, and copy BX to
; stack.
; decrement stack pointer
; by two, and copy
; BX to stack
POP AX
; Copy content for top
; of stack to AX.
XCHG DX,AX
; Exchange word in DX
; with word in AX
Example is available in
Unit 3.
IN AL,028h
; read a byte from port
; 028h to AL register
LDS des-reg
LES des-reg
LAHF
SAHF
PUSHF
POPF
Microprocessor
Architecture
INC des
DESCRIPTION
Adds byte to byte, or word to word.
The source may be an immediate
operand, a register or a memory
location. The rules for operands are
the same as that of MOV instruction.
To add a byte to a word, first copy the
byte to a word location, then fill up
the upper byte of the word with zeros.
This instruction effects the following
flags: AF, CF, OF, PF, SF, ZF.
Add byte + byte + carry flag, or word
+ word + carry flag. It adds the two
operands with the carry flag. Rest all
the details are the same as that of
ADD instruction.
It increments specified byte or word
operand by one. The operand can be a
register or a memory location. It can
effect AF, SF, ZF, PF, and OF flags.
It does not affect the carry flag, that
is, if you increment a byte operand
EXAMPLE
ADD AL,74H
; Add the number 74H to
; AL register, and store the
; result back in AL
ADD DX,BX
; Add the contents of DX to
; BX and store the result in ;
DX, BX remains
; unaffected.
INC BX
; Add 1 to the contents of
; BX register
INC BL
; Add 1 to the contents of
; BL register
15
Assembly Language
Programming
AAA
DAA
DEC src
NEG src
16
ADD AL,BL
; AL=00110101, ASCII 05
; BL=00111001, ASCII 09
; after addition
; AL = 01101110, that is,
; 6EH- incorrect
; temporary result
AAA
; AL = 00000100.
; Unpacked BCD for 04
; carry = 1, indicates
; the result is 14
; AL = 0101 1001 (59
; BCD)
; BL = 0011 0101 (35
; BCD)
ADD AL, BL
; AL = 10001101 or
; 8Eh (incorrect BCD)
DAA
; AL = 1001 0100
; 94 BCD : Correct.
SUB AX, 3427h
; Subtract 3427h from AX
; register, and store the
; result back in AX
SBB AL,CH
; subtract the contents
; of CH and CF from AL
; and store the result
; back in AL.
DEC BP
; Decrement the contents
; of BP
; register by one.
NEG AL
; Replace the number in
; AL with its 2s
; complement
CMP des,src
AAS
DAS
MUL src
AAM
Microprocessor
Architecture
CMP CX,BX
; Compare the CX register
; with the BX register
; In the example above, the ;
CF, ZF, and the SF flags
; will be set as follows.
; CX=BX 0 1 0; result of
; subtraction is zero
; CX>BX 0 0 0; no borrow ;
required therefore, CF=0
; CX<BX 1 0 1
; subtraction require
; borrow, so CF=1
; AL = 0011 0101 ASCII 5
; BL = 0011 1001 ASCII 9
SUB AL,BL
; (5-9) result:
; AL= 1111 1100 = - 4 in
; 2s complement, CF = 1
AAS ;result:
; AL = 0000 0100 =
; BCD 04,
; CF = 1 borrow needed.
; AL=86 BCD
; BH=57 BCD
SUB AL,BH
; AL=2Fh, CF =0
DAS
; Results in AL = 29 BCD
17
Assembly Language
Programming
DIV src
IDIV
AAD
18
MUL BH
; AX=AL * BH=002Dh
AAM
; AX=00000100 00000101 ;
BCD 45 : Correct result
; AX = 37D7h = 14295
; decimal
; BH = 97h = 151 decimal
DIV BH
; AX / BH quotient
; AL = 5Eh = 94
; decimal RernainderAH = ;
65h = 101
; decimal
; AL = 11001010 = -26h =
; - 38 decimal
; CH = 00000011 = + 3h =
; 3 decimal
; According to the operand
; rules to divide by a byte
; the number should be
; present in a word register, ;
i.e. AX. So, first convert
; the operand in AL to word
; operand. This can be done ;
by sign extending the
; AL register,
; this makes AX
; 11111111 11001010.
; (Sign extension can also
; be done with the help of
; an instruction, discussed
; later)
IDIV CH
; AX/CH
; AL = 11110100 = - 0CH ;
= -12 Decimal
; AH = 11111110 = -02H = ; 02 Decimal
; Although the quotient is
; actually closer to -13
; (-12.66667) than -12, but
; 8086 truncates the result
; to give -12.
; AX= 0607 unpacked
; BCD for 6
CWD
1.5.3
; and 7 CH = 09h
AAD
; adjust to binary before
; division AX= 0043 =
; 043h = 67 Decimal
DIV CH
; Divide AX by unpacked
; BCD in CH
; AL = 07 unpacked BCD
; AH = 04 unpacked BCD
; PF = SF = ZF = 0
; AL = 10011011 = -155
; decimal AH = 00000000
CBW ;convert signed
; byte in AL to signed
; word in AX = 11111111
; 10011011 = -155 decimal
Microprocessor
Architecture
These instructions are used at the bit level. These instructions can be used for testing a
zero bit, set or reset a bit and to shift bits across registers. Let us look into some such
basic instructions.
MNEMONIC
NOT des
OR des, src
XOR des,src
DESCRIPTION
Complements each bit to produce
1s complement of the specified
byte or word operand. The
operand can be a register or a
memory operand.
Bitwise AND of two byte or word
operands. The result is des des
AND src. The source can be an
immediate operand a register, or a
memory operand. The destination
can be a register or a memory
operand. Both operands cannot be
memory operands at the same
time. The CF and the OF flags are
both zero after the AND
operation. PF, SF and ZF area
updated, Afis left undefined.
OR each corresponding bits of the
byte or word operands. The other
operands rules are same as AND.
des des OR src
XOR each corresponding bit in a
byte or word operands rules are
two same as AND and OR.
des Des + src
EXAMPLE
; BX = 0011 1010 0001 0000
NOT BX
; BX = 1100 0101 1110 1111
; BH = 0011 1010 before
AND BH, 0Fh
; BH = 0000 1010
; after the AND operation
19
Assembly Language
Programming
SHL/SAL des,
count
MSB
; AL = 0101 0001
SHR BX,01
; if CF = 0
; BX = 1000 1001
; result: CF = 1
; BX = 0100 0100
MOV CL, 02
SHR BX, CL
; with same BX, the
; result would be
; CF = 0
; BX = 0010 0100
; AL=0001 1101 = +29
; decimal, CF = 0
SAR AL, 01
; AL = 0000 1110 = +14
; decimal, CF = 1
; OF = PF = SF = ZF = 0
; BH = 1111 0011 = -13
; decimal
SAR BH,01
; BH = 1111 1001 = -7
; decimal, CF =1
; OF = ZF = 0 ; PF = SF = 1
LSB
; CF = 0,
Microprocessor
Architecture
LSB
MSB
MSB
LSB
MSB
LSB
Point out the error/ errors in the following 8086 assembly instruction (if any)?
a.
b.
c.
d.
e.
2.
PUSHF AX
MOV AX, BX
XCHG MEM_WORD1, MEM_WORD2
AAA BL, CL
IDIV AX, CH
(a) LEA and MOV instruction serve the same purpose. The only difference
between the two is the type of operands they take.
(b) NEG instruction produces 1's complement of a number.
(c) MUL instruction assumes one of the operands to be present in the AL or
AX register.
(d) TEST instruction performs an OR operation, but does not change the value
of operands.
(e) Suppose AL contains 0110 0101 and CF is set, then instructions ROL AL
and RCL AL will produce the same results.
21
Assembly Language
Programming
DESCRIPTION
This function results in a
procedure/ function call. The
return address is saved on the
stack. There are two basic types
of CALLS. NEAR or IntraSegment calls: if the call is made
to a procedure in the same
segment as the calling program.
FAR or Inter segment call: if the
call is made to a procedure in the
segment, other than the calling
program. The saved return
address for NEAR procedure
call is just the IP. For FAR
Procedure call IP and CS are
saved as return address.
EXAMPLE
CALL procl
CALL proc2
The new instruction
address is determined by
name declaration proc1 is
a near procedure, thus,
only IP is involved. proc2
involves new CS: IP pair.
On call to proc1
stack IP
IP address offset of
proc1
on call to proc2
Stack [top] CS
Stack [top] IP
CS code segment of
proc2
IP address offset of
proc2
Here we assume that procl
is defined within the same
segment as the calling
procedure, while proc2 is
defined in another
segment. As far as the
calling program is
concerned, both the
procedures have been
called in the same manner.
But while declaring these
procedures, we declare
procl as NEAR procedure
and proc2 as FAR
procedure, as follows:
procl PROC NEAR
proc2 PROC FAR
LEA BX, procl
; initialize BX with the
; offset of the procedure
; procl
CALL BX
; CALL procl indirectly
; using BX register
RET 6
Conditional Jump
JA/JNBE
Unconditionally go to specified
address and get next instruction
from the label specified. The
label assigns the instruction to
which jump has to take place
within the program, or it could
be a register that has been
initialised with the offset value.
JMP can be a NEAR JMP or a
FAR jump, just like CALL.
All the conditional jumps follow
some conditional statement, or
any instruction that affects the
flag.
Microprocessor
Architecture
MOV CX, 05
MOV BX, 04
CMP CX, BX
; this instruction will set
; various flags like the ZF,
; and the CF.
JE LABEL1
; conditional jump can
; now be applied, which
; checks for the ZF, and if
; it is set implying CX =
; BX, it makes
; a jump to LABEL1,
; otherwise the control
; simply falls
; through to next
; instruction
; in the above example as
; CX is not equal to BX
; the jump will not take
; place and the next
; instruction to conditional
; jump instruction will be
; executed. However, if
; JNE (Jump if not equal
; to) or JA (Jump if
above), ; or JAE (Jump
above or
; equal) jump instructions
; if applied instead of JE,
; will cause the conditional
; jump to occur.
All the conditional jump instructions which are given below
are self explanatory.
Jump if above / Jump if not
below nor equal
23
Assembly Language
Programming
JAE/JNB
JB/JNAE
JBE/JNA
JC
JE/JZ
JNC
JNE/JNZ
JO
JNO
JP/JPE
JNP/JPO
JG/JNLE
JA/JNL
JL/JNGE
JLE/JNG
JS
JNS
LOOP label
LOOPE/ LOOPZ
label
24
decremented to zero,
which means all the
elements of the array are
equal to 0FFh, or an
element in the array is
found which is not equal
to 0FFh. In this case, the
CX register may still be
greater than zero, when the
control comes out. This
can be coded as follows:
(Please note here that you
might not understand
everything at this place,
that is because you are still
not familiar with the
various addressing modes.
Just concentrate on the
LOOPE instruction):
Microprocessor
Architecture
JCXZ label
25
Assembly Language
Programming
DESCRIPTION
This is an instruction prefix. It
causes repetition of the following
instruction till CX becomes zero.
REP. It is not an instruction, but it
is an instruction prefix that causes
the CX register to be decremented.
This prefix causes the string
instruction to be repeated, until CX
becomes equal to zero.
REPE/REPZ
REPNE/REPNZ
MOVS/MOVSB/
MOVSW
26
EXAMPLES
REP MOVSB STR1, STR2
The above example copies
byte by byte contents. The
CX register is initialized to
contain the length of source
string REP repeats the
operation MOVSB that
copies the source string byte
to destination byte. This
operation is repeated until
the CX register becomes
equal to zero.
CMPS/CMPSB/
CMPSW
SCAS/SCASB/
SCASW
Microprocessor
Architecture
27
Assembly Language
Programming
LODS/LODSB/
LODSW
STOS/STOSB/
STOSW
STD
CLD
28
DESCRIPTION
It sets carry flag to 1.
It clears the carry flag to 0.
It complements the state of the
carry flag from 0 to 1 or 1 to 0 as
the case may be.
It sets the direction flag to 1. The
string instruction moves either
forward (increment SI, DI) or
backward (decrement SI, DI)
based on this flag value. STD
instruction does not affect any
other flag. The set direction flag
causes strings to move from right
to left.
This is opposite to STD, the string
EXAMPLE
CMC; Invert the carry flag
CLD
Microprocessor
Architecture
There are many process control instructions other than these; you may please refer to
further reading for such instructions. These instructions include instructions for setting
and closing interrupt flag, halting the computer, LOCK (locking the bus), NOP etc.
Based
Indexed
Based Indexed
Description
Effective address is the
displacement of memory
variable.
Effective address is the
contents of a register.
Example
[BX]
[SI]
[DI]
[BP]
LIST[BX]
(OFFSET LIST + BX)
[BP + 1]
LIST[SI]
[LIST +DI]
[DI + 2]
[BX + SI]
29
Assembly Language
Programming
[BX][DI]
[BP + DI]
[BX + SI + 2]
Description
In general, the register
addressing mode is the most
efficient because registers are
within the CPU and do not
require memory access.
Example
MOV AL,CH
MOV AX,CX
1.6.3
Description
Please note in the last
examples the expression (2
+ 3)/5, is evaluated at
assembly time.
Example
MOV AL,10
MOV AL,'A'
MOV AX,'AB'
MOV AX, 64000
MOV AL, (2 + 3)/5
A direct operand refers to the contents of memory at an address implied by the name
of the variable.
Mode
DIRECT
30
Description
The direct operands are also
called as relocatable operands
as they represent the offset of
a label from the beginning of a
segment. On reloading a
program even in a different
segment will not cause change
in the offset that is why we
call them relocatable. Please
note that a variable is
considered in Data segment
(DS) and code label in code
segment (SS) by default. Thus,
in the example, COUNT, by
Example
MOV COUNT, CL
; move CL to COUNT (a
; byte variable)
MOV AL,COUNT
; move COUNT to AL
JMP LABEL1
; jump to LABEL1
MOV AX,DS:5
; segment register and
; offset
MOV BX,CSEG:2Ch
; segment name and offset
MOV AX,ES:COUNT
; segment register and
; variable.
Microprocessor
Architecture
Register indirect
Based indirect
Indexed indirect
Based indexed
Based indexed with displacement.
Mode
Register
indirect
Description
Indirect operands are
particularly powerful when
processing list of arrays,
because a base or an index
register may be modified at
runtime.
Example
MOV BX, OFFSET ARRAY
; point to start of array
MOV AL,[BX]
; get first element
INC BX
; point to next
MOV DL,[BX]
; get second element
The brackets around BX signify
that we are referring to the contents
of memory location, using the
address stored in BX.
In the following example, three
bytes in an array are added together:
MOV SI,OFFSET ARRAY
; address of first byte
MOV AL,[SI]
; move the first byte to AL
INC SI
; point to next byte
ADD AL,[SI]
; add second byte
INC SI
; point to the third byte
ADD AL,[SI]
; add the third byte
31
Assembly Language
Programming
Based Indirect
and Indexed
Indirect
Mode
Based Indexed
Description
In this type of addressing the
operands effective address is
formed by combining a base
register with an index register.
Example
MOV AL,[BP] [SI]
MOV DX,[BX + SI]
ADD CX,[DI] [BX]
; Two base registers or two
; index registers cannot be
; combined, so the
; following would be
; incorrect:
MOV DL,[BP + BX]
; error : two base registers
MOV AX,[SI + DI]
; error : two index registers
MOV DX,ARRAY[BX][SI]
Microprocessor
Architecture
1.7
SUMMARY
In this unit, we have studied one of the most popular series of microprocessors, viz.,
Intel 8086. It serves as a base to all its successors, 8088, 80186, 80286, 80486, and
Pentium. The successors of 8086 can be directly run on any successors. Therefore,
though, 8086 has become obsolete from the market point of view, it is still needed to
understand advanced microprocessors.
To summarize the features of 8086, we can say 8086 has:
1.8 SOLUTIONS/ANSWERS
Check Your Progress 1
1.
2.
33
Assembly Language
Programming
= 42000h + 0123h
= 42123h
3.
(a)
(b)
(c)
(d)
2.
(a)
(b)
(c)
(d)
(e)
False
False
True
False
False
34
False
True
True
True
True
True
False
Introduction to
Assembly Language
Unit Name
Programming
Introduction
Objectives
The Need and Use of the Assembly Language
Assembly Program Execution
An Assembly Program and its Components
2.4.1
2.4.2
2.5
2.7
2.8
2.9
2.10
45
Interrupts
DOS Function Calls (Using INT 21H)
35
35
35
36
41
2.6
Page No.
51
COM Programs
EXE Programs
53
55
56
56
2.0 INTRODUCTION
In the previous unit, we have discussed the 8086 microprocessor. We have discussed
the register set, instruction set and addressing modes for this microprocessor. In this
and two later units we will discuss the assembly language for 8086/8088
microprocessor. Unit 1 is the basic building block, which will help in better
understanding of the assembly language. In this unit, we will discuss the importance
of assembly language, basic components of an assembly program followed by
discussions on the program developmental tools available. We will then discuss what
are COM programs and EXE programs. Finally we will present a complete example.
For all our discussions, we have used Microsoft Assembler (MASM). However, for
different assemblers the assembly language directives may change. Therefore, before
running an assembly program you must consult the reference manuals of the
assembler you are using.
2.1 OBJECTIVES
After going through this unit you should be able to:
The Central
Processing
Unit
Assembly
Language
Programming
It greatly depends on machine and is difficult for most people to write in 0-1
forms.
DEBUGGING is difficult.
Deciphering the machine code is very difficult. Thus program logic will be
difficult to understand.
Assembly Language provides more control over handling particular hardware and
software, as it allows you to study the instructions set, addressing modes,
interrupts etc.
Assembly Programming generates smaller, more compact executable modules: as
the programs are closer to machine, you may be able to write highly optimised
programs. This results in faster execution of programs.
Assembly language programs are at least 30% denser than the same programs written
in high-level language. The reason for this is that as of today the compilers produce a
long list of code for every instruction as compared to assembly language, which
produces single line of code for a single instruction. This will be true especially in
case of string related programs.
On the other hand assembly language is machine dependent. Each microprocessor has
its own set of instructions. Thus, assembly programs are not portable.
Assembly language has very few restrictions or rules; nearly everything is left to the
discretion of the programmer. This gives lots of freedom to programmers in
construction of their system.
Uses of Assembly Language
Assembly language is used primarily for writing short, specific, efficient interfacing
modules/ subroutines. The basic idea of using assembly is to support the HLL with
some highly efficient but nonportable routines. It will be worth mentioning here that
UNIX mostly is written in C but has about 5-10% machine dependent assembly code.
Similarly in telecommunication application assembly routine exists for enhancing
efficiency.
1)
Manual assembly
2)
By using an assembler.
Introduction to
Assembly Language
Programming
Manual Assembly
It was an old method that required the programmer to translate each opcode into its
numerical machine language representation by looking up a table of the
microprocessor instructions set, which contains both assembly and machine language
instructions. Manual assembly is acceptable for short programs but becomes very
inconvenient for large programs. The Intel SDK-85 and most of the earlier university
kits were programmed using manual assembly.
Using an Assembler
The symbolic instructions that you code in assembly language is known as - Source
program.
An assembler program translates the source program into machine code, which is
known as object program.
Mnemonic
Program
Assembler
Source Code
Machine
Instructions
Object Code
Step 3: The last step is to load the program for execution. Because the loader knows
where the program is going to load in memory, it is now able to resolve any
remaining address still left incomplete in the header. The loader drops the
header and creates a program segment prefix (PSP) immediately before the
program is loaded in memory.
37
The Central
Processing
Unit
Assembly
Language
Programming
Introduction to
Assembly Language
Programming
The Central
Processing
Unit
Assembly
Language
Programming
Errors
Two possible kinds of errors can occur in assembly programs:
a.
Programming errors: They are the familiar errors you can encounter in the course
of executing a program written in any language.
b. System errors: These are unique to assembly language that permit low-level
operations. A system error is one that corrupts or destroys the system under
which the program is running - In assembly language there is no supervising
40
2.4
Introduction to
Assembly Language
Programming
Sample Program
In this program we just display:
Line Offset
Numbers
0001
0002 0000
0003
0004
0005
0006
0007
0008
0009 Offset
0010 0000
0011 0003
0012 0005
0013 0008
0014 000A
0015 000C
0016 000F
0017
0018
Source Code
DATA SEGMENT
MESSAGE DB HAVE A NICE DAY!$
DATA ENDS
STACK SEGMENT
STACK 0400H
STACK ENDS
CODE SEGMENT
ASSUME CS: CODE, DS: DATA SS: STACK
Machine Code
B8XXXX
MOV AX, DATA
8ED8
MOV DS, AX
BAXXXX
MOV DX, OFFSET MESSAGE
B409
MOV AH, 09H
CD21
INT 21H
B8004C
MOV AX, 4C00H
CD21
INT 21H
CODE ENDS
END
The assembler assigns line numbers to the statements in the source file
sequentially. If the assembler issues an error message; the message will contain a
reference to one of these line numbers.
The second column from the left contains offsets. Each offset indicates the
address of an instruction or a datum as an offset from the base of its logical
segment, e.g., the statement at line 0010 produces machine language at offset
0000H of the CODE SEGMENT and the statement at line number 0002 produces
machine language at offset 0000H of the DATA SEGMENT.
The third column in the annotation displays the machine language produce by
code instruction in the program.
Segment numbers: There is a good reason for not leaving the determination of
segment numbers up to the assembler. It allows programs written in 8086 assembly
language to be almost entirely relocatable. They can be loaded practically anywhere
in memory and run just as well. Program1 has to store the message Have a nice
day$ somewhere in memory. It is located in the DATA SEGMENT. Since the
41
The Central
Processing
Unit
Assembly
Language
Programming
characters are stored in ASCII, therefore it will occupy 15 bytes (please note each
blank is also a character) in the DATA SEGMENT.
Missing offset: The xxxx in the machine language for the instruction at line 0010 is
there because the assembler does not know the DATA segment location that will be
determined at loading time. The loader must supply that value.
Name refers to the address of a data item such as counter, arr etc.
Label refers to the address of our instruction, process or segment. For example
MAIN is the label for a process as:
MAIN PROC FAR
A20: BL,45 ; defines a label A20.
Identifier can use alphabet, digit or special character but it always starts with an
alphabet.
Parameters: A parameter extends and refines the meaning that the assembler
attributes to the keyword in a statement. The number of parameters is dependent on
the Statement.
Comments: A comment is a string of a text that serves only as internal document
action for a program. A semicolon identifies all subsequent text in a statement as a
comment.
2.4.2 Directives
Assembly languages support a number of statements. This enables you to control the
way in which a source program assembles and list. These statements, called
directives, act only when the assembly is in progress and generate no machineexecutable code. Let us discuss some common directives.
42
1.
List: A list directive causes the assembler to produce an annotated listing on the
printer, the video screen, a disk drive or some combination of the three. An
annotated listing shows the text of the assembly language programs, numbers of
each statement in the program and the offset associated with each instruction and
each datum. The advantage of list directive is that it produces much more
informative output.
2.
HEX: The HEX directive facilitates the coding of hexadecimal values in the
body of the program. That statement directs the assembler to treat tokens in the
source file that begins with a dollar sign as numeric constants in hexadecimal
notation.
3.
Introduction to
Assembly Language
Programming
PROC Directive: The code segment contains the executable code for a
program, which consists of one or more procedures defined initially with the
PROC directive and ended with the ENDP directive.
Procedure-name
Procedure-name
PROC FAR
ENDP FAR
; Beginning of Procedure
; End Procedure
4.
5.
ASSUME Directive: An .EXE program uses the SS register to address the base
of stack, DS to address the base of data segment, CS to address base of the code
segment and ES register to address the base of Extra segment. This directive tells
the assembler to correlate segment register with a segment name. For example,
ASSUME SS: stack_seg_name, DS: data_seg_name, CS: code_seg_name.
6.
The Central
Processing
Unit
Assembly
Language
Programming
The loader sets the IP (Instruction Pointer) register to that value. This sets CS:IP
to the segment relative address of the first instruction in the program.
STACK SEGMENT
8086 Microprocessor supports the Word stack. The stack segment parameters
tell the assembler to alert the linker that this segment statement defines the
program stack area.
A program must have a stack area in that the computer is continuously carrying
on several background operations that are completely transparent, even to an
assembly language programmer, for example, a real time clock. Every 55
milliseconds the real time clock interrupts. Every 55 ms the CPU is interrupted.
The CPU records the state of its registers and then goes about updating the
system clock. When it finishes servicing the system clock, it has to restore the
registers and go back to doing whatever it was doing when the interruption
occurred. All such information gets recorded in the stack. If your program has
no stack and if the real time clock were to pulse while the CPU is running your
program, there would be no way for the CPU to find the way back to your
program when it was through updating the clock. 0400H byte is the default size
of allocation of stack. Please note if you have not specified the stack segment it
is automatically created.
DATA SEGMENT
It contains the data allocation statements for a program. This segment is very
useful as it shows the data organization.
Defining Types of Data
The following format is used for defining data definition:
Format for data definition:
{Name} <Directive> <expression>
Name a program references the data item through the name although it is
optional.
Directive: Specifying the data type of assembly.
Expression: Represent a value or evaluated to value.
The list of directives are given below:
Directive
Description
DB
DW
DD
DQ
DT
Number of Bytes
Define byte
Define word
Define double word
Define Quad word
Define 10 bytes
1
2
4
8
10
DB
10 DUP (0)
In the above statement ARRAY is the name of the data item, which is of byte
type (DB). This array contains 10 duplicate zero values; that is 10 zero values.
EQU
44
EQU
20
Introduction to
Assembly Language
Programming
DB
0111001B
VALI
DW
7341Q
Decimal
DB
49
HEX
DW
03B2AH
ASCII
DB
EXAMPLE
2.
3.
2.5.1 Interrupts
An interrupt causes interruption of an ongoing program. Some of the common
interrupts are: keyboard, printer, monitor, an error condition, trap etc.
8086 recognizes two kinds of interrupts: Hardware interrupts and Software
interrupts.
45
The Central
Processing
Unit
Assembly
Language
Programming
number
In 8086, this interrupt instruction is processing using the interrupt vector table
(IVT). The IVT is located in the first 1K bytes of memory, and has a total of 256
entities, each of 4 bytes. An entry in the interrupt vector table is identified by the
number given in the interrupt instruction. The entry stores the address of the operating
system subroutine that is used to process the interrupt. This address may be different
for different machines. Figure 1 shows the processing of an interrupt.
Introduction to
Assembly Language
Programming
The advantage of this type of call is that it appears static to a programmer but flexible
to a system design engineer. For example, INT 00H is a special system level vector
that points to the recovery from division by zero subroutine. If new designer come
and want to move interrupt location in memory, it adjusts the entry in the IVT vector
of interrupt 00H to a new location. Thus from the system programmer point of view,
it is relatively easy to change the vectors under program control.
One of the most commonly used Interrupts for Input /Output is called DOS function
call. Let us discuss more about it in the next subsection:
Purpose
For reading a single
character from keyboard
and echo it on monitor.
The input value is put in
AL register.
AH = 02H
AH = 08H
AH = 09H
Example
To get one character input in a variable
in data segment you may include the
following in the code segment:
MOV AH,01
INT 21H
MOV X, AL
(Please note that interrupt call will
return value in AL which is being
transferred to variable of data segment
X. X must be byte type).
To print a character let say ? on the
screen we may have to use following
set of commands:
MOV AH, 02H;
MOV DL, ?
INT 21H
Same example as 01 can be used only
difference in this case would be that the
input character wouldnt get displayed
MOV AH, 08H
INT 21H
MOV X, AL
To print a string hello world followed
by a carriage return (control character)
we may have to use the following
assembly program segment.
47
The Central
Processing
Unit
Assembly
Language
Programming
Example of
AH = 09H
CR EQU ODH
; ASCII code of carriage return.
DATA SEGMENT
STRING DB HELLO WORLD, CR, $
DATA ENDS
CODE SEGMENT
:
MOV AX, DATA
MOV DS, AX
MOV AH, 09H
MOV DX, OFFSET STRING
; Store the offset of string in DX register.
INT 21H
AH = 0AH
AH = 4CH
; Read a single digit in BL register with echo. No error check in the Program
MOV AH, 01H
INT
21H
; Assuming that the value entered is digit, then its ASCII will be stored in AL.
; Suppose the key pressed is 1 then ASCII 31 is stored in the AL. To get the
; digit 1 in AL subtract the ASCII value 0 from the AL register.
; Here it store
0 as ASCII 30,
; 1 as 31, 2 as 32.9 as 39
; to store 1 in memory subtract 30 to get 31 30 = 1
MOV BL, AL
SUB BL, 0 ; 0 is digit 0 ASCII
; OR
SUB BL, 30H
; Now BL contain the single digit 0 to 9
; The only code missing here is to check whether the input is in the specific
; range.
CODE ENDS.
(iii) Input numbers like (10, 11..99)
; If we want to store 39, it is actually 30 + 9
; and it is 3 10 + 9
; to input this value through keyboard, first we input the tenth digit e.g., 3 and
48
; then type 9
MOV AH, 08H
INT
21H
MOV BL, AL ; If we have input 39 then, BL will first have character
; 3, we can convert it to 3 using previous logic that is 33 30 = 3.
SUB BL, 0
MUL BL, AH
; To get 30 Multiply it by 10.
; Now BL Store 30
; Input another digit from keyboard
MOV AH, 08H
INT
21H;
MOV DL, AL
; Store AL in DL
SUB DL, 0
; (39 30) = 9.
; Now BL contains the value: 30 and DL has the value 9 add them and get the
; required numbers.
ADD BL, DL
; Now BL store 39. We have 2 digit value in BL.
Introduction to
Assembly Language
Programming
;
;
The Central
Processing
Unit
Assembly
Language
Programming
Explanation
The above DATA segment creates an input buffer BUFF of maximum 50 characters.
On input of data JAIN followed by enter data would be stored as:
50
BL, 0
AH, 02H
DL, BL
21H
BH, 0
BL, 0
AH, 02H
DL, BH
21H
DL, BL
21H
50
CODE SEGMENT
;
set the DS register
MOV AX, DATA
MOV DS, AX
;
Read Keyboard
MOV AH, 08H
INT 21H
;
Save input
MOV BL, AL
;
Display first part of Message
MOV AH, 09H
MOV DX, OFFSET MESSAGE
INT 21 H
;
Display character of BL register
MOV AH, 02H
MOV DL, BL
INT 21 H
;
Exit to DOS
MOV AX, 4C00H
INT 21H
CODE ENDS
Introduction to
Assembly Language
Programming
DATA SEGMENT
MESSAGE DB The letter you typed is $
DATA ENDS
END.
The Central
Processing
Unit
Assembly
Language
Programming
These programs are stored on a disk with an extension .com. A COM program
requires less space on disk rather than equivalent EXE program. At run-time the COM
program places the stack automatically at the end of the segment, so they use at least
one complete segment.
EXE-format assembler programs are more easily converted into subroutines for
high-level languages.
EXE programs are more easily relocatable. Because, there is no ORG statement,
forcing the program to be loaded from a specific address.
; First number
; Second number
RESULT DB
?
; Put sum here
CARRY DB
?
; Put any carry here
DATA ENDS
CODE SEGMENT
ASSUME CS:CODE, DS:DATA
START:MOV AX, DATA
; Initialise data segment
MOV DS, AX
; register using AX
MOV AL, NUM1
; Bring the first number in AL
ADD AL, NUM2
; Add the 2nd number to AL
MOV RESULT, AL
; Store the result
RCL AL, 01
; Rotate carry into Least Significant Bit (LSB)
AND AL, 00000001B
; Mask out all but LSB
MOV CARRY, AL
; Store the carry
MOV AX, 4C00h
; Terminate to DOS
INT 21h
CODE ENDS
END START
2.7
Introduction to
Assembly Language
Programming
Now that we have seen all the details of assembly language programming, let us
discuss the art of writing assembly programs in brief.
Preparation of writing the program
1. Write an algorithm for your program closer to assembly language. For example,
the algorithm for preceding program would be:
get NUM1
add NUM2
put sum into memory at RESULT
position carry bit in LSB of byte
mask off upper seven bits
store the result in the CARRY location.
2. Specify the input and output required.
input required
output required
3. Study the instruction set carefully. This step helps in specifying the available
instructions and their format and constraints. For example, the segment registers
cannot be directly initialized by a memory variable. Instead we have to first move
the offset for segment into a register, and then move the contents of register to the
segment register.
You can exit to DOS, by using interrupt routine 21h, with function 4Ch, placed in AH
register.
53
The Central
Processing
Unit
Assembly
Language
Programming
It is a nice practice to first code your program on paper, and use comments liberally.
This makes programming easier, and also helps you understand your program later.
Please note that the number of comments do not affect the size of the program.
After the program development, you may assemble it using an assembler and correct
it for errors, finally creating exe file for execution.
54
1.
2.
3.
4.
5.
6.
7.
String input and output can be achieved using INT 21H with
function number 09h and 0Ah respectively.
8.
9.
10.
11.
12.
13.
14.
15.
16.
17.
18.
2.8 SUMMARY
Introduction to
Assembly Language
Programming
55
The Central
Processing
Unit
Assembly
Language
Programming
(a)
2.
3.
(a)
(b)
(c)
(d)
(e)
(f)
False
False
True
True
False
True
False
False
True
False
True
True
True
True
True
False
False
False
True
False
True
True
False
True
56
Assembly Language
Programming
(Part I)
Introduction
Objectives
Simple Assembly Programs
3.2.1
3.2.2
3.2.3
3.2.4
3.3
3.5
3.6
63
57
57
57
Data Transfer
Simple Arithmetic Application
Application Using Shift Operations
Larger of the Two Numbers
3.4
Page No.
69
String Processing
Some More Arithmetic Problems
Summary
Solutions/ Answers
75
75
3.0 INTRODUCTION
After discussing a few essential directives, program developmental tools and simple
programs, let us discuss more about assembly language programs. In this unit, we will
start our discussions with simple assembly programs, which fulfil simple tasks such as
data transfer, arithmetic operations, and shift operations. A key example here will be
about finding the larger of two numbers. Thereafter, we will discuss more complex
programs showing how loops and various comparisons are used to implement tasks
like code conversion, coding characters, finding largest in array etc. Finally, we will
discuss more complex arithmetic and string operations. You must refer to further
readings for more discussions on these programming concepts.
3.1 OBJECTIVES
After going through this unit, you should be able to:
write assembly programs with simple arithmetic logical and shift operations;
implement loops;
use comparisons for implementing various comparison functions;
write simple assembly programs for code conversion; and
write simple assembly programs for implementing arrays.
Assembly Language
Programming
; Program 1: This program shows the difference of MOV and XCHG instructions:
DATA SEGMENT
VAL DB
DATA ENDS
5678H
CODE SEGMENT
ASSUME
MAINP:
MOV
XCHG
MOV
MOV
XCHG
XCHG
MOV
INT
CODE ENDS
END MAINP
CS:
CODE, DS: DATA
AX, 1234H
; AH=12 & AL=34
AH, AL
; AH=34 & AL=12
AX, 1234H
; AH=12 & AL=34
BX, VAL
; BH=56 & BL=78
AX, BX
; AX=5678 & BX=1234
AH, BL
; AH=34, AL=78, BH=12, & BL=56
AX, 4C00H
; Halt using INT 21h
21H
Discussion:
Just keep on changing values as desired in the program.
; Program 2: Program for interchanging the values of two Memory locations
; input: Two memory variables of same size: 8-bit for this program
DATA SEGMENT
VALUE1 DB
0Ah
VALUE2 DB
14h
DATA ENDS
CODE SEGMENT
ASSUME CS:CODE, DS:DATA
MOV AX, DATA
MOV DS, AX
MOV AL, VALUE1
XCHG VALUE2,AL
MOV VALUE1,AL
INT 21h
CODE ENDS
END
; Variables
Discussion:
The question is why cannot we simply use XCHG instruction with two memory
variables as operand? To answer the question let us look into some of constraints for
the MOV & XCHG instructions:
The MOV instruction has the following constraints and operands:
The statement MOV AL, VALUE1, copies the VALUE1 that is 0Ah in the AL
register:
58
AX : 00 0A
AH AL
0A
14
(VALUE1)
(VALUE2)
Assembly Language
Programming
(Part I)
The instruction, XCHG AL, VALUE2 ; exchanges the value of AL with VALUE2
Now AL and VALUE2 contains and values as under:
AX : 00 14
0A
0A
(VALUE1)
(VALUE2)
14
0A
(VALUE1)
(VALUE2)
Other statements in the above program have already been discussed in the preceding
units.
Assembly Language
Programming
the carry bit into the AH register such that the AX(AH:AL) reflects the added value.
This is done using ADC instruction.
The ADC AH,00h instruction will add the immediate number 00h to the contents of
the carry flag and the contents of the AH register. The result will be left in the AH
register. Since we had cleared AH to all zeros, before the add, we really are adding
00h + 00h + CF. The result of all this is that the carry flag bit is put in the AH register,
which was desired by us.
Finally, to get the average, we divide the sum given in AX by 2. A more general
program would require positive and negative numbers. After the division, the 8-bit
quotient will be left in the AL register, which can then be copied into the memory
location named AVGE.
ASCII
00110101
00111001
BCD
00000101
00001001
Pack
0101 0000
0000 1001
0101 1001
Using OR
;The assembly language program for the above can be written in the following
manner.
; ABSTRACT
60
; REGISTERS
; Uses CS, AL, BL, CL
; PORTS
; None used
CODE
SEGMENT
ASSUME
CS:CODE
START:
MOV BL,
'5'
; Load first ASCII digit in BL
MOV AL,
'9'
; Load second ASCII digit in AL
AND BL,
0Fh
; Mask upper 4 bits of first digit
AND AL,
0Fh
; Mask upper 4 bits of second digit
MOV CL,
04h
; Load CL for 4 rotates
ROL BL,
CL
; Rotate BL 4 bit positions
OR
AL,
BL
; Combine nibbles, result in AL contains 59
; as packed BCD
CODE
ENDS
END
START
Discussion:
Assembly Language
Programming
(Part I)
8086 does not have any instruction to swap upper and lower four bits in a byte,
therefore we need to use the rotate instructions that too by 4 times. Out of the two
rotate instructions, ROL and RCL, we have chosen ROL, as it rotates the byte left by
one or more positions, on the other hand RCL moves the MSB into the carry flag and
brings the original carry flag into the LSB position, which is not what we want.
Let us now look at a program that uses RCL instructions. This will make the
difference between the instructions clear.
; Program 5: Add a byte number from one memory location to a byte from the next
memory location and put the sum in the third memory location. Also, save the carry
flag in the least significant bit of the fourth memory location.
; ABSTRACT
;
;
;
; ALGORITHM:
;
get NUM1
;
add NUM2 in it
;
put sum into memory location RESULT
;
rotate carry in LSB of byte
;
mask off upper seven bits of byte
;
store the result in the CARRY location.
;
; PORTS
: None used
; PROCEDURES
: None used
; REGISTERS
: Uses CS, DS, AX
;
DATA
SEGMENT
NUM1
DB
25h
; First number
NUM2
DB
80h
; Second number
RESULT
DB
?
; Put sum here
CARRY
DB
DATA
ENDS
CODE
SEGMENT
ASSUME CS:CODE, DS:DATA
START:MOV AX, DATA
; Initialise data segment
MOV DS, AX
; register using AX
MOV AL, NUM1
; Load the first number in AL
ADD AL, NUM2
; Add 2nd number in AL
61
Assembly Language
Programming
MOV RESULT, AL
RCL AL, 01
AND AL, 00000001B
MOV CARRY, AL
MOV AH, 4CH
INT 21H
CODE
ENDS
END
START
Discussion:
RCL instruction brings the carry into the least significant bit position of the AL
register. The AND instruction is used for masking higher order bits, of the carry, now
in AL.
In a similar manner we can also write applications using other shift instructions.
Flag(s) affected
Carry flag = 1
Zero flag = 1
Carry = 0, Zero = 0
Lets look at three examples that show how the flags are set when the numbers are
compared. In example 1 BL is less than 10, so the carry flag is set. In example 2, the
zero flag is set because both operands are equal. In example 3, the destination (BX) is
greater than the source, so both the zero and the carry flags are clear.
Example 1:
; Carry flag = 1
Example 2:
; Zero flag = 1
Example 3:
In the following section we will discuss an example that uses the flags set by CMP
instruction.
62
Assembly Language
Programming
(Part I)
T F
1.
In a MOV instruction, the immediate operand value for 8-bit destination cannot
exceed F0h.
2.
3.
4.
A single instruction cannot swap the upper and lower four of a byte
register.
5.
6.
AX,BX
THERE
AX, 02
CL, 07
; load 07 to CL
In the example above the control of the program will directly transfer to the label
THERE if the value stores in AX register is equal to that of the register BX. The same
example can be rewritten in the following manner, using different jumps.
Example 5:
FIX:
CMP
JNE
JMP
ADD
AX, BX
FIX
THERE
AX, 02
Assembly Language
Programming
THERE: MOV
CL, 07
The above code is not efficient, but suggest that there are many ways through which a
conditional jump can be implemented. Select the most optimum way.
Example 6:
CMP
JE
DX, 00
Label1
Label1:----
; checks if DX is zero.
; if yes, jump to Label1 i.e. if ZF=1
; control comes here if DX=0
Example 7:
MOV AL, 10
CMP AL, 20
JL
Lab1
; moves 10 to AL
; checks if AL < 20 i.e. CF=1
; carry flag = 1 then jump to Lab1
Lab1: ------
LOOPING
; Program 6: Assume a constant inflation factor that is added to a series of prices
; stored in the memory. The program copies the new price over the old price. It is
; assumed that price data is available in BCD form.
; The algorithm:
;Repeat
;
Read a price from the array
;
Add inflation factor
;
Adjust result to correct BCD
;
Put result back in array
;
Until all prices are inflated
; REGISTERS: Uses DS, CS, AX, BX, CX
; PORTS
: Not used
ARRAYS
SEGMENT
PRICE
DB
36h, 55h, 27h, 42h, 38h, 41h, 29h, 39h
ARRAYS
ENDS
CODE
SEGMENT
ASSUME CS:CODE, DS:ARRAYS
START:
MOV AX, ARRAYS ; Initialize data segment
MOV DS, AX
; register using AX
LEA BX, PRICES ; initialize pointer to base of array
MOV CX, 0008h
; Initialise counter to 8 as array have 8
; values.
DO_NEXT:
MOV AL, [BX]
; Copy a price to AL. BX is addressed in
; indirect mode.
ADD AL, 0Ah
; Add inflation factor
DAA
; Make sure that result is BCD
MOV [BX], AL
; Copy result back to the memory
INC
BX
; increment BX to make it point to next price
DEC CX
; Decrement counter register
JNZ
DO_NEXT
: If not last, (last would be when CX will
; become 0) Loop back to DO_NEXT
MOV AH, 4CH
; Return to DOS
INT
21H
CODE
ENDS
END START
64
Discussion:
Please note the use of instruction: LEA BX,PRICES: It will load the BX register with
the offset of the array PRICES in the data segment. [BX] is an indirection through BX
and contains the value stored at that element of array. PRICES. BX is incremented to
point to the next element of the array. CX register acts as a loop counter and is
decremented by one to keep a check of the bounds of the array. Once the CX register
becomes zero, zero flag is set to 1. The JNZ instruction keeps track of the value of
CX, and the loop terminates when zero flag is 1 because JNZ does not loop back.
The same program can be written using the LOOP instruction, in such case, DEC CX
and JNZ DO_NEXT instructions are replaced by LOOP DO_NEXT instruction.
LOOP decrements the value of CX and jumps to the given label, only if CX is not
equal to zero.
Assembly Language
Programming
(Part I)
Let us demonstrate the use of LOOP instruction, with the help of following program:
; Program 7: This following program prints the alphabets (A-Z)
; Register used : AX, CX, DX
CODE SEGMENT
ASSUME : CS:CODE.
MAINP: MOV CX, 1AH
MOV DL, 41H
NEXTC: MOV AH, 02H
INT
21H
INC
DL
LOOP NEXTC
Assembly Language
Programming
EQUAL:
MOV DL, N
INT 21 H
CONTINUE :
LOOP NEXT P
MOV AH, 4C H
INT 21 H
CODE ENDS
END MAINP
Discussion:
This program will be executed, at least 3 times.
DATA
END.
CODE
A1:
SEGMENT
ARRAY
LARGE
SMALL
ENDS
DW
DW
DW
SEGMENT
MOV AX,DATA
MOV DS,AX
MOV DI, OFFSET ARRAY
MOV AX, [DI]
MOV DX, AX
MOV BX, AX
MOV CX, 6
MOV AX, [DI]
CMP AX, BX
JGE
A2
MOV BX, AX
JMP
66
A3
; Initialize DS
; DI points to the array
; AX contains the first element
; initialize large in DX register
; initialize small in BX register
; initialize loop counter
; get next array value
; Is the new value smaller?
; If greater then (not smaller) jump to
; A2, to check larger than large in DX
; Otherwise it is smaller so move it to
; the smallest value (BX register)
; as it is small, thus no need
; to compare it with the large so jump
; to A3 to continue or terminate loop.
A2:
CMP
JLE
AX, DX
A3
MOV DX, AX
A3:
ADD
LOOP
MOV
MOV
DI, 2
A1
LARGE, DX
SMALL, BX
CODE
; [DI] = large
; if less than it implies not large so
; jump to A3
; to continue or terminate
; otherwise it is larger value, so move
; it to DX that store the large value
; DI now points to next number
; repeat the loop until CX = 0
Assembly Language
Programming
(Part I)
Discussion:
Since the data is word type that is equal to 2 bytes and memory organisation is byte
wise, to point to next array value DI is incremented by 2.
1234
3210
The sum is
4444
As each digit is input, we would store its ASCII code in a memory byte. After the
first number was input the number would be stored as follows:
The number is entered as:
31
1
32
2
33
3
34
4
hexadecimal storage
ASCII digits
Each of these numbers will be input as equivalent ASCII digits and need to be
converted either to digit string to a 16-bit binary value that can be used for
computation or the ASCII digits themselves can be added which can be followed by
instruction that adjust the sum to binary. Let us use the conversion operation to
perform these calculations here.
Another important data format is packed decimal numbers (packed BCD). A packed
BCD contains two decimal digits per byte. Packed BCD format has the following
advantages:
The BCD numbers allow accurate calculations for almost any number of
significant digits.
Conversion of packed BCD numbers to ASCII (and vice versa) is relatively fast.
An implicit decimal point may be used for keeping track of its position in a
separate variable.
The instructions DAA (decimal adjust after addition) and DAS (decimal adjust after
subtraction) are used for adjusting the result of an addition of subtraction operation on
67
Assembly Language
Programming
packed decimal numbers. However, no such instruction exists for multiplication and
division. For the cases of multiplication and division the number must be unpacked.
First, multiplied or divided and packed again. The instruction DAA and DAS has
already been explained in unit 1.
ERROR:
MOV
CONVERTED: MOV
INT
CODE
ENDS
END
AL, 0FFh
AX, 4C00h
21h
Assembly Language
Programming
(Part I)
START
Discussions:
The above program demonstrates a single hex digit represented by an ASCII
character. The above programs can be extended to take more ASCII values and
convert them into a 16-bit binary number.
2.
3.
4.
Assembly Language
Programming
L2:
L1:
MOV
CMP
JG
LEA
MOV
LEA
MOV
CMP
JNE
MOV
ADD
JMP
IND, 00
IND, 08
L1
AX, STR1
BX, IND
; ind : = 0
; ind < 9
; not so; skip
; offset of str1 in AX register
; it uses a register for indexing into
; the array
CX, STR2
; str2 in CX
DL, BYTE PTR CX[BX]
DL, BYTE PTR AX[BX]
; str1[ind] = str2[ind]
L1
; no, skip
IND, BX
IND, 01
L3
; loop back
What we find in the above code: a large code that could have been improved further,
if the 8086 string instructions would have been used.
; Program 11: Matching two strings of same length stored in memory locations.
; REGISTERS : Uses CS, DS, ES, AX, DX, CX, SI, DI
DATA
SEGMENT
PASSWORD DB
'FAILSAFE'
; source string
DESTSTR
DB
'FEELSAFE' ; destination string
MESSAGE
DB
'String are equal $'
DATA
ENDS
CODE
SEGMENT
ASSUME CS:CODE, DS:DATA, ES:DATA
MOV AX, DATA
MOV DS, AX
; Initialise data segment register
MOV ES, AX
; Initialise extra segment register
; as destination string is considered to be in extra segment. Please note that ES is also
; initialised to the same segment as of DS.
LEA SI, PASSWORD
; Load source pointer
LEA DI, DESTSTR
; Load destination pointer
MOV CX, 08
; Load counter with string length
CLD
; Clear direction flag so that comparison is
; done in forward direction.
REPE
JNE
MOV
MOV
INT
NOTEQUAL:MOV
INT
CODE
ENDS
END
CMPSB
; Compare the two string byte by byte
NOTEQUAL ; If not equal, jump to NOTEQUAL
AH, 09
; else display message
DX, OFFSET MESSAGE ;
21h
; display the message
AX, 4C00h
; interrupt function to halt
21h
Discussion:
In the above program the instruction CMPSB compares the two strings, pointed by SI
in Data Segment and DI register in extra data segment. The strings are compared byte
by byte and then the pointers SI and DI are incremented to next byte. Please note the
last letter B in the instruction indicates a byte. If it is W, that is if instruction is
CMPSW, then comparison is done word by word and SI and DI are incremented by 2,
70
that is to next word. The REPE prefix in front of the instruction tells the 8086 to
decrement the CX register by one, and continue to execute the CMPSB instruction,
until the counter (CX) becomes zero. Thus, the code size is substantially reduced.
Assembly Language
Programming
(Part I)
Similarly, you can write efficient programs for moving one string to another, using
MOVS, and scanning a string for a character using SCAS.
1
5MHz
1
Seconds
5 106
1 103
clock cycles
1
6
5 10
CX, N
NOP
; 3 cycles
NOP
; 3 cycles
LOOP DELAY ; 17 or 5
LOOP instruction takes 17 clock cycles when the condition is true and 5 clock cycles
otherwise. The condition will be true, N number of times and false only once, when
the control comes out of the loop.
To calculate N:
Total clock cycles
Assembly Language
Programming
11 01 10 FF
FF 40 30 20 10
1 1F
Carry
51 31 31 1F
Let us also assume that the numbers are represented as the lowest significant byte first
and put in memory in two arrays. The result is stored in the third array SUM. The
SUM also contains the carry out information, thus would be 1 byte longer than
number arrays.
; Program 12: Add two five-byte numbers using arrays
; ALGORITHM:
;
Make count = LEN
;
Clear the carry flag
;
Load address of NUM1
;
REPEAT
;
Put byte from NUM1 in accumulator
;
Add byte from NUM2 to accumulator + carry
;
Store result in SUM
;
Decrement count
;
Increment to next address
;
UNTIL count = 0
;
Rotate carry into LSB of accumulator
;
Mask all but LSB of accumulator
;
Store carry result, address pointer in correct position.
; PORTS
: None used
; PROCEDURES
: None used
; REGISTERS
: Uses CS, DS, AX, CX, BX, DX
DATA
DATA
CODE
START:
AGAIN:
72
SEGMENT
NUM1
DB
NUM2
DB
SUM
DB
ENDS
LEN
EQU
0FFh, 10h
10h,
20h,
6DUP(0)
05h
,01h
30h,
,11h
40h
,20h
,0FFh
SEGMENT
ASSUME CS:CODE, DS:DATA
MOV
AX, DATA
; initialise data segment
MOV
DS, AX
; using AX register
MOV
SI, 00
; load displacement of 1st number.
; SI is being used as index register
MOV
CX, 0000
; clear counter
MOV
CL, LEN
; set up count to designed length
CLC
; clear carry. Ready for addition
MOV
AL, NUM1[SI] ; get a byte from NUM1
ADC
AL, NUM2[SI] ; add to byte from NUM2 with carry
FINISH:
CODE
MOV
INC
LOOP
RCL
AND
MOV
MOV
INT
ENDS
END
SUM[SI], AL
SI
AGAIN
AL, 01h
AL, 01h
SUM[SI], AL
AX, 4C00h
21h
Assembly Language
Programming
(Part I)
START
EQU
3E8h
SEGMENT
BCD
DW
HEX
DW
ENDS
; 1000 = 3E8h
4567h
?
; storage reserved for result
CODE
SEGMENT
ASSUME CS:CODE, DS:DATA
START: MOV
AX, DATA
; initialise data segment
MOV
DS, AX
; using AX register
MOV
AX, BCD
; get the BCD number AX = 4567
MOV
BX, AX
; copy number into BX; BX = 4567
MOV
AL, AH
; place for upper 2 digits in AX = 4545
MOV
BH, BL
; place for lower 2 digits in BX = 6767
; split up numbers so that we have one digit
; in each register
MOV
CL, 04
; bit count for rotate
ROR
AH, CL
; digit 1 (MSB) in lower four bits of AH.
; AX = 54 45
ROR
BH, CL
; digit 3 in lower four bits of BH.
; BX = 76 67
AND
AX, 0F0FH
; mask upper four bits of each digit.
; AX = 04 05
73
Assembly Language
Programming
AND
MOV
BX, 0F0FH
CX, AX
; BX = 06 07
; copy AX into CX so that can use AX for
; multiplication CX = 04 05
CODE
MOV
AX, 0000H
MOV
MOV
AL, CH
DI, THOU
MUL
DI
MOV
MOV
ADD
MOV
MUL
ADD
DH, 00H
DL, BL
DX, AX
AX, 0064h
CL
DX, AX
MOV
MUL
ADD
AX, 000Ah
BH
DX, AX
MOV
MOV
INT
ENDS
END
HEX, DX
AX, 4C00h
21h
; zero AH and AL
; now multiply each number by its place
; value
; digit 1 to AL for multiply
; no immediate multiplication is allowed so
; move thousand to DI
; digit 1 (4)*1000
; result in DX and AX. Because BCD digit
; will not be greater than 9999, the result will
; be in AX only. AX = 4000
; zero DH
; move BL to DL, so DL = 7
; add AX; so DX = 4007
; load value for 100 into AL
; multiply by digit 2 from CL
; add to total in DX. DX now contains
; (7 + 4000 + 500)
; load value of 10 into AL
; multiply by digit 3 in BH
; add to total in DX; DX contains
; (7 + 4000 + 500 +60)
; put result in HEX for return
START
Why should we perform string processing in assembly language in 8086 and not
in high-level language?
2.
3.
74
Assembly Language
Programming
(Part I)
3.5 SUMMARY
In this unit, we have covered some basic aspects of assembly language programming.
We started with some elementary arithmetic problems, code conversion problems,
various types of loops and graduated on to do string processing and slightly complex
arithmetic. As part of good programming practice, we also noted some points that
should be kept in mind while coding. Some of them are:
3.6
SOLUTIONS/ ANSWERS
AX, A
AX, B
DX, 0000h
BX, 10
BX
C
AX
; bring A in AX
; subtract B
; move 0 to DX as it will be used for word division
; move dividend to BX
; divide
; ( (A-B) / 10 * C) in AX
; square AX to get (A-B/10 * C) * * 2
75
Assembly Language
Programming
CX, COUNT
MOV
AX, 0000h
MOV
SI, AX
; add the elements of array in AX again and again
AGAIN: ADD AX, ARRAY[SI] ; another way of handling array
ADD
SI, 2
; select the next element of the array
LOOP AGAIN
; add all the elements of the array. It will
terminate when CX becomes zero.
MOV
TOTAL, AX ; store the results in TOTAL.
76
UNIT 4
ASSEMBLY LANGUAGE
PROGRAMMING (PART-II)
Structure
4.0
4.1
4.2
4.3
Assembly Language
Programming
(Part II)
Page No.
Introduction
Objectives
Use of Arrays in Assembly
Modular Programming
4.3.1
4.3.2
4.3.3
4.3.4
4.4
The stack
FAR and NEAR Procedures
Parameter Passing in Procedures
External Procedures
4.5
4.6
4.7
4.8
77
77
77
80
93
Simple Interfacing
Interfacing Subroutines With Parameter Passing
Interrupts
Device Drivers in Assembly
Summary
Solutions/ Answers
97
99
101
102
4.0 INTRODUCTION
In the previous units, we have discussed the instruction set, addressing modes, and
other tools, which are needed to develop assembly language programs. We shall now
use this knowledge in developing more advanced tools. We have divided this unit
broadly into four sections. In the first section, we discuss the design of some simple
data structures using the basic data types. Once the programs become lengthier, it is
advisable to divide them into small modules, which can be easily written, tested and
debugged. This leads to the concept of modular programming, and that is the topic of
our second section in this unit. In the third section, we will discuss some techniques to
interface assembly language programs to high level languages. We have explained the
concepts using C and C ++ as they are two of the most popular high-level languages.
In the fourth section we have designed some tools necessary for interfacing the
microprocessor with external hardware modules.
4.1 OBJECTIVES
After going through this unit, you should be able to:
4.2
An array is referencing using a base array value and an index. To facilitate addressing
in arrays, 8086 has provided two index registers for mathematical computations, viz.
BX and BP. In addition two index registers are also provided for string processing,
viz. SI and DI. In addition to this you can use any general purpose register also for
indexing.
77
Assembly Language
Programming
An important application of array is the tables that are used to store related
information. For example, the names of all the students in the class, their CGPA, the
list of all the books in the library, or even the list of people residing in a particular area
can be stored in different tables. An important application of tables would be character
translation. It can be used for data encryption, or translation from one data type to
another. A critical factor for such kind of applications is the speed, which just happens
to be a strength of assembly language. The instruction that is used for such kind of
applications is XLAT.
Let us explain this instruction with the help of an example:
Example 1:
Let us assume a table of hexadecimal characters representing all 16 hexadecimal
digits in table:
HEXA
0123456789ABCDEF
DB
; index value
; offset of the table HEXA
The above tasks can be done without XLAT instruction but it will require a long series
of instructions such as:
MOV
MOV
PUSH
ADD
AL, 0Ah
BX, OFFSET HEXA
BX
BL, AL
MOV
POP
AL, [BX]
BX
; index value
; offset of the table HEXA
; save the offset
; add index value to table
; HEXA offset
; retrieve the entry
; restore BX
Let us use the instruction XLAT for data encoding. When you want to transfer a
message through a telephone line, then such encoding may be a good way of
preventing other users from reading it. Let us show a sample program for encoding.
78
Assembly Language
Programming
(Part II)
PROGRAM 1:
; A program for encoding ASCII Alpha numerics.
; ALGORITHM:
; create the code table
; read an input string character by character
; translate it using code table
; output the strings
DATA
DATA
SEGMENT
CODETABLE
DB 48 DUP (0)
; no translation of first
; 48 ASCII
DB 4590821367
; ASCII codes 48
; 57 (30h 39h)
DB 7 DUP (0)
; no translation of
these 7 characters
DB GVHZUSOBMIKPJCADLFTYEQNWXR
DB 6 DUP (0)
; no translation
DB gvhzusobmikpjcadlftyeqnwxr
DB 133 DUP (0)
; no translation of remaining
; character
ENDS
CODE
SEGMENT
MOV
MOV
MOV
GETCHAR:
MOV
MOV
INT
JZ
MOV
XLAT
CMP
JE
MOV
PUTCHAR:
MOV
INT
JMP
QUIT:
MOV
INT
CODE
ENDS
END
AX, DATA
DS, AX
; initialize DS
BX, OFFSET CODETABLE
AH, 06
DL, 0FFh
21h
QUIT
DL, AL
CODETABLE
AL, 0
PUTCHAR
DL, AL
AH, 02
21h
GETCHAR
AX, 4C00h
21h
; write DL to output
; get another character
Discussion:
The program above will code the data. For example, a line from an input file will be
encoded:
A SECRET Message
G TUHFUY Juttgou
The program above can be run using the following command line. If the program file
name is coding.asm
coding infile > outfile
79
Assembly Language
Programming
The infile is the input data file, and outfile is the output data file.
You can write more such applications using 8086 assembly tables.
2.
3.
4.
5.
6.
7.
8.
Module A
Module D
80
Module B
Module E
Module C
Assembly Language
Programming
(Part II)
You can divide a program into subroutines or procedures. You need to CALL the
procedure whenever needed. A subroutine call transfers the control to subroutine
instructions and brings the control back to calling program.
4.3.1
The Stack
Mnemonics
PUSH SRC
POP DST
4.3.2
Description
SPSP 2
SP+1 and SP location are
assign the SRC
DST is a assigned values
stored at stack top
SP SP + 2
Procedure provides the primary means of breaking the code in a program into
modules. Procedures have one major disadvantage, that is, they require extra code to
81
Assembly Language
Programming
join them together in such a way that they can communicate with each other. This
extra code is sometimes referred to as linkage overhead.
A procedure call involves:
1.
Unlike other branch instructions, a procedure call must save the address of the
next instruction so that the return will be able to branch back to the proper
place in the calling program.
2.
The registers used by the procedures need to be stored before their contents
are changed and then restored just before the procedure is finished.
3.
.
IP HIGH
IP LOW
.
.
Please note the growth of stack is towards stack segment base. So stack becomes full
on an offset 0000h. Also for push operation we decrement SP by 2 as stack is a word
stack (word size in 8086 = 16 bits) while memory is byte organised memory.
FAR procedure call, also known as intersegment call, is a call made to separate code
segment. Thus, the control will be transferred outside the current segment. Therefore,
both CS and IP need to be stored as the return address. These values on the stack after
the calls look like:
Initial stack top
.
CS HIGH
CS LOW
IP HIGH
IP LOW
.
.
Low address
When the 8086 executes the FAR call, it first stores the contents of the code segment
register followed by the contents of IP on to the stack. A RET from the NEAR
procedure. Pops the two bytes into IP. The RET from the FAR procedure pops four
bytes from the stack.
Procedure is defined within the source code by placing a directive of the form:
82
Assembly Language
Programming
(Part II)
In the first case the <attribute> code NEAR should be used as the procedure and code
are in the same segment. For the latter two cases the <attribute> must be FAR.
Let us describe an example of procedure call using NEAR procedure, which contains
a call to a procedure in the same segment.
PROGRAM 2:
Write a program that collects in data samples from a port at 1 ms interval. The upper 4
bits collected data same as mastered and stored in an array in successive locations.
; REGISTERS
; PROCEDURES
DATA_SEG
SEGMENT
PRESSURE
DW
NBR_OF_SAMPLES
PRESSURE_PORT EQU
DATA_SEG
ENDS
100
DUP(0)
EQU
100
0FFF8h
CODE_SEG SEGMENT
ASSUME CS:CODE_SEG, DS:DATA_SEG, SS:STACK_SEG
START:
MOV
AX, DATA_SEG
; Initialise data segment register
MOV
DS, AX
MOV
AX, STACK_SEG
; Initialise stack segment register
MOV
SS, AX
MOV
SP, OFFSET STACK TOP ; initialise stack pointer top of
; stack
LEA
SI, PRESSURE
; SI points to start of array
; PRESSURE
MOV
BX, NBR_OF_SAMPLES
; Load BX with number
; of samples
MOV
DX, PRESSURE_PORT
; Point DX at input port
; it can be any A/D converter or
; data port.
READ_NEXT: IN
AND
MOV
CALL
AX, DX
AX, 0FFFH
[SI], AX
WAIT
Assembly Language
Programming
STOP:
WAIT
INC
INC
DEC
JNZ
SI
SI
BX
READ_NEXT
NOP
PROC
MOV
NEAR
CX, 2000H
HERE:
LOOP
RET
WAIT
ENDP
CODE_SEG ENDS
END
HERE
Discussion:
Please note that the CALL to the procedure as above does not indicate whether the
call is to a NEAR procedure or a FAR procedure. This distinction is made at the time
of defining the procedure.
The procedure above can also be made a FAR procedure by changing the definition of
the procedure as:
WAIT
WAIT
PROC FAR
.
.
ENDS
The procedure can now be defined in another segment if the need so be, in the same
assembly language file.
Let us discuss a program that uses a procedure for converting a BCD number to binary
number.
PROGRAM 3:
Conversion of BCD number to binary using a procedure.
Algorithm for conversion procedure:
Take a packed BCD digit and separate the two digits of BCD.
Multiply the upper digit by 10 (0Ah)
Add the lower digit to the result of multiplication
The implementation of the procedure will be dependent on the parameter-passing
scheme. Let us demonstrate this with the help of three programs.
Program 3 (a): Use of registers for parameter passing: This program uses AH register
for passing the parameter.
84
We are assuming that data is available in memory location. BCD and the result is
stored in BIN
;REGISTERS
;PROCEDURES
Assembly Language
Programming
(Part II)
DATA_SEG
BCD
BIN
DATA_SEG
STACK_SEG
SEGMENT
DB 25h
DB ?
ENDS
SEGMENT
DW
TOP_STACK LABEL
STACK_SEG
ENDS
CODE_SEG
SEGMENT
ASSUME CS:CODE_SEG, DS:DATA_SEG, SS:STACK_SEG
START: MOV
AX, DATA_SEG
; Initialise data segment
MOV
DS, AX
; Using AX register
MOV
AX, STACK_SEG
; Initialise stack
MOV
SS, AX
; Segment register. Why
; stack?
MOV
SP, OFFSET TOP_STACK ; Initialise stack pointer
MOV
AH, BCD
CALL
BCD_BINARY
; Do the conversion
MOV
BIN, AH
; Store the result in the
; memory
:
:
; Remaining program can be put here
;PROCEDURE
: BCD_BINARY - Converts BCD numbers to binary.
;INPUT
: AH with BCD value
;OUTPUT
: AH with binary value
;DESTROYS
: AX
BCD_BINARY
PUSHF
PUSH
PUSH
PROC NEAR
; Save flags
; and registers used in procedure
; before starting the conversion
; Do the conversion
MOV
BH, AH
; Save copy of BCD in BH
AND
BH, 0Fh
; and mask the higher bits. The lower digit
; is in BH
AND
AH, 0F0h
; mask the lower bits. The higher digit is in AH
; but in upper 4 bits.
MOV
CH, 04
; so move upper BCD digit to lower
ROR
AH, CH
; four bits in AH
MOV
AL, AH
; move the digit in AL for multiplication
MOV
BH, 0Ah
; put 10 in BH
MUL
BH
; Multiply upper BCD digit in AL
; by 0Ah in BH, the result is in AL
MOV
AH, AL
; the maximum/ minimum number would not
; exceed 8 bits so move AL to AH
ADD
AH, BH
; Add lower BCD digit to MUL result
; End of conversion, binary result in AH
POP
CX
; Restore registers
POP
BX
POPF
BX
CX
85
Assembly Language
Programming
BCD_BINARY
CODE_SEG
RET
ENDP
ENDS
END
Discussion:
The above program is not an optimum program, as it does not use registers minimally.
By now you should be able to understand this module. The program copies the BCD
number from the memory to the AH register. The AH register is used as it is in the
procedure. Thus, the contents of AH register are used in calling program as well as
procedure; or in other words have been passed from main to procedure. The result of
the subroutine is also passed back to AH register as returned value. Thus, the calling
program can find the result in AH register.
The advantage of using the registers for passing the parameters is the ease with which
they can be handled. The disadvantage, however, is the limit of parameters that can be
passed. For example, one cannot pass an array of 100 elements to a procedure using
registers.
SEGMENT
BCD
BIN
ENDS
DB
DB
25h
?
STACK
DW
100 DUP(0)
TOP_STACK LABEL WORD
STACK_SEG ENDS
STACK_SEG SEGMENT
CODE_SEG
SEGMENT
ASSUME CS:CODE_SEG, DS:DATA_SEG, SS:STACK_SEG
START:
MOV
AX, DATA_SEG
; Initialize data
MOV
DS, AX
; segment using AX register
MOV
AX, STACK_SEG
; initialize stack
MOV
SS, AX
; segment. Why stack?
MOV
SP, OFFSET TOP_STACK ; initialize stack pointer
; Put pointer to BCD storage in SI and DI prior to procedure call.
MOV
SI, OFFSET BCD
; SI now points to BCD_IN
MOV
DI, OFFSET BIN
; DI points BIN_VAL
; (returned value)
CALL BCD_BINARY
; Call the conversion
86
; procedure
; Continue with program
; here
NOP
; PROCEDURE
; INPUT
; OUTPUT
; DESTROYS
Assembly Language
Programming
(Part II)
BCD_BINARY
PROC
NEAR
PUSHF
PUSH
AX
PUSH
BX
PUSH
CX
MOV
AL, [SI]
87
Assembly Language
Programming
PROGRAM 3: Version 3
DATA_SEG
DATA_SEG
STACK_SEG
STACK_SEG
SEGMENT
BCD
BIN
ENDS
DB
DB
25h
?
SEGMENT
STACK
DW
100 DUP(0)
TOP_STACK LABEL WORD
ENDS
CODE_SEG
SEGMENT
ASSUME CS:CODE_SEG, DS:DATA_SEG, SS:STACK_SEG
START:
MOV
AX, DATA
; Initialise data segment
MOV
DS, AX
; using AX register
MOV
AX, STACK-SEG .
; initialise stack segment
MOV
SS, AX
; using AX register
MOV
SP, OFFSET TOP_STACK ; initialise stack pointer
MOV
AL, BCD
; Move BCD value into AL
PUSH
AX
; and push it onto word stack
CALL BCD_BINARY
; Do the conversion
POP
AX
; Get the binary value
MOV
BIN, AL
; and save it
NOP
; Continue with program
; PROCEDURE
: BCD_BINARY Converts BCD numbers to binary.
; INPUT
: None - BCD value assumed to be on stack before call
; OUTPUT
: None - Binary value on top of stack after return
; DESTROYS
: Nothing
BCD_BINARY
PROC
PUSHF
PUSH
PUSH
PUSH
PUSH
MOV
NEAR
; Save flags
; and registers : AX
; BX
; CX
; BP. Why BP?
; Make a copy of the
; stack pointer in BP
MOV
AX, [BP+ 12]
; Get BCD number from
; stack. But why it is on
; BP+12 location? Please note 5 PUSH statements + 1 call which is intra-segment (so
; just IP is stored) so total 6 words are pushed after AX has been pushed and since it is
; a word stack so the BCD value is stored on 6 2 = 12 locations under stack. Hence
; [BP + 12] (refer to the figure given on next page).
MOV
BL, AL
; Save copy of BCD in BL
AND
BL, 0Fh
; mask lower 4 bits
AND
AL, F0H
; Separate upper 4 bits
MOV
CL, 04
; Move upper BCD digit to low
ROR
AL, CL
; position BCD digit for multiply location
MOV
BH, 0Ah
; Load 10 in BH
MUL
BH
; Multiply upper BCD digit in AL by 10
; the result is in AL
ADD
AL, BL
; Add lower BCD digit to result.
MOV
[BP + 12], AX ; Put binary result on stack
; Restore flags and registers
POP
BP
POP
CX
POP
BX
POP
AX
88
AX
BX
CX
BP
BP, SP
Assembly Language
Programming
(Part II)
POPF
RET
BCD_BINARY
CODE_SEG
END
START
ENDP
ENDS
Discussion:
The parameter is pushed on the stack before the procedure call. The procedure call
causes the current instruction pointer to be pushed on to the stack. In the procedure
flags, AX, BX, CX and BP registers are also pushed in that order. Thus, the stack
looks to be:
Before push AX (SP = 0090h)
After push AX (SP = 008Eh)
X
AH
AL
IP HIGH
IP LOW
FLAG H
FLAG L
AH
AL
BH
BL
CH
CL
BP HIGH
BP LOW
:
:
The instruction MOV BP, SP transfers the contents of the SP to the BP register. Now
BP is used to access any location in the stack, by adding appropriate offset to it. For
example, MOV AX, [BP + 12] instruction transfers the word beginning at the 12th
byte from the top of the stack to AX register. It does not change the contents of the BP
register or the top of the stack. It copies the pushed value of AH and AL at offset
008Eh into the AX register. This instruction is not equivalent to POP instruction.
Stacks are useful for writing procedures for multi-user system programs or recurvise
procedures. It is a good practice to make a stack diagram as above while using
procedure call through stacks. This helps in reducing errors in programming.
Assembly Language
Programming
3. STACK: If the segments in different object modules have the same name and the
combine type is STACK, then they become one segment, with the length the sum
of the lengths of individual segments.
These details will be more clear after you go through program 4 and further readings.
Identifiers
a)
b)
Let us explain all the above with the help of the following example:
PROGRAM 4:
Write a procedure that divides a 32-bit number by a 16-bit number. The procedure
should be general, that is, it is defined in one module, and can be called from another
assembly module.
; REGISTERS
:Uses CS, DS, SS, AX, SP, BX, CX
; PROCEDURES
: Far Procedure SMART_DIV
DATA_SEG SEGMENT
WORD
PUBLIC
DIVIDEND
DW
2345h, 89AB ; Dividend =
; 89AB2345H
DIVISOR
DW
5678H
; 16-bit divisor
MESSAGE
DB
INVALID DIVIDE, $
DATA_SEG ENDS
MORE_DATA SEGMENT
QUOTIENT
REMAINDER
MORE_DATA ENDS
WORD
DW
DW
STACK_SEG
SEGMENT
DW
TOP STACK
STACK_SEG
ENDS
STACK
100 DUP(0)
LABEL
WORD
PUBLIC
2 DUP(0)
0
DIVISOR
; SMART_DIVis declared as an
; external label in procedure
; segment of type FAR
PROCEDURES ENDS
; declare the code segment as PUBLIC so that it can be merged with other PUBLIC
; segments
CODE_SEG SEGMENT
WORD
PUBLIC
ASSUME CS:CODE, DS:DATA_SEG, SS:STACK SEG
START: MOV AX, DATA_SEG
; Initialize data segment
MOV DS, AX
; using AX register
MOV AX, STACK_SEG
; Initialize stack segment
90
MOV
MOV
MOV
SS, AX
SP, OFFSET TOP_STACK
AX, DIVIDEND
MOV
DX DIVIDEND + 2
; using AX register
; Initialize stack pointer
; Load low word of
; dividend
; Load high word of
; dividend
; Load divisor
Assembly Language
Programming
(Part II)
MODULE
PROCEDURES
; INPUT
; OUTPUT
Assembly Language
Programming
; in PROCEDURES SEGMENT.
PROCEDURES
SEGMENT PUBLIC
SMART_DIV
PROC
FAR
ASSUME CS:PROCEDURES, DS:DATA_SEG
CMP
DIVISOR, 0
; This is just to demonstrate the use of
; external variable, otherwise we can
; check it through CX register which
; contains the divisor.
JE
ERROR_EXIT
; IF divisor = 0, exit procedure
MOV
BX, AX
; Save low order of dividend
MOV
AX, DX
; Position high word for lst divide
MOV
DX, 0000h
; Zero DX
DIV
CX
; DX:AX/CX, quotient in AX,
; remainder in DX
MOV
BP, AX
; transfer high order of final result to BP
MOV
AX, BX
; Get back low order of dividend. Note
; DX contains remainder so DX : AX is
; the actual number
DIV
CX
; DX:AX/CX, quotient in AX,
; 2nd remainder that is final remainder
; in DX
MOV
CX, DX
; Pass final remainder in CX
MOV
DX, BP
; Pass high order of quotient in DX
; AX contains lower word of quotient
CLC
; Clear carry to indicate valid result
JMP
EXIT
; Finished
ERROR_EXIT: STC
; Set carry to indicate divide by zero
EXIT: RET
SMART_DIV
ENDP
PROCEDURES
ENDS
END
Discussion:
The procedure accesses the data item named DIVISOR, which is defined in the main,
therefore the statement EXTRN DIVISOR:WORD is necessary for informing
assembler that this data name is found in some other segment. The data type is defined
to be of word type. Please not that the DIVISOR is enclosed in the same segment
name as that of main that is DATA_SEG and the procedure is in a PUBLIC segment.
(a)
(b)
A FAR call uses one word in the stack for storing the return address.
(c)
(d)
(e)
(f)
92
(f)
(h)
(i)
(i)
2.
4.4
Assembly Language
Programming
(Part II)
By now you can write procedures, both external and internal, and pass parameters,
especially through stack, let us use these concepts, to see how assembly language can
be interfaced to some high level language programs. It is very important to learn this
concept, because then you can combine the advantages of both the types of languages,
that is, the ease of programming of high level languages and the speed and the scope
of assembly language. Assembly language can be interfaced with most of the high
level languages like C, C + + and database management systems.
What are the main considerations for interfacing assembly to HLL? To answer that we
need to answer the following questions:
The answer to the above questions are dependent on the high level language (HLL).
Let us take C Language as the language for interfacing. The C Language is very
useful for writing user interface programs, but the code produced by a C compiler
does not execute fast enough for telecommunications or graphics applications.
Therefore, system programs are often written with a combination of C and assembly
language functions. The main user interface may be written in C and specialized high
speed functions written in assembly language.
The guidelines for calling assembly subroutines from C are:
(i)
(ii)
Memory model: The calling program and called assembly programs must be
defined with the same memory model. One of the most common convention
that makes NEAR calls is .MODEL SMALL, C
The naming convention normally involve an underscore (_) character preceding
the segment or function name. Please note, however, this underscore is not used
while making a call from C function. Please be careful about case sensitivity.
93
Assembly Language
Programming
You must give a specific segment name to the code segment of your assembly
language subroutine. The name varies from compiler to compiler. Microsoft C,
and Turbo C require the code segment name to be_TEXT or a segment name
with suffix_TEXT. Also, it requires the segment name _DATA for the data
segment.
(iii) The arguments from C to the assembly language are passed through the stack.
For example, a function call in C:
function_name (arg1, arg2, ..., argn) ;
would push the value of each argument on the stack in the reverse order. That
is, the argument argn is pushed first and arg1 is pushed last on the stack. A
value or a pointer to a variable can also be passed on the stack. Since the stack
in 8086 is a word stack, therefore, values and pointers are stored as words on
stack or multiples of the word size in case the value exceeds 16 bits.
(iv)
You should remember to save any special purpose registers (such as CS, DS,
SS, ES, BP, SI or DI) that may be modified by the assembly language routine. If
you fail to save them, then you may have undesirable/ unexplainable
consequences, when control is returned to the C program. However, there is no
need to save AX, BX, CX or DX registers as they are considered volatile.
(v)
(vi)
Byte (DB)
Word (DW)
Double Word (DD)
Returned value: The called assembly routine uses the followed registers for
returned values:
char
Near/ int
Far/ long
AL
AX
DX : AX
Assembly Language
Programming
(Part II)
PROGRAM 5:
Write an assembly function that hides the cursor. Call it from a C program.
. PUBLIC CUROFF
. MODEL small,C
. CODE
CUROFF PROC
MOV
AH,3
; get the current cursor position
XOR
BX,BX
; empty BX register
INT
10h
; use int 10hto do above
OR
CH,20h
; force to OFF condition
MOV
AH,01
; set the new cursor values
INT
10h
RET
CUROFF ENDP
END
For details on various interrupt functions used in this program refer to further
readings.
The C program to test this routine is as follows:
# include < stdio.h
void curoff(void);
void main()
{
printf("%s\n, "The cursor is now turning off);
curoff();
}
You can write another procedure in assembly language program to put the cursor on.
This can be done by replacing OR CH,20h instruction by AND CH,1Fh. You can call
this new function from C program to put the cursor on after the curoff.
Assembly Language
Programming
PROGRAM 6:
Write a subroutine in C for toggling the cursor using old directives.
;
; use small memory model for C near code segment
_DATA
SEGMENT WORD
DATA
CURVAL EQU
[BP+4]
; parameters
_DATA
ENDS
_TEXT
DGROUP
_CURSW
SEGMENT
GROUP
ASSUME
PUBLIC
PROC
PUSH
MOV
MOV
CMP
JZ
CMP
JZ
JMP
BYTE PUBLIC
CODE
_DATA
CS:_TEXT,
DS:DGROUP, SS:DGROUP
_CURSW
NEAR
BP
; BP register of caller is saved
BP, SP
; BP is pointing to stack now
AX, CURVAL
AX, 0H
CUROFF
; Execute code for cursor off
AX, 01H
CURON
; Execute code for cursor on
OVER
; Error in parameter, do nothing
; write code for curoff
JMP
OVER
CUROFF:
:
:
CURON:
:
:
OVER:
_CURSW
_TEXT
Why the parameter is found in [BP+4]? Please look into the following stack for the
answer.
Parameter (0 or 1)
Return Address
Old value
BP + 4
BP + 2
BP + 0
PROGRAM 7:
Write a subroutine in C that toggles the cursor. It takes one argument that toggles the
value between on (1) and off (0) using simplified directives:
PUBLIC CURSW
.MODEL small, C
.CODE
CURSW PROC switch:word
MOV
XOR
AX,SWITCH
; get flag value
AX,AX
; test zero / nonzero
:
:
// routine to test the switch and accordingly
96
Assembly Language
Programming
(Part II)
4.5 INTERRUPTS
Interrupts are signals that cause the central processing unit to suspend the currently
executing program and transfer to a special program called an interrupt handler. The
interrupt handler determines the cause of the interrupt, services the interrupt, and
finally returns the control to the point of interrupt. Interrupts are caused by events
external or internal to the CPU that require immediate attention. Some external events
that cause interrupts are:
- Completion of an I/O process
- Detection of a hardware failure
An 8086 interrupt can occur because of the following reasons:
1.
2.
3.
Save the system context (registers, flags etc. that will be modified by the ISR).
Disable the interrupts that may cause interference if allowed to occur during this
ISR's processing
Enable those interrupts that may still be allowed to occur during this ISR
processing.
Determine the cause of the interrupt.
Take the appropriate action for the interrupt such as receive and store data
from the serial port, set a flag to indicate the completion of the disk sector
transfer, etc.
Restore the system context.
Re-enable any interrupt levels that were blocked during this ISR execution.
Resume the execution of the process that was interrupted on occurrence of the
interrupt.
MS-DOS provides you facilities that enable you to install well-behaved interrupt
handlers such that they will not interfere with the operating system function or other
interrupt handlers. These functions are:
Function
Int 21h function 25h
Int 21h function 35h
Int 21h function 31h
Action
Set interrupt vector
Get interrupt vector
Terminate and stay residents
97
Assembly Language
Programming
Here are a few rules that must be kept in mind while writing down your own Interrupt
Service Routines:
1.
2.
3.
4.
Use Int 21h, function 35h to get the required IVT entry from the IVT. Save this
entry, for later use.
Use Int 21h, function 25h to modify the IVT.
If your program is not going to stay resident, save the contents of the IVT, and
later restore them when your program exits.
If your program is going to stay resident, use one of the terminate and stay
resident functions, to reserve proper amount of memory for your handler.
Let us now write an interrupt routine to handle division by zero. This file can be
loaded like a COM file, but makes itself permanently resident, till the system is
running.
This ISR is divided into two major sections: the initialisation and the interrupt
handler. The initialisation procedure (INIT) is executed only once, when the program
is executed from the DOS level. INIT takes over the type zero interrupt vector, it also
prints a sign-on message, and then performs a terminate and stay resident exit to
MS-DOS. This special exit reserves the memory occupied by the program, so that it is
not overwritten by subsequent application programs. The interrupt handler (ZDIV)
receives control when a divide-by-zero interrupt occurs.
CR
LF
BEEP
BACKSP
EQU
EQU
EQU
EQU
ODH
0Ah
07h
08h
INIT
MOV
MOV
INT
ENDP
AX, 2500h
21h
AH,09
21h
98
AX
BX
CX
DX
SI
DI
BP
DS
ES
MOV
MOV
MOV
MOV
INT
ZDIV1: MOV
INT
CMP
JE
CMP
JE
MOV
MOV
INT
JMP
AX,CS
DS,AX
DX,OFFSET WARN
AH, 9
21h
AH,1
21h
AL, 'C'
ZDIV3
AL, 'Q'
ZDIV2
DX, OFFSET BAD
AH,9
21h
ZDIV1
ZDIV2: MOV
INT
AX, 4CFFh
21h
ZDIV3: MOV
MOV
INT
POP
POP
POP
POP
POP
POP
POP
POP
POP
IRET
ZDIV ENDP
DX,OFFSET CRLF
AH,9
21h
ES
DS
BP
DI
SI
DX
CX
BX
AX
SIGNON
DB
DB
DB
DB
DB
DB
DB
DB
EQU $-INIT
WARN
BAD
CRLF
PGM_LEN
CSEG ENDS
Assembly Language
Programming
(Part II)
END
The leader
The strategy procedure
99
Assembly Language
Programming
3)
The driver has either .sys or .exe extension and is originated at offset address 0000h.
The Header
The header contains information that allows DOS to identify the driver. It also
contains pointers that allow it to chain to other drivers loaded into the system.
The header section of a device driver is 18 bytes in length and contains pointers and
the name of the driver.
Following structure of the header:
CHAIN DD -1
: link to next driver
ATTR DW 0
: driver attribute
STRT DW START
: address of strategy
INTER DW INT
: address if interrupt
DNAME DB MYDRIVER : driver name.
The first double word contains a 1 that informs DOS this is the last driver in the
chain. If additional drivers are added DOS inserts a chain address in this double word
as the segment and offset address. The chain address points to the next driver in the
chain. This allows additional drivers installed at any time.
The attribute word indicates the type of headers included for the driver and the type of
device the driver installs. It also indicates whether the driver control a character driver
or a block device.
The Strategy Procedure
The strategy procedure is called when loaded into memory by DOS or whenever the
controlled device request service. The main purpose of the strategy is to save the
request header and its address for use by the interrupt procedure.
The request header is used by DOS to communicate commands and other
informations to the interrupt procedure in the device driver
The request header contains the length of the request header as its first byte. This is
necessary because the length of the request header varies from command to command.
The return status word communicate information back to DOS from the device driver.
The initialise driver command (00H) is always executed when DOS initialises the
device driver. The initialisation commands pass message to the video display
indicating that the driver is loaded into the system and returns to DOS the amount of
memory needed by the driver. You may only use DOS INT 21H functions 00H. You
can get more details on strategy from the further readings.
The Interrupt Procedure
The interrupt procedure uses the request header to determine the function requested by
DOS. It also performs all functions for the device driver. The interrupt procedures
must respond to at least the initialised driver command (00H) and any other
commands required to control the device operated by the device driver. You must
refer to the further readings for more details and examples of device drivers.
100
Assembly Language
Programming
(Part II)
2.
(a)
(b)
(c)
(d)
(e)
4.7 SUMMARY
In the above module, we studied some programming techniques, starting from arrays,
to interrupts.
Arrays can be of byte type or word type, but the addressing of the arrays is always
done with respect to bytes. For a word array, the address will be incremented by two
for the next access.
As the programs become larger and larger, it becomes necessary to divide them into
smaller modules called procedures. The procedures can be NEAR or FAR depending
upon where they are being defined and from where they are being called. The
parameters to the procedures can be passed through registers, or through memory or
stack. Passing parameters in registers is easier, but limits the total number of variables
that can be passed. In memory locations it is straight forward, but limits the use of the
procedure. Passing parameters through stack is most complex out of all, but is a
standard way to do it. Even when the assembly language programs are interfaced to
high level languages, the parameters are passed on stack.
Interrupt Service Routines are used to service the interrupts that could have arisen
because of some exceptional condition. The interrupt service routines can be
modified- by rewriting them, and overwriting their entry in the interrupt vector table.
This completes the discussion on microprocessors and assembly language
programming. The above programming was done for 8086 microprocessor, but can be
tried on 80286 or 80386 processors also, with some modification in the assembler
directives. The assembler used here is MASM, Microsoft assembler. The assembly
language instructions remain the same for all assemblers, though the directives vary
from one assembler to another. For further details on the assembler, you can refer to
their respective manuals. You must refer to further readings for topics such as
Interrupts, device drivers, procedures etc.
101
Assembly Language
Programming
4.8
SOLUTIONS/ ANSWERS
We will give you an algorithm using XLAT instruction. Please code and run the
program yourself.
2.
(a) True (b) False (c) False (d) True (e) False (f) True (g) True (h) False
(i) False (j) True.
2.
SP
SP
.
00
50
.
00
50
30
00
50
55
Low address
Original
(c)
after (a)
The return for FIRST can occur only after return of SECOND. Therefore, the
stack will be back in original state.
(a) False (b) False (c) True (d) True (e) False
2.
102
after (b)