Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Coa Unit 1 Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 104

UNIT 1

BASIC STRUCTURE OF COMPUTERS

INTRODUCTION
COMPUTER
A Computer is a machine which accepts input information in the digitized form, processes the
input according to a set of stored instructions and produces the resulting output information.
PROGRAM AND DATA
The set of stored instructions written using a computer to solve the task is called program and
input and output information is called data.
The internal storage where programs are stored is called Memory.
Characteristics of computer

1 Speed: Computers perform various operations at a very high speed.


2 Accuracy: Computers are very accurate. Do not make mistakes in calculations.
3 Reliability: Computers gives correct and consistent results always even if they are used in adverse
conditions. Many times errors are caused by human interventions not by computer. Computer output is
reliable, subject to the condition that the input data and the instructions (programs) are correct. Incorrect
input data and unreliable programs give us wrong results.
4 Storage Capacity: The computer can store large amount of data and can be retrieved at any time in
fractions of a second. This data can be stored in permanent storage devices like hard disk, CDs etc.
1
5 Versatility: Computers can do a variety of jobs based on the instructions given to them. They are
used in each and every field, making the tasks easier.
Limitations of a Computer:-
1) Not intelligent
2) Inactive

Computer = Hardware + Software

Hardware:
• Hardware is the physical aspect of computers, telecommunications, and other device.
• Hardware implies permanence and invariability
• The components include keyboard, floppy drive, hard disk, monitor, CPU, printer, wires,
transistors, circuits etc.

Software:
It is a set of programs used to perform certain tasks.

Program is set of instructions to carry out a particular task

Hardware and Software

Hardware Software
The physical components making up the Software is a set of programs used to
system are termed as Hardware. perform certain tasks(logical
component)
The components include keyboard, Software’s include compliers,
floppy drive, hard disk, monitor, CPU, loaders, Banking s/w, library s/w,

2
printer, wires, transistors, circuits etc. payroll s/w etc.

Hardware works based on instructions Software tell the hardware what to


do

TYPES OF COMPUTER
Computers are classified according to size, cost, power of the processor, and type of usage.
Some of types of computer are
¾ Personal computers(PC)
Widely used in homes, schools, and business offices.
¾ Notebook computers
It’s a compact version of the PC with all the components are packed together into a
single unit.
¾ Workstations
It has a High resolution –Graphics input / output capabilities.
¾ Desktop Computers
They have processing and storage units, visual display and audio output displays.
¾ Enterprise systems or mainframes and Servers
Mainframes are used for business data processing in medium and large corporations that
require more computing power and storage capacity.
Servers:
Servers contain sizable database storage units and are capable of handling large volumes
of request to access the data.
3
The request and responses are transported over internet communication facilities.
¾ Supercomputers
They are used for large – scale numerical calculations required in applications such as
weather forecasting and aircraft design and simulation.

I. FUNCTIONAL UNITS
A computer consists of five functionally independent main parts. They are,
¾ Input
¾ Memory
¾ Arithmetic and logic
¾ Output
¾ Control unit
Basic functional units of a computer

4
CU

ALU

O/P UNIT
I/P UNIT

MAIN
MEMORY
UNIT

SECONDARY

Figure: The operation of a computer can be summarized as follows


9 The computer accepts programs and the data through an input and stores them in the memory.
9 The stored data are processed by the arithmetic and logic unit under program control.
9 The processed data is delivered through the output unit.
9 All above activities are directed by control unit.
9 The information is stored either in the computer’s memory for later use or
immediately used by ALU to perform the desired operations.
9 Instructions are explicit commands that
ƒ Manage the transfer of information within a computer as well as between the computer
and its I/O devices.

5
ƒ Specify the arithmetic and logic operations to be performed.
9 To execute a program, the processor fetches the instructions one after another, and performs the
desired operations.
9 The processor accepts only the machine language program.
9 To get the machine language program, Complier is used.
Note: Compiler is software (Translator) which converts the High Level Language program (source
program) into Machine language program (object program)
Input unit
¾ The computer accepts coded information through input unit. The input can be from human
operators, electromechanical devices such as keyboards or from other computer over
communication lines.
¾ Examples of input devices are
9 Keyboard, joysticks, trackballs and mouse are used as graphic input devices in
conjunction with display.
9 Microphones can be used to capture audio input which is then sampled and converted
into digital code for storage and processing.
Keyboard
• It is a common input device.
• Whenever a key is pressed, the corresponding letter or digit is automatically translated into its
corresponding binary code and transmitted over cable to the memory of the computer.
Memory unit
9 Memory unit is used to store programs as well as data.

6
9 Memory is classified into primary and secondary storage.
Primary storage
¾ It also called main memory.
¾ It operates at high speed and it is expensive.
¾ It is made up of large number of semiconductor storage cells, each capable of storing one bit of
information.
¾ These cells are grouped together in a fixed size called word. This facilitates reading and writing
the content of one word (n bits) in single basic operation instead of reading and writing one bit
for each operation
¾ Each word is associated with a distinct address that identifies word location. A given word is
accessed by specifying its address.
¾ Word length
9 The number of bits in each word is called word length of the computer.
9 Typical word lengths range from 16 to 64bits.
¾ Programs must reside in the primary memory during execution.
¾ RAM
9 It stands for Random Access Memory. Memory in which any location can be reached in
a short and fixed amount of time by specifying its address is called random-access
memory.
9 Memory access time
• Time required to access one word is called Memory access time.
• This time is fixed and independent of the word being accessed.

7
• It typically ranges from few nano seconds (ns) to about 100ns.
¾ Caches
9 They are small and fast RAM units.
9 They are tightly coupled with the processor.
9 They are often contained on the same integrated circuits(IC) chip to achieve high
performance.
Secondary storage
¾ It is slow in speed.
¾ It is cheaper than primary memory.
¾ Its capacity is high.
¾ It is used to store information that is not accessed frequently.
¾ Various secondary devices are magnetic tapes and disks, optical disks (CD-ROMs), floppy etc.
Arithmetic and logic unit
Arithmetic and logic unit (ALU) and control unit together form a processor.
Actual execution of most computer operations takes place in arithmetic and logic unit of the processor.
Example:
Suppose two numbers located in the memory are to be added. They are brought into the
processor, and the actual addition is carried out by the ALU.
Registers:
¾ Registers are high speed storage elements available in the processor.
¾ Each register can store one word of data.
¾ When operands are brought into the processor for any operation, they are stored in the registers.
¾ Accessing data from register is faster than that of the memory.
8
Output unit
¾ The function of output unit is to produce processed result to the outside world in human
understandable form.
¾ Examples of output devices are Graphical display, Printers such as inkjet, laser, dot matrix and
so on. The laser printer works faster.
Control unit
¾ Control unit coordinates the operation of memory, arithmetic and logic unit, input unit, and
output unit in some proper way. Control unit sends control signals to other units and senses their
states.
¾ Example:
9 Data transfers between the processor and the memory are controlled by the control unit
through timing signals.
9 Timing signals are the signals that determine when a given action is to take place.
9 Control units are well defined, physically separate unit that interact with other parts of
the machine.
9 A set of control lines carries the signals used for timing and synchronization of events in
all units
Differences between:

Primary Memory Secondary Memory


Also called as Main memory. Also called as Auxiliary memory.
Accessing the data is faster. Accessing the data is slow.
CPU can access directly CPU cannot access directly

9
Semiconductor memory. Magnetic memory.
Data storage capacity is less. Data storage capacity is more or huge.
Expensive. Not expensive.
It is Internal memory. It is External memory.
Examples : RAM, ROM Examples: hard disk, floppy disk, magnetic
tape etc.

RAM ROM
Random Access Memory. Read Only Memory.
Volatile memory. Non-volatile memory.
The contents of the RAM are lost when The contents of the ROM are not lost when
power is turned off. power is turned off.
Temporary storage medium. Permanent storage medium.
The data can be read and written. The data can only be read, but the data
cannot be written.
The programs are brought into RAM just BIOS and monitor programs are stored.
before execution.

10
Categories of Software:

System software Application software


Collection of programs written by expert Collection of programs written by users
programmers/manufacturers. (programmers).
System software can be used to control the Application software is written to perform
computer system. particular task.
System software helps in executing other Application software are not used for
programs. executing other programs.
Examples include compilers, loaders Examples include Banking s/w, library s/w,
.Operating System etc. payroll s/w etc

II.BASIC OPERATIONAL CONCEPTS

To perform a given task on computer, an appropriate program is to be stored in the memory.


Individual instructions are brought from the memory into the processor, which executes the specified
operations. Data to be used as operands are also stored in the memory.
Consider an instruction
Add LOCA, R0
9 This instruction adds operand at memory location LOCA to the operand in a register R0 in the
processor and the result get stored in the register R0.
11
9 The original content of LOCA is preserved, whereas the content of R0 is overwritten.
This instruction requires the following steps
1) The instruction is fetched from memory into the processor.
2) The operand at LOCA is fetched and added to the content of R0.
3) Resulting sum is stored in register R0.
The above add instruction combines a memory access operation with an ALU operation.
Same can be performed using two instruction sequences
Load LOCA, R1
ADD R1, R0
• Here the first instruction, transfer the contents of memory location LOCA into register R1.
• The second instruction adds the contents of R1 and R0 and places the sum into R0.
• The first instruction destroys the content of R1 and preserve the value of LOCA, the second
instruction destroys the content of R0.
Connection between memory and the processor
Transfer between the memory and the processor are started by sending the address of the
memory to be accessed to the memory unit and issuing the appropriate control signals. The data are
then transferred to or from the memory.

12
The below figure shows how the memory and the processor can be connected

Memory

MAR MDR Control P


r
Ro o
PC R1 c
e
. ALU s
IR . s
Rn-1 o
r
N general purpose
registers

Processor contains number of registers in addition to the ALU and the Control unit for different
purposes. Various registers are
¾ Instruction register(IR)
¾ Program counter(PC)
¾ Memory address register(MAR)
¾ Memory data register(MDR)
¾ General purpose registers (R0 to Rn-1 )

13
Instruction register (IR): IR holds the instruction that is currently being executed by the processor. Its
output is available to the control circuits, which generates the timing signals that controls various
processing elements involved in executing the instruction.
Program counter (PC): It is a special purpose register that contains the address of the next instruction
to be fetched and executed.
During the execution of one instruction PC is updated to point the address of the next instruction
to be fetched and executed. It keeps track of the execution of a program
Memory address register (MAR): The MAR holds the address of the memory location to be
accessed.
Memory data register(MDR): The MDR contains the data to be written into or read from the memory
location, that is being pointed by MAR.These two registers MAR and MDR facilitates communication
between memory and the processor.
Operating steps
¾ Initially program resides in the memory (usually get through the input Unit.) and PC is set to
point to the first instruction of the program.
¾ The contents of PC are transferred to MAR and Read control signal is sent to the memory. The
addressed word (in this case the first instruction of the program) is read out of the memory and
located into the MDR .register.
¾ Next, the contents of MDR are transferred to the IR. At this point the instruction is ready to be
decoded and executed.
¾ If the instruction involves an operation to be performed by the ALU, it is necessary to obtain the
required operands. If the operands resides in the memory ( it could also be in a general purpose
register in the processor), Then the operands required are fetched from the memory to the MDR
14
by sending its address to the MAR and initiating a read cycle. The fetched operands are then
transferred to the ALU. After one or more operands are fetched dint his way the ALU can
perform the desired operation..
¾ If the result of this operation is to be stored in the memory, then the result is sent to MDR and
address of the memory location where the result is to be stored is sent to MAR and write cycle is
initiated.
¾ During the execution of current instruction the contents of the PC are incremented to point to
next instruction to be executed. Thus as soon as the execution of the current instruction is
completed, a new instruction fetch may be started.
¾ Note: in addition to transferring data between the memory and the processor, the computer
accepts data from input devices and sends data to the output devices. Thus some machine
instructions with ability to handle IO transfers are provided.

Interruption
Normal execution of the program may be interrupted if some other device requires urgent
service of the processor. For example, a monitoring device in a computer controlled industrial process
may detect a dangerous condition. In order to deal with that situation immediately, the normal execution
of the current program must be interrupted. To do this the device raises an interrupt signal. An interrupt
is the request from an I/O device for the service by the processor. The processor provides the requested
service by executing an appropriate interrupt service routine.

15
When the interrupt service routine is completed, the execution of the interrupted program is
continued by the processor. Because of such changes, it may alter the internal state of the processor. So
its state must be saved in memory location before servicing the interrupt. Normally PC will be used.

III. BUS STRUCTURES


¾ Bus is a group of lines that serves as a connection path for several individual parts of a
computer to transfer the data between them.
To achieve a reasonable speed of the operation, a computer must be organized so that,
¾ All its units can handle one full word of data at a given time.
¾ When a word of data is transferred in a bus, all its bits are transferred in parallel, that is, the
bits are transferred simultaneously over many wires, or lines, one bit per

Input Output Memory Processor

line. Figure: Single Bus Structure

A group of lines serves as a connecting path of several devices is called a BUS. The bus must
have a separate line for carrying data, address and control signals. Single bus is used to interconnect all
the units as shown above and hence the bus can be used for only one transfer at a time, only two units
can actively use the bus at any given time.

16
Advantage of using single bus structure is its low cost and its flexibility for attaching
peripheral devices.
Multiple Bus structure
System using multiple buses results in concurrency as it allows two or more transfer at the same
time. This leads to high performance but at increased cost.

The use of Buffer registers


Speed of operation of various devices connected to a common bus varies. Input and output
devices such as keyboard and printers are relatively slow compared to processor and storage devices
such as optical disks.
Consider an example the transfer of encoded character from a processor to a character printer.
The efficiency of using processor will be reduced because of variance in their speed.
To overcome this problem a buffer register is used.
Buffer registers
Buffer register is an electronic register that is included with the devices to hold the information
during transfer.
When the processor sends a set of characters to a printer, those contents is transferred to the
printer buffer (buffer register for a printer). Once printer buffer is loaded processor and the bus is no
longer needed and the processor can be released for other activity.
Purpose of Buffer Register:
9 Buffer register prevent a high speed processor from being locked to a slow I/O devices.
9 Buffer register is used which smooth out timing differences among slow and the fast
devices.
17
9 It allows the processor to switch rapidly from one device to another.

IV. PERFORMANCE AND METRICS

Performance of a computer can be measured by speed with which it can execute the program.
Speed of the computer is affected by
¾ Hardware design
¾ Machine language instruction of the computer. Because the programs are usually written in high
level language.
¾ Compiler, which translates high-level language into machine language.
For best performance, it is necessary to design a complier, machine instruction set, and the hardware
in a coordinated way.
Consider a Time line diagram to describe how the operating system overlaps processing, disk
transfers, and printing for several programs to make the best possible use of the resources available. The
total time required to execute the program is t5 - t0. This is called elapsed time and it is the measure of
the performance of the entire computer system.
It is affected by the speed of the processor, the disk and the printer. To discuss the performance of the
processor we should only the periods during which the processor is active.

18
User program and OS routine sharing of the processor

Pointer

Disk

OS
routines

Program

t0 t1 t2 t3 t4 t4
Time

Elapsed time for the execution of the program depends on hardware involved in the execution of the
program. This hardware includes processor and the memory which are usually connected by a BUS (As
shown in the bus structure diagram.).

Main Cache
memory memory Processor

19
Bus

The Processor Cache

When the execution of the program starts, all program instructions and the required data are stored
in the main memory. As execution proceeds, instructions are fetched from the main memory one by one
by the processor, and a copy is placed in the cache. When execution of the instruction calls for the data
located in the main memory, the data are fetched and a copy is placed in the cache. If the same
instruction or data is needed later, it is read directly from the cache. The processor and a small cache
memory are fabricated into a single IC chip. The speed of such chip is relatively faster than the speed at
which instruction and data can be fetched from the main memory. A program can be executed faster if
the movement of the instructions and data between the main memory and the processor is minimized,
which is achieved by using the cache.
To evaluate the performance, we can discuss about,
9 Processor clock
9 Basic performance equation
9 Pipelining and Superscalar operation
9 Clock Rate
9 Instruction Set: CISC and RISC
9 Compiler
9 Performance Measurement
Processor clock

20
Processor circuits are controlled by timing signal called a clock. The clock defines regular time
intervals, called clock cycle. To execute a machine instruction, the processor divides the action to be
performed into sequence of basic steps, such that each step can be completed in one clock cycle.
Length of one clock cycle is P and this parameter P affects processor performance. It is
inversely proportional to clock rate
R=1/P
This is measured in cycles per second.
Processors used in today’s personal computers and workstations have clock rates from a few hundred
millions to over a billion cycles per second is called hertz (Hz). The term “million” is denoted by the
prefix Mega (M) and “billion” is denoted by prefix Giga ( G). Hence, 500 million cycles per second is
usually abbreviated to 500Mega Hertz (MHz). And 1250 million cycles per second is abbreviated to
1.25 Giga Hertz (GHz). The corresponding clock periods are 2 and 0.8 nano seconds (ns) respectively.

Basic performance equation


Let T be the time required for the processor to execute a program in high level language. The
compiler generates machine language object program corresponding to the source program.
Assume that complete execution of the program requires the execution of N machine language
instructions.
Assume that average number of basic steps needed to execute one machine instruction is S,
where each basic step is completed in one clock cycle.
If the clock rate is R cycles per second, the program execution time is given by
T = (N x S) / R
This is often called Basic performance equation.
21
To achieve high performance, the performance parameter T should be reduced. T value can be
reduced by reducing N and S, and increasing R.
¾ Value of N is reduced if the source program is compiled into fewer number of machine
instructions.
¾ Value of S is reduced if instruction has a smaller no of basic steps to perform or if the execution
of the instructions is overlapped.
¾ Value of R can be increased by using high frequency clock, ie. Time required to complete a
basic execution step is reduced.
¾ N, S and R are dependent factors. Changing one may affect another.

Pipelining and Superscalar operation


Pipelining
It is a technique of overlapping the execution of successive instructions. This technique
improves performance.
Consider the instruction
Add R1, R2, R3
The above instruction adds the contents of registers R1 and R2, and places the sum to R3. The
contents of R1 and R2 are first transferred to the inputs of the ALU. After addition is performed the
result is transferred to register R3 from the processor.
Here processor can read the next instruction to be executed while performing addition operation
of the current instruction and while transferring the result of addition to ALU, the operands required for
the next instruction can be transferred to the processor. This process of overlapping the instruction
execution is called Pipelining.
22
9 If all the instructions are overlapped to the maximum degree, the effective value of S is 1. It is
impossible always.
9 Individual instructions require several clock cycles to complete but for the pupose of computing
T, effective value of S is 1.
Superscalar operation
A higher degree of concurrency can be achieved if multiple instruction pipelines are
implemented in the processor. This means that multiple functional units are used, creating parallel paths
through which different instruction can be executed in parallel. With such an arrangement, it becomes
possible to start the execution of several instructions in every clock cycle. This mode of operation is
called superscalar execution. So there is possibility of reducing the S value even less than 1.
Parallel execution should preserve the logical correctness of programs. That is the result
produced must be same as those produced by serial execution of program executions.
Clock Rate
There are two possibilities for increasing the clock rate, R.
¾ First, improving the integrated-circuit (IC) technology makes logic circuits faster, which reduces
the time needed to complete a basic step. This allows the clock period, P, to be reduced and the
clock rate, R, to be increased.
¾ Second, reducing the amount of processing done in one basic step also makes it possible to
reduce the clock period, P. However, if the actions that have to be performed by an instruction
remain the same, the number of basic steps needed may increase.
Increases in the value of R by improvements in IC technology affect all aspects of the processor's
operation equally with the exception of the time it takes to access the main memory. In the presence of
a cache, the percentage of accesses to the main memory is small. Hence, much of the performance can
23
be improved.
The value of T will be reduced by the same factor as R is increased because S and N are not
affected.
Instruction Set: CISC and RISC
CISC: Complex Instructional Set Computers
RISC: Reduced Instructional Set Computers
¾ Simple instructions require a small number of basic steps to execute.
¾ Complex instructions involve a large number of steps.
¾ For a processor that has only simple instructions, a large number of instructions may be needed to
perform a given programming task. This could lead to a large value for N and a small value for
S.
¾ On the other hand, if individual instructions perform more complex operations, fewer instructions
will be needed, leading to a lower value of N and a larger value of S. It is not obvious if one
choice is better than the other.
¾ Processors with simple instructions are called Reduced Instruction Set Computers (RISC) and
processors with more complex instructions are referred to as Complex Instruction Set Computers
(CISC)
¾ The decision for choosing the instruction set is done with the use of pipelining. Because the
effective value of S is close 1.
Compiler
A compiler translates a high-level language program into a sequence of machine instructions. To
reduce N, we need to have a suitable machine instruction set and a compiler that makes good use of it.

24
An optimizing compiler takes advantage of various features of the target processor to reduce
the product N x S, which is the total number of clock cycles needed to execute a program. The number
of cycles is dependent not only on the choice of instructions, but also on the order in which they appear
in the program. The compiler may rearrange program instructions to achieve better performance
without changing the logic of the program.
Complier and processor must be closely linked in their architecture. They should be designed at the
same time.
Performance Measurement
The computer community adopted the idea of measuring computer performance using
benchmark programs. To make comparisons possible, standardized programs must be used. The
performance measure is the time it takes a computer to execute a given benchmark program.
A nonprofit organization called System Performance Evaluation Corporation (SPEC).
Running time on the reference computer
SPEC rating = ------------------------------------------------
Running time on the computer under test

The test is repeated for all the programs in the SPEC suite, and the geometric means of the
results are computed. Let SPECi be the rating for program i in the suite. The overall SPEC rating for the
computer is given by
1/ n
⎛ n

SPEC rating =
⎜⎜ ∏ SPECi ⎟⎟
⎝ i =1 ⎠
where n is the number of programs in the suite.

25
V. INSTRUCTION AND INSTRUCTION SEQUENCING
A computer must have instruction capable of performing four types of basic operations such as
¾ Data transfer between the memory and the processor registers.
¾ Arithmetic and logic operation on data
¾ Program sequencing and control
¾ I/O transfers
To understand the first two types of instruction, we need to know some notations..

Register Transfer Notation (RTN)


Data transfer can be represented by standard notations given below..Processor registers are
represented by notations R0, R1, R2…Address of the memory locations are represented by names such
as LOC, PLACE, MEM etc..I/O registers are represented by names such as DATAIN, DATAOUT. The
content of memory locations are denoted by placing square bracket around the name of the register.
Example 1 : R1 Å [ LOC ]
This expression states that the contents of memory location LOC are transferred into the
processor register R1.
Example 2: R3 Å [ R1 ] + [ R2 ]

26
This expression states that the contents of processor registers R1 and R2 are added and the result
is stored into the processor register R3.
This type of notation is known as Register Transfer Notation (RTN).
Note: that the right-hand of an RTN expression always denotes a value, and left-hand side is name of a
location where the value is to be placed, overwriting the old contents of that location.
Assembly Language Notation
To represent machine instructions, assembly language uses statements as shown below
¾ To transfer the data from memory location LOC to processor register R1
Move LOC,R1
¾ To add two numbers in register R1 and R2 and to place their sum in register R3
ADD R1,R2,R3
_____________________________________________________________
BASIC INSTRUCTION TYPES
The operation of addition of two numbers is a fundamental capability in any computer. The
statement
C= A + B
in a high-level language program is a command to the computer to add the current values of the two
variables called A and B, and to assign the sum to a third variable, C.
When the program containing this statement is compiled, the three variables, A,B,C are assigned
to distinct location in the memory.

Hence the above high-level language statement requires the action

27
C Å [A] + [B]
to take place in the computer. Here [A] and [B] represents contents of A and B respectively.

To carry out this action, the contents of memory locations A and B are fetched from the memory and
transferred into the processor where their sum is computed. This result is then sent back to the memory
and stored in location C.
Performing a basic instruction is represented in many ways:
They are
• 3-address instruction
• 2 -address instruction
• 1-address instruction
• 0-address instruction

Let us first assume that this action is to be accomplished by a single machine instruction.
Furthermore, assume that this instruction contains the memory addresses of the three operands - A, B,
and C. This three-address instruction can be represented symbolically as
Add A,B,C

Operands A and B are called the source operands, C is called the destination operand, and Add
is the operation to be performed on the operands. A general instruction of this type has the format
Operation Source1,Source2,Destination
¾ If k bits are needed to specify the memory address of each operand, the encoded form of the

28
above instruction must contain 3k bits for addressing purposes in addition to the bits needed to
denote the Add operation.
¾ For a modern processor with a 32-bit address space, a 3-address instruction is too large to fit in
one word for a reasonable word length. Thus, a format that allows multiple words to be used for
a single instruction would be needed to represent an instruction of this type.
¾ An alternative approach is to use a sequence of simpler instructions to perform the same task,
with each instruction having only one or two operands. Suppose that two-address instructions of
the form are available.

Operation Source,Destination
An Add instruction of this type is
Add A,B
which performs the operation B Å [A] + [B].
¾ When the sum is calculated, the result is sent to the memory and stored in location B, replacing
the original contents of this location. This means that operand B is both a source and a
destination.
¾ A single two-address instruction cannot be used to solve our original problem, which is to add
the contents of locations A and B, without destroying either of them, and to place the sum in
location C.

The problem can be solved by using another two address instruction that copies the contents of
one memory location into another. Such an instruction is
Move B,C
29
which performs the operation C Å [B], leaving the contents of location B unchanged. The word
"Move" is a misnomer here; it should be "Copy."

However, this instruction name is deeply entrenched in computer nomenclature. The operation C Å [A]
+ [B] can now be performed by the two-instruction sequence

Move B,C
Add A,C
¾ In all the instructions given above, the source operands are specified first, followed by the
destination. This order is used in the assembly language expressions for machine instructions in
many computers.
¾ But there are also many computers in which the order of the source and destination operands is
reversed. It is unfortunate that no single convention has been adopted by all manufacturers.
¾ In fact, even for a particular computer, its assembly language may use a different order for
different instructions. We have defined three- and two-address instructions. But, even two-
address instructions will not normally fit into one word for usual word lengths and address sizes.
¾ Another possibility is to have machine instructions that specify only one memory operand.
¾ When a second operand is needed, as in the case of an Add instruction, it is understood
implicitly to be in a unique location. A processor register, usually called the accumulator, may
be used for this purpose. Thus, the one-address instruction
Add A
means the following: Add the contents of memory location A to the contents of the accumulator
register and place the sum back into the accumulator. Let us also introduce the one-address
30
instructions
Load A
and
Store A
¾ The Load instruction copies the contents of memory location A into the accumulator, and the
Store instruction copies the contents of the accumulator into memory location A. Using only
one-address instructions, the operation C ( [A] + [B] can be performed by executing the
sequence of instructions
Load A
Add B
Store C
¾ Note that the operand specified in the instruction may be a source or a destination, depending on
the instruction.
¾ In the Load instruction, address A specifies the source operand, and the destination location, the
accumulator, is implied.
¾ On the other hand, C denotes the destination location in the Store instruction, whereas the
source, the accumulator, is implied.
¾ Some early computers were designed around a single accumulator structure. Most modern
computers have a number of general-purpose processor registers - typically 8 to 32, and even
considerably more in some cases.
¾ Access to data in these registers is much faster than to data stored in memory locations because
the registers are inside the processor. Because the number of registers is relatively small, only a
few bits are needed to specify which register takes part in an operation. For example, for 32
31
registers, only 5 bits are needed.
¾ This is much less than the number of bits needed to give the address of a location in the
memory. Because the use of registers allows faster processing and results in shorter instructions,
registers are used to store data temporarily in the processor during processing.
Let Ri represent a general-purpose register. The instructions
Load A,Ri
Store Ri,A
and
Add A,Ri
are generalizations of the Load, Store, and Add instructions for the single-accumulator case, in which
register Ri performs the function of the accumulator.
¾ Even in these cases, when only one memory address is directly specified in an instruction, the
instruction may not fit into one word.
¾ When a processor has several general-purpose registers, many instructions involve only
operands that are in the registers. In fact, in many modem processors, computations can be
performed directly only on data held in processor registers. Instructions such as
Add Ri,Rj
or
Add Ri,Rj,Rk
are of this type.
¾ In both of these instructions, the source operands are the contents of registers Ri and Rj. In the
first instruction, Rj also serves as the destination register, whereas in the second instruction, a
third register, Rk, is used as the destination. Such instructions, where only register names are
32
contained in the instruction, will normally fit into one word.
¾ It is often necessary to transfer data between different locations. This is achieved with the
instruction
Move Source,Destination
which places a copy of the contents of Source into Destination.
¾ When data are moved to or from a processor register, the Move instruction can be used rather
than the Load or Store instructions because the order of the source and destination operands
determines which operation is intended. Thus,
Move A,Ri
is the same as
Load A,Ri
and
Move Ri,A
is the same as
Store Ri ,A

¾ In processors where arithmetic operations are allowed only on operands that are in processor
registers, the C = A + B task can be performed by the instruction sequence
Move A,Ri
Move B,Rj
Add Ri ,Rj
Move Rj ,C
In processors where one operand may be in the memory but the other must be in a register, an
33
instruction sequence for the required task would be
Move A,Ri
Add B,Ri
Move Ri,C
¾ The speed with which a given task is carried out depends on the time it takes to transfer
instructions from memory into the processor and to access the operands referenced by these
instructions.
¾ Transfers that involve the memory are much slower than transfers within the processor. Hence, a
substantial increase in speed is achieved when several operations are performed in succession on
data in processor registers without the need to copy data to or from the memory.
¾ When machine language programs are generated by compilers from high-level languages, it is
important to minimize the frequency with which data is moved back and forth between the
memory and processor registers.
We used the task C Å [A] + [B] as an example instruction format. The diagram shows a possible
program segment for this task as it appears in the memory of a computer. We have assumed that the
computer allows one memory operand per instruction and has a number of processor registers. We
assume that the word length is 32 bits and the memory is byte addressable. The three instructions of the
program are in successive word locations, starting at location i. Since each instruction is 4 bytes long,
the second and third instructions start at addresses i + 4 and i + 8.
For simplicity, we also assume that a full memory address can be directly specified in
a single-word instruction, although this is not usually possible for address space sizes
and word lengths of current processors.

34
Fig: A program for C Å [A] + [B]
Execution steps of an above program:
¾ The processor contains a register called the program counter (PC), which holds the address of
the instruction to be executed next.

35
¾ To begin executing a program, the address of its first instruction (i in our example) must be
placed into the PC.
¾ Then, the processor control circuits use the information in the PC to fetch and execute
instructions, one at a time, in the order of increasing addresses. This is called straight-line
sequencing.
¾ During the execution of each instruction, the PC is incremented by 4 to point to the next
instruction.
¾ Thus, after the Move instruction at location i + 8 is executed, the PC contains the value i + 12,
which is the address of the first instruction of the next program segment.
¾ Executing a given instruction is a two-phase procedure.
9 In the first phase, called instruction fetch, the instruction is fetched from the memory location
whose address is in the PC. This instruction is placed in the instruction register (IR) in the
processor.
9 At the start of the second phase, called instruction execute, the instruction in IR is examined to
determine which operation is to be performed.
¾ The specified operation is then performed by the processor. This often involves fetching
operands from the memory or from processor registers, performing an arithmetic or logic
operation, and storing the result in the destination location.
¾ At some point during this two-phase procedure, the contents of the PC are advanced to point to
the next instruction. When the execute phase of an instruction is completed, the PC contains the
address of the next instruction, and a new instruction fetch phase can begin.
¾ In most processors, the execute phase itself is divided into a small number of distinct phases
corresponding to fetching operands, performing the operation, and storing the result.
36
BRANCHING
Consider the task of adding a list of n numbers. The addresses of the memory locations
containing the n numbers are symbolically given as NUM1, NUM2, . . . , NUMn, and a separate Add
instruction is used to add each number to the contents of register R0. After all the numbers have been
added, the result is placed in memory location SUM. Instead of using a long list of Add instructions, it
is possible to place a single Add instruction in a program loop. The loop is a straight-line sequence of
instructions executed as many times as needed. It starts at location LOOP and ends at the instruction
Branch>0. During each pass through this loop, the address of the next list entry is determined, and that
entry is fetched and added to R0. Now, we concentrate on how to create and control a program loop.

Assume that the number of entries in the list, n, is stored in memory location N. Register R1 is used as a
counter to determine the number of times the loop is executed. Hence, the contents of location N are
loaded into register R1 at the beginning of the program. Then, within the body of the loop, the
instruction
Decrement R1
reduces the contents of R1 by 1 each time through the loop. (A similar type of operation is performed
by an Increment instruction, which adds 1 to its operand.) Execution of the loop is repeated as long as
the result of the decrement operation is greater than zero.

37
38
Fig: A straight-line program for adding n numbers.

Fig: Using a loop to add n numbers

39
We now introduce branch instructions. This type of instruction loads a new value into the program
counter. As a result, the processor fetches and executes the instruction at this new address, called the
branch target, instead of the instruction at the location that follows the branch instruction in sequential
address order. A conditional branch instruction causes a branch only if a specified condition is
satisfied. If the condition is not satisfied, the PC is incremented in the normal way, and the next
instruction in sequential address order is fetched and executed.
In the above program , the instruction,
Branch>0 LOOP
(branch if greater than 0) is a conditional branch instruction that causes a branch to
location LOOP if the result of the immediately preceding instruction, which is the
decremented value in register R1, is greater than zero. This means that the loop is
repeated as long as there are entries in the list that are yet to be added to R0. At the
end of the nth pass through the loop, the Decrement instruction produces a value of
zero, and, hence, branching does not occur. Instead, the Move instruction is fetched
and executed. It moves the final result from R0 into memory location SUM.
The capability to test conditions and subsequently choose one of a set of alternative
ways to continue computation has many more applications than just loop control. Such
a capability is found in the instruction sets of all computers and is fundamental to the
programming of most nontrivial tasks.

40
CONDITION CODES
The processor keeps track of information about the results of various operations for use by
subsequent conditional branch instructions. This is accomplished by recording the required information
in individual bits, often called condition code flags. These flags
are usually grouped together in a special processor register called the condition code
register or status register. Individual condition code flags are set to 1 or cleared to 0,
depending on the outcome of the operation performed.
Four commonly used flags are
N (negative) Set to 1 if the result is negative; otherwise, cleared to 0
Z (zero) Set to 1 if the result is 0; otherwise, cleared to 0
V (overflow) Set to 1 if arithmetic overflow occurs; otherwise, cleared to 0
C (carry) Set to 1 if a carry-out results from the operation; otherwise, cleared to 0

• The N and Z flags indicate whether the result of an arithmetic or logic operation is negative or
zero.
• The N and Z flags may also be affected by instructions that transfer data, such as Move, Load,
or Store.
• This makes it possible for a later conditional branch instruction to cause a branch based on the
sign and value of the operand that was moved.
• Some computers also provide a special Test instruction that examines a value in a register or in
the memory and sets or clears the N and Z flags accordingly.

41
• The V flag indicates whether overflow has taken place. Overflow occurs when the result of an
arithmetic operation is outside the range of values that can be represented by the number of bits
available for the operands.
• The processor sets the V flag to allow the programmer to test whether overflow has occurred
and branch to an appropriate routine that corrects the problem.
• Instructions such as BranchIfOverflow are provided for this purpose. A program interrupt may
occur automatically as a result of the V bit being set, and the operating system will resolve what
to do.
• The C flag is set to 1 if a carry occurs from the most significant bit position during an
arithmetic operation. This flag makes it possible to perform arithmetic operations on operands
that are longer than the word length of the processor. Such operations are used in multiple-
precision arithmetic.
• The instruction Branch>0, an example of a branch instruction that tests one or more of the
condition flags.
• It causes a branch if the value tested is neither negative nor equal to zero. That is, the branch is
taken if neither N nor Z is 1.
• Many other conditional branch instructions are provided to enable a variety of conditions to be
tested. The conditions are given as logic expressions involving the condition code flags.
• In some computers, the condition code flags are affected automatically by instructions that
perform arithmetic or logic operations. However, this is not always the case.
• A number of computers have two versions of an Add instruction, for example. One version,
Add, does not affect the flags, but a second version, AddSetCC, does.

42
• This provides the programmer—and the compiler—with more flexibility when preparing
programs for pipelined execution.

GENERATING MEMORY ADDRESSES


The purpose of the instruction block at LOOP is to add a different number from the list during
each pass through the loop. Hence, the Add instruction in that block must refer to a different address
during each pass. How are the addresses to be specified? The memory operand address cannot be given
directly in a single Add instruction in the loop. Otherwise, it would need to be modified on each pass
through the loop. As one possibility, suppose that a processor register, Ri, is used to hold the memory
address of an operand. If it is initially loaded with the address NUM1 before the loop is entered and is
then incremented by 4 on each pass through the loop, it can provide the needed capability.
This situation, and many others like it, gives rise to the need for flexible ways to
specify the address of an operand. The instruction set of a computer typically provides
a number of such methods, called addressing modes. While the details differ from one
computer to another, the underlying concepts are the same.

VI. HARDWARE
The traffic-light controller is a very simple special-purpose computer system requiring only a
few of the physical hardware components that constitute a general-purpose computer system. The four
major hardware blocks of a general purpose computer system are its memory unit (MU), arithmetic and
logic unit (ALU), input=output unit (IOU), and control unit (CU). Input=output (I / O) devices input
and output data into and out of the memory unit. In some systems, I / O devices send and receive data

43
into and from the ALU rather than the MU. Programs reside in the memory unit. The ALU processes
the data taken from the memory unit (or the ALU) and stores the processed data back in the memory
unit (or the ALU). The control unit coordinates the activities of the other three units. It retrieves
instructions from programs resident in the MU, decodes these instructions, and directs the ALU to
perform corresponding processing steps. It also oversees I / O operations. A keyboard and a mouse are
the most common input devices nowadays. A video display and a printer

Figure : Typical computer system.

are the most common output devices. Scanners are used to input data from hardcopy sources. Magnetic
tapes and disks are used as I / O devices. These devices are also used as memory devices to increase the

44
capacity of the MU. The console is a special-purpose I/O device that permits the system operator to
interact with the computer system. In modern-day computer systems, the console is typically a
dedicated terminal.

VII. SOFTWARE
The hardware components of a computer system are electronic devices in which the basic unit of
information is either a 0 or a 1, corresponding to two states of an electronic signal. For instance, in one
of the popular hardware technologies a 0 is represented by 0V while a 1 is represented by 5 V.
Programs and data must therefore be expressed using this binary alphabet consisting of 0 and 1.
Programs written using only these binary digits are machine language programs. At this level of
programming, operations such as ADD and SUBTRACT are each represented by a unique pattern of 0s
and 1s, and the computer hardware is designed to interpret these sequences. Programming at this level
is tedious since the programmer has to work with sequences of 0s and 1s and needs to have very
detailed knowledge of the computer structure. The tedium of machine language programming is
partially alleviated by using symbols such as ADD and SUB rather than patterns of 0s and 1s for these
operations. Programming at the symbolic level is called assembly language programming. An assembly
language programmer also is required to have a detailed knowledge of the machine structure, because
the operations permitted in the assembly language are primitive and the instruction format and
capabilities depend on the hardware organization of the machine. An assembler program is used to
translate assembly language programs into machine language. Use of high-level programming
languages such as FORTRAN, COBOL, C, and JAVA further reduces the requirement of an intimate
knowledge of the machine organization. A compiler program is needed to translate a high-level
45
language program into the machine language. A separate compiler is needed for each high-level
language used in programming the computer system. Note that the assembler and the compiler are also
programs written in one of those languages and can translate an assembly or high-level language
program, respectively, into the machine language.
The below figure shows the sequence of operations that occurs once a program is developed. A
program written in either the assembly language or a high-level language is called a source program. An
assembly language source program is translated by the assembler into the machine language program.
This machine language program is the object code. A compiler converts a high-level language source
into object code. The object code ordinarily resides on an intermediate device such as a magnetic disk
or tape. A loader program loads the object code from the intermediate device into the memory unit. The
data required by the program will be either available in the memory or supplied by an input device
during the execution of the program. The effect of program execution is the production of processed
data or results.

46
Figure : Program translation and execution.
System
Operations such as selecting the appropriate compiler for translating the source into object code;
loading the object code into the memory unit; and starting, stopping, and accounting for the computer
system usage are automatically done by the system. A set of supervisory programs that permit such
automatic operation is usually provided by the computer system manufacturer. This set, called the
operating system, receives the information it needs through a set of command language statements from
the user and manages the overall operation of the computer system. Operating system and other utility
programs used in the system may reside in a memory block that is typically read-only. Special devices
are needed to write these programs into read-only memory. Such programs and commonly used data are
termed firmware. The below Figure is a simple rendering of the complete hardware– software
environment of a general-purpose computer system.

47
Figure: Hardware and software components.

Definition:
Software is a collection of program written to solve the problem using computer. Software is
two types.
• System software
• Applications software
The differences between System software and Applications software
System software Application software
It’s a collection of programs that are It’s a collection of programs that

48
responsible for the coordination of all are focus on the particular
activities in a computing system application( problem) to be solved
Purely machine dependent machine independent
Examples: Complier, assemble, MS office, accounting system,
linker, debugger, text editor, loader, ticket reservation etc..
OS and so on..

System software perform the following functions


¾ Receiving and interpreting user commands.
¾ Entering and editing application programs and storing them as files in secondary storage
devices. Eg., Text editors
¾ Managing the storage and retrieval of files in secondary storage devices.
¾ Running standard application program such as word processor or spreadsheet, with data
supplied by the user.
¾ Controlling I/O units to receive input and produce output.
¾ Translating source program into object program. Eg., Compiler.
¾ Linking and running user written programs.
Compiler
Compiler is a system software that translating high-level language program (source program)
such as C, C++ into machine language program (object program).
Text editor

49
It is used for entering and editing application programs. The user can use the commands that
allow statements of a source program and saved as a file in secondary storage memory. A file can be
referred to by a name chosen by the user.
Operating System
Operating system is a large program with a collection of routines. It is used to control the
sharing of and interaction among various computers units as they execute application programs.
Other tasks of OS are,
9 To assign memory and magnetic disk space to program and data files.
9 To move data between memory and disk units
9 To handle I/O operations
Steps involved in running an application program
1) Transfer the program to be executed from secondary storage into main memory.
2) Start executing the program.
3) Read the required data for program from memory and perform the specified computation on
the data.
4) Print the result.
Role of operating system in running the program
1) When the executing program requires some data from the memory then it sends request to
operating system. The operating system fetches the requested data and passes the control back to
program which then proceed to perform the required computation.
2) When the computation is completed and the results are ready to be printed, the program again
sends a request to the operating system. An OS routine makes the printer to print the result.

50
The below time line diagram illustrates the sharing of the processor execution time. In this
diagram during time period t0 to t1, OS initiates loading the application program from disk to main
memory wait unit loading and then passes execution control to the application program. Same activity
occurs during period t2 to t3 and period t4 to t5.During t1 to t2 and t3 to t4 processor performs actual
execution of program.

Pointer

Disk

OS
routines

Program

t0 t1 t2 t3 t4 t4
Time

Figure : User program and OS routine sharing of the processor

51
From t4 to t5 OS transfers the file from main memory to printer to print the result. During this
period processor is free which can execute next program till printing is completed. Thus operating
system manages the concurrent execution of several programs to make the best possible use of
computer resources and it is called multiprogramming or multitasking.

52
MEMORY LOCATIONS AND ADDRESS
Computer memory consists of millions of storage cells. Each cell can store a bit (0 or 1) of
information. Usually n bits are grouped, so that such group of bits can be stored and retrieved in a single
basic operation. Each group of n bits is called a word of information, and n is called word length. Thus
memory of a computer can be schematically represented as a collection of words.

Figure : Memory words


Characteristics of word length:
• World length of modern computers ranges from 16 to 64 bits.

53
• If word length of a computer is 32 bits, then a single word can store 32 bit 2’s complement
number or four ASCII characters, each occupying 8 bits (a unit of 8 bits called a byte).
Machine instructions may require one or more words for their representation.
The format for encoding the machine instructions into memory word

Address and name representations to store an information:


Accessing the memory to store or retrieve a single item of information, either a word or a byte,
requires distinct names or addresses for each item location.

54
Normally numbers will be represented from 0 through 2k - 1, for some suitable value of k, as the
addresses of successive locations in the memory. The 2k addresses constitutes the address space of
the computer, and the memory can have up to 2k addressable locations.
For example:
© A 24-bit address generates an address space of 224(16,777,216) locations. This number is
usually written as 16M (16 mega), where 1M is the number 220(1,048,576).
© A 32-bit address creates an address space of 232 or 4G (4 giga) locations, where 1G is 230.
© Other notational convention that are commonly used are K (kilo) for the number 210 (1,024),
and T (tera) for the number 240.
Byte Addressability
A byte is always 8 bits, but the word length typically ranges from 16 to 64 bits. It is impractical
to assign distinct addresses tio individual bit locations in the memory. The practical assignment is to
have successive addresses refer to successive byte locations in the memory.
Byte addressable memory is one in which successive addresses refer to successive byte location in
the memory. Thus each byte in a memory is addresses as 0,1,2… and if the word length of the machine
is 32 bits, successive words are located at addresses 0, 4, 8…., with each word consisting of four bytes.

Big-Endian And Little Endian Assignments


There are two ways of assigning byte addresses. They are
¾ Big-endian assignment
¾ Little-endian assignment
Big-endian

55
Big-endian is used when lower byte addresses are used for the more significant bytes (the left
most bytes) of the word.
Little-endian
Little-endian is used when lower byte addresses are used for the less significant bytes (the
rightmost bytes) of the word.

In both Big-endian and Little-endian assignments byte addresses 0,4,8,…, are taken as the address of
successive words (ie, word length is 4 bytes) in the memory and are the addresses are used when
specifying memory read and write operations for words.

Word alignment
There are two kind of address
56
¾ Aligned address
¾ Unaligned address
Aligned address
Words are said to be aligned in memory if they begin at a byte address that is a multiple of
number of bytes in a word.
For example,
• In a 32 bit (4 bytes) word length machine, number of bytes in a word is 4. In this case
words are said to have aligned address, if words begin at address 0, 4, 8, 16… i.e.,
multiple of number of byte in a word.
• Similarly if word length is 16 (2 bytes), aligned words begin at byte
addresses 0,2,4…
Unaligned address
Words are said to be unaligned in memory if they do not begin at a byte address that is a
multiple of number of bytes in a word.
Accessing numbers, characters, and character strings
¾ A number occupies one word. It can be accessed in the memory by specifying its word address.

¾ A Character occupies one Byte. It can be accessed in the memory by specifying its Byte address.
¾ Accessing Strings
9 The beginning of the string is indicated by giving the byte address of its first character.
Successive byte locations contain successive characters of a string.
9 There are two ways to indicate the length of the string

57
• A special control character with the meaning “end of string” can be used as the
last character in the string.
• A separate memory word location or processor register can contain a number
indicating the length of the string in bytes.
MEMORY OPERATIONS

To execute an instruction, the processor control circuits must cause the word containing the
instruction to be transferred from the memory to the processor. Operands and results must also be
moved between the memory and the processor.
Thus, two basic memory operations are needed, they are
¾ Load
¾ Store
Load
o The load operation transfers a copy of the content of a specific memory location to the
processor.
o To start a load operation, the processor sends the address of the desired location to the
memory.
o The memory reads the data stored at that address and sends them to the processor.

58
Address

Read
Processor Memory
Data

Store
• The store operation transfers an item of information from the processor register to a specific
memory location.
• The processor sends the address of the desired memory location to the memory, together with
the data to be written into that location.

Address

Write
Processor Memory
Data

59
An information item of either one word or one byte can be transferred between the processor and the
memory in a single operation. Processor register can hold one word of information at a time.

VIII. INSTRUCTION SET ARCHITECTURE

Interface between the high level language and the machine language

It has the following parts:


• Instruction set
• Addressing modes
• Instruction formats
• Instruction representation
Instructions
Logical instructions
AND, OR, XOR, Shift
Arithmetic instructions
Data types
Integers: Unsigned, Signed, Byte, Short, Long
Real numbers: Singleprecision (float), Doubleprecision (double) Operations
Addition, Subtraction, Multiplication, Division
Data transfer instructions
Register transfer: Move
Memory transfer: Load, Store
I/O transfer: In, Out

60
Control transfer instructions
• Unconditional branch
• Conditional branch
• Procedure call
• Return
Addressing modes
Specification of operands in instructions
Different addressing modes:
• Register direct: Value of operand in a register
• Register indirect: Address of operand in a register
• Immediate: Value of operand
• Memory direct: Address of operand
• Indexed: Base register, Index register
• Relative: Base register, Displacement
• Indexed relative: Base register, Index register,
Instruction formats
3operand instructions
ADD op1, op2, op3; op1 op2 + op3
2operand instructions
ADD op1, op2; op1 op1 + op2
1operand instructions
INC op1; op1 op1 + 1
Types of operands:
Register operands
Memory operands specified using addressing modes
Effect of instruction format:
61
Instruction length
Number of instructions for a program
– Complexity of instruction decoding (Control unit)

Complex Instruction Set Computer (CISC) processors:

2operand instructions and 1operand instructions

Any instruction can use memory operands, Many addressing modes, Complex instruction formats:
Varying length instructions , Micro programmed control unit

Reduced Instruction Set Computer (RISC) processors:

3operand instructions, 2operand instructions, and 1operand instructions


Architecture (LSA) processors:
• Only memory transfer instructions (Load and Store) can use memory operands.
• All other instructions can use register operands only.
– A few addressing modes
– Simple instruction formats: Fixed length instructions
– Hardwired control unit

IX. ADDRESSINGMODES

In general, a program operates on data that reside in the computer’s memory. These data can be
organized in a variety of ways. If we want to keep track of students’ names, we can write them in a list.
If we want to associate information with each name, for example to record telephone numbers or marks
in various courses, we may organize this information in the form of a table. Programmers use
organizations called data structures to represent the data used in computations. These include lists,
linked lists, arrays, queues, and so on.
62
Programs are normally written in a high-level language, which enables the programmer to use
constants, local and global variables, pointers, and arrays. When translating a high-level language
program into assembly language, the compiler must be able to implement these constructs using the
facilities provided in the instruction set of the computer in which the program will be run. The different
ways in which the location of an operand is specified in an instruction are referred to as addressing
modes.

63
1. IMPLEMENTATION OF VARIABLES AND CONSTANTS
Variables and constants are the simplest data types and are found in almost every Computer
program. In assembly language, a variable is represented by allocating a register or a memory location
to hold its value. Thus, the value can be changed as needed using appropriate instructions.
We accessed an operand by specifying the name of the register or the address of the memory
location where the operand is located.
Register mode
The operand is the contents of a processor register; the name (address) of the register is given in
the instruction. It is used to access the variables in the program.
Absolute mode
The operand is in a memory location; the address of this location is given explicitly in the
instruction. It is also called as Direct mode. It also used to access the variables in the program.
Example instruction for register and absolute mode:
Move LOC, R2
uses the register and absolute modes. The processor registers are used as temporary storage locations
where the data in a register are accessed using the Register mode. The Absolute mode can represent
global variables in a program. A declaration such as
Integer A, B;
In a high-level language program will cause the compiler to allocate a memory location to each of the
variables A and B. Absolute mode can be used to access the variables in the program.
Immediate mode

64
Address and data constants can be represented in assembly language using the Immediate
mode. The operand is given explicitly in the instruction.
For example, the instruction
Move 200immediate, R0
places the value 200 in register R0. Clearly, the Immediate mode is only used to specify the value of a
source operand. Using a subscript to denote the Immediate mode is not appropriate in assembly
languages. A common convention is to use the sharp sign (#) in front of the value to indicate that this
value is to be used as an immediate operand.
Hence, we write the instruction above in the form
Move #200, R0
Constant values are used frequently in high-level language programs. For example,
the statement
A=B+6
contains the constant 6. Assuming that A and B have been declared earlier as variables
and may be accessed using the Absolute mode, this statement may be compiled as
follows:
Move B, R1
Add #6, R1
Move R1, A
Constants are also used in assembly language to increment a counter, test for some bit pattern, and so
on.

65
2. INDIRECTION AND POINTERS
In the addressing modes that follow, the instruction does not give the operand or its address
explicitly. Instead, it provides information from which the memory address of the operand can be
determined. We refer to this address as the effective address (EA) of the operand.
Indirect mode
The effective address of the operand is the contents of a register or memory location whose
address appears in the instruction. We denote indirection by placing the name of the register or the
memory address given in the instruction in parentheses.
To execute the Add instruction the processor uses the value B, which is in register R1, as the effective
address of the operand. It requests a read operation from the memory to read the contents of location B.
The value read is the desired operand, which the processor adds to the contents of register R0. Indirect
addressing through a memory location is also possible. In this case, the processor first reads the

66
contents of memory location A, then requests a

Fig: Indirect addressing.

Fig: Use of indirect addressing in the program

67
second read operation using the value B as an address to obtain the operand. The register or memory
location that contains the address of an operand is called a pointer. Consider the analogy of a treasure
hunt: In the instructions for the hunt you may be told to go to a house at a given address. Instead of
finding the treasure there, you find a note that gives you another address where you will find the
treasure. By changing the note, the location of the treasure can be changed, but the instructions for the
hunt remain the same. Changing the note is equivalent to changing the contents of a pointer in a
computer program. For example, by changing the contents of register R1 or location A, the same Add
instruction fetches different operands to add to register R0. Let us now return to the program for adding
a list of numbers. Indirect addressing can be used to access successive numbers in the list, resulting in
the program. Register R2 is used as a pointer to the numbers in the list, and the operands are accessed
indirectly through R2. The initialization section of the program loads the counter value n from memory
location N into R1 and uses the immediate addressing mode to place the address value NUM1, which is
the address of the first number in the list, into R2. Then it clears R0 to 0. The first two instructions in
the loop implement the unspecified instruction block starting at LOOP. The first time through the loop,
the instruction
Add (R2), R0
fetches the operand at location NUM1 and adds it to R0. The second Add instruction adds 4 to the
contents of the pointer R2, so that it will contain the address value NUM2 when the above instruction is
executed in the second pass through the loop.
Consider the C-language statement
A= *B;
where B is a pointer variable. This statement may be compiled into
Move B, R1
68
Move (R1), A
Using indirect addressing through memory, the same action can be achieved with
Move (B), A
Despite its apparent simplicity, indirect addressing through memory has proven to be of limited
usefulness as an addressing mode, and it is seldom found in modern computers. An instruction that
involves accessing the memory twice to get an operand is not well suited to pipelined execution.
Indirect addressing through registers is used extensively. The program shows the flexibility it provides.
Also, when absolute addressing is not available, indirect addressing through registers makes it possible
to access global variables by first loading the operand’s address in a register.

3. INDEXING AND ARRAYS


It is useful in dealing with lists and arrays.
Index mode
The effective address of the operand is generated by adding a constant value to the contents of a
register. The register used may be either a special register provided for this purpose, or, more
commonly; it may be any one of a set of general-purpose registers in the processor. In either case, it is
referred to as an index register. We indicate the Index mode symbolically as
X (Ri)
where X denotes the constant value contained in the instruction and Ri is the name of the register
involved. The effective address of the operand is given by
EA = X + [Ri ]

69
The contents of the index register are not changed in the process of generating the effective address. In
an assembly language program, the constant X may be given either as an explicit number or as a
symbolic name representing a numerical value. When the instruction is translated into machine code,
the constant X is given as a part of the instruction and is usually represented by fewer bits than the word
length of the computer. Since X is a signed integer, it must be sign-extended to the register length
before being added to the contents of the register. The index register, R1, contains the address of a
memory location, and the value X defines an offset (also called a displacement) from this address to the
location where the operand is found.
An alternative use: Constant X corresponds to a memory address, and the contents of the index
register define the offset to the operand. In either case, the effective address is the sum of two values;
one is given explicitly in the instruction, and the other is stored in a register.

70
Indexed addressing:

71
To see the usefulness of indexed addressing, consider a simple example involving a list of test
scores for students taking a given course. Assume that the list of scores, beginning at location LIST. A
four-word memory block comprises a record that stores the relevant information for each student. Each
record consists of the student’s identification number (ID), followed by the scores the student earned on
three tests. There are n students in the class, and the value n is stored in location N immediately in front
of the list. The addresses given in the figure for the student IDs and test scores assume that the memory
is byte addressable and that the word length is 32 bits.
We should note that the list in n represents a two-dimensional array having n rows and four columns.
Each row contains the entries for one student, and the columns give the IDs and test scores.

Fig: A list of students’ marks.


72
Suppose that we wish to compute the sum of all scores obtained on each of the tests and store
these three sums in memory locations SUM1, SUM2, and SUM3. In the body of the loop, the program
uses the Index addressing mode. To access each of the three scores in a student’s record, Register R0 is
used as the index register. Before the loop is entered, R0 is set to point to the ID location of the first
student record; thus, it contains the address LIST. On the first pass through the loop, test scores of the
first student are added to the running sums held in registers R1, R2, and R3, which are initially cleared
to 0. These scores are accessed using the Index addressing modes 4(R0), 8(R0), and 12(R0). The index
register R0 is then incremented by 16 to point to the ID location of the second student. Register R4,
initialized to contain the value n, is decremented by 1 at the end of each pass through the loop. When
the contents of R4 reach 0, all student records have been accessed, and the loop terminates. Until then,
the conditional branch instruction transfers control back to the start of the loop to process the next
record. The last three instructions transfer the accumulated sums from registers R1, R2, and R3, into
memory locations SUM1, SUM2, and SUM3, respectively. It should be emphasized that the contents of
the index register, R0, are not changed when it is used in the Index addressing mode to access the
scores. The contents of R0 are changed only by the last Add instruction in the loop, to move from one
student record to the next. In general, the Index mode facilitates access to an operand whose location is
defined relative to a reference point within the data structure in which the operand appears. In the
example just given, the ID locations of successive student records are the reference points, and the test
scores are the operands accessed by the Index addressing mode.

73
Fig: Indexed addressing used in accessing test scores in the list
We have introduced the most basic form of indexed addressing. Several variations of this basic form
provide for very efficient access to memory operands in practical programming situations. For example,
a second register may be used to contain the offset X, in which case we can write the Index mode as
(Ri,R j )
The effective address is the sum of the contents of registers Ri and Rj . The second register is usually
called the base register. This form of indexed addressing provides more flexibility in accessing
operands, because both components of the effective address can be changed. As an example of where
this flexibility may be useful, consider again the student record data structure shown in Figure. In the
above program, we used different index values in the three Add instructions at the beginning of the loop
74
to access different test scores. Suppose each record contains a large number of items, many more than
the three test scores of that example. In this case, we would need the ability to replace the three Add
instructions with one instruction inside a second (nested) loop. Just as the successive starting locations
of the records (the reference points) are maintained in the pointer register R0, offsets to the individual
items relative to the contents of R0 could be maintained in another register. The contents of that register
would be incremented in successive passes through the inner loop.
Yet another version of the Index mode uses two registers plus a constant, which can be denoted as
X(Ri,R j )
In this case, the effective address is the sum of the constant X and the contents of registers Ri and Rj .
This added flexibility is useful in accessing multiple components inside each item in a record, where the
beginning of an item is specified by the (Ri,R j ) part of the addressing mode. In other words, this mode
implements a three-dimensional array.

4. RELATIVE ADDRESSING
We have defined the Index mode using general-purpose processor registers. A useful version of
this mode is obtained if the program counter, PC, is used instead of a general purpose register. Then,
X(PC) can be used to address a memory location that is X bytes away from the location presently
pointed to by the program counter. Since the addressed location is identified “relative” to the program
counter, which always identifies the current execution point in a program, the name Relative mode is
associated with this type of addressing.
Relative mode

75
The effective address is determined by the Index mode using the program counter in place of
the general-purpose register Ri. This mode can be used to access data operands. But, its most common
use is to specify the target address in branch instructions. An instruction such as
Branch>0 LOOP
causes program execution to go to the branch target location identified by the name LOOP if the branch
condition is satisfied. This location can be computed by specifying it as an offset from the current value
of the program counter. Since the branch target may be either before or after the branch instruction, the
offset is given as a signed number.
Recall that during the execution of an instruction, the processor increments the PC to point to the next
instruction. Most computers use this updated value in computing the effective address in the Relative
mode. For example, suppose that the Relative mode
is used to generate the branch target address LOOP in the Branch instruction of the program in Figure
2.12. Assume that the four instructions of the loop body, starting at LOOP, are located at memory
locations 1000, 1004, 1008, and 1012. Hence, the updated contents of the PC at the time the branch
target address is generated will be 1016. To branch to location LOOP (1000), the offset value needed is
X = -16. Assembly languages allow branch instructions to be written using labels to denote the branch
target. When the assembler program processes such an instruction, it computes the required offset
value, -16 in this case, and generates the corresponding machine instruction using the addressing mode -
16(PC).

5. ADDITIONAL MODES
We have given a number of common versions of the Index mode, not all of which may be
found in any one computer. Although these modes suffice for general computation, many computers
76
provide additional modes intended to aid certain programming tasks. The two modes described next are
useful for accessing data items in successive locations in the memory.
Auto increment mode
The effective address of the operand is the contents of a register specified in the instruction.
After accessing the operand, the contents of this register are automatically incremented to point to the
next item in a list. We denote the Autoincrement mode by putting the specified register in parentheses,
to show that the contents of the register are used as the effective address, followed by a plus sign to
indicate that these contents are to be incremented after the operand is accessed. Thus, the
Autoincrement mode is written as
(Ri )+
Implicitly, the increment amount is 1 when the mode is given in this form. But in a byte addressable
memory, this mode would only be useful in accessing successive bytes of some list. To access
successive words in a byte-addressable memory with a 32-bit word length, the increment must be 4.
Computers that have the Autoincrement mode automatically increment the contents of the register by a
value that corresponds to the size of the accessed operand. Thus, the increment is 1 for byte-sized
operands, 2 for 16-bit operands, and 4 for 32-bit operands. Since the size of the operand is usually
specified as part of the operation code of an instruction, it is sufficient to indicate the
Autoincrement mode as (Ri)+.
If the Autoincrement mode is available, it can be used in the first Add instruction and the second Add
instruction can be eliminated. The modified program is shown in below Fig.
As a companion for the Autoincrement mode, another useful mode accesses the items of a list in the
reverse order:

77
Autodecrement mode
The contents of a register specified in the instruction are first automatically decremented and
are then used as the effective address of the operand. We denote the Autodecrement mode by putting
the specified register in parentheses, preceded by a minus sign to indicate that the contents of the
register are to be decremented before being used as the effective address. Thus, we write
-(Ri )

Fig: The Autoincrement addressing mode used in the program


In this mode, operands are accessed in descending address order. The reader may
wonder why the address is decremented before it is used in the Autodecrement mode
and incremented after it is used in the Autoincrement mode. The actions performed by the
Autoincrement and Autodecrement addressing modes can obviously be achieved by using two
instructions, one to access the operand and the other to increment or decrement the register that contains
the operand address. Combining the two operations in one instruction reduces the number of
instructions needed to perform the desired task.

78
X. RISC

RISC

RISC stands for “ Reduced Instruction Set Computer” .

This research was further developed by the universities of Berkeley and Stanford to give basic
architectural models.The IBM was the first company to define the RISC architecture in the 1970s RISC
can be described as a philosophy with three basic levels : (i)All instruction will be executed in a single
cycle (ii)Memory will only be accessed via load and store instruction. (iii)All executions units will be
hardwired with no micro coding. The instruction set is the hardware “language” in which the software
tells the processor what to do. The vacated area of chip can be used in ways that accelerate the
performance of more commonly used instructions . It becomes easier to optimize the design . Basically
the philosophy is, that instructions are handled in parts:

• Fetch the instruction


• Get the arguments
• Perform the action
• Write back the result

which means : r0 = r1 + r2
ro=r1+r2

RISC CHARACTERISTICS

9 Simple instruction set


9 Same length instructions.
9 1 machine-cycle instructions
79
R4000 Internal Block Diagram

13

80
CPU Register Overview

It’s a 32 general purpose register. A program counter(PC) register. 2 registers that hold the
results of integer multiply and divide operations( HI & LO). The R4000 has no Program Status
81
Word(PSW) register, as such this is covered by the status and cause registers incorporated within the
system control coprocessor(CP0).

82
CPU Instruction Set Overview

Each CPU instruction is 32-bits long.


There are three instruction formats :
ƒ immediate ( I - type)
ƒ jump (J - type)
ƒ register (r - type)

14

83
Memory Management Unit (MMU)

The MIPS R4000 processor provides a full-featured MMU which uses an on-chip translation look aside
buffer(TLB) to translate virtual addresses into physical addresses.

84
85
System Control Coprocessor(CP0)

CP0 translates virtual addresses into physical addresses and manages exceptions and transitions between
kernel, supervisor, and user states.CP0 also controls the cache subsystem, as well as providing diagnostic
control and error recovery facilities.

86
Floating Point Unit(FPU), CP1

9 R4000 has on-chip floating point unit designated as CP1. The


9 FPU extends the CPU instruction set to perform arithmetic operations
on floating-point values.

The FPU features include :

ƒ Full 64-bit Operation.


ƒ Load and store instruction set.
ƒ Tightly coupled coprocessor Interface.

87
XII. CISC

CISC, which stands for Complex Instruction Set Computer, is a philosophy for
designing chips that are easy to program and which make efficient use of memory. Each
instruction in a CISC instruction set might perform a series of operations inside the
processor. This reduces the number of instructions required to implement a given
program, and allows the programmer to learn a small but flexible set of instructions.

Since the earliest machines were programmed in assembly language and memory
was slow and expensive, the CISC philosophy made sense, and was commonly
implemented in such large computers as the PDP-11 and the DEC system 10 and 20
machines.

Most common microprocessor designs including the Intel(R) 80x86 and Motorola
68K series also follow the CISC philosophy. As we shall see, recent changes in software
and hardware technology have forced a re-examination of CISC. But first, let's take a
closer look at the decisions which led to CISC.

CISC philosophy 1:

Use Microcode The earliest processor designs used dedicated (hardwire) logic to
decode and execute each instruction in the processor's instruction set. This worked well
for simple designs with few registers, but made more complex architectures hard to build,
as control path logic can be hard to implement. So, designers switched tactics they built
some simple logic to control the data paths between the various elements of the
processor, and used a simplified microcode instruction set to control the data path logic.
This type of implementation is known as a microprogrammed implementation.

In a microprogrammed system, the main processor has some built-in memory


(typically ROM) which contains groups of microcode instructions which correspond with
each machine-language instruction. When a machine language instruction arrives at the
central processor, the processor executes the corresponding series of microcode
instructions. Because instructions could be retrieved up to 10 times faster from a local
ROM than from main memory, designers began to put as many instructions as possible
into microcode. In fact, some processors could be ordered with custom microcode which
would replace frequently used but slow routines in certain application.

There are some real advantages to a microcoded implementation:

Since the microcode memory can be much faster than main memory, an
instruction set can be implemented in microcode without losing much speed over a purely
hard-wired implementation. New chips are easier to implement and require fewer
transistors than implementing the same instruction set with dedicated logic, and a micro
programmed design can be modified to handle entirely new instruction sets quickly.

88
Using micro coded instruction sets, the IBM 360 series was able to offer the same
programming model across a range of different hardware configurations.

Some machines were optimized for scientific computing, while others were
optimized for business computing. However, since they all shared the same instruction
set, programs could be moved from machine to machine without re-compilation (but with
a possible increase or decrease in performance depending on the underlying hardware.)
This kind of flexibility and power made microcoding the preferred way to build new
computers for quite some time.

CISC philosophy 2:

Build "rich" instruction sets One of the consequences of using


a microprogrammed design is that designers could build more functionality into each
instruction. This not only cut down on the total number of instructions required to
implement a program, and therefore made more efficient use of a slow main memory, but
it also made the assembly-language programmer's life simpler. Soon, designers were
enhancing their instruction sets with instructions aimed specifically at the assembly
language programmer. Such enhancements included string manipulation operations,
special looping constructs, and special addressing modes for indexing through tables in
memory.

For example:

ABCD Add Decimal with Extend


ADDA Add Address
ADDX Add with Extend
ASL Arithmetic Shift Left
CAS Compare and Swap Operands
NBCD Negate Decimal with Extend
EORI Logical Exclusive OR Immediate
TAS Test Operand and Set

CISC philosophy 3:

Build high-level instruction sets Once designers started


building programmer-friendly instruction sets, the logical next step was to build
instruction sets which map directly from high-level languages. Not only does this
simplify the compiler writer's task, but it also allows compilers to emit fewer instructions
per line of source code. Modern CISC microprocessors, such as the 68000, implement
several such instructions, including routines for creating and removing stack frames with
a single call.

For example:
DBcc Test Condition, Decrement and Branch
ROXL Rotate with Extend Left

89
RTR Return and Restore Codes
SBCD Subtract Decimal with Extend
SWAP Swap register Words
CMP2 Compare Register against Upper and Lower Bounds

The rise of CISC :

CISC Design Decisions: use microcode, build rich instruction sets, build high-
level instruction sets taken together, these three decisions led to the CISC philosophy
which drove all computer designs until the late 1980s, and is still in major use today.
(Note that "CISC" didn't enter the computer designer's vocabulary until the advent of
RISC it was simply the way that everybody designed computers). The next lesson
discusses the common characteristics that all CISC designs share, and how those
characteristics affect the operation of a CISC machine.

Characteristics of a CISC design

Introduction

While the chips that emerged from the 1970s and 1980s followed their own
unique design paths, most were bound by what we are calling the "CISC Design
Decisions". These chips all have similar instruction sets, and similar hardware
architectures.
In general terms, the instruction sets are designed for the convenience of the assembly-
language programmer and the hardware designs are fairly complex.

Instruction sets

The design constraints that led to the development of CISC (small amounts of
slow memory, and the fact that most early machines were programmed in assembly
language) give CISC instruction sets some common characteristics. A 2-operand format,
where instructions have a source and a destination. For example, the add instruction "add
#5, D0" would add the number 5 to the contents of register D0 and place the result in
register D0.Register to register, register to memory, and memory to register commands.
Multiple addressing modes for Memory, including specialized modes for indexing
through arrays Variable length instructions where the length often varies according to the
addressing.

Hardware architectures

Most CISC hardware architectures have several characteristics in common.


Complex instruction-decoding logic, driven by the need for a single instruction to support
multiple addressing modes. A small number of general purpose registers. This is the
direct result of having instructions which can operate directly on memory and the limited
amount of chip space not dedicated to instruction decoding, execution, and microcode
storage. Several special purpose registers. Many CISC designs set aside special registers

90
for the stack pointer, interrupt handling, and so on. This can simplify the hardware design
somewhat, at the expense of making the instruction set more complex. A "Condition
code" register which is set as a side-effect of most instructions. This register reflects
whether the result of the last operation is less than, equal to, or greater than zero, and
records if certain error conditions occur.

The ideal CISC machine

CISC processors were designed to execute each instruction completely before


beginning the next instruction. Even so, most processors break the execution of an
instruction into several definite stages; as soon as one stage is finished, the processor
passes the result to the next stage. An instruction is fetched from main memory. The
instruction is decoded, the controlling code from the microprogram identifies the type of
operation to be performed, where to find the data on which to perform the operation, and
where to put the result. If necessary, the processor reads in additional information from
memory. The instruction is executed. the controlling code from the microprogram
determines the circuitry/hardware that will perform the operation. The results are written
to memory.
In an ideal CISC machine, each complete instruction would require only one
clock cycle (which means that each stage would complete in a fraction of a cycle.) In
fact, this is the maximum possible speed for a machine that executes 1 instruction at a
time.

A realistic CISC machine


In reality, some instructions may require more than one clock per stage, as
the animation shows. However, a CISC design can tolerate this slowdown since the idea
behind CISC is to keep the total number of cycles small by having complicated things
happen within each cycle.

The advantages of CISC


At the time of their initial development, CISC machines used available
technologies to optimize computer performance.
Microprogramming is as easy as assembly language to implement, and much less
expensive than hardwiring a control unit.
The ease of microcoding new instructions allowed designers to make CISC
machines upwardly compatible: a new computer could run the same programs as earlier
computers because the new computer would contain a superset of the instructions of the
earlier computers.
As each instruction became more capable, fewer instructions could be used to
implement a given task. This made more efficient use of the relatively slow main
memory.
Because microprogram instruction sets can be written to match the constructs of
high-level languages, the compiler does not have to be as complicated.

The disadvantages of CISC

91
Earlier generations of a processor family generally were contained as a subset in
every new version so instruction set & chip hardware become more complex with each
generation of computers.
So that as many instructions as possible could be stored in memory with the least
possible wasted space, individual instructions could be of almost any length this means
that different instructions will take different amounts of clock time to execute, slowing
down the overall performance of the machine.
Many specialized instructions aren't used frequently enough to justify their
existence approximately 20% of the available instructions are used in a typical program.
CISC instructions typically set the condition codes as a side effect of the
instruction. Not only does setting the condition codes take time, but programmers have to
remember to examine the condition code bits before a subsequent instruction changes
them.

XII. ALU DESIGN

An Arithmetic and Logic Unit (ALU) is a combinational circuit that performs


logic and arithmetic micro-operations on a pair of n-bit operands (ex. A[3:0] and B[3:0]).
The operations performed by an ALU are controlled by a set of function-select inputs. In
this lab you will design a 4-bit ALU with 3 function-select inputs: Mode M, Select S1
and S0 inputs. The mode input M selects between a Logic (M=0) and Arithmetic (M=1)
operation. The functions performed by the ALU are specified in Table I.

Table 1: Functions of ALU


M = 0 Logic
S1 S0 C0 FUNCTION OPERATION (bit wise)
0 0 X AiBiB AND
0 1 X Ai + Bi OR
1 0 X Ai⊕ Bi XOR
1 1 X (Ai⊕ Bi)’ XNOR
M = 1 Arithmetic
S1 S0 C0 FUNCTION OPERATION
0 0 0 A Transfer A
0 0 1 A+1 Increment A by 1
0 1 0 A+B Add A and B
0 1 1 A+B+1 Increment the sum of A and B by 1
1 0 0 A + B' A plus one's complement of B
1 0 1 A-B Subtract B from A (i.e. B' + A + 1)
1 1 0 A' + B B plus one's complement of A

92
1 1 1 B-A B minus A (or A' + B + 1)

A block diagram is given in Figure 1.

Figure 1: Block diagram of the 4-bit ALU.

When doing arithmetic, you need to decide how to represent negative numbers.
As is commonly done in digital systems, negative numbers are represented in two’s
complement. This has a number of advantages over the sign and magnitude
representation such as easy addition or subtraction of mixed positive and negative
numbers. Also, the number zero has a unique representation in two’s complement. The
two’s complement of a n-bit number N is defined as,

2n - N = (2n - 1 - N) + 1

The last representation gives us an easy way to find two’s complement: take the
bit wise complement of the number and add 1 to it. As an example, to represent the
number -5, we take two’s complement of 5 (=0101) as follows,

5 0 1 0 1 --> 1 0 1 0 (bit wise complement)


+ 1
1 0 1 1 (two’s complement)
Numbers represented in two’s complement lie within the range -(2n-1) to
+(2n-1 - 1). For a 4-bit number this means that the number is in the range -8 to
+7. There is a potential problem we still need to be aware of when working
with two's complement, i.e. over- and underflow as is illustrated in the example
below,

0 1 0 0 (=carry Ci)
+5 0 1 0 1
+4 + 0 1 0 0
+9 0 1 0 0 1 = -7!

93
also,

1 0 0 0 (=carry Ci)
-7 1 0 0 1
-2 + 1 1 1 0
-9 1 0 1 1 1 = +7!

Both calculations give the wrong results (-7 instead of +9 or +7 instead of -9)
which is caused by the fact that the result +9 or -9 is out of the allowable range for a 4-bit
two’s complement number. Whenever the result is larger than +7 or smaller than -8 there
is an overflow or underflow and the result of the addition or subtraction is wrong.
Overflow and underflow can be easily detected when the carry out of the most significant
stage (i.e. C4 ) is different from the carry out of the previous stage (i.e. C3). You can
assume that the inputs A and B are in two’s complement when they are presented to the
input of the ALU.

b. Design strategies

When designing the ALU we will follow the principle "Divide and Conquer" in
order to use a modular design that consists of smaller, more manageable blocks, some of
which can be re-used. Instead of designing the 4-bit ALU as one circuit we will first
design a one-bit ALU, also called a bit-slice. These bit-slices can then be put together to
make a 4-bit ALU.

There are different ways to design a bit-slice of the ALU. One method consists
of writing the truth table for the one-bit ALU. This table has 6 inputs (M, S1, S0, C0, Ai
and Bi) and two outputs Fi and Ci+1. This can be done but may be tedious when it has to
be done by hand.

An alternative way is to split the ALU into two modules, one Logic and one
Arithmetic module. Designing each module separately will be easier than designing a bit-
slice as one unit. A possible block diagram of the ALU is shown in Figure 2. It consists
of three modules: 2:1 MUX, a Logic unit and an Arithmetic unit.

94
Figure 2: Block diagram of a bit-slice ALU

c.Displaying the results

In order the easily see the output of the ALU you will display the results on the
seven-segment displays and the LEDs (LD).

1. The result of the logic operation can be displayed on the LEDs (LD). Use also
one of these LEDs to display the overflow flag V.
2. Since you are working with a 4-bit representation for 2's complement numbers,
the maximum positive number is +7 and the most negative number is –8. Thus a
single seven-segment display can be used to show the magnitude of the number.
Use another seven-segment display for the “-“ sign (e.g. use segment “g”).
3. There is one complication when using more than one of the seven-segment
displays on the Digilab board, as can be seens from the connections of the LED
segments of the displays. You will notice that the four seven-segment displays
share the same cathodes A, B, ..., G). This implies that one cannot directly
connect the signals for the segments of the magnitude and sign to these terminals,
since that would short the outputs of the gates which would damage the FPGA!.
How could you solve this problem? Sketch a possible solution in your lab
notebook. (Hint: You can alternate the signals applied to the cathodes between
those of the Magnitude and Sign displays. If you do this faster than 30 times per
second the eye will not notice the flickering. You will also need to alternate the
anode signals). What type of circuit will be needed to accomplish this? You can
make use of an on-chip clock, called OSC4 that provides clock signals of 8MHz,
500KHz, 590Hz and 15Hz.
4. Figure 3 shows a schematic of the overall system, consisting of the ALU,
Decoder and Switching circuit, and Displays on the Digital lab board.

95
Figure 3: Overall system, including the 4-bit ALU and display units.

d. Tasks:

Do the following tasks prior to coming to the lab. Write the answers to all
questions in your lab notebook prior to coming to the lab. There is no on-line submission
for the pre-lab. Ask the TA to sign pre-lab section in your lab notebook at the start of the
lab session. You will also need to include answer to the pre-lab questions in your lab
report.

1. Design the MUX. You can choose to design the MUX with gates or by writing
HDL (VHDL) code. Choose one of the two methods and write the design down in
your lab notebook.

2. Design of the Logic unit. Here you also have several choices to design this unit:

a. Write truth table, derive the K-map and give the minimum gate implementation
b. Use a 4:1 MUX and gates
c. Write an HDL file

As part of the pre-lab, you can choose any of the three methods. Briefly justify
why you chose a particular design method. Explain the design procedure and give the
logic diagram or the HDL file. In case you use a MUX, you need also to give the
schematic or the HDL file for the MUX.

3. Design the arithmetic unit. Again, here you have different choices to design and
implement the arithmetic unit. A particularly attractive method is one that makes
use of previously designed modules, such as your Full Adder. The arithmetic unit
performs basically additions on a set of inputs. By choosing the proper inputs, one
can perform a range of operations. This approach is shown in Figure 4. The only

96
blocks that need to be designed are the A Logic and B Logic circuits. You can
make use of your previously designed full adder (MYFA).

Figure 4: Schematic block diagram of the arithmetic unit.

a. Give the truth tables for the Xi and Yi functions with as inputs S1, S0 and Ai, and S1,
S0 and Bi, respectively. Fill out the following tables. Notice that in definition table I of
the ALU, the variable C0 acts as the Carry input. Depending on the value of C0, one
performs the function on the odd or even entries of the definition table I. As an example
the first entry is "transfer A" (for C0=0) while the second one is "A+1" (for C0=1);
Similarly for A + B and A + B + 1, etc.

Xi Yi
S1 S0 Ai (A S1 S0 Bi (B
Logic) Logic)
0 0 0 . 0 0 0 .
0 0 1 . 0 0 1 .
0 1 0 . 0 1 0 .
0 1 1 . 0 1 1 .
1 0 0 . 1 0 0 .
1 0 1 . 1 0 1 .
1 1 0 . 1 1 0 .
1 1 1 . 1 1 1 .
Table II: Truth tables for the A and B logic circuits.

b. Give the K-map for Xi and Yi functions. Find the minimum realization for
Xi and Yi.
c. Draw the logic diagram for Xi and Yi.
d. Design the circuit that detects over- or underflow.

4.Design the decoder for the seven-segment displays. Remember that the segments of the
display are active-low. The decoders should be designed in such a way that when the
Logic Mode (M=0) is selected, only the LEDs are active and when the Arithmetic Mode
(M=1) is selected only the seven-segment displays are active.

97
XIII. FIXED AND FLOATING-POINT OPERATION
Definition:
An arithmetic operation performed on floating-point numbers; "this computer can
perform a million flops per second".
Floating point hardware was standard throughout the 7090/94 family. The 7090
had single precision (36-bit) floating point operations while the 7094/7094 II machines
also provided double precision (72-bit) floating point instructions. The fraction was
considered normalized if Bit-9 (or Bit-18 in Double Precision) contained the first 1-bit of
the fraction so that the floating point word was positioned to have no leading zero bits.
The characteristic for single precision numbers consisted of eight bits (Bits 1-8)
and defined the exponent of the number. Since the exponent could either be positive or
negative, but the hardware sign bit was already allocated for the fraction, then the
exponent was algebraically signed in so-called excess form where the characteristic was
formed by subtracting +128 from the exponent (e.g., an exponent of +12 would be coded
as 140 and -30 would be coded as 98). The allowable range for the single precision
exponent was -128 (decimal) to +127 (decimal) which yielded a floating point range
between approximately 10E-39 to 10E+39 (decimal).
As example, single precision floating point 10.00 (decimal) was represented as
204500000000 (octal) which yielded a sign bit of 0; a characteristic of 204 (octal); and a
mantissa of 500000000 (octal). The zero sign bit indicated an algebraically positive
number; the 204 (octal) or 132 (decimal) characteristic indicated, after subtracting 128
(decimal), an exponent of 4; and the mantissa of 500000000 (octal) indicated a fraction of
(2 ** -2) + (2 ** -3) or 0.63 (decimal). Therefore, the floating point number was (2 ** 4)
* (0.63) or 10.00.
Other floating point examples: 0.00390625 (decimal) was represented by
171400000000 (octal); 44.00 (decimal) was represented by 206510000000 (octal); and -
20.00 (decimal) was represented by 605500000000 (octal).

98
IEEE STANDARD 754 FLOATING POINT NUMBERS

IEEE Standard 754 floating point is the most common representation today for
real numbers on computers, including Intel-based PC's, Macintoshes, and most Unix
platforms. This article gives a brief overview of IEEE floating point and its
representation. Discussion of arithmetic implementation may be found in the book
mentioned at the bottom of this article.

Floating Point Numbers

There are several ways to represent real numbers on computers. Fixed point
places a radix point somewhere in the middle of the digits, and is equivalent to using
integers that represent portions of some unit. For example, one might represent 1/100ths
of a unit; if you have four decimal digits, you could represent 10.82, or 00.01. Another
approach is to use rationals, and represent every number as the ratio of two integers.

Floating-point representation - the most common solution - basically represents


reals in scientific notation. Scientific notation represents numbers as a base number and
an exponent. For example, 123.456 could be represented as 1.23456 × 102. In
hexadecimal, the number 123.abc might be represented as 1.23abc × 162.

Floating-point solves a number of representation problems. Fixed-point has a fixed


window of representation, which limits it from representing very large or very small
numbers. Also, fixed-point is prone to a loss of precision when two large numbers are
divided. Floating-point, on the other hand, employs a sort of "sliding window" of
precision appropriate to the scale of the number. This allows it to represent numbers from
1,000,000,000,000 to 0.0000000000000001 with ease.

Storage Layout

IEEE floating point numbers have three basic components: the sign, the exponent,
and the mantissa. The mantissa is composed of the fraction and an implicit leading digit
(explained below). The exponent base (2) is implicit and need not be stored.

The following figure shows the layout for single (32-bit) and double (64-bit)
precision floating-point values. The number of bits for each field are shown (bit ranges
are in square brackets):

Sign Exponent Fraction Bias


Single Precision 1 [31] 8 [30-23] 23 [22-00] 127
Double Precision 1 [63] 11 [62-52] 52 [51-00] 1023

99
The Sign Bit

The sign bit is as simple as it gets. 0 denotes a positive number; 1 denotes a


negative number. Flipping the value of this bit flips the sign of the number.

The Exponent

The exponent field needs to represent both positive and negative exponents. To do
this, a bias is added to the actual exponent in order to get the stored exponent. For IEEE
single-precision floats, this value is 127. Thus, an exponent of zero means that 127 is
stored in the exponent field. A stored value of 200 indicates an exponent of (200-127), or
73. For reasons discussed later, exponents of -127 (all 0s) and +128 (all 1s) are reserved
for special numbers. For double precision, the exponent field is 11 bits, and has a bias of
1023.

The Mantissa

The mantissa, also known as the significand, represents the precision bits of the
number. It is composed of an implicit leading bit and the fraction bits. To find out the
value of the implicit leading bit, consider that any number can be expressed in scientific
notation in many different ways. For example, the number five can be represented as any
of these:

5.00 × 100

0.05 × 102

5000 × 10-3

In order to maximize the quantity of representable numbers, floating-point numbers


are typically stored in normalized form. This basically puts the radix point after the first
non-zero digit. In normalized form, five is represented as 5.0 × 100.

A nice little optimization is available to us in base two, since the only possible non-
zero digit is 1. Thus, we can just assume a leading digit of 1, and don't need to represent
it explicitly. As a result, the mantissa has effectively 24 bits of resolution, by way of 23
fraction bits.

Putting it All Together

1. The sign bit is 0 for positive, 1 for negative.

2. The exponent's base is two.

3. The exponent field contains 127 plus the true exponent for single-precision, or
1023 plus the true exponent for double precision.

100
4. The first bit of the mantissa is typically assumed to be 1.f, where f is the field of
fraction bits.

Ranges of Floating-Point Numbers

Let's consider single-precision floats for a second. Note that we're taking
essentially a 32-bit number and re-jiggering the fields to cover a much broader range.
Something has to give, and it's precision. For example, regular 32-bit integers, with all
precision centered around zero, can precisely store integers with 32-bits of resolution.
Single-precision floating-point, on the other hand, is unable to match this resolution with
its 24 bits. It does, however, approximate this value by effectively truncating from the
lower end. For example:

11110000 11001100 10101010 00001111 // 32-bit integer


= +1.1110000 11001100 10101010 x 231 // Single-Precision Float
= 11110000 11001100 10101010 00000000 // Corresponding Value

This approximates the 32-bit value, but doesn't yield an exact representation. On the
other hand, besides the ability to represent fractional components (which integers lack
completely), the floating-point value can represent numbers around 2127, compared to 32-
bit integers maximum value around 232.

The range of positive floating point numbers can be split into normalized numbers
(which preserve the full precision of the mantissa), and denormalized numbers (discussed
later) which use only a portion of the fractions's precision.

Approximate
Denormalized Normalized
Decimal

Single ± 2-126 to (2-2-


± 2-149 to (1-2-23)×2-126 23 ± ~10-44.85 to ~1038.53
Precision )×2127

Double ± 2-1074 to (1-2-52)×2- ± 2-1022 to (2-2-


1022 52 ± ~10-323.3 to ~10308.3
Precision )×21023

Since the sign of floating point numbers is given by a special leading bit, the range
for negative numbers is given by the negation of the above values. There are five distinct
numerical ranges that single-precision floating-point numbers are not able to represent:

1. Negative numbers less than -(2-2-23) × 2127 (negative overflow)

2. Negative numbers greater than -2-149 (negative underflow)

3. Zero

4. Positive numbers less than 2-149 (positive underflow)

101
5. Positive numbers greater than (2-2-23) × 2127 (positive overflow)

Overflow means that values have grown too large for the representation, much in the
same way that you can overflow integers. Underflow is a less serious problem because is
just denotes a loss of precision, which is guaranteed to be closely approximated by zero.
Here's a table of the effective range (excluding infinite values) of IEEE floating-point
numbers:

Binary Decimal

Single ± (2-2-23) × 2127 ~ ± 1038.53

Double ± (2-2-52) × 21023 ~ ± 10308.25


Note that the extreme values occur (regardless of sign) when the exponent is at the
maximum value for finite numbers (2127 for single-precision, 21023 for double), and the
mantissa is filled with 1s (including the normalizing 1 bit).

Special Values

IEEE reserves exponent field values of all 0s and all 1s to denote special values in
the floating-point scheme.

Zero

As mentioned above, zero is not directly representable in the straight format, due
to the assumption of a leading 1 (we'd need to specify a true zero mantissa to yield a
value of zero). Zero is a special value denoted with an exponent field of zero and a
fraction field of zero. Note that -0 and +0 are distinct values, though they both compare
as equal.

Denormalized

If the exponent is all 0s, but the fraction is non-zero (else it would be interpreted
as zero), then the value is a denormalized number, which does not have an assumed
leading 1 before the binary point. Thus, this represents a number (-1)s × 0.f × 2-126, where
s is the sign bit and f is the fraction. For double precision, denormalized numbers are of
the form (-1)s × 0.f × 2-1022. From this you can interpret zero as a special type of
denormalized number.

Infinity

The values +infinity and -infinity are denoted with an exponent of all 1s and a
fraction of all 0s. The sign bit distinguishes between negative infinity and positive

102
infinity. Being able to denote infinity as a specific value is useful because it allows
operations to continue past overflow situations. Operations with infinite values are well
defined in IEEE floating point.

Not A Number

The value NaN (Not a Number) is used to represent a value that does not represent a
real number. NaN's are represented by a bit pattern with an exponent of all 1s and a non-
zero fraction. There are two categories of NaN: QNaN (Quiet NaN) and SNaN
(Signalling NaN).

A QNaN is a NaN with the most significant fraction bit set. QNaN's propagate freely
through most arithmetic operations. These values pop out of an operation when the result
is not mathematically defined.

An SNaN is a NaN with the most significant fraction bit clear. It is used to signal an
exception when used in operations. SNaN's can be handy to assign to uninitialized
variables to trap premature usage. Semantically, QNaN's denote indeterminate
operations, while SNaN's denote invalid operations.

Special Operations

Operations on special numbers are well-defined by IEEE. In the simplest case,


any operation with a NaN yields a NaN result. Other operations are as follows:

Operation Result
n ÷ ±Infinity 0
±Infinity × ±Infinity ±Infinity
±nonzero ÷ 0 ±Infinity
Infinity + Infinity Infinity
±0 ÷ ±0 NaN
Infinity - Infinity NaN
±Infinity ÷ ±Infinity NaN
±Infinity × 0 NaN

To sum up, the following are the corresponding values for a given representation:

103
Float Values (b = bias)
Sign Exponent (e) Fraction (f) Value
0 00..00 00..00 +0
00..01
Positive Denormalized Real
0 00..00 :
0.f × 2(-b+1)
11..11
00..01
Positive Normalized Real
0 : XX..XX
1.f × 2(e-b)
11..10
0 11..11 00..00 +Infinity
00..01
0 11..11 : SNaN
01..11
10..00
0 11..11 : QNaN
11..11
1 00..00 00..00 -0
00..01
Negative Denormalized Real
1 00..00 :
-0.f × 2(-b+1)
11..11
00..01
Negative Normalized Real
1 : XX..XX
-1.f × 2(e-b)
11..10
1 11..11 00..00 -Infinity
00..01
1 11..11 : SNaN
01..11
10..00
1 11..11 : QNaN
11.11

104

You might also like