COrrrrr Unit IV

1
Mailam Engineering College

(Approved by AICTE, New Delhi, Affiliated to Anna University, Chennai
& Accredited by National Board of Accreditation (NBA), New Delhi)
Mailam (Po), Villupuram (Dt). Pin: 604 304

DEPARTMENT OF COMPUTER APPLICATIONS
Computer Organization MC9211
UNIT IV
PROCESSOR DESIGN
Processor basics CPU Organization Data path design Control design Basic concepts Hard wired
control Micro programmed control Pipeline control Hazards Super scalar operation.
Part A
1. What do you meant by pipelining?
[Jan 2012]
A pipelining may be visualized as a collection of segments called pipe stages through
which binary information flows. Each segment performs partial processing as dictated by the
task. The result obtained in each segment is transferred to the next segment in the pipeline.
The final result is obtained after the data passes through all the segments.
2. Explain latency and throughput.
Latency: Each Instruction takes certain amount of time to complete. This is called as
latency. It is the time differences when an instruction is issued and when it is completed.
Throughput: The number of instructions completed in a given time is called
Throughput.
3. What are the major characteristics of a pipeline?
Pipelining cannot be implemented in a single task. As it works by splitting multiple
task into a number of subtask and operating on them simultaneously.
The speedup or efficiently is achieved by using the pipelining depends on the number
of pipe stages and the number of available task that can be subdivide.
4. Define control word.

[May 2012]
The combination of control steps used for the generation of control signals is a
control word. A control word is a word whose individual bits represent the various control
signals.
5. What are the various stages in a pipeline execution?
Instruction Fetch
Instruction Decode
Operand fetch
Opcode Execution
Write back
6. Define Pipeline Hazards?
The pipeline architectures work smoothly as long as it is able to take up new task in
every machine cycle. In practice there are situation when the next instruction can be
executed in the following machine cycle. These events called as pipeline hazards.
7. When does a structural hazard occur in pipeline operation?
The types of hazards that can occur in the pipelining were,
[Jan 2013]
Prepared By
Mrs. V.Rekha AP / MCA
Data hazards.
Instruction hazards.
Structural hazards.
8. What are Hazards?

A hazard is also called as hurdle .The situation that prevents the next instruction in
the instruction stream from executing during its designated Clock cycle. Stall is introduced
by hazard.
9. What is meant by Data hazards?
A data hazard is any condition in which either the source or the destination operands
of an instruction are not available at the time expected in pipeline. As a result some
operation has to be delayed, and the pipeline stalls.
10. What is meant by Instruction hazards?
The pipeline may be stalled because of a delay in the availability of an instruction.
For example, this may be a result of miss in cache, requiring the instruction to be fetched
from the main memory. Such hazards are called as Instruction hazards or Control hazards.
11. What is meant by Structural hazards?
The structural hazards is the situation when two instructions require the use of a
given hardware resource at the same time. The most common case in which this hazard
may arise is access to memory.
12. What do you mean by out-of order execution? Is it Desirable?
In a pipelined processor with several instructions is process concurrently it is Possible
for instruction to finish out of sequence, one instruction finishes before another which is
issued earlier. As for as main computation is concerned no Hazards will happen but if an
interrupts occurs it creates the problem.
13. List out Various branching technique used in micro program control unit?
Bit-Oring
Using Conditional Variable
Wide Branch Addressing
14. What is micro programming and micro programmed control unit?
Microprogramming is a method of control unit design in which the control unit
selection and sequencing information are stored in ROM and RAMs called control store or
control memory.
Micro programmed control unit is a general approach used for implementation of
control unit. Here control signals are generated by a program similar to machine language
programs.
15. Define the term hardwired control.
[Jan 2012]
It is the one that contains control units that use fixed logic circuits to interpret
instructions and generate control signals from them. The fixed logic circuit block includes
combinational circuit that generates the required control outputs for decoding and encoding
functions.
16. What is the necessity of grouping signals?
It is used to reduce the number of the bits in the microinstruction.
Prepared By
It is used to overcome the drawback of assigning individual bits to each control signal
results in long microinstructions.
17. Define Job Sequencing.

It is a process of scheduling task that are awaiting initiation in order to avoid collision
and achieve high throughput.
18. Write control signals for storing a word in memory.
R1out , MARin
R2out , MDRin ,write
MDRout E , WMFC
19. What are the problems faced in Instruction Pipeline.
Resources Conflicts
Data Dependency
Branch Difficulties
20. What is Register Renaming?
If a temporary register assumes the role of the permanent register whose data it is
holding and is given the same name is called as the Register Renaming
21. How data hazard can be prevented in pipelining?
Data hazards in the instruction pipelining can prevented by the following techniques.
Operand Forwarding
Software Approach
22. Define instruction set processor.

IR <-[[PC]]
The instruction recorder and control logic unit is responsible for implementing the
actions specified by the instruction loaded in the IR register. The decoder generates the
control signals needed to select the registers involved and direct the transfer of data. The
registers, the ALU, and the interconnecting bus are collectively referred to as the data path.
23. What are the uses of 3 register y, z and temp?
Y, z and temp are used by the processor for temporary storage during execution of
some instructions. These register s are never used for storing data generated by one
instruction for later use by another instruction.
24. Define data path.
Enable the output of register R1 by setting R1out to
This places the contents of R1 on the processor bus.
Enable the input of register R4 by setting R4in to 1.
This loads data from the processor bus into register R4.
25. How will you transfer the contents of register R1 to register R4?
A technique called delayed branching can minimize the penalty incurred as a result of
conditional branch instructions. The idea is simple. The instructions in the delay slots are
always fetched. The processing unit, which executes machine instructions and coordinates
the activities of other units. This unit is often called the instruction set processor (ISP), or
simply the processor.
26. Define processor clock.
Prepared By
4
All operations and data transfers within the processor take place within time periods
defined by the processor clock.
27. Define multiphase clocking.
Edge-triggered flip-flops are not used; two or more clock signals may be needed to
guarantee proper transfer of data. This is known as multiphase clocking.
28. What are three steps that requires for the memory read operation?
R1out, MARin, Read
MDRinE, WMFC
MDRout, R2in
29. What are the actions that requires for executing of a complete instruction?
Fetch the instruction
Fetch the first operand (the contents of the memory location pointed to by R3).
Perform the addition
Load the result into RI
30. Define register file.
A three-bus structure used to connect the registers and the ALU of a processor. All
general-purpose registers are combined into a single block called the register file.
31. Define control store.
The micro routines for all instructions in the instruction set of a computer are stored
in a special memory called the control store.
32. Define vertical organization.
Highly encoded schemes that use compact codes to specify only a small number of
control functions in each macro instruction are referred to as a vertical organization.
33. Define horizontal organization.
The minimally encoded scheme if in which many resources can be controlled with a single
microinstruction is called a horizontal organization.
34. Define hazard.
Pipelined operation in is said to have been stalled for two clock cycles. Normal
pipelined operation resumes in cycle 7.Any condition that causes the pipeline to stall is
called a hazard.
35. Define control hazards.
The Pipeline may also be stalled because of a delay in the availability of an
instruction. For example, this may be a result of a miss in the cache, requiring the
instruction to be fetched from the main memory. Such hazards are often called control
hazards.
36. Define stalls.
The Decode unit is idle in cycle 3 through 5, the execute unit is idle in cycle 4
through 6 and the write unit is idle in cycle 5 through 7, such idle period are called stalls.
37. What is meant by dispatch unit?
The instruction queue can store several instructions. A separate unit, which we call
the dispatch unit, takes the instruction from the front of the queue and sends them to the
execution unit.
38. What is branch folding?
Prepared By
5
The instruction fetch unit has executed the branch instruction concurrently with the
execution of other instruction. This technique is referred as branch folding.
39. Define branch delay slot.
When execution of I2 is completed and a branch is to be made, the processor must
discard I3 and fetch the instruction at the branch target. The location following a branch
instruction is called a branch delay slot.
40. What is delayed branching?
A technique called delayed branching can minimize the penalty incurred as a result of
conditional branch instructions. The idea is simple. The instructions in the delay slots are
always fetched.
41. Define static branch prediction.
With either of these schemes, the branch prediction decision is always the same
every time a given instruction is executed. Any approach that has this characteristic is called
static branch prediction
42. Define dynamic branch prediction.
Approach in which the prediction decision may change depending on execution
history is called dynamic branch prediction.
43. Define multiple-issue.
A more aggressive approach is to equip the processor with multiple processing units
to handle several instructions in parallel in each processor stage. With this arrangement,
several instructions start execution in the same clock, and the processor is said to use
multiple-issue.
44. Define commitment unit.
When out-of-order execution is allowed, a special control unit is needed to guarantee
in-order commitment. This is called commitment unit.
45. Explain deadlock?
A deadlock is a situation that can arise when two units, A and B use a shared
resource. Suppose that unit B cannot complete its task unit A completes its task. At the
same time, unit B has been assigned a resource that unit A need. If this happens, neither
unit can complete its task. Unit A is waiting for the resource it needs, which is being held by
unit B. At the same time, unit B is waiting for unit A to finish before it can release that
recourse.
46. Define Superscalar operation.
Superscalar describes a microprocessor design that makes it possible for more than
one instruction at a time to be executed during a single clock cycle. In a superscalar design,
the processor or the instruction compiler is able to determine whether an instruction can be
carried out independently of other sequential instructions, or whether it has a dependency
on another instruction and must be executed in sequence with it.
47. List out the disadvantages of superscalar operations.
The degree of intrinsic parallelism in the instruction stream, i.e. limited amount of
instruction-level parallelism.
The complexity and time cost of the dispatcher and associated dependency checking
logic.
Prepared By
The branch instruction processing.
48. What information determines the control signals?

Instruction opcode is fetched
2nd half of instruction is fetched with I/O address
Contents of AC written out to device over data bus
[Dec 2011]
49. Differentiate precise and imprecise exceptions.

[Dec 2011]
A machine is said to support precise interrupt when it guarantees that all the
instruction before the instruction causing the exception will be executed and retired without
being affected by the exception being raised and all instructions after the faulting instruction
will not change the state of the machine before the exception is handled. Any machine that
does not give such guarantee is called to have imprecise exception.
Precise exception is a desired attribute as it helps programmer to reason about the
logic in the program, especially in the event of debugging in the presence of an exception.
Moreover imprecise exception can turn a behavior of even a single threaded program with
same input, non-deterministic.
50. List the techniques used for overcoming hazards.
Data forwarding.
Adding sufficient hardware.
Stalling instructions.
Document to find instruction in wrong order.
51. What are the techniques used to prevent Control hazards.
Scheduling instructions in delay slots
Loop unrolling
Conditional execution
Speculation (by both compiler and CPU)
52. What is instruction level parallelism and Loop Level Parallelism?
Pipelining increases performance by overlapping execution of independent
instructions. The potential to overlap instructions is called Instruction-Level Parallelism (ILP)
since the instructions are evaluated in parallel.
Parallelism among iterations in a loop is called as Loop Level Parallelism (LLP).
53. What is Name dependence?
Name dependence is said to occur when two instructions use the same register or
memory location and there is no flow of data between instructions that use the same name.
54. What are the types of Name dependences & its explanation?
Anti dependence- Assume instruction i precedes instruction j . Anti dependence occurs if
j writes to a register or memory location while I reads and is executed first. This
corresponds to WAR hazard.
Control dependence- Assume instruction I precedes instruction j . Control dependence
occurs when I and j write to the same register or memory location, resulting in a WAW
hazard and instruction order is to be maintained.
55. Define dynamic scheduling.
Prepared By
7
The CPU rearranges the instruction to reduce stalls while preserving dependences and this
technique is called as dynamic scheduling. It uses a hardware based mechanism to
rearrange instruction execution order to reduce stalls at run-time and enables handling
cases where dependences are unknown at compile time.
56. What are the advantages of super scalar processor?
[May 2012]
Hardware detects potential parallelism between instructions;
Hardware tries to issue as many instructions as possible in parallel.
Hardware solves register renaming.
If functional units are added in a new version of the architecture or some other
improvements have been made to the architecture, old programs can benefit from
the additional potential of parallelism.
Because the new hardware will issue the old instruction sequence in a more efficient
way.
57. Compare and contrast hardwired and microprogramming control. [Jan 2013]
Hardwired control:
Hardwired control is a control mechanism to generate control signals by using
appropriate finite state machine (FSM).
Hardwired systems are made to perform in a set manner, implemented with logic,
switches, etc. between any input and output in the system. Once the manner in
which the control is executed.
Hardwired control also can be used for implementing sophisticated CISC machines.
Microprogrammed control:
Microprogrammed control is a control mechanism to generate control signals by
using a memory called control storage (CS), which contains the control signals.
Although microprogrammed control seems to be advantageous to CISC machines,
since CISC requires systematic development of sophisticated control signals, there is
no intrinsic difference between these 2 control mechanisms.
The microprogrammed control is not always necessary to implement CISC machines.
Part -B
1. Write about general CPU organization with example. (or) Explain the process
Fundamental concepts.
[Jan 2012]
The processor fetches one instruction at a time and performs the operations specified.
Instructions are fetched from successive memory locations until a branch or a jump
instruction is encountered. The processor keeps track of the address of the memory location
containing the next instruction to be fetched using the program counter, PC. After fetching
an instruction, the contents of the PC are updated to point to the next instruction in the
sequence. A branch instruction may load a different value into the PC.
Another key register in the processor is the instruction register, IR. Suppose that each
instruction comprises 4 bytes, and that it is stored in one memory word. To execute an
instruction, the processor has to perform the following three steps:
Prepared By
Fetch the contents of the memory location pointed to by the PC. The contents of this
location are interpreted as an instruction to be executed. Hence, they are loaded into
the IR.
IR [[PC]]
Assuming that the memory is byte addressable, increment the contents of the PC by
4, that is,
PC [PC] + 4
Carry out the actions specified by the instruction in the IR.
Where an instruction occupies more than one word, steps 1 and 2 must be repeated as
many times as necessary to fetch the complete instruction. These two steps are usually
referred to as the fetch phase; step 3 constitutes the execution phase. In which the
arithmetic and logic unit (ALU) and all the registers are interconnected via a single common
bus. This bus is internal to the processor and should not be confused with the external bus
that connects the processor to the memory and I/O devices.
The data and address lines of the external memory bus are connected to the internal
processor bus via the memory data register, MDR, and the memory address register, MAR,
respectively. Register MDR has two inputs and two outputs. Data may be loaded into MDR
either from the memory bus or from the internal processor bus. The data stored in MDR
may be placed on either bus.
The input of MAR is connected to the internal bus, and its output is connected to the
external bus. The control lines of the memory bus are connected to the instruction decoder
and control logic block. This unit is responsible for issuing the signals that control the
operation of all the units inside the processor and for interacting with the memory bus.
The number and use of the processor registers R0 through R(n - 1) vary considerably from
one processor to another. Registers may be provided for general-purpose use by the
programmer. Some may be dedicated as special-purpose registers, such as index registers
or stack pointers.
Three registers, Y, Z, and TEMP, have not been mentioned before. These registers are
transparent to the programmer, that is, the programmer need not be concerned with them
because they are never referenced explicitly by any instruction.
The multiplexer MUX selects either the output of register Y or a constant value 4 to be
provided as input A of the ALU. The constant 4 is used to increment the contents of the
program counter. The two possible values of the MUX control input Select as Select4 and
SelectY for selecting the constant 4 or register Y, respectively.
As instruction execution progresses, data are transferred from one register to another, often
passing through the AL U to perform some arithmetic or logic operation. The instruction
decoder and control logic unit is responsible for implementing the actions specified by the
instruction loaded in the IR register.
The decoder generates the control signals needed to select the registers involved and direct
the transfer of data. The registers, the ALU, and the interconnecting bus are collectively
referred to as the datapath.
Prepared By
Single bus organization of the data path inside a processor
An instruction can be executed by performing one or more of the following operations in

some specified sequence:
Transfer a word of data from one processor register to another or to the ALU.
Perform arithmetic or a logic operation and store the result in a processor register.
Fetch the contents of a given memory location and load them into a processor
register.
Store a word of data from a processor register into a given memory location.
Register Transfers:
Instruction execution involves a sequence of steps in which data are transferred from
one register to another. For each register, two control signals are used to place the contents
of that register on the bus or to load the data on the bus into the register. The input and
output of register Ri are connected to the bus via switches controlled by the signals Riin and
Ri out respectively. When Riin is set to 1, the data on the bus are loaded into Ri. Similarly,
when Riout is set to 1, the contents of register Ri are placed on the bus. While Riout is equal
to 0, the bus can be used for transferring data from other registers. Suppose that we wish
to transfer the contents of register Rl to register R4. This can be accomplished as follows:
Enable the output of register Rl by setting R1out to 1. This places the contents of R 1
on the processor bus.
Enable the input of register R4 by setting R4in to 1. This loads data from the
processor bus into register R4.
Prepared By
10
Input and output gating for the registers
All operations and data transfers within the processor take place within time periods defined
by the processor clock. The control signals that govern a particular transfer are asserted at
the start of the clock cycle.
Performing Arithmetic And Logical Operation:
The ALU is a combinational circuit that has no internal storage. It performs
arithmetic and logic operations on the two operands applied to its A and B inputs. The
operands is the output of the multiplexer MUX and the other operand is obtained directly
from the bus. The result produced by the ALU is stored temporarily in register Z. Therefore,
a sequence of operations to add the contents of register Rl to those of register R2 and store
the result in register R3 is
R1out, Yin
R2out, Select Y, Add, Zin
Zout, R3in
Fetching a Word from Memory:
The connection for register MDR has four control signals: MDR in and MDRout control
the connection to the internal bus, and MDR inE and MDRout E control the connection to the
external bus. The circuit is easily modified to provide the additional connections.
Prepared By
11
Input and output gating for one register bit.
Connections and control signals for register MDR
Example:
MAR [R1]
Start a Read operation on the memory bus
Wait for the MFC response from the memory
Load MDR from the memory bus
R2 [MDR]
Storing a Word In Memory:

Writing a word into a memory location follows a similar procedure. The desired
address is loaded into MAR. Then, the data to be written are loaded into MDR, and a Write
command is issued. Hence, executing the instruction Move R2,(R 1) requires the following
sequence:
R1out, MARin
R2out, MDRin, Write
MDRoutE, WMFC
As in the case of the read operation, the Write control signal causes the memory bus
interface hardware to issue a Write command on the memory bus. The processor remains in
step 3 until the memory operation is completed and an MFC response is received.
2. List and explain the steps involved in the execution of a complete Instruction
Sets.
[May 2012]
Consider the instruction Add (R3),Rl , which adds the contents of a memory location
pointed to by R3 to register R1. Executing this instruction requires the following actions:
Fetch the instruction.
Fetch the first operand (the contents of the memory location pointed to by R3).
Perform the addition.
Load the result into R1.
The sequence of control steps required to perform these operations for the single bus
architectur. Instruction execution proceeds as follows.
In step 1, the instruction fetch operation is initiated by loading the contents of the PC into
the MAR and sending a Read request to the memory. The Select signal is set to Select4,
which causes the multiplexer MUX to select the constant 4. This value is added to the
operand at input B, which is the contents of the PC, and the result is stored in register Z.
Prepared By
12
The updated value is moved from register Z back into the PC during step 2, while waiting for
the memory to respond. In step 3, the word fetched from the memory is loaded into the IR.
Steps 1 through 3 constitute the instruction fetch phase, which is the same for all
instructions. The instruction decoding circuit interprets the contents of the IR at the
beginning of step 4. This enables the control circuitry to activate the control signals for
steps 4 through 7, which constitute the execution phase. The contents of register R3 are
transferred to the MAR in step 4, and a memory read operation is initiated.
Then the contents of R 1 are transferred to register Y in step 5, to prepare for the
addition operation. When the Read operation is completed, the memory operand is available
in register MDR, and the addition operation is performed in step 6. The contents of MDR are
gated to the bus, and thus also to the B input of the ALU, and register Y is selected as the
second input to the ALU by choosing Select Y The sum is stored in register Z, then
transferred to R 1 in step 7. The End signal causes a new instruction fetch cycle to begin by
returning to step 1.
Branch Instruction:
A branch instruction replaces the contents of the PC with the branch target address.
This address is usually obtained by adding an offset X, which is given in the branch
instruction, to the updated value of the PC. A control sequence that implements an
unconditional branch instruction. Processing starts, as usual, with the fetch phase. This
phase ends when the instruction is loaded into the IR in step 3.
The offset value is extracted from the IR by the instruction decoding circuit, which
will also perform sign extension if required. Since the value of the updated PC is already
available in register Y, the offset X is gated onto the bus in step 4, and an addition operation
is performed. The result, which is the branch target address, is loaded into the PC in step 5.
The offset X used in a branch instruction is usually the difference between the branch target
address and the address immediately following the branch instruction. For example, if the
branch instruction is at location 2000 and if the branch target address is 2050, the value of
X must be 46.
The reason for this can be readily appreciated from the control sequence. The PC is
incremented during the fetch phase, before knowing the type of instruction being executed.
Thus, when the branch address is computed in step 4, the PC value used is the updated
value, which points to the instruction following the branch instruction in the memory.
Prepared By
13
Thus, if N = 0 the processor returns to step 1 immediately after step 4. If N = 1, step 5 is

performed to load a new value into the PC, thus performing the branch operation.
3 Discuss multiple bus organization.
All general-purpose registers are combined into a single block called the register file.
The register file is said to have three ports.
There are two outputs, allowing the contents of two different registers to be accessed
simultaneously and have their contents placed on buses A and B. The third port allows the
data on bus C to be loaded into a third register during the same clock cycle.
Buses A and B are used to transfer the source operands to the A and B inputs of the
ALU, where an arithmetic or logic operation may be performed. The result is transferred to
the destination over bus C. If needed, the ALU may simply pass one of its two input
operands unmodified to bus C.
The ALU control signals for such an operation R=A or R=B. A second feature is the
introduction of the Incremented unit, which is used to increment the PC by 4. Using the
Incremented eliminates the need to add 4 to the PC using the main ALD, as was done in
single bus organization.
Prepared By
14
Consider the three-operand instruction Add R4,R5,R6
In step 1, the contents of the PC are passed through the ALU, using the R=B control
signal, and loaded into the MAR to start a memory read operation. At the same time
the PC is incremented by 4. Note that the value loaded into MAR is the original
contents of the PC. The incremented value is loaded into the PC at the end of the
clock cycle and will not affect the contents of MAR.
In step 2, the processor waits for MFC and loads the data received into MDR, then
transfers them to IR in step 3.
Finally, the execution phase of the instruction requires only one control step to
complete, step 4. By providing more paths for data transfer a significant reduction in
the number of clock cycles needed to execute an instruction is achieved.
4. Explain Hardwired control with the block diagram, Micro Programmed control &
Micro instruction
[May 2012, Dec 2011 & Jan 2013]
The processor must have some means of generating the control signals needed in the
proper sequence. Computer designers use a wide variety of techniques to solve this
problem. The approaches used fall into one of two categories:
Hardwired control
Micro programmed control.
The required control signals are determined by the following information:
Contents of the control step counter
Contents of the instruction register
Contents of the condition code flags
External input signals, such as MFC and interrupt requests
The decoder/encoder block is a combinational circuit that generates the required control
outputs, depending on the state of all its inputs. By separating the decoding and encoding
Prepared By
15
functions. For any instruction loaded in the IR, one of the output lines INS 1 through INS m
is set to 1, and all other lines are set to O.
The input signals to the encoder block are combined to generate the individual control
signals Y in , PC OUh Add, End, and so on. An example of how the encoder generates the
Zin control signal for the processor organization. This circuit implements the logic function
signal is asserted during time slot Tl for all instructions, during T6 for an Add instruction,
during T 4 for an unconditional branch instruction, and so on. Circuit that generates the End
control signal from the logic function
The End signal starts a new instruction fetch cycle by resetting the control step
counter to its starting value. Control signal called RUN. When set to 1, RUN causes the
counter to be incremented by one at the end of every clock cycle. When RUN is equal to 0,
the counter stops counting.
The control hardware can be viewed as a state machine that changes from one state
to another in every clock cycle, depending on the contents of the instruction register, the
condition codes, and the external inputs. The outputs of the state machine are the control
signals. The sequence of operations carried out by this machine is determined by the wiring
of the logic elements, hence the name "hardwired." A controller that uses this approach can
operate at high speed. However, it has little flexibility, and the complexity of the instruction
set it can implement is limited.
Prepared By
16
A Complete Processor:
This structure has an instruction unit that fetches instructions from an instruction
cache or from the main memory when the desired instructions are not already in the cache.
It has separate processing units to deal with integer data and floating-point data. A data
cache is inserted between these units and the main memory. Using separate caches for
instructions and data is common practice in many processors today.
Micro programmed control

[May 2012]
An alternative scheme for hardwired control is called micro programmed control in which
control signals are generated by a program similar to machine language programs.
A control word (CW) is a word whose individual bits represent the various control signals
each of the control steps in the control sequence of an instruction defines a unique
combination of 1s and 0s in the CW.
Prepared By
17
The CW s corresponding to the 7 steps of SelectY is represented by Select = 0 and Select4

by Select = 1. A sequence of CW s corresponding to the control sequence of a machine
instruction constitutes the microroutine for that instruction, and the individual control words
in this microroutine are referred to as microinstructions.
The microroutines for all instructions in the instruction set of a computer are stored in a
special memory called the control store. The control unit can generate the control signals for
any instruction by sequentially reading the CW s of the corresponding microroutine from the
control store. This suggests organizing the control unit.
To read the control words sequentially from the control store, a microprogram counter (PC)
is used. Every time a new instruction is loaded into the IR, the output of the block labeled
"starting address generator" is loaded into the PC.
In microprogrammed control, an alternative approach is to use conditional branch
microinstructions. In addition to the branch address, these microinstructions specify which
of the external inputs, condition codes, or, possibly, bits of the instruction register should be
checked as a condition for branching to take place.
Prepared By
18
The instruction Branch <0 may now be implemented by a microroutine. After loading this
instruction into IR, a branch microinstruction transfers control to the corresponding
microroutine, which is assumed to start at location 25 in the control store. This address is
the output of the starting address generator bloc. The microinstruction at location 25 tests N
bit of the condition codes. If this bit is equal to 0, a branch takes place to location 0 to fetch
a new machine instruction. Otherwise, the microinstruction at location 26 is executed to put
the branch target address into register Z. The microinstruction in location 27 loads this
address into the PC.
Microinstructions:
Horizontal and vertical organizations represent the two organizational extremes in
microprogrammed control. Many intermediate schemes are also possible, in which the
degree of encoding is a design parameter. The layout is a horizontal organization because it
groups only mutually exclusive microoperations in the same fields. As a result, it does not
limit in any way the processor's ability to perform various microoperations in parallel.
Highly encoded schemes that use compact codes to specify only a small number of control
functions in each microinstruction are referred to as a vertical organization. On the other
hand, the minimally encoded scheme, in which many resources can be controlled with a
single microinstruction, is called a horizontal organization.
The horizontal approach is useful when a higher operating speed is desired and when the
machine structure allows parallel use of resources. The vertical approach results in
considerably slower operating speeds because more microinstructions are needed to
perform the desired control functions.
Prepared By
19
5. Explain in detail the implementation of pipeline with a neat diagram. [Jan 2012]
In computer architecture Pipelining means executing machine instructions concurrently. The
pipelining is used in modern computers to achieve high performance. The speed of
execution of programs is influenced by many factors. One way to improve performance is to
use faster circuit technology to build the processor and the main memory. Another
possibility is to arrange the hardware so that more than one operation can be performed at
the same time. In this way, the number of operations performed per second is increased
even though the elapsed time needed to perform anyone operation is not changed.
Pipelining is a particularly effective way of organizing concurrent activity in a computer
system. The basic idea is very simple. It is frequently encountered in manufacturing plants,
where pipelining is commonly known as an assembly-line operation. The processor executes
a program by fetching and executing instructions, one after the other. Let Fi and Ei refer to
the fetch and execute steps for instruction Ii. Executions of a program consists of a
sequence of fetch and execute steps,
Now consider a computer that has two separate hardware units, one for fetching
instructions and another for executing them. The instruction fetched by the fetch unit is
deposited in an intermediate storage buffer, B1. This buffer is needed to enable the
execution unit to execute the instruction while the fetch unit is fetching the next instruction.
The results of execution are deposited in the destination location specified by the
instruction. The data can be operated by the instructions are inside the block labeled
"Execution unit".
The computer is controlled by a clock whose period is such that the fetch and execute steps
of any instruction can each be completed in one clock cycle. Operation of the computer
proceeds. In the first clock cycle, the fetch unit fetches an instruction I1 (step F1 ) and
stores it in buffer Bl at the end of the clock cycle. In the second clock cycle, the instruction
Prepared By
20
fetch unit proceeds with the fetch operation for instruction I2 (step F2). Meanwhile, the
execution unit performs the operation specified by instruction I1, which is available to it in
buffer Bl (step E1). By the end of the second clock cycle, the execution of instruction I1 is
completed and instruction I2 is available. Instruction I2 is stored in B1, replacing I1, which
is no longer needed. Step E2 is performed by the execution unit during the third clock cycle,
while instruction I3 is being fetched by the fetch unit. In this manner, both the fetch and
execute units are kept busy all the time.
F Fetch: read the instruction from the memory.

D Decode: decode the instruction and fetch the source operand(s).
E Execute: perform the operation specified by the instruction.
W Write: store the result in the destination location
Role of Cache Memory:

Each stage in a pipeline is expected to complete its operation in one clock cycle. Hence, the
clock period should be sufficiently long to complete the task being performed in any stage.
Pipelining is most effective in improving performance if the tasks being performed in
different stages require about the same amount of time.
Pipeline Performance:
The pipelined processor processing of one instruction in each clock cycle, which means that
the rate of instruction processing is four times that of sequential operation. The potential
increase in performance resulting from pipelining is proportional to the number of pipeline
stages.
Prepared By
21
6. What is a Data hazards? How will you overcome it?

[May, Jan 2012 & 2013]
A data hazard is any condition in which either the source or the destination operands of an
instruction are not available at the time expected in the pipeline. As a result some operation
has to be delayed, and the pipeline stalls. A data hazard is a situation in which the pipeline
is stalled because the data to be operated on are delayed for some reason. Consider a
program that contains two instructions, I1 followed by I2. When this program is executed in
a pipeline, the execution of 12 can begin before the execution of I1 is completed. The
potential for obtaining incorrect results when operations are performed concurrently can be
demonstrated by a simple example. Assume that A = 5, and consider the following two
operations:
A 3 + A
B 4*A
When these operations are performed in the order given, the result is B == 32. But if they
are performed concurrently, the value of A used in computing B would be the original value,
5, leading to an incorrect result.
A 5xC
B 20 + C
For example, the two instructions Mul R2,R3,R4 and Add RS,R4,R6 give rise to a data
dependency. The result of the multiply instruction is placed into register R4, which in turn is
one of the two source operands of the Add instruction. Assuming that the multiply operation
takes one clock cycle to complete, execution. As the Decode unit decodes the Add
instruction in cycle 3, it realizes that R4 is used as a source operand.
Hence, the D step of that instruction cannot be completed until the W step of the multiply
instruction has been completed. Completion of step D2 must be delayed to clock cycle 5,
and is shown as step D2A. Instruction h is fetched in cycle 3, but its decoding must be
delayed because step D3 cannot precede D2. Hence, pipelined execution is stalled for two
cycles.
Prepared By
22
Operand forwarding:
The data hazard just described arises because one instruction, instruction I2 is waiting for
data to be written in the register file. However, these data are available at the output of the
ALU once the Execute stage completes step El. Hence, the delay can be reduced, or possibly
eliminated, if we arrange for the result of instruction I1 to be forwarded directly for use in
step E2.
The processor datapath involving the ALU and the register file. This arrangement is similar
to the three-bus structur, except that registers SRCl, SRC2, and RSLT have been added.
These registers constitute interstage buffers needed for pipelined operation. Registers SRC1
and SRC2 are part of buffer B2 and RSLT is part of B3. The data forwarding mechanism is
provided by the blue connection lines. The two multiplexers connected at the inputs to the
ALU allow the data on the destination bus to be selected instead of the contents of either
the SRCI or SRC2 register. When the instructions are executed in the datapath of the
operations performed in each clock cycle are as follows. After decoding instruction I2 and
detecting the data dependency, a decision is made to use data forwarding. The operand not
involved in the dependency, register R2, is read and loaded in register SRCI in clock cycle 3.
In the next clock cycle, the product produced by instruction I1 is available in register RSLT,
and because of the forwarding connection, it can be used in step E2. Hence, execution of I2
proceeds without interruption.
Handling data hazards in software:
I1: Mul R2,R3,R4
NOP
NOP
I2 : Add R5,R4,R6
Side effect:
The data dependencies encountered in the preceding examples are explicit and easily
detected because the register involved is named as the destination in instruction I1 and as a
source in I2. Sometimes an instruction changes the contents of a register other than the
one named as the destination.
Classification of data dependent hazards:
The Data dependent hazards can be classified into three types according to various data
update patterns, Consider two instructions I1 and I2, with I1 occurring before I2 in program
order.
I. Read After Write (RAW) (flow dependence hazard) ( R(1) D(2) )
Data hazard refers to a situation where an instruction refers to a result that has not yet
been calculated or retrieved.
II. Write After Read (WAR) (Anti dependence hazard) ( D(1) R(2) )
A write after read (WAR) data hazard represents a problem with concurrent execution.
III. Write After Write (WAW) (Output dependence hazard) ( R(1) R(2) )
A write after write (WAW) data hazard may occur in a concurrent execution environment.
7. Discus Instruction hazards.
[Jan 2012 & 2013]
Pipeline execution of instructions will reduce the time and improves the performance.
Whenever this stream is interrupted, the pipeline stalls illustrates for the case of a cache
miss. A branch instruction may also cause the pipeline to stall. The effect of branch
instructions and the techniques that can be used for mitigating their impact are discussed
with unconditional branches and conditional branches.
Prepared By
23
Unconditional branches:
A sequence of instructions being executed in a two-stage pipeline. Instructions I1 to I3 are
stored at successive memory addresses, and I2 is a branch instruction. Let the branch
target be instruction Ik. In clock cycle 3, the fetch operation for instruction 13 is in progress
at the same time that the branch instruction is being decoded and the target address
computed. In clock cycle 4, the processor must discard I3, which has been incorrectly
fetched, and fetch instruction Ik. In the meantime, the hardware unit responsible for the
Execute (E) step must be told to do nothing during that clock period.
Either a cache miss or a branch instruction stalls the pipeline for one or more clock cycles.
To reduce the effect of these interruptions, many processors employ sophisticated fetch
units that can fetch instructions before they are needed and put them in a queue. Typically,
the instruction queue can store several instructions. A separate unit, which we call the
dispatch unit, takes instructions from the front of the queue and sends them to the
execution unit. This leads to the organization. The dispatch unit also performs the decoding
function.
To be effective, the fetch unit must have sufficient decoding and processing capability to
recognize and execute branch instructions. It attempts to keep the instruction queue filled
at all times to reduce the impact of occasional delays when fetching instructions. If there is
a delay in fetching instructions because of a branch or a cache miss, the dispatch unit
continues to issue instructions from the instruction queue. The fetch unit continues to fetch
instructions and add them to the queue.
the queue length changes and how it affects the relationship between different pipeline
stages. Suppose that instruction I1 introduces a 2-cycle tall. Since space is available in the
queue, the fetch unit continues to fetch instructions and the queue length rises to 3 in clock
cycle 6. Instruction I5 is a branch instruction. Instructions I1, I2, I3, I4 and Ik complete
Prepared By
24
execution in successive clock cycles. Hence, the branch instruction does not increase the
overall execution time. This technique is referred to as branch folding.
Reading more than one instruction in each clock cycle may reduce delay. Having an
instruction queue like this is also beneficial in dealing with cache misses. The instruction
queue mitigates the impact of branch instructions on performance through the process of
branch folding. It has a similar effect on stalls caused by cache misses. The effectiveness of
this technique is enhanced when the instruction fetch unit is able to read more than one
instruction at a time from the instruction cache.
Conditional branches and branch prediction:

I. Delayed branching
The processor fetches next instructions before it determines whether the current instruction
is a branch instruction.
II Branching Prediction (Static)
Another technique for reducing the branch penalty associated with conditional branches is to
attempt to predict whether or not a particular branch will be taken.
III Dynamic Branch Prediction
The idea is that the processor hardware assesses the likelihood of a given branch being
taken by keeping track of branch decisions every time that instruction is executed.
8. Explain Datapath and control considerations
The three-bus structure suitable for pipelined execution with a slight modification to support
a 4-stage pipeline. There are separate instruction and data caches that use separate
address and data connections to the processor. This requires two versions of the MAR
register, IMAR for accessing tile instruction cache and DMAR for accessing the data cache.
The PC is connected directly to the IMAR, so that the contents of the PC can be transferred
to IMAR at the same time that an independent ALU operation is taking place. The data
address in DMAR can be obtained directly from the register file or from the ALU to support
the register indirect and indexed addressing modes. Separate MDR registers are provided
for read and write operations. Data can be transferred directly between these registers and
the register file during load and store operations without the need to pass through the ALU.
Prepared By
25
Buffer registers have been introduced at the inputs and output of the ALU. These are
registers SRCl, SRC2, and RSLT. Forwarding connections may be added if desired. The
instruction register has been replaced with an instruction queue, which is loaded from the
instruction cache. The output of the instruction decoder is connected to the control signal
pipeline. This pipeline holds the control signals in buffers B2 and B.3
The following operations can be performed independently in the processor,
Reading an instruction from the instruction cache
Incrementing the PC
Decoding an instruction
Reading from or writing into the data cache
Reading the contents of up to two registers from the register file
Writing into one register in the register file
Performing an ALU operation
9. Discuss about Superscalar Operation.

Pipelining makes it possible to execute instructions concurrently. Several instructions are
present in the pipeline at the same time, but they are in different stages of their execution.
While one instruction is performing an ALU operation, another instruction is being decoded
and yet another is being fetched from the memory. Instructions enter the pipeline in strict
program order.
The maximum throughput of a pipelined processor is one instruction per clock cycle. The
processors are capable of achieving an instruction execution throughput of more than one
instruction per cycle. They are known as superscalar processors. Many modem highperformance processors use this approach.
Prepared By
26
In a superscalar processor, the detrimental effect on performance of various hazards
becomes even more pronounced. The compiler can avoid many hazards through judicious
selection and ordering of instructions. For example, the compiler should strive to interleave
floating-point and integer instructions.
This would enable the dispatch unit to keep both the integer and floating-point units busy
most of the time. In general, high performance is achieved if the compiler is able to arrange
program instructions to take maximum advantage of the available hardware units.
Out-of-order execution:
Instructions are dispatched in the same order as they appear in the program. However, their
execution is completed out of order. Suppose one issue arise from dependencies among
instructions.
To guarantee a consistent state when exceptions occur, the results of the execution of
instructions must be written into the destination locations strictly in program order. This
means we must delay step W2 until cycle 6. In turn, the integer execution unit must retain
the result of instruction I2, and hence it cannot accept instruction I4 until cycle 6. If an
exception occurs during an instruction, all subsequent instructions that may have been
partially executed are discarded. This is called a precise exception. It is easier to provide
precise exceptions in the case of external interrupts. At this point, the processor and all its
registers are in a consistent state, and interrupt processing can begin.
Prepared By
27
10. Difference between micro programmed and hardwired control.
[Jan 2012]
Hardwired control is a control mechanism to generate control signals by using appropriate

finite state machine (FSM). Microprogrammed control is a control mechanism to generate
control signals by using a memory called control storage (CS), which contains the control
signals. Although microprogrammed control seems to be advantageous to CISC machines,
since CISC requires systematic development of sophisticated control signals, there is no
intrinsic difference between these 2 control mechanism.
The pair of "microinstruction-register" and "control storage address register" can be
regarded as a "state register" for the hardwired control. Note that the control storage can
be regarded as a kind of combinational logic circuit. We can assign any 0, 1 values to each
output corresponding to each address, which can be regarded as the input for a
combinational logic circuit. This is a truth table.
The microprogrammed control is not always necessary to implement CISC machines.
Hardwired control also can be used for implementing sophisticated CISC machines.
Hardwired systems are made to perform in a set manner, implemented with logic, switches,
etc. between any input and output in the system. Once the manner in which the control is
executed.
Microprogrammed systems are centered around a computer of some sort, often a
microcontroller in small systems, that controls the system using a program. Input is sent to
the computer, and the program determines what should be done with the input to come up
with an output. So the processor is between the input and the output, rather than a direct
link between the input and output.
The versatility of the microprogrammed system far exceeds the hardwired system. The
systems can also be considerably smaller. The size of a complex microcontroller can be quite
a bit smaller that a bunch of logic and switches for the same functionality.
11. What is branch penalty? Explain how branch penalty is reduced.
[Dec 2011]
A branch instruction loads the processors program counter with a new non-sequential
value. Consequently, all the instructions whose execution was started before the branch was
taken are suddenly redundant and the pipeline has to be refilled with instructions following
the branch target address. The cost of executing an operation that causes a non-sequential
flow of control is known as the branch penalty.
Instructions that modify the flow of control reducing or even eliminating the bubble in the
RISCs pipeline caused when a branch is taken; that is, concerned with ways of reducing the
Prepared By
28
branch penalty. Some of the techniques involve limiting the damage done by a branch and
some techniques attempt to predict the outcome of a branch before it has been executed.
Several instructions modify the flow of control; for example, the unconditional branch, the
conditional branch, the subroutine call, and the subroutine return. Internally generated
traps and exceptions and externally generated interrupts also modify the flow of control.
Subroutine call and returns are not normally regarded as branch operations from the
computer architect's point of view, but they have similar characteristics from the computer
designer's point of view; that is, they also incur a branch penalty. The unconditional branch
is always taken and forces execution to continue at the target address. An unconditional
branch is equivalent to the high-level language go to and its outcome is known at compiletime.
Reduce branch penalty:
The outcome of a conditional branch is determined by the state of one or more flag bits in
the processor's condition code register and is therefore not known until runtime. The
conditional branch may be taken. When a branch is not taken, the outcome is sometimes
called in line because the next instruction immediately following the branch is executed. A
subroutine call is a type of unconditional branch that saves the return address. Similarly, a
subroutine return is an unconditional branch that fetches the target address from a register
or the stack. Some computers support conditional subroutine calls and returns.
1. Predict branch/jump instructions AND branch direction (taken or not taken)
2. Predict branch/jump target address (for taken branches)
3. Speculatively execute instructions along the predicted path
Anna University Questions
Part- A
1.
2.
3.
4.
5.
6.
7.
8.
What do you meant by pipelining?

Define control word.
Define the term hardwired control.
What information determines the control signals?
Differentiate precise and imprecise exceptions.
What are the advantages of super scalar processor?
When does a structural hazard occur in pipeline operation?
Compare and contrast hardwired and microprogramming control.
[Ref.
[Ref.
[Ref.
[Ref.
[Ref.
[Ref.
[Ref.
[Ref.
No.:
No.:
No.:
No.:
No.:
No.:
No.:
No.:
1]
4]
15]
48]
49]
56]
7]
57]
Part B
1. Write about general CPU organization with example. (or) Explain the process
Fundamental concepts.
[Ref. No.: 1]
2. List and explain the steps involved in the execution of a complete Instruction Sets.
[Ref.
No.: 2]
3. Explain Hardwired control with the block diagram, Micro Programmed control & Micro
instruction
[Ref. No.: 4]
4. Explain in detail the implementation of pipeline with a neat diagram.
[Ref. No.: 5]
5. What is a Data hazards? How will you overcome it?
[Ref. No.: 6]
6. Discus Instruction hazards.
[Ref. No.: 7]
7. Difference between micro programmed and hardwired control.
[Ref. No.: 10]
Prepared By
29
8. What is branch penalty? Explain how branch penalty is reduced.
[Ref. No.: 11]
Prepared By

COrrrrr Unit IV

Uploaded by

Copyright:

Available Formats

COrrrrr Unit IV

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

COrrrrr Unit IV

Uploaded by

Copyright:

Available Formats

1

Mailam Engineering College

Mailam (Po), Villupuram (Dt). Pin: 604 304

4. Define control word.

8. What are Hazards?

17. Define Job Sequencing.

22. Define instruction set processor.

The branch instruction processing.

48. What information determines the control signals?

49. Differentiate precise and imprecise exceptions.

Single bus organization of the data path inside a processor

An instruction can be executed by performing one or more of the following operations in

Input and output gating for the registers

Connections and control signals for register MDR

Storing a Word In Memory:

Thus, if N = 0 the processor returns to step 1 immediately after step 4. If N = 1, step 5 is

Consider the three-operand instruction Add R4,R5,R6

Micro programmed control

The CW s corresponding to the 7 steps of SelectY is represented by Select = 0 and Select4

F Fetch: read the instruction from the memory.

Role of Cache Memory:

6. What is a Data hazards? How will you overcome it?

Conditional branches and branch prediction:

9. Discuss about Superscalar Operation.

10. Difference between micro programmed and hardwired control.

Hardwired control is a control mechanism to generate control signals by using appropriate

What do you meant by pipelining?

[Ref. No.: 11]

You might also like