Lecture # 7.
Lecture # 7.
Lecture # 7.
Lecture # 7
Course Instructor: Dr. Noshina
Outline
• Single cycle control
• ALU control
• Single cycle design problems
• An overview of pipelining
• Pipelining analogy
• RISC-V Pipeline
• Pipeline performance example
• Pipeline speedup
Single cycle Datapath design
Control
• Control unit takes input from
– the instruction opcode bits
• Control unit generates
– ALU control input
– write enable (possibly, read enable also)
signals for each storage element
– selector controls for each multiplexor
ALU Control
• Depending on the instruction class, the ALU will need
to perform one of these four functions.
– add for load/stores
– sub for branch on equal
– one of and, or, add, sub for R-type instructions.
• We can generate the 4-bit ALU control input using a
small control unit that has as inputs the funct7 and
funct3 fields of the instruction and a 2-bit control field,
which we call ALUOp.
ALUOp
• ALUOp indicates whether the operation to be
performed should be
– add (00) for loads and stores
– subtract and test if zero (01) for beq, or
– Be determined by the operation encoded in
the funct7 and funct3 fields (10).
• The output of the ALU control unit is a 4-bit
signal that directly controls the ALU by
generating one of the 4-bit combinations.
ALUOp
Instruction classes
Control Signals
Datapath with Control I
New multiplexor
Control Lines
Single-cycle Implementation Notes
• The steps are not really distinct as each instruction
completes in exactly one clock cycle – they simply
indicate the sequence of data flowing through the
datapath.
• The operation of the datapath during a cycle is purely
combinational – nothing is stored during a clock cycle.
• Therefore, the machine is stable in a particular state at
the start of a cycle and reaches a new stable state only
at the end of the cycle.
Load Instruction Steps: ld x9, offset(x19)
1. Fetch instruction and increment PC
2. Read base register from the register file: the base register (x19) is
given by bits 19-15 of the instruction
3. ALU computes sum of value read from the register file and the
sign-extended upper12 bits (offset) of the instruction.
4. The sum from the ALU is used as the address for the data
memory
5. The data from the memory unit is written into the register file:
the destination register (x9) is given by bits 11-7 of the
instruction
Branch Instruction Steps: beq x5, x6, offset
1. Fetch instruction and increment PC
2. Read two register (x5 and x6) from the register file.
3. ALU performs a subtract on the data values from
the register file; the value of PC+4 is added to the
sign-extended shifted left by one upper 12 bits of
instrcution to give the branch target address
4. The Zero result from the ALU is used to decide
which adder result (from step 1 or 3) to store in the
PC
Single-Cycle Design Problems
• Assuming fixed-period clock every instruction datapath uses one clock
cycle implies:
– CPI = 1
– cycle time determined by length of the longest instruction path (load)
• but several instructions could run in a shorter clock cycle: waste
of time
• consider if we have more complicated instructions like floating
point!
– resources used more than once in the same cycle need to be
duplicated
• waste of hardware and chip area
An overview of pipelining
• Pipelining is an implementation technique
in which multiple instructions are overlapped
in execution.
• All steps in a task, called stages in
pipelining, operate concurrently.
• If we have separate resources for each
stage, we can pipeline the tasks.
• Pipelining improves performance by
increasing instruction throughput, as
opposed to decreasing the execution time of
CONTD…