Cpre 381 Processor Project 1
Cpre 381 Processor Project 1
Cpre 381 Processor Project 1
Instruction Set
Our instruction set is very similar to MIPS. However there are some differences between our
processor and a true MIPS. Our processors memory is word addressed instead of byte
addressed. Our processors instruction set is a reduced version of the MIPS instruction set too.
We have three types of 32-bit instructions: R-Type, I-Type, and Jump.
R-Type
R-Type Instructions are used to do arithmetic and store the result into a register. R-Type
instructions follow the following format:
OPCODE
6-bit
RS
5-bit
RT
5-bit
RD
5-bit
SHAMT
5-bit
FUNC
6-bit
OPCODE Determines which operation we will use. For R-Type instructions, the Opcode is
0b000000.
RS The first parameter register.
RT The second parameter register.
RD Destination register.
SHAMT This would be used to shift bits for R-type inscructions such as multiply. However,
time did not allow to add this, so these bits may be any values.
FUNC Determines which of the following R-type instructions is executed:
Addition
000000
6-bit
RS
5-bit
RT
5-bit
RD
5-bit
SHAMT
5-bit
XX0000
6-bit
Subtraction
000000
6-bit
RS
5-bit
RT
5-bit
RD
5-bit
SHAMT
5-bit
XX0010
6-bit
Or
000000
6-bit
RS
5-bit
RT
5-bit
RD
5-bit
SHAMT
XX0101
5-bit
6-bit
or $1, $2, $3
$1 is RD, $2 is RS, and $3 is RT
Perform a bitwise OR operation on each bit of RS with RT and store into register rd
And
000000
6-bit
RS
5-bit
RT
5-bit
RD
5-bit
SHAMT
XX0100
5-bit
6-bit
Slt
000000
6-bit
RS
5-bit
RT
5-bit
RD
5-bit
SHAMT
XX1010
5-bit
6-bit
I-Type
I-Type Instructions involve using immediate, or constant values. Our processor can perform
three immediate instructions: load word, store word, and branch equals. Load word loads into a
register a value from memory, and store word stores into memory a value from a register.
Branch equals will perform a branch if the parameter registers are equal. I-Type Instructions use
the following format:
OPCODE
6-bit
RS
5-bit
RT
5-bit
IMMEDIATE
16-bit
Lw
100011
6-bit
RS
5-bit
RT
IMMEDIATE
5-bit
16-bit
lw $1, 53($0)
$0 is RS, $1 is RT, and 53 is the IMMEDIATE
Loads into register rt from memory address equal to the value in register rs + IMMEDIATE.
Sw
101011
6-bit
RS
5-bit
RT
IMMEDIATE
5-bit
16-bit
sw $1, 53($0)
$0 is RS, $1 is RT, and 53 is the IMMEDIATE
Stores the value in rt into the memory address which is equal to the value in register rs +
IMMEDIATE
Beq
000100
6-bit
RS
5-bit
RT
IMMEDIATE
5-bit
16-bit
Jump
The jump instruction jumps to an absolute location in memory. Jump instructions follow the
following format:
OPCODE
6-bit
ADDRESS
26-bit
OPCODE Determines which operation we will use. For Jump instructions, the Opcode is
0b000010.
ADDRESS The address in memory to jump to.
j
000010
6-bit
ADDRESS
26-bit
j 53
53 is the ADDRESS
Jumps to the given address.
Design
For our processor we designed a 32-bit MIPS-like pipelined processor. We decided to build a
32-bit implementation as it did not seem much more difficult than a 16-bit processor, which was
the other suggested processor size, and because a 32-bit processor is superior to a 16-bit
processor in regards to the size of data we can manipulate. For example, any R-Type data
manipulation on a 16-bit processor can be performed on a 32-bit processor and a 32-bit processor
can perform operations on data sizes up to 216 times larger than the 16-bit processor.
The processor has five pipelined stages: instruction fetch, instruction decode, execution,
memory, and write back. Pipeline registers hold values between instruction fetch/instruction
decode (IF/ID), instruction decode/execution (ID/EX), execution/memory (EX/MEM), and
memory/write back (MEM/WB). Registers partition the different stages in order to shorten the
clock cycle time which allows us to use the multiple stages in parallel and therefore shorten the
total execution time.
Our processor has two clocks signals. Most components run on the SYSTEM_CLOCK2 clock
speed except the instruction memory and data memory which use the SYSTEM_CLOCK.
SYSTEM_CLOCK is the clock that is externally driven and SYSTEM_CLOCK2 runs at twice
the period of SYSTEM_CLOCK, making it essentially half the speed of SYSTEM_CLOCK.
This is because when we initialized the memory using the Quartus wizard there were registers
placed in front of the memory addresses that we could not remove. Therefore, an easy fix was to
have these use a faster clock than the rest of the program.
We start the processor by first initializing all registers including the Program Counter (PC) to
zero, initializing all memory using our mif files, and setting our SYSTEM_CLOCK. From this
point our processor handles all the program instructions through the following pipeline stages.
Instruction Fetch
The purpose of the instruction fetch stage is to read the current instruction. A value is outputted
from the program counter which is read by the instruction memory. This value tells the
instruction memory which value in memory to output into the IF/ID register. Then the program
counter is either incremented by one or modified by a branch or jump instruction.
Instruction Decode
The instruction decode stage is where the instruction is decoded into its different parts. The
instruction to decode is the 32-bit instruction from the IF/ID register, which was written to by the
previous stage. Bits 31-26 go into our control box and are called the OP Code. The other bits of
the instruction determine the registers to be read and written to, whether or not memory will be
used, and everything else about what the instruction makes the processor do.
The following components are used in this stage:
Control
This is a combinational circuit that takes the op code and outputs control signals for the ALU
Control, as well as the RegDst which determines which register will be written to, ALUSrc
which determines whether the offset from the instruction itself or Data 2 from the register file
will be used as a source for the ALU, MemtoReg which determines whether or not data should
be stored from memory or from the ALU into the register file, RegWrite which determines
whether or data will be written to the registers, MemRead which determines whether or not
memory will be read from, MemWrite which determines whether memory will be written to,
and Branch which determines whether or not there is a branch instruction.
Registers
This is a register file containing thirty-two 32-bit registers made up of d flip-flops. It can read
two registers at once and also write a single register value at the negative edges of
SYSTEM_CLOCK. All registers are initialized to zero, and register zero cannot be written to
and is always 0.
Sign Extender
The sign extender interprets bits 15-0 from the instruction as twos complement and outputs an
equivalent thirty-two bit value.
Sign Extender/Read Data 2 Mux
Selects either the sign extended value or Read Data 2 value depending on if ALUSrc is 0 or 1
respectively.
NOP Mux
If the Hazard Detection Unit detects that an instruction is trying to use a value from a register
that has not been loaded from memory yet, this is a hazard and the mux will set all the control
signals to zero which essentially creates No Operation (NOP) Instructions because nothing is
being written. This is to stall the processor until the memory values are loaded and ready to use.
If the Hazard Detection Unit does not detect a hazard then this mux outputs the normal control
signals.
Execution
The execution stage is where the data manipulation takes place. This is done with our ALU. Not
only is the ALU used to compute R-Type instructions such as adding two registers together, but
it is also needed to calculate things such as the offset of registers for immediate type instructions.
The following components are used in this stage:
ALU Control
The ALU Control sends a three bit code to the ALU to determine which of the following types of
arithmetic or logic to do: add (010), subtract (110), or (001), and (000), slt (111). The ALU
control is a combinational circuit that takes the function field.
ALU
The ALU takes in two thirty-two bit inputs, which come from our data forwarding unit, and has
one thirty-two bit output that goes into the EX/MEM register. The arithmetic or logic that is
performed depends on the ALU control.
To design our thirty-two bit ALU we first started with making a thirty-two bit adder. To design
the thirty-two bit adder we started with a one-bit full adder. This one bit adder could add two
input bits (A and B) along with the carry in bit (Cin) with the simple digital logic output =
(A^B)^Cin. Additionally, it could subtract one bit from another using a subtract signal labeled
BInv with output = A^(B^BInv)^Cin. Instead of rippling thirty-two of these bits together we
used multiple carry look ahead adders. The advantage to a carry look ahead adder as opposed to
a ripple adder is that it takes less time to compute the output as it has propagate and generate
signals to determine the carry in. This makes it so the adder doesnt always have to wait for the
addition of the previous bit to know if it has a carry in or not. A basic one bit carry look ahead
block was cascaded into a four-bit carry look ahead unit. The one-bit adder signals were then fed
into the four bit design to give us a four-bit adder:
These four-bit adders were then strung together into two separate two-level, 16-bit carrylookahead adder with a fifth carry look ahead unit. These were then strung together to give us a
32-bit carry-lookahead adder/subtractor.
Once we had the 32-bit adder/subtractor we needed to add logic to do OR, AND, and SLT
operations. The OR and AND were made by using OR and AND gates at a one bit level for each
individual bit. The SLT was implemented using our subtractor to subtract the second input from
the first and then output the sign of this subtraction. All of these operations always happen
inside of the ALU, but the final result is multiplexed out according to ALU Control.
Rt/Rd Mux
Selects register Rt or Rd from the ID/EX register stage depending on if RegDst from the ID/EX
stage is a 0 or 1 respectively. This chooses what register will be written to during the Write Back
stage.
Execution/Memory Registers
Things registered in this stage are:
Memory
This stage is where the data memory is read from and written to. This stage is especially
important to our processor as there are no load immediate instructions in its instruction set and so
all data that is to be manipulated must first be initialized in the memory using the mif file.
The following component is used in this stage:
Data Memory
This holds multiple 32-bit values of data for our program to use. Data can be loaded from it, and
data can be stored into it. It was made with the Quartus memory wizard.
Write Back
The Write Back stage is where data, whether it has been calculated in the ALU or loaded from
memory, is stored into the register file.
The following components are used in this stage:
MemtoReg Mux
Selects either data memory output or ALU Out from the Mem/WB register stage depending on if
the control signal memToReg is a 0 or 1 respectively.
Registers
These are the same registers that were in the instruction decode stage, except now instead of
reading from them the processor is writing to them.
Control Design
Normally our processor has the instruction fetch, instruction decode, execution, memory, and
write back stages all happening at once as they are independent parts. In one cycle, data flows
from the instruction fetch to the instruction decode stage, the instruction decode to the execution
stage, the execution stage to the memory stage, and the memory stage to the write back stage.
However, there are some special cases that need to be taken into consideration. Consider the
following assembly code:
Add $t1, $t2, $t3
Add $t4, $t1, $t5
This is an example of a data hazard. The new value of $t1 has not yet been propagated to the
write back stage when we are trying to add it to $t5. To solve this we used data forwarding from
our forwarding unit. This way it takes the last output of the ALU, which was stored in the
EX/MEM register, and uses it for data 1 in instead of the read data from the registers. Also
consider
Lw $t1, 100($t3)
Add $t4, $t1, $t5
Again, $t1 is not the value it needs to be when we are calling the add command, so this is another
example of a data hazard. However, this is slightly different than our last example. Here the
data needs to be forwarded from the MEM/WB register instead of the EX/MEM register, which
our forwarding unit also does. Additionally, we must insert no ops with our hazard detection
unit at this point. This is because we must stall for at least one cycle before we can forward our
data from the memory/wb register.
Test Program
Our first test program is designed to find the maximum value within a group of registers:
The beginning and end of the waveform from our program that finds the largest value in a group
of registers, and then puts that value into register three is shown above. If one looks at the data
memory, it can be seen that 333 is the largest value, and that value is stored into register 3 at the
end of the program. Also note that only the beginning and middle parts of the waveform are
shown, as it would be impractical to print the entire program.
Our second test program does nothing useful except to see if the processor does what it is
supposed to:
Following the wave form through each cycle and examining the outputs of the different
components, it is clear that this program executes as expected.