Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Cpre 381 Processor Project 1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 20

CprE 381 Processor Project

By Matt Clucas and Jacob Moyer

Instruction Set
Our instruction set is very similar to MIPS. However there are some differences between our
processor and a true MIPS. Our processors memory is word addressed instead of byte
addressed. Our processors instruction set is a reduced version of the MIPS instruction set too.
We have three types of 32-bit instructions: R-Type, I-Type, and Jump.

R-Type
R-Type Instructions are used to do arithmetic and store the result into a register. R-Type
instructions follow the following format:
OPCODE
6-bit

RS
5-bit

RT
5-bit

RD
5-bit

SHAMT
5-bit

FUNC
6-bit

OPCODE Determines which operation we will use. For R-Type instructions, the Opcode is
0b000000.
RS The first parameter register.
RT The second parameter register.
RD Destination register.
SHAMT This would be used to shift bits for R-type inscructions such as multiply. However,
time did not allow to add this, so these bits may be any values.
FUNC Determines which of the following R-type instructions is executed:

Addition
000000
6-bit

RS
5-bit

RT
5-bit

RD
5-bit

SHAMT
5-bit

XX0000
6-bit

add $1, $2, $3


$1 is RD, $2 is RS, and $3 is RT
Add registers rs and rt and store into register rd

Subtraction
000000
6-bit

RS
5-bit

RT
5-bit

sub $1, $2, $3


$1 is RD, $2 is RS, and $3 is RT
Subtract rt from rs and store into register rd

RD
5-bit

SHAMT
5-bit

XX0010
6-bit

Or
000000
6-bit

RS
5-bit

RT
5-bit

RD
5-bit

SHAMT

XX0101

5-bit

6-bit

or $1, $2, $3
$1 is RD, $2 is RS, and $3 is RT
Perform a bitwise OR operation on each bit of RS with RT and store into register rd
And
000000
6-bit

RS
5-bit

RT
5-bit

RD
5-bit

SHAMT

XX0100

5-bit

6-bit

and $1, $2, $3


$1 is RD, $2 is RS, and $3 is RT
Perform a bitwise AND operation on each bit of RS with RT and store into register rd

Slt
000000
6-bit

RS
5-bit

RT
5-bit

RD
5-bit

SHAMT

XX1010

5-bit

6-bit

slt $1, $2, $3


$1 is RD, $2 is RS, and $3 is RT
Subtract register rt from register rs and store the sign of the subtraction into register rd

I-Type
I-Type Instructions involve using immediate, or constant values. Our processor can perform
three immediate instructions: load word, store word, and branch equals. Load word loads into a
register a value from memory, and store word stores into memory a value from a register.
Branch equals will perform a branch if the parameter registers are equal. I-Type Instructions use
the following format:
OPCODE
6-bit

RS
5-bit

RT
5-bit

OPCODE Determines which operation we will use.


RS The first parameter register.
RT The second parameter register.
IMMEDIATE The immediate value.

IMMEDIATE
16-bit

Lw
100011
6-bit

RS
5-bit

RT

IMMEDIATE

5-bit

16-bit

lw $1, 53($0)
$0 is RS, $1 is RT, and 53 is the IMMEDIATE
Loads into register rt from memory address equal to the value in register rs + IMMEDIATE.

Sw
101011
6-bit

RS
5-bit

RT

IMMEDIATE

5-bit

16-bit

sw $1, 53($0)
$0 is RS, $1 is RT, and 53 is the IMMEDIATE
Stores the value in rt into the memory address which is equal to the value in register rs +
IMMEDIATE

Beq
000100
6-bit

RS
5-bit

RT

IMMEDIATE

5-bit

16-bit

beq $1, $0, 53


$0 is RT, $1 is RS, and 53 is the IMMEDIATE
Branches if RS and RT are equal. This means that if the registers are equal, the new program
counter will be program counter + IMMEDIATE + 1.

Jump
The jump instruction jumps to an absolute location in memory. Jump instructions follow the
following format:
OPCODE
6-bit

ADDRESS
26-bit

OPCODE Determines which operation we will use. For Jump instructions, the Opcode is
0b000010.
ADDRESS The address in memory to jump to.

j
000010
6-bit

ADDRESS
26-bit

j 53
53 is the ADDRESS
Jumps to the given address.

Design
For our processor we designed a 32-bit MIPS-like pipelined processor. We decided to build a
32-bit implementation as it did not seem much more difficult than a 16-bit processor, which was
the other suggested processor size, and because a 32-bit processor is superior to a 16-bit
processor in regards to the size of data we can manipulate. For example, any R-Type data
manipulation on a 16-bit processor can be performed on a 32-bit processor and a 32-bit processor
can perform operations on data sizes up to 216 times larger than the 16-bit processor.
The processor has five pipelined stages: instruction fetch, instruction decode, execution,
memory, and write back. Pipeline registers hold values between instruction fetch/instruction
decode (IF/ID), instruction decode/execution (ID/EX), execution/memory (EX/MEM), and
memory/write back (MEM/WB). Registers partition the different stages in order to shorten the
clock cycle time which allows us to use the multiple stages in parallel and therefore shorten the
total execution time.
Our processor has two clocks signals. Most components run on the SYSTEM_CLOCK2 clock
speed except the instruction memory and data memory which use the SYSTEM_CLOCK.
SYSTEM_CLOCK is the clock that is externally driven and SYSTEM_CLOCK2 runs at twice
the period of SYSTEM_CLOCK, making it essentially half the speed of SYSTEM_CLOCK.
This is because when we initialized the memory using the Quartus wizard there were registers
placed in front of the memory addresses that we could not remove. Therefore, an easy fix was to
have these use a faster clock than the rest of the program.
We start the processor by first initializing all registers including the Program Counter (PC) to
zero, initializing all memory using our mif files, and setting our SYSTEM_CLOCK. From this
point our processor handles all the program instructions through the following pipeline stages.

Instruction Fetch
The purpose of the instruction fetch stage is to read the current instruction. A value is outputted
from the program counter which is read by the instruction memory. This value tells the
instruction memory which value in memory to output into the IF/ID register. Then the program
counter is either incremented by one or modified by a branch or jump instruction.

The following components are used in this stage:


PC
The program counter is an eight bit register that is initialized to zero. It is used to store the
current line of code of the program.
Instruction Memory
This holds up to 256 values which are 32-bit instructions for our program. It was made with the
Quartus memory wizard.
PC Selector
This determines how the program counter gets incremented. Its inputs are two 6 bit op codes
BranchOP and JumpOP, one 1 bit BranchTaken, and three 8 bit PC values BranchPC,
JumpPC, and PCPLUS1. BranchTaken represents whether or not a branch instruction has been
taken in the Execution stage and the program needs to branch to the value of BranchPC. The op
code is defined as bits 31-26 of the instruction memorys output. This op code is propagated to
the instruction decode in one cycle and the execution stage in two cycles. It compares the op
code at the execution stage with the op code for a branch (0b000100). If these are equal and
BranchTaken is a 1, then the program counter will then be set to BranchPC. If those are not
equal, it then compares the op code for a jump (0b000010) with the op code at the decode stage
(JumpOP). If they are equal it will set the program counter to JumpPC, a signal that comes from
the decode stage. Otherwise one is added to the current program counter with a simple adder
(PCPLUS1).
Branch Op Mux
This value normally outputs the op code from the Execution stage to the PC Selector. However
when a branch is taken during the Execution stage or a jump is taken during the Instruction
Decode stage, the Flush Control will set the output of this Mux to all zeroes for two cycles,
effectively disabling branches for the instructions which would be at the Instruction Decode
stage and the Instruction Fetch stage.
Jump Op Mux
Similar to the Branch Op Mux, this Mux normally outputs the op code from the Instruction
Decode stage to the PC Selector and when a branch is taken during the Execution stage or a jump
is taken during the Instruction Decode stage, the Flush Control will set the output of this Mux to
all zeroes for a number of cycles depending on whether it was a branch or jump were taken.
Again, this is to disable jumping for the instructions which would be at the Instruction Fetch
stage.

Instruction Fetch/Instruction Decode Registers


Things registered in this stage are:

Instruction Memory output

Program Counter output

Instruction Decode
The instruction decode stage is where the instruction is decoded into its different parts. The
instruction to decode is the 32-bit instruction from the IF/ID register, which was written to by the
previous stage. Bits 31-26 go into our control box and are called the OP Code. The other bits of
the instruction determine the registers to be read and written to, whether or not memory will be
used, and everything else about what the instruction makes the processor do.
The following components are used in this stage:
Control
This is a combinational circuit that takes the op code and outputs control signals for the ALU
Control, as well as the RegDst which determines which register will be written to, ALUSrc
which determines whether the offset from the instruction itself or Data 2 from the register file
will be used as a source for the ALU, MemtoReg which determines whether or not data should
be stored from memory or from the ALU into the register file, RegWrite which determines
whether or data will be written to the registers, MemRead which determines whether or not
memory will be read from, MemWrite which determines whether memory will be written to,
and Branch which determines whether or not there is a branch instruction.
Registers
This is a register file containing thirty-two 32-bit registers made up of d flip-flops. It can read
two registers at once and also write a single register value at the negative edges of
SYSTEM_CLOCK. All registers are initialized to zero, and register zero cannot be written to
and is always 0.
Sign Extender
The sign extender interprets bits 15-0 from the instruction as twos complement and outputs an
equivalent thirty-two bit value.
Sign Extender/Read Data 2 Mux
Selects either the sign extended value or Read Data 2 value depending on if ALUSrc is 0 or 1
respectively.
NOP Mux
If the Hazard Detection Unit detects that an instruction is trying to use a value from a register
that has not been loaded from memory yet, this is a hazard and the mux will set all the control
signals to zero which essentially creates No Operation (NOP) Instructions because nothing is
being written. This is to stall the processor until the memory values are loaded and ready to use.
If the Hazard Detection Unit does not detect a hazard then this mux outputs the normal control
signals.

Flush Write Mux


Similar to the NOP Mux, if a branch or jump is taken, the Flush Control will force this mux to
set the instruction controls to zero which effectively turns the instruction into a NOP. If no
branches or jumps are taken then the mux outputs the controls as normal.

Instruction Decode/Execution Registers


Things registered in this stage are:

Program Counter from the IF/ID register


Control Signals from the control output
Register RS which comes from bits 25-21 of the Instruction register in the IF/ID registers
Register RT which comes from bits 20-16 of the Instruction register in the IF/ID registers
Register RD which comes from bits 15-11 of the Instruction register in the IF/ID registers
The function field which comes from bits 5-0 of the Instruction registered in the IF/ID
registers
The offset which is the output of the sign extender
Read data 1 which is the read result from read register one
Read data 2 which is the read result from read register two
The multiplexed value of read data 2 and the sign extended bits

Execution
The execution stage is where the data manipulation takes place. This is done with our ALU. Not
only is the ALU used to compute R-Type instructions such as adding two registers together, but
it is also needed to calculate things such as the offset of registers for immediate type instructions.
The following components are used in this stage:
ALU Control
The ALU Control sends a three bit code to the ALU to determine which of the following types of
arithmetic or logic to do: add (010), subtract (110), or (001), and (000), slt (111). The ALU
control is a combinational circuit that takes the function field.

ALU
The ALU takes in two thirty-two bit inputs, which come from our data forwarding unit, and has
one thirty-two bit output that goes into the EX/MEM register. The arithmetic or logic that is
performed depends on the ALU control.
To design our thirty-two bit ALU we first started with making a thirty-two bit adder. To design
the thirty-two bit adder we started with a one-bit full adder. This one bit adder could add two
input bits (A and B) along with the carry in bit (Cin) with the simple digital logic output =

(A^B)^Cin. Additionally, it could subtract one bit from another using a subtract signal labeled
BInv with output = A^(B^BInv)^Cin. Instead of rippling thirty-two of these bits together we
used multiple carry look ahead adders. The advantage to a carry look ahead adder as opposed to
a ripple adder is that it takes less time to compute the output as it has propagate and generate
signals to determine the carry in. This makes it so the adder doesnt always have to wait for the
addition of the previous bit to know if it has a carry in or not. A basic one bit carry look ahead
block was cascaded into a four-bit carry look ahead unit. The one-bit adder signals were then fed
into the four bit design to give us a four-bit adder:

These four-bit adders were then strung together into two separate two-level, 16-bit carrylookahead adder with a fifth carry look ahead unit. These were then strung together to give us a
32-bit carry-lookahead adder/subtractor.
Once we had the 32-bit adder/subtractor we needed to add logic to do OR, AND, and SLT
operations. The OR and AND were made by using OR and AND gates at a one bit level for each
individual bit. The SLT was implemented using our subtractor to subtract the second input from
the first and then output the sign of this subtraction. All of these operations always happen
inside of the ALU, but the final result is multiplexed out according to ALU Control.
Rt/Rd Mux
Selects register Rt or Rd from the ID/EX register stage depending on if RegDst from the ID/EX
stage is a 0 or 1 respectively. This chooses what register will be written to during the Write Back
stage.

Execution/Memory Registers
Things registered in this stage are:

Control Signals from the ID/EX register stage


ALU output
Read data 2 from the ID/EX register stage
The output from Rt/Rd Mux named write address

Memory
This stage is where the data memory is read from and written to. This stage is especially
important to our processor as there are no load immediate instructions in its instruction set and so
all data that is to be manipulated must first be initialized in the memory using the mif file.
The following component is used in this stage:
Data Memory
This holds multiple 32-bit values of data for our program to use. Data can be loaded from it, and
data can be stored into it. It was made with the Quartus memory wizard.

Memory/Write Back Registers


Things registered in this stage are:

Control Signals from the EX/Mem register stage


Data Memory Output
ALU output from the EX/Mem register stage
Read data 2 from the EX/Mem stage
Write address from the Ex/Mem stage

Write Back
The Write Back stage is where data, whether it has been calculated in the ALU or loaded from
memory, is stored into the register file.
The following components are used in this stage:
MemtoReg Mux
Selects either data memory output or ALU Out from the Mem/WB register stage depending on if
the control signal memToReg is a 0 or 1 respectively.
Registers
These are the same registers that were in the instruction decode stage, except now instead of
reading from them the processor is writing to them.

Components Outside Pipelined Stages


SYSTEM_CLOCK
This clock is externally driven and input into the processor. This is used to drive the entire
processor either directly or indirectly. It directly drives the SYSTEM_CLOCK2, the Instruction
memory and the Data Memory. Through SYSTEM_CLOCK2, which has half the frequency of
SYSTEM_CLOCK, it indirectly drives every other part of the processor because
SYSTEM_CLOCK2 is used as the clock for every other part of the processor.
SYSTEM_CLOCK2
Outputs a clock signal that runs at half the frequency of SYSTEM_CLOCK for the pipeline
registers and register file to use.
Hazard Detection Unit
This will stall the pipeline when an instruction is trying to use a value that has not been loaded
from memory yet. It does this by disabling PC and the IF/ID registers from being written to and
setting all the control signals at the Instruction Decode stage to zero, which creates a NOP.
Forwarding Unit
Determines the input for the ALU. Normally this is going to be read data 1 and read data 2 from
the ID/EX register stage. However, if the EX/Mem write address is the same as read data ones
register address or read data twos register address, and write enable is on, this will use the data
at the Ex/mem register for whichever register address was equal to the EX/Mem write address.
Additionally, if it does not forward data from the EX/Mem stage it checks to see if it needs to
forward data from the Mem/Wb stage. When the Mem/Wb write address is the same as read
data ones register address or read data twos register address, and write enable is on, this will
use the data at the Ex/mem register for whichever register address was equal to the Mem/Wb
write address.
Flush
Flushes instructions that were read but that we do not want to execute. This is done by checking
if a branch or jump took place, and then setting the write enables to zero for two cycles if it was a
branch and for one round if it was a jump. This also controls the Branch OP MUX and Jump OP
MUX and sets their outputs to 0 if a branch or jump has taken place. This effectively disables
any jump or branch instructions that were read or decoded before this branch or jump.

Control Design
Normally our processor has the instruction fetch, instruction decode, execution, memory, and
write back stages all happening at once as they are independent parts. In one cycle, data flows
from the instruction fetch to the instruction decode stage, the instruction decode to the execution
stage, the execution stage to the memory stage, and the memory stage to the write back stage.

However, there are some special cases that need to be taken into consideration. Consider the
following assembly code:
Add $t1, $t2, $t3
Add $t4, $t1, $t5
This is an example of a data hazard. The new value of $t1 has not yet been propagated to the
write back stage when we are trying to add it to $t5. To solve this we used data forwarding from
our forwarding unit. This way it takes the last output of the ALU, which was stored in the
EX/MEM register, and uses it for data 1 in instead of the read data from the registers. Also
consider
Lw $t1, 100($t3)
Add $t4, $t1, $t5
Again, $t1 is not the value it needs to be when we are calling the add command, so this is another
example of a data hazard. However, this is slightly different than our last example. Here the
data needs to be forwarded from the MEM/WB register instead of the EX/MEM register, which
our forwarding unit also does. Additionally, we must insert no ops with our hazard detection
unit at this point. This is because we must stall for at least one cycle before we can forward our
data from the memory/wb register.

Test Program
Our first test program is designed to find the maximum value within a group of registers:

The data the first program used is as follows:

Start of Program 1 (Finds Largest Value in 10 registers)

End of Program 1 (Finds Largest Value in 10 registers)

The beginning and end of the waveform from our program that finds the largest value in a group
of registers, and then puts that value into register three is shown above. If one looks at the data
memory, it can be seen that 333 is the largest value, and that value is stored into register 3 at the
end of the program. Also note that only the beginning and middle parts of the waveform are
shown, as it would be impractical to print the entire program.

Our second test program does nothing useful except to see if the processor does what it is
supposed to:

The data the second program used is as follows:

Start of Program 2 (Extra Tests)

End of Program 2 (Extra Tests)

Following the wave form through each cycle and examining the outputs of the different
components, it is clear that this program executes as expected.

You might also like