Design of A General Purpose 8 Bit RISC Processor For Computer Architecture Learning
Design of A General Purpose 8 Bit RISC Processor For Computer Architecture Learning
Design of A General Purpose 8 Bit RISC Processor For Computer Architecture Learning
Introduction
Instruction Execution
Memory Organization
Program Memory
o ROM TYPE
o Used to store sequence of instruction (Program)
o Organized as a linear sequence of 256 deep x16 bit write memory locations
8-bit wide address
8-bit PC (program counter) to locate next instruction
o NON-VOLATILE
o Reading and writing operation
READING: When a program is being executed
WRITING: When a program is going to be loaded into the processor
Data Memory
o RAM type memory segment
Used to store data generated by the main program
Can be variable value or a constant value
256 allocations, 8 bit wide
Address is 8 bit wide
VOLATILE MEMORY
2 operation modes
READ:
o Only needs to set the desired address and the data is
available at the output right away
WRITE: Must follow the steps
o 1. Set desired address at the address bus
o 2. Set the desired data in the input data bus
o 3.Data is stored at the desired location and is available at
the output bus
Instruction set
Made up of the assembly code and the machine language binary format
Must be simple and robust
Simple instructions are selected so that the processor can execute them in the fewest
steps possible.
Instructions can be classified according to their purpose
o Operations: Affects the register Value
Arithmetic operations
ADD/Subtract registers and immediates
Logic Operations
AND, OR, NOT, SHIFT RIGHT, SHIFT LEFT, SWAP
o Program Control: Affect the execution order
Branch, Jumps, Conditional branches
o Data Transferring: Affect the memory contents
Load and storage
Addressing Modes
Form to connect the processor to the different memories (To interchange information)
o Rule for interpreting or modifying the address field.
Processor includes six addressing modes
o Program Memory Direct
When a new value is introduced to the PC, causing it to change address
o Immediate
Use a constant value K to affect a register
o Data Memory Direct
Instructions use a constant value to select address to the data memory
(can read or write)
o Register Direct, Two Registers
Operates on two registers (arithmetic or logic) and saves it in one of the
registers (destination register)
o Data Memory Indirect Through Register
Data memory is addressed by the contects of a register [] and execture
read or write operations on a second register RD<=[RS] Means: Contents
in direction RS saved to RD
o Register Direct, Single Register
1 register is used from GPR to affect its contents according to an
arithmetic or logical operation
Instruction Format
Refers to the order of the bits of the instruction (organization)
Every instruction uses a different identifier or operation code (OPCODE).
o Length of OPCODE depends on the total number of instructions (5 bits for this
processor)
The instructions can be classified according to their format type
o Type J: Used for jumps
o Type I: Use a register and an immediate
o Type R: Use registers to perform operations
o Type D: Carry out operations without the need for parameters
Not all instructions use all of the bits but it is desirable to have a universal instruction
length
Functional Units
A processor is made up of many hardware blocks that are necessary for data processing. All the
blocks have to satisfy certain logic design criteria to accomplish their task. Sometimes a designer
has to make their own block or sometimes they can get it from a library (commonly used blocks) .
The control unit must use the control signals that every block has to coordinate the functioning of
the blocks.
Program Counter
o Binary counter that produces the address of an instruction in the program
memory. This is how the computer keeps track of its location in a program. A
program memory must also be capable of loading a pre-defined address if
required.
A common 8 bit binary counter with parallel load is use
Instruction Register
o Divided into two 8 bit registers
Instruction Register (IR)
Stores the 8 most significant bits which contain the OPDCODE and
a register parameter (INSTRUCTION PARAMETERS)
Instruction Data Register (IDR)
Stores the 8 Least significant bits which contain the constant or
immediate used by the instruction (Instruction data)
o Made up of parallel array of D-Type Flip-Flops
o An 8-bit register (ADRESS REGISTER) is used to store the PC while the instruction is
executed.
Instruction decoder
o In charge of decoding Data stored in the instruction Registers
Splits the MSB and LSB 8 bits of the instruction into the OPDOCDE, RS, RD,
AND K constant.
The split data is sent to the units that require it
o The decoder is made up of buffers inside a block to sort the signals to separate
buses
General Purpose Registers
o Registers Used to store and save operands or results during the program
execution.
o Can share data directly with the ALU and data memory (high speed calculations)
o Control unit controls read and write
o Internally
Consists of eight 8bit registers
Pair of 8 bit multiplexers
8 bit output decoder to control which register is read or written
Reads two registers at a time but only writes one register at a time
Arithmetic-Logic Unit (ALU)
o Executes arithmetic and logical calculations
o Executes simple operations (these can be used to make more complex operations)
Control Unit
o State machine that synchronizes the operation of all the other functional blocks. It
sets the functioning order according to the OPCODE of the instruction. The state
diagram used to design the Control Unit must take in to account the following
considerations:
Must have a reset state present at start up (considering initial conditions
of blocks)
Second state must be the FETCH state (retrieved from program memory
and loaded into IR and IDR register)
Instruction decoding happens right away once the instruction is fetched
Next state is the execute state
The final stage involves storage of the result into the respective functional
unit, then the PC is incremented
o Design of the control unit is the most challenging part of the processor design,
Must take every functional block into account.
MULTI CYCLE VS SINGLE CYCLE
o This processor is a multi cycle processor
o Multi Cycle:
Can be 1.27 times faster than single cycle
Higher clock speed
Less hardware
o Single Cycle: Al instructions have the same clock cycle length (Means wasted time
for shorter instructions)
Clock cycle is determined by the longest path
More hardware
PIPELINED 8 BIT RISC PROCESSOR DESIGN USING VERILOG HDL ON FPGA 07808194
Introduction
Due to the betterment of field programmable gate arrays, we have reached a point where
the architecture of processors can be modified by programming in HDL
Main difference between traditional processors and fpga based processors is that with
fpga processors one can make significant changes to the datapath itself
In risc processors load and store are the only operations used to acces memory
o The rest are performed on a register-to-register basis
Clock Gating: A method to reduce clock power, dynamically terminates the clock signals in
unused modules of the total hardware
Universal Asynchronous Receiver Transmiter (UART): Type of serial communication
protocol, which is mostly used for short- distance, low speed, low-cost data exchange
between computers and peripherals
Asynchronous serial communication: high reliability, less transmission line and long
distance transmission, extenseviley used as a mode of communication between computers
and peripherals. This is usually implemented by UART
Pipelining
o Designed to improve performance and provides a way to reduce the average
execution time per instruction (decreasing the number of clock cycles per
instruction [decrease the number of clock cycles per instruction])
o While executing one instruction, the next instruction is fetched
The pipelining architecture used for the processor is:
FPGAS provide the highest degree of flexibility and are almost fully application neutral
o Sacrifices: Higher Usage of basic Logic gates
o Decrease in circuit operation frequency
Caused by the use of switched interconnect fabric
o Soft-CPUs allow
Improve or fully replace CPCU architecture In the field
Implement hard logic solutions with maximum reliability
Implement functions that can be efficiently implemented only with hard
logic (Example: Coding/Decoding)
Implement functions that require hard deterministic and fixed timing
Full control over technologies used
Applicability of Moores law for general purpose FPGA devices
o MAIN METRICS
Metrics used to compare ASIC and FPGA-Based soft CPUS
Number of gates (Transistors)
Maximum Internal Clock Frequency
Maximum clock frequency for FPGA based soft-CPU is usually 3.5 slower
then ASIC implementations
The clock signal has to pass through several statically controlled
FET switches with higher resistence then ASICs metal
interconnects
Economics of soft-CPU adoption
o ASIC development and NRE costs are much higher
Unit cost for huge production volumes is much lower
o FPGA development and NRE costs are much lower
Final unit cost is high even for large volumes
o In I development, the cost for different approaches should be considered. There
are 3 types of costs
Unit cost for final product (Cprod)
The cost of production for one device unit and consists for cost of
preparation for production one-time non-recurring engineering
cost
o One time cost to research, design, develop and test a new
product
Development cost (Cdevel)
Cost of time to market delay (Cttm)
INTRODUCTION
THE PROCESSOR
Low power
Low cost
8-bit microprocessor
Fully static core
o The main processing unit can be stopped by stoping the system clock oscillator
that is driving I t
It maintains this state until the clock is introduced again and then the
processing resumes where it stopped without a problem
When this happens, they consume very little power
Useful in designes where MPU remains in standby mode until
needed a
Features:
o 8 bit data bus
o 16 bit address bus
o 8 bit ALU
o 16 bit PC
o 69 Instructions
o 16 addressing modes
o 212 operation codes (OpCodes)
o Variable length instructions
Provides for lower power and smaller code optimization over fixed length
instruction set processors
FUNCTIONAL DESCRIPTION
o The organization of the core is divided into two parts. The register section and the
control section
Instructions from program memory are executed withtin the register
section
The signals that cause the data transfers are generated within the control
section
Instruction Register (IR) and Decode
o The OPCODE portion of the instruction is loaded into the instruction register and is
latched during the OpCode fetch cycle
o The instruction is then decoded to generate various control signals for program
execution
Timing Control unit (TCU)
o Provides timing for each instruction cycle that is executed. It is set to zero for each
instruction fetch, and it is advanced until the instruction Is completed
o The data transfers between the registers depends on decoding the contents of
both the IR and the TCU
Arithmetic and Logic Unit (ALU)
o All arithmetic and logical operations take place within the ALU
o Also calculates the effective address for relative and indexed addressing nodes
o The result of the operation can be stored in memory or registers
o The flags (Carry, Negative, Overflow and zero) are updated after the ALU has done
the operation
Accumulator Register (A)
o 8 bit general purpose register that holds one of the operands and the result of the
ALU
Index Registers (X and Y)
o Two 8 bit index registers
Can be used as general purpose registers or to provide an index value for
calculation of the effective address
When executing an instruction with indexed addressing
Processor fetched OPCODE and a base address
o The address is modified by taking the contents of the
index register and adding them to the address prior to
performing the desired operation
Processor Status Register (P)
o Contains status flags to report to the ALU
o On top of the status flags, the status register also contains mode bits for user
input
Program Counter Register (PC)
o A 16 bit register which provides the addresses that are used to execute a program
Every time an instruction or operand is fetched from the program
memory, the register is incremented
Stack Pointer Register (S)
o 8 bit register to indicate the next available address is the stack memory
o
16 adressing modes
o An aspect of the instruction set architecture
o The addressing modes define how the instruction is read (what bits are the
OPCODE, What bits are the registers, constants, etc.)
BUILDING EMBEDDED SYSTEMS USING SOFT IP CORES “HANDS-ON EXPERIENCE WITH ALTERA
FPGA DEVELOPMENT BOARDS”
The NIOS II is the most widely used soft processor in FPGA industry
Soft IP (synthesizable IP) provides customers with lot of design flexibility because they
allow the customer to alter the design at functional level
Soft Core Processors for Embedded Systems
In the mid to late 1990s, Soft-Core FPGAs were mainly used for research because they were
expensive and had low performance. They also occupied most of the space within the FPGA
meaning that it could only be used for that
o Nowadays a soft core processor does not occupy a lot of space within the FPGA
Efficiency is a ratio of performance to cost
CONNFIGURABLE PROCESSORS
o FPGA processors and ASIC processors support generation-time-configuration
This allows designers to trade off cost and performance as needed
Examples of this are: Pipelining, cache size, multiplier implementation
o FPGA designers have an advantage over ASIC designers due to the configurability
nature of the FPGA. Where FPGA designers can test their designs in real life without a
problem, ASIC designers test their designs in a simulation which is not as accurate.
o FPGA designers can also tune their designes much easier to meet requirements
CHALLENGES OF FPGA PROCESSOR DESIGN
o When working with FPGAS, designers need to develop solutions appropriate for FPGAs
and not adopting solutions that work for other forms of implementation
o Some techniques used by ASIC processors to increase performance might not work in
FPGA due to their difference in how they work.
o Designers also need to accommodate the low efficiency of FPGA resources relative to
ASICs.
An efficient soft processor needs to have a simple instruction set on a simple
pipeline.
Higher levels of application performance are available by using
multiple FPGA processors, adding custom instructions, and/or adding
custom accelerators
OPPORTUNITIES OF FPGA PROCESSOR DESIGN
o Despite the disadvantages relative to ASIC, the flexibility offered provides unique
opportunities in processor design.
An FPGA designer can change their processor configuration whenever
They can make their own custom periferals
When using ASICs if the system requirements change then the entire ASIC
needs to be changed
ASICs are usually made to provide more performance than required
o This tends to increase the cost and size of the processor
o The end user is the one that pays for the increased
performance that is not being used. THERE ALSO EXISTS THE
POTENTIAL THAT THE REQUIREMENTS SURPASS EVEN THE
EXTRA PERFORMANCE GAP.
o FPGAS CAN AVOID ADDITIONAL COST BY BEING CONFIGURED
WITH MINIMAL OR NO PERFORMANCE MARGIN
The usage of FPGAS also offers debug facilities, these allow the software
developers to control the processor and observe its state
The extent of the debug facilities for ASIC processors is fixed once it is
produced
o Some debug facilities include:
Stepping, breakpointing, watchpoining, tracing, and
examining/modifying memory and registers
o IN AN FPGA DEBUG FACILITIES CAN BE ADDED AS NEEDED
AND REMOVED WHEN THE DESIGNER IS DONE USING THEM
FPGA LOGIC OVERVIEW
In order to make good designs, the processor designer must have a
good understanding of FPGA devices
o FPGAS are composed of logic elements, RAM blocks,
Multiplier blocks, and routing
Routing occupies most of the die area
The resources are configured through SRAM blocks
that get loaded with configuration information every
time power is applied to the FPGA. The configuration
file Is typically held in non-volatile memory
LOGIC ELEMENTS
o Typical Logic element
o
o Consists of 4bit input look up table, carry chain logic and a flip
flop.
SRAM is used to hold the contents of the look up
table, the inputs are connected to the address of the
SRAM and the lookup table computes the result based
on the inputs
The flip-flop stores the output of the carry logic or the
lookup table
o RAM BLOCKS
FPGAS usually require dedicated RAM blocks
These blocks typically support simple dual
port (one read and one write port)
The RAM locks are typically only available as
synchronour SRAMs (Synchronouor refers to
the fact that the data transfer is controlled by
a clock, either falling endge or rising edge of
the clock)
o MULTIPLIER BLOCKS
Composed of several small multiplies
Some fpgas offer dedicated circuitry to
combine smaller multipliers
o The ones that offer dedicared circuitry
usually offer higher frequency
On top of multiplication, these blocks can also provide
other features, like:
Saturated arithmetic, accumulators, or barrel
shifters
o FPGA ROUTING
o
2.1.1 PROCESSOR
2.1.2 Memory
There are 3 types of memory that are found within an embedded system
o RAM: Is a hardware component within the syste that is used to temporary store
data during the execution of the program
o ROM: Is also a hardware component that stores the information needed for the
system to work (the program)
o Cache: Used to store information from slower memory to speed up processing
times
The clock is used to synchronize all of the computers operations. There are two ways of
doing this. The instructions or an operation is done as the edge of the clock rises or can be
done as the edge of the clock falls
2.1.4 Peripherals
Peripherals are devices that are connected to the cpu, they are not part of the computer
itself but supply the cpu with information that is needed to execture the processes
It is quite difficult to define standards for embedded systems because the application
dictates the design choices
A good embedded design will optimize between various metrics of design (again the
importance of the metrics is dictated by the application)
o Unit cost and NRE cost
o Size and weight
o Performance and power consumption
o Flexibility and maintainability
o Time-to-market
o Correctness
o Security of the system
The platform on which an embedded system can be developed varies and it depends on
the complexity cost (practically the metrics mentioned)