Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
100% found this document useful (1 vote)
1K views

Project Report Arithmetic Logic Unit (ALU)

This document provides an introduction to VHDL (VHSIC Hardware Description Language). It describes VHDL as a hardware description language that can model digital systems at various levels of abstraction. The key constructs of VHDL that allow it to model digital systems include sequential languages, concurrent languages, net lists, timing specifications, and waveform generation. VHDL supports structural, dataflow, and behavioral modeling styles and allows hierarchical and mixed modeling. It also describes entity declaration, architecture declaration, configuration declaration, and packages as the main design units in VHDL.

Uploaded by

Hemant Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
1K views

Project Report Arithmetic Logic Unit (ALU)

This document provides an introduction to VHDL (VHSIC Hardware Description Language). It describes VHDL as a hardware description language that can model digital systems at various levels of abstraction. The key constructs of VHDL that allow it to model digital systems include sequential languages, concurrent languages, net lists, timing specifications, and waveform generation. VHDL supports structural, dataflow, and behavioral modeling styles and allows hierarchical and mixed modeling. It also describes entity declaration, architecture declaration, configuration declaration, and packages as the main design units in VHDL.

Uploaded by

Hemant Sharma
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 71

PROJECT REPORT ON

ALU
(ARITHMETIC &
LOCICAL
UNIT)

BY
NARESH SINGH DOBAL
9540493245
nsdobal@gmail.com

1.

INTRODUCTION
TO
VHDL

Introduction

VHDL is an acronym for VHSIC Hardware Description


language (VHSIC stands for Very High Speed Integrated
Circuits). It is a hardware description language that can be
used

to

model

digital

system

at

many

levels

of

abstraction ranging from the algorithmic level to the gate


level. The complexity of the digital system being modeled
could vary from that of a simple gate to a complete digital
electronic system, or anything in between. The VHDL
language can be regarded as an integrated amalgamation
of the following languages:
Sequential languages +
Concurrent languages +
Net list languages +
Timing specification +
Waveform generation language

=> VHDL

Therefore the language has constructs that enable to


express the concurrent or sequential behavior of a digital
system with or without timing. It also allows modeling the
system

as

waveforms

an
can

interconnection
also

be

of

generated

components.
using

the

Test
same

constructs. All the above constructs can be combined to


provide a comprehensive description of the system in a
single model.

1.1

Advantages of VHDL over other

hardware description languages

1. The language can be used as a communication medium


between different CAD and CAE tools.
2. The language supports hierarchy; that is, a digital
system can be modeled as a set of interconnected
components each component in turn can be modeled as a
set of interconnected subcomponents.
3. The language supports flexible design methodologies
top down, bottom-up or mixed.
4. It supports both synchronous and asynchronous timing
models.

5. Various digital modeling techniques such as finite state


machine

descriptions,

algorithmic

descriptions

and

Boolean equations can be modeled using this language.


6. The language is publicly available, human readable,
machine readable and not proprietary.
7. The language supports three basic different description
styles: structural, dataflow and behavioral.
8. Arbitrarily large designs can be modeled using the
language and therefore there are no limitations imposed
by the language on the size of a design.
9. The model can not only describe the functionality of a
design, but also contain information about the design
itself in terms of user-defined attributes, such as total
area and speed.
10. The capability of defining new data types provides the
power to describe and simulate a new design technology
at a very high level of abstraction without any concern
about the implementation details.

1.2

VHDL : The language

VHDL is a hardware description language that can be


used to model a digital system. The digital system can be
as simple as a logic gate or as complex as a complete
electronic system.
To describe an entity VHDL provides five different
types of primary constructs, called design units. They are
1. Entity declaration.
2. Architecture declaration.
3. Configuration declaration.
4. Package.
5. Package body.

Entity declaration

An entity is modeled using an entity declaration and


at least one architecture body. The entity declaration
describes the external view of an entity. The entity
declaration

specifies

the

name

of

the

entity

being

modeled and lists the set of interface ports. Ports are


signals (wires) through which the entity communicates
with the other models in its external environment. An
example for a half-adder circuit is given below.

Figure: AND_GATE

This entity called AND_GATE has four input ports In1,

In2, In3, In4 and one output ports Out1 .std_logic is a


predefined type of language construct.

1.2.2 Architecture body


The second important part of a VHDL source file is
the architecture declaration. Every entity declaration you
write must be accompanied by at least one corresponding
architecture. An architecture declaration is a statement
that describes the underlying function and/or structure of
a circuit. Each architecture in your design must be
associated by name with one entity in the design. The
architecture body contains the internal description of the
entity. The internal structure can be specified by any of
the following modeling styles.
a) As a set of interconnected components.
b) As a set of concurrent assignment statements.
c) As a set of sequential assignment statements.
d) As a combination of the above three.

VHDL architectures can be classified as

Behavioral
Defines a sequentially described functioning of the design.
Structural
Defines a interconnections between previously defined
components.
Dataflow
A combination of structural and behavioral architectures .

The

different

modeling

styles

explained below

A.

Structural style of modeling

are

This is modeled as a set of interconnected


components. Such a model for an AND4 is shown.
The name of the architecture body is AND4. The
architecture

body

is

composed

of

two

parts:

the

declarative part (before the keywords begin) and the


statement part (after keywords begin). Two component
declarations are present in the declarative part of the
architecture body.
The declared components are instantiated in the
statement part of the architecture body using component

instantiation

statements.

U1,

U2,

and

U1

are

the

component labels for these component instantiations. The


first component instantiation statement labeled U1, shows
that signals A and B are connected to input port In1 and
In2 of component AND2 ands TEMP1 is connected to
output port AND2 entity. Similarly in the second and third
component instantiation statement, signals A and B are
connected to respective ports of AND entities.

B. Data flow style of modeling


In this modeling style, the flow of data through the
entity is expressed primarily using concurrent signal
assignment statements. The structure of the entity is not
explicitly specified in this modeling style, but it can be
implicitly deduced. The data flow model of the AND4 entity
is given below.

The dataflow is described using two concurrent


signal

assignment

assignment

statements

statements).

In

(or
a

sequential
signal

signal

assignment

statement, the symbol <= implies an assignment of a


value to a signal. The value of the expression on the right
hand side of the statement is computed and is assigned to
the signal on the left-hand side, called the target signal. A
concurrent signal assignment statement is executed only
when any signal used in the expression on the right hand
side has an event on it that is the value for the signal
changes. Delay information is also included in the signal
assignment statements using after clauses.

C.

Behavioral style of modeling

The behavioral style of modeling specifies the behavior of


an entity as a set of statements that are executed
sequentially in the specified process statement. They do
not explicitly specify the structure of the entity but merely
its functionality. A process statement is a concurrent
statement that can appear within an architecture body.
For example, consider the following behavioral model for
the same AND4.

A process statement also has a declarative part


(before keyword begin) and a statement part (between
keyword

begin

appearing

and

within

end

the

process).

statement

The

part

statements

are

executed

sequentially. The list of signals specified within the


parentheses after the keyword process constitutes a
sensitivity list and the process statement is invoked
whenever there is and event on any signal in the list. In
the example when an event occurs on In1, In2, In3 or In4
the statements appearing within the process, statements
are executed sequentially. However, all the processes that
appear in a design are executed concurrently. Signal
assignment statements appearing within a process are
called

sequential

Sequential
variable

signal

signal

assignment

assignment

assignment

statements.

statements,

statements,

are

including
executed

sequentially independent of whether an event occurs on


any signals in its right-hand side expression.

D.

Mixed style of modeling


It is possible to mix the three modeling styles that

were described before in a single architecture body. That


is, within an architecture body, we could use component
instantiation
therefore

statements

their

order

and
of

concurrent
appearance

statements;
within

the

architecture body is not important. Note that a process


statement itself is a concurrent statement; however
statements

within

executed sequentially.

process

statement

art

always

1.2.3 Configuration declaration

A configuration declaration is used to select one of


the possibly many architecture bodies that an entity may
have, and to bind component instances to entities. For
structural models, configurations can be thought of as the
parts list for the model. For component instances, the
configuration specifies from many architectures for an
entity, which architecture to use for a specific instance.
When

the

configuration

for

an

entity-architecture

combination is compiled into the library, a simulatable


object

is

created.

An

example

of

the

configuration

declaration for the HALF-ADDER entity is given below.

Library CMOS-LIB, MY-LIB;


Configuration CONFIG of HALF-ADDER is
For HA-STRUCTURE
For X1: XOR2
Use entity CMOS-LIB.XOR-GATE (DATAFLOW);
End for;

For A1: AND2


Use configuration MY-LIB.AND-CONFIG;
End for;
End for;
End CONFIG;

1.2.4 Package--

The primary purpose of a package is to encapsulate


elements that can be shared (globally) among two or more
design units. A package is a common storage area used to
hold data to be shared among a number of entities.
Declaring data inside of a package allows the data to be
referenced by other entities; thus, the data can be shared.
A package consists of two parts: a package declaration
section and a package body. The package declaration
defines the interface for the package, much the same way
that the entity defines the interface for a model. The
package body specifies the actual behavior of the package
in the same method that the architecture statement does
for a model.

A package is collection of commonly used subprograms,


data types and constants.

Package saves coding and promotes code reuse.

STANDARD and TEXTIO are provided in the STD library


that defines useful data types and utilities.

A 'USE' statement is used to access a library.

Package consists of two parts: Package header:


This defines the contents of a package that is made visible
after the statement 'use library package_name all"

Package body:
This provides the implementation details of sub programs,
Items declared in the body are not visible to the user of
the package.

1.2.5

Test bench

A test bench is used to verify the functionality of a design.


The

test

bench

allows

the

designer

to

verify

the

functionality of the design at each step in the HDL


synthesis-based methodology. When the designer makes a

small change to fix an error, the change can be tested to


make sure that it did not affect other parts of the design.
New versions of the design can be verified against known
good results to verify compatibility. A test bench is at the
highest level in the hierarchy of the design. The test
bench

instantiates

the

design

under

test

(DUT).

It

provides the necessary input stimulus to the DUT and


examines the output from the DUT.

HIGH LEVEL
DESIGN FLOW
For
-- XILINX -Software
tools
The high level design flow is illustrated in figure.

Each step is explained below.

2.1 HDL Capture--

After the specification has been completed, the designer


can begin the process of implementation. The designer
creates the VHDL description that describes the clock-by-

clock behavior of the design. The VHDL codes for entities


of the design are entered. The designer then checks the
design for any syntax errors. After all syntax errors are
removed, the VHDL code is verified for correctness by
simulating it.

2.2 VHDL Synthesis--

The goal of the VHDL Synthesis step is to create a design


that implements the required functionality and matches
the designers constraints in speed, area, or power.
The VHDL synthesis tools convert the VHDL description
into a net list in the target FPGA or ASIC technology. For
the VHDL synthesis tool to perform this step properly, the
VHDL code must be written in a particular style.

The designer reads the VHDL design into the VHDL


synthesis

tool.

The

tool

reports

syntax

errors

and

synthesis errors. Synthesis errors usually result from the


designer using constructs that are not synthesizable. In
such cases, the code has to be modified and simulated
again. The synthesizer produces an output net list in the
target technology and a number of report files. The
designer looks at the report files to determine the quality
of the synthesis output. The most common output files are
the timing report and the area report.

Most synthesis tools produce a number of other reports


such as hierarchy reports, instance reports, net reports,
power reports, and others.
The most useful reports initially are the timing and area
reports, because these are usually the most critical
factors.
The area report shows the designer how much of the
resources of the chip the design has consumed. The
designer can tell if the design is too big for a particular
chip and the designer needs to target a larger chip, if the
design should go into a smaller chip, or if the current chip
will work fine. The designer can also get a relative size of
the design to use in later stages of the design process.

The timing report shows the timing of critical paths or


specified paths of the design. The designer examines the
timing of the critical paths closely because these paths
ultimately determine how fast the design can run. If the
longest path is a timing critical part of the design and is
not meeting the speed requirements of the designer, then
the designer may have to modify the VHDL code or try
new timing constraints to make the path meet timing.

The most important type of output data is the netlist for


the design in the target technology. This output is a gate
or macro level output in a format compatible with the
place and route tools that are used to implement the
design in the target chip. For instance, most place and
route tools for FPGA technologies take in an EDIF netlist
as an input format. The primitives used in the netlist are
those used in the synthesis library to describe the
technology. The place and route tools understand what to
do with these primitives in terms of how to place a
primitive and how to route wires to them.

2.3

RTL Simulation--

In

Simulation,

RTL

the

designer

uses

stimulus

that

represents the design environment to drive the design


and check to make sure that the results are correct. A
standard VHDL simulator can be used to read the RTL

VHDL description and verify the correctness of the design.


The VHDL simulator reads the VHDL description, compiles
it into an internal format, and then executes the compiled
format using test vectors. The designer can look at the
output of the simulation and determine whether or not the
design is working properly. The designer has a number of
ways to analyze the output. The most common are
waveform output and tabular output.

2.4

Functional Gate Level Verification--

Some designers might want to do a quick check on the


output of the synthesis tool to make sure that the
synthesis tool produced a design that is functionally
correct. To do this the designer runs a functional gate
level verification. The designer reads the output VHDL
netlist from the synthesis tool plus a library of the
synthesis primitives into the VHDL simulator and runs the
simulation using the RTL Verification vectors. If the design

matches, then the synthesis tool did not produce logic


mismatches; if it does not match, the designer needs to
debug the VHDL RTL description to see what is wrong.

2.5 Implementation
Implementation (Place and route) tools are used to take
the design netlist and implement the design in the target
technology device. The place and route tools place each
primitive from the netlist into an appropriate location on
target

device

and

then

route

signals

between

the

primitives to connect the devices according to the netlist.


One input to the place and route tools is the netlist in EDIF
or another netlist format. Another input to some place and
route tools is the timing constraints, which give the place
and route tools an indication about which signals have
critical timing associated with them and to route these
nets in the most timing efficient manner. These nets are
typically

identified

during

the

static

timing

analysis

process during synthesis. These constraints tell the place


and route tool to place the primitives in close proximity to
one another and to use the fastest routing. The closer the

cells are, the shorter the routed signals will be and the
shorter the time delay.
Some place and route tools allow the designer to
specify the placement of large parts of the design. This
process is also known as floor planning. Floor planning
allows the user to pick locations on the chip for large
blocks of the design so that routing wires are as short as
possible. The designer lays out blocks on the chip as
general areas. The floor planner feeds this information to
the place and route tools so that these blocks are placed
properly. After the cells are placed, the router makes the
appropriate connections.
After all the cells are place and routed, the output of
the place and route tools consists of data files that can be
used to implement the chip. In the case of FPGAs, these
files describe all of the connections needed to fuse FPGAs
macrocells to implement the functionality required. Antifuse FPGAs use this information to burn the appropriate
fuses

while

reprogrammable

devices

download

this

information to the device to turn on the appropriate


transistor connections. The other output from the place
and route software is a file used to generate the timing
file.

This

file

describes

the

actual

timing

of

the

programmed FPGA device or the final ASIC device. This


timing file, as much as possible, describes the timing
extracted from the device when it is plugged into the

system for testing. The most common format of this file


for most simulators is the SDF (Standard Delay Format).

2.6 Post Layout Timing verification

After the place and route process has completed, the


designer will want to verify the results of the place and
route

process.

There

are

number

of

methods

to

accomplish this task but the most common is to use post


route gate level simulation. This simulation combines the
netlist used for place and route with the timing file from
the place and route process into a simulation that checks
both functionality and timing of the design. The designer
can run the simulation and generate accurate output
waveforms that show whether or not the device is
operating properly and if the timing is being met. For
VHDL

simulations

this

requires

VITAL

compliant

(standard way of describing designs with designs that


allow SDF timing back annotation) VHDL Simulator.

3.
DESIGNING
STEPS

1.

STARTING OF SOFTWARE XILINX PROJECT NAVIGATOR

2. CREATING A NEW PROJECT

3.

NEW PROJECT WIZARD

4. NEW PROJECT WIZARD (DEVICE PROPERTIES WINDOW)--

5.

NEW PROJECT WIZARD (PROJECT SUMMERY)

6.

CREATE NEW SOURCE WIZARD

7.

CREATE NEW SOURCE WIZARD (PORT DEFINE)

8.

CREATE NEW SOURCE WIZARD (MODULE SUMMER)

9.

ARCHITECTURE CODING

INTRODUCTION
To
ARITHMETIC
LOGIC
UNIT
(ALU)

What is ALU (Arithmetic Logic Unit)


Abbreviation of arithmetic logic unit, the part of a computer that performs all
arithmetic computations, such as addition and multiplication, and all comparison
operations. The ALU is one component of the CPU (central processing unit

The Arithmetic Logic Unit (ALU) is essentially the heart of a CPU. This is what
allows the computer to add, subtract, and to perform basic logical operations such
as AND/OR. Since every computer needs to be able to do these simple functions,
they are always included in a CPU. How a company designs their ALU has a
significant impact on the overall performance of their CPU. In this article I will
give a brief introduction to some basics of ALU design; you will quickly see how
complicated these things can get.

An arithmetic-logic unit (ALU) is the part of a computer processor (CPU) that


carries out arithmetic and logic operations on the operands in computer instruction
words. In some processors, the ALU is divided into two units, an arithmetic unit
(AU) and a logic unit (LU). Some processors contain more than one AU - for
example, one for fixed-point operations and another for floating-point operations.
(In personal computers floating point operations are sometimes done by a floating
point unit on a separate chip called a numeric coprocessor.)
Typically, the ALU has direct input and output access to the processor controller,
main memory (random access memory or RAM in a personal computer), and
input/output devices. Inputs and outputs flow along an electronic path that is called
a bus. The input consists of an instruction word (sometimes called a machine

instruction word) that contains an operation code (sometimes called an "op code"),
one or more operands, and sometimes a format code. The operation code tells the
ALU what operation to perform and the operands are used in the operation. (For
example, two operands might be added together or compared logically.) The
format may be combined with the op code and tells, for example, whether this is a
fixed-point or a floating-point instruction. The output consists of a result that is
placed in a storage register and settings that indicate whether the operation was
performed successfully. (If it isn't, some sort of status will be stored in a permanent
place that is sometimes called the machine status word.)
In general, the ALU includes storage places for input operands, operands that are
being added, the accumulated result (stored in an accumulator), and shifted results.
The flow of bits and the operations performed on them in the subunits of the ALU
is controlled by gated circuits. The gates in these circuits are controlled by a
sequence logic unit that uses a particular algorithm or sequence for each operation
code. In the arithmetic unit, multiplication and division are done by a series of
adding or subtracting and shifting operations. There are several ways to represent
negative numbers. In the logic unit, one of 16 possible logic operations can be
performed - such as comparing two operands and identifying where bits don't
match.
The design of the ALU is obviously a critical part of the processor and new
approaches to speeding up instruction handling are continually being developed.
In computing, an arithmetic logic unit (ALU) is a digital circuit that performs
arithmetic and logical operations. The ALU is a fundamental building block of the
central processing unit (CPU) of a computer, and even the simplest
microprocessors contain one for purposes such as maintaining timers. The

processors found inside modern CPUs and graphics processing units (GPUs)
accommodate very powerful and very complex ALUs; a single component may
contain a number of ALUs.

BASIC BUILDING BLOCKS OF AN ALU


CONTROL UNIT The
the computer

control

system

unit

and

maintains

directs

the

order

flow

within

of

traffic

(operations) and data. The flow of control is indicated by the


dotted arrows on figure 1-1. The control unit selects
program

statement

at

time

from

the program storage

area, interprets the statement, and sends


electronic

impulses

to

the

appropriate

the arithmetic-logic unit and storage

section to cause them to carry out the instruction. The


unit
on

does
the

not

data.

one

perform

the

Specifically,

control

actual processing operations

the control unit manages the

operations of the CPU, be it a single-chip microprocessor or a


fill-size mainframe. Like a traffic director, it decides when to start
and

stop

(control

and

timing),

what

to

do

(program

instructions), where to keep information (memory), and with


what devices to communicate (I/O). It controls the flow of all data
entering and leaving the computer. It accomplishes this
communicating

or

interfacing

with

by

the arithmetic-logic

unit, memory, and I/O areas. It provides the computer with


the ability to function under program control. Depending on the

design

of

capability

the computer,
to

function

the

CPU

under

can

manual

also

have

control

the

through

man/machine interfacing. The control unit consists of several


basic logically defined

areas. These

logically

defined

areas

work closely with each other. Timing in a computer regulates the


flow of signals that control the operation of the computer.
The instruction and control portion makes up

the

decision-

making and memory-type functions. Addressing is the process


of locating the operand (specific
operation.

An interrupt

information)

for

is a break in the normal flow of

operation of a computer (e.g., CTRL + ALT + DEL).


memory is
addressable

random-access

storage

given

memory

Control

(RAM) consisting of

registers. Cache memory

is

small,

high-speed RAM buffer located between the CPU and main


memory; it can increase the speed of the PC. Read-only memory
(ROM) are chips with a set of software instructions supplied
by the manufacturer built into them that enables the computer
to perform its I/O operations. The control unit is also capable of
shutting down the computer when the power supply detects
abnormal conditions.
ARITHMETIC-LOGIC

UNIT The

performs all arithmetic


multiplication,
operations

and
test

arithmetic-logic

operations

division)
various

and

(addition,
logic

conditions

unit

(ALU)

subtraction,

operations. Logic

encountered

during

processing and allow for different actions to be taken based on


the results. The data required to perform the arithmetic and

logical functions are inputs from the designated CPU registers and
operands. The
operations.
circuits
and

ALU

relies

These

on

include

number

(adders/subtracters),
registers. Figure

diagram

of

an

1-2

ALU

basic

shows
a

to

systems,

timing,
of

items

perform
data

its

routing

instructions, operands,
a representative

microcomputer.

block

PRIMARY

STORAGE (MAIN MEMORY) The primary storage section (also


called internal storage, main storage, main memory, or just
memory) serves four purposes: . To hold data transferred from
an I/O device to the input storage area, where it remains until the
computer is ready to process it. This is indicated by the solid
arrow on figure 1-1. . To hold both the data being processed
and

the

intermediate

operations.
storage

This

section.

is
It

a
is

results
working

of

the

storage

arithmetic-logic
area within

the

sometimes referred to as a scratch

pad memory. . To hold the processing results in an


storage area for transfer to an I/O device.

output

CPU BUILDING BLOCKS

Registers
(IR, PC, ACC)
Control Unit
(CU)
Arithmetic Logic Unit
(ALU)
ARITHMETIC LOGIC UNIT STRUCTURES

ARITHMETIC LOGIC UNIT SCEMETIC SYMBOL

A and B: the inputs to the ALU


(aka operands)
R: Output or Result
F: Code or Instruction from the Control Unit (aka as op-code)
D: Output status; it indicates cases such as:
carry-in
carry-out,
overflow,
division-by-zero
And . . .
Numerical systems

An ALU must process numbers using the same format as the rest of the digital
circuit. The format of modern processors is almost always the two's complement
binary number representation. Early computers used a wide variety of number
systems, including one's complement, sign-magnitude format, and even true
decimal systems, with ten tubes per digit.
ALUs for each one of these numeric systems had different designs, and that
influenced the current preference for two's complement, as this is the
representation that makes it easier for the ALUs to calculate additions and
subtractions.
The two's-complement number system allows for subtraction to be accomplished
by adding the negative of a number in a very simple way which negates the need
for specialized circuits to do subtraction.
practical overview
Most of a processor's operations are performed by one or more ALUs. An ALU
loads data from input registers, an external Control Unit then tells the ALU what
operation to perform on that data, and then the ALU stores its result into an output
register. Other mechanisms move data between these registers and memory.

Simple operation
A simple example arithmetic logic unit (2-bit ALU) that does AND, OR, XOR, and
addition
Most ALUs can perform the following operations:

Integer arithmetic operations (addition, subtraction, and sometimes


multiplication and division, though this is more expensive)
Bitwise logic operations (AND, NOT, OR, XOR)
Bit-shifting operations (shifting or rotating a word by a specified number of
bits to the left or right, with or without sign extension). Shifts can be
interpreted as multiplications by 2 and divisions by 2.

Complex operations
Engineers can design an Arithmetic Logic Unit to calculate any operation. The
more complex the operation, the more expensive the ALU is, the more space it
uses in the processor, the more power it dissipates. Therefore, engineers
compromise. They make the ALU powerful enough to make the processor fast, but
yet not so complex as to become prohibitive. For example, computing the square
root of a number might use :
1. Calculation in a single clock Design an extraordinarily complex ALU that
calculates the square root of any number in a single step.
2. Calculation pipeline Design a very complex ALU that calculates the square
root of any number in several steps. The intermediate results go through a
series of circuits arranged like a factory production line. The ALU can
accept new numbers to calculate even before having finished the previous
ones. The ALU can now produce numbers as fast as a single-clock ALU,
although the results start to flow out of the ALU only after an initial delay.

3. interactive calculation Design a complex ALU that calculates the square


root through several steps. This usually relies on control from a complex
control unit with built-in microcode.
4. Co-processor Design a simple ALU in the processor, and sell a separate
specialized and costly processor that the customer can install just beside this
one, and implements one of the options above.
5. Software libraries Tell the programmers that there is no co-processor and
there is no emulation, so they will have to write their own algorithms to
calculate square roots by software.
6. Software emulation Emulate the existence of the co-processor, that is,
whenever a program attempts to perform the square root calculation, make
the processor check if there is a co-processor present and use it if there is
one; if there isn't one, interrupt the processing of the program and invoke the
operating system to perform the square root calculation through some
software algorithm.
The options above go from the fastest and most expensive one to the slowest and
least expensive one. Therefore, while even the simplest computer can calculate the
most complicated formula, the simplest computers will usually take a long time
doing that because of the several steps for calculating the formula.
Powerful processors like the Intel Core and AMD64 implement option #1 for
several simple operations, #2 for the most common complex operations and #3 for
the extremely complex operations.
Inputs and outputs

The inputs to the ALU are the data to be operated on (called operands) and a code
from the control unit indicating which operation to perform. Its output is the result
of the computation.
In many designs the ALU also takes or generates as inputs or outputs a set of
condition codes from or to a status register. These codes are used to indicate cases
such as carry-in or carry-out, overflow, divide-by-zero, etc.
ALUs vs. FPUs
A Floating Point Unit also performs arithmetic operations between two values, but
they do so for numbers in floating point representation, which is much more
complicated than the two's complement representation used in a typical ALU. In
order to do these calculations, a FPU has several complex circuits built-in,
including some internal ALUs.
In modern practice, engineers typically refer to the ALU as the circuit that
performs integer arithmetic operations (like two's complement and BCD). Circuits
that calculate more complex formats like floating point, complex numbers, etc.
usually receive a more specific name such as FPU.
Logic Gates

Before we get to the overall design of an ALU, we first have to understand the
basics of logic gates. Figure 1 shows the basic logic gates shown in their graphical
representations. Keep in mind that each of these can be made from transistors by
combining them in different ways. What types of transistors and how they are
arranged can impact the performance of the gate.

'AND' Gate

'OR' Gate

'XOR' Gate

'NOT' Gate

'NAND' Gate

'NOR' Gate

'XNOR' Gate

Figure 1: Basic logic gates (courtesy of wikipedia.com)


These logic gates work by taking two inputs (one input for the 'NOT' gate) and
producing an output. If we consider the 'AND' gate the output will be true, or '1' (or
a high voltage), if input #1 and input #2 are true, and the output will be false, or '0'

(or a low voltage), if one or both inputs are false. Likewise, if we consider the 'OR'
gate the output will be true if input #1 or input #2 are true. The 'XOR' gate output
will be true if either input is true, but false if both inputs are true; this is an
implementation of the exclusive 'OR' logic operation. The 'NOT' gate will output
the opposite of the input; so if the input is true the 'NOT' gate's output will be false.
The 'NAND', 'NOR', and 'XNOR' gates are implementations of the 'AND', 'OR',
and 'XOR' gates respectively with a 'NOT' gate prior to the output; so a 'NAND'
gate will return what a 'AND' gate does not.
These logic functions are by themselves an important part of a CPU's functionality,
but performing logic operations on two inputs is only so useful. By combining
these gates together we can have devices with more inputs. For example, in Figure
2 I have combined three 'AND' gates. These three 'AND' gates will produce an
output that is true only when all four inputs are true. In essence, this is a 4 bit
'AND' gate. You can extrapolate from this and form an 8 bit 'AND' gate by
combining two 4 bit 'AND's and one 2 bit 'AND'.

Figure 2: A 4 bit 'AND' device

Arithmetic

By combining these gates into even more clever configurations we can perform
other useful functions, like addition. Figure 3 shows a typical configuration
referred to as a half-adder. To understand how this adder works we have to think of
the inputs not as true or false but as '1' or '0'. The output of this adder is the sum of
the inputs with a carry bit. If the inputs are '1' and '1' we are adding 1 plus 1. The
output labeled 'SUM' is just an 'XOR' of the inputs which will be '0'. The output
labeled 'CARRY' is an AND gate which of course will be '1'. The addition answer
therefore is 10 which is the binary addition of '1' and '1'. If the inputs are '1' and '0'
the 'SUM' will be '1' and the 'CARRY' will be '0', giving an answer of 01 or just 1.

Figure 3: A half-adder
So, if this is performing binary addition why is it called a half-adder? This is
because in order to add binary numbers greater than two bits we need the adder to
be able to take in a carry bit along with the two input bits. This full-adder is shown
in Figure 4. You can see that the full-adder is two half-adders with one additional
'OR' gate. To use a full-adder to add two binary numbers of arbitrary size you will
begin with the right most bit, called the least significant bit (LSB) of each number
with a carry in bit of '0'. You would then add the two bits, record the sum, and use
the carry out bit as the carry in bit when adding the next two bits and moving

towards the most significant bits (MSB). By repeating this process you can add
two binary numbers of any arbitrary length. This process is known as a ripple
carry.

Figure 4: A full-adder
Figure 5 shows a half-sub. In this scenario if we have input 'A' equal to '1' and
input 'B' equal to '0' we want to subtract 0 from 1. You can see that the 'DIFF'
output will be '1' and the 'BORROW' Output will be '0'. Like the half-adder, the
half-sub can be used to implement a full-sub, shown in Figure 6.

Figure 5: A half-sub

Figure 6: A full-sub
Arithmetic units are usually grouped together into an ALU which has inputs,
outputs, and control bits which tell the ALU which type of operation to perform.
Figure 7 shows a typical diagram of an ALU. In this diagram A and B are the data
inputs, F is the control input to choose the function, R is the result of the function
applied to A and B, and D is the status of the output so that you know when the
function is done.
The example of the ripple carry addition is an effective method of adding binary
numbers. Let us extrapolate this a little bit and imagine a 32-bit adder. If we want
to add two 32-bit numbers we can start at the LSB and move left as we calculate
the carry bits. If adding two bits takes a couple of clock cycles, then the total time
taken to add all 32 bits is significant. Thankfully this problem has already been
solved. The solution: a carry look ahead adder.
In a carry look ahead adder the binary numbers are split into sections, perhaps of 4
bits each. Now each section can begin calculating its carry bits beginning with the
section's LSB and moving towards the section's MSB. Once a carry bit reaches the
MSB of its section the bit can then jump ahead 4-bits at a time instead of
continuing towards the number's MSB one bit at a time. The logic involved with

keeping all of the carries straight is quite complex and becomes more complex as
the size of the sections increase; because of this more time is spent calculating the
carries of each section and less time on the faster propagation of the carry bits.
However, if the section size is too small then there are so many sections for the
carries to propagate through that there is not much time saved versus the ripple
carry adder. Deciding upon a section size involves a detailed analysis of the gate
and propagation delays which can vary depending on the technology used within
the logic gates.
The carry look ahead technique is one optimization the engineers can make to the
ALU. There are many others. For instance, to do multiplication one would
normally just add the number to itself over and over, but there are optimization
techniques that can be employed to speed this process up. These are some of the
differences between the ALUs of various processors and a major reason why some
processors are better at certain types of operations than others. For instance, a GPU
will have an ALU which is optimized for the arithmetic often performed for the
display of graphics while a CPU will have an ALU designed to be optimized for
the most common operations performed by users.

PROGRAM
FOR ALU
IN VHDL--

--------------------------------------------------------------------------------- Title

: alu

-- Design

: arithmetic logic unit

-- Author

: NARESH DOBAL

-- Company

: NSD

---------------------------------------------------------------------------------- File

: alu.vhd

-- Generated : Fri Nov 19 12:41:19 2010


-- From
-- By

: interface description file


: Itf2Vhdl ver. 1.20

---------------------------------------------------------------------------------- Description :
--------------------------------------------------------------------------------

--{{ Section below this comment is automatically maintained


-- and may be overwritten
--{entity {alu} architecture {alu_arc}}

library IEEE;
use IEEE.STD_LOGIC_1164.all;
use ieee.std_logic_arith.all;
use ieee.std_logic_unsigned.all;

entity alu is
port(
din1 : in STD_LOGIC_VECTOR(3 downto 0);
din2 : in STD_LOGIC_VECTOR(3 downto 0);
sel : in STD_LOGIC_VECTOR(3 downto 0);
dout : out STD_LOGIC_VECTOR(3 downto 0)
);
end alu;

--}} End of automatically maintained section

architecture alu_arc of alu is

begin
with sel select
dout <= din1 and din2 when "0000",

din1 or din2 when "0001",


din1 nand din2 when "0010",
din1 nor din2 when "0011",
din1 xor din2 when "0100",
din1 xnor din2 when "0101",
not din1 when "0110",
not din2 when "0111",

din1 + "0001" when "1000",


din2 + "0001" when "1001",
din1 + din2 when "1010",
din1 - din2 when "1011",
din2 - "0010" when "1100",
din1 - "0010" when "1101",
din1 + "0010" when "1110",
din2 + "0010" when others;

end alu_arc;

OUTPUT
WAVE-FORM
OF ALU

CIRCUIT DIAGRAM
FOR ABOVE CODE
GENERATED BY
SYNTHESIS TOOL
(RTL LAYOUT DESIGN)

You might also like