Algorithm and Design

Uploaded by

haigshai

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

30 views

Algorithm and Design

Uploaded by

haigshai

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

International Journal of Intelligent Information Technology Application, 2009, 2(6):273-278

Algorithm & Design of an Efficient Floating

Point ADD/SUB Unit for an Experimental CPU
A.Joshi
The University of the West Indies
Department of Electrical and Computer Engg
St. Augustine, Trinidad and Tobago
ajoshi@eng.uwi.tt

S.L. Lam and Y.Y. Chan

Multimedia University, Faculty of Engg.
Cyberjaya, Malaysia.

Abstract— An 8-bit CPU is designed at gate level from This unit has been designed to use five control signals,
scratch using custom chip approach. CPU has an 8-bit one enable signal, shift (left/right) signal and 3 signals to
integer unit and 16-bit floating point unit. The instruction determine the number of shifts. This unit accepts one
set includes shift, logic, integer and floating-point arithmetic operand from ALU registers.
instruction. The circuits are optimized by using more
efficient algorithm. The algorithm discussed in this paper
FP add/subtractor:
was applied for an 8-bit CPU design, however there is no Its operands, FA and FB are sourced from floating-
reason that this couldn't be used for more powerful and point register file and it requires one control signal to
serious CPU development. Currently no attempt has been indicate the start and another control signal to decide
made to include any special support or design for parallel whether addition or subtraction is to be performed. the
MUL/ ADD / SUB operations[1][2]. An attempt has been result is 16-bit wide.
made to improve conventional[6] algorithm. This paper FP multiplier:
discusses the design of FP ADD/SUB unit, with respect to Similar to FP add unit, the operands are sourced from
algorithm and VHDL implementation, as all the functional floating-point register file. But it requires only one
units cannot be discussed in this paper.
control to indicate the start.
The project was implemented using VHDL and simulated
using Altera MaxPlus II sim software which can map the B. Specifications of Floating-point add/sub unit:
design into Altera CPLD.
Specifications in short are 2x16 bit FP registers for
Index Terms—CPU, simulation, algorithm, Floating point operands, 1x16 FP register for final result, 1x8 bit register
unit, VHDL. counter, 1xadder, 1x 4to1 selector, 1x 2to1 selector, 6x8
bi register, 6x 2to1 multiplexers, 6x 2to1 multiplexers, 3x
I. INTRODUCTION 3to1 multiplexers, left barrel shifter, right barrel shifter, 1
zero counter, 1x7 bit register and an output signal logic.
Paper focuses on a functional unit of 16-bit FPU which is All simulations of VHDL code were done using device
a part of a CPU with 8-bit integer unit. CPU has 4x16bit family MAX7000 from Altera Max-plus II. It is
FPU registers, 16 bit data, address busses and 16-bit impossible to discuss all in this paper however, floating
program counter. Data path is where most of the point add unit is discussed. Logic of algorithm is
operations are done on by the processor's control unit. discussed in detail and implementation block diagram
There are seven functional units, out of which 3 for FPU. example of the Add unit is shown with detail design of
This paper will discuss 1 functional unit, floating point 16-bit floating point register. Although it is worth
add. Logic of algorithm is discussed in detail and mentioning that excellent work has been done for
implementation block diagram along with the VHDL improving the FP arithmetic [7][8].
code and simulation results. Goal is not to addrerss any
issues of Clock rate or IPC [3]. Main focus is on C. The Algorithm:
improved algorithm and relevant design
- Initially, operands are loaded in 2 temporary registers
A. 3 Functional units:. - 2 biased-exponents are compared.
Barrel shifter: - The difference is stored. - mantissa with smaller
Biased-exponent is shifted by the difference.

1999-2459 /09/$25.00 ©2009 Engineering Technology Press

273
- Then it is subtracted from the other mantissa and - Round to nearest: in this, a representable significand
the result is stored. value nearest to the result will be stored. If the result is
- Round up or down exactly in between the two representable values, then the
- Result is normalized and stored. current least significand bit will determine a round up or
round down in order to force the result to be even. For
There are some exceptions to this algorithm: example, if the current least significand bit is zero, then it
- If the difference between the two biased-exponents is will be rounded down and if found to be one then it will
greater than 7, which is the length of mantissa, then the be rounded up.
operand with higher exponent value will be stored
without going through the following steps. Algorithm also allows exceptions like exponent
- or if the subtraction between mantissa is zero, then underflow/overflow and significand overflow. Algorithm
zero will be stored as a result, Normalizing will be is faster as it only takes maximum of 8 clock cycles to
skipped. complete as compared to conventional algorithm which
takes 13 clock cycles. There are different ways to
Conventional algorithm uses following steps: improve performance [4][5], our approach different.
- Zero checking. There is a small drawback in this, in a sense that this
- Significant adjustment. algorithm requires mores components, as a result, the
- Addition/Subtraction. block diagram may look complicated and confusing.
- Normalization. However this is far outweighed by the benefit.
- Rounding.
Overall Block diagram Figure 2. can be found on page 5
D. Rationale : of the paper.

With respect to the above algorithm, we have a slightly II. DESIGN: ADD/SUB UNIT
different method of obtaining the result. The difference
are as follows: We have tried quiet a few different designs like Carry
Look-ahead adder (CLA) or Ripple Carry adder (RCA).
- There is no zero checking on operands in this method. CLA provided good speed but has much larger size and
We think that as the operand with zero value doesn't power consumption was more than RCA. RCA , on the
occur very often there would be no significant other hand, is compact but rather slower than CLA.
degradation of performance of floating point calculations. Hence we decided to design Hybrid adders to take
Further, one clock cycle is saved for every FADD/SUB advantage of both. We have tried quiet a few different
instruction with non-zero operand and fewer gates are designs for Hybrid adders as well, and here we will
used. discuss type 1 hybrid adder (HA-1). HA-1 is very fast,
- Our method chooses not to compare exponent and with high on power consumption and usage is
testing significant for zero every clock cycle to make FADD/FSUB.
exponents equal. Instead, we chose to find the difference
between the two exponents and store the difference A. VHDL Code and Simulation
(which is positive). Larger exponent value will be stored
and significant with smaller exponent will be shifted by Code for CLA (3) with normal carry input
the difference using a barrel shifter (with the exception
that the difference must not be larger than 7) LIBRARY ieee;
USE ieee.std_loqic.1164.all ;
- Significant will not be checked for zero after adding
signed significands. Since most of the results of addition ENTITY add_cla3._n IS
does not result in zero, we feel that it is not necessary to PORT ( a0, a1, a2 : IN STD_LOGIC;
introduce an extra cycle just to check this. b0, bl, b2 : IN STD_LOGIC;
ci : IN STD_LOGIC;
- If significand overflow occurs after adding both o0, o1, o2 : OUT STD_LOGIC;
significands, exponent overflow will not be checked co : OUT STD_LOGIC);
immediately by this algorithm. The maximum biased END ENTITY;
exponent value that can be stored is 1111111-
ARCHITECTURE a OF add_cla3_n IS
1=1111110.1111111 will indicate an overflow. In a worst SIGNAL g0, g1, g2 : STD_LOGIC; -- imm signal for P
case scenario, the maximum value of biased exponent SIGNAL p0, pl, p2 : STO_LOGIC; -- imm signal for G
fater being incremented is 1111111. However, since the SIGNAL cl, c2, c3 : STD_LOGIC; -- imm signal for carry out
result will be normalized later, which can decrement the
BEGIN
biased exponent back into permissible range, we check g0 <= a0 AND b0; g1 <= a1 AND b1; g2 <= a2 AND b2;
this after normalization and rounding. p0 <= a0 OR b0; p1 <= a1 OR b1 ; p2 <= a2 OR b2;

c1 <= g0 OR (p0 AND ci); -- carry generation for bit 1

274
c2 <= g1 OR (p1 AND c1); -- carry generation for bit 2 The inverted signal from carry out C4.
c3 <= g2 OR (p2 AND c2); -- carry generation for carry out c4n <= NOT c4;

o0 <= (a0 XOR b0) XOR ci; -- sum output bit0 0(0) <= sO; -- sum bitO (from full adderl)
o1 <= (a1 XOR b1) XOR c1; -- sum output bitl 0(1) <= s1; -- sum bitl (From CLA3)
o2 <= (a2 XOR b2) XDR c2; -- sum output bit2 0(2) <= s2; -- sum bit2 (From CLA3)
co <= c3; -- carry output 0(3) <= s3; -- sum bit3 (From CLA3)
END a;
0(4) <= (s40 AND c4n) OR (s41 AND c4); -- sum bit4 (From CSA)
0(5) <= (s50 AND c4n) OR (s51 AND c4); -- sum bit5 (From CSA)
0(6) <= (s60 AND c4n) OR (s61 AND c4); -- sum bit6 (From CSA)
Code for Hybrid Adder 1: 0(7) <= (s70 AND c4n) OR (s71 AND c4); -- sum bit7 (From CSA)
LIBRARY ieee;
USE ieee.std_logic.1164.all ; co <= (c80 AND c4n) OR (c81 AND c4); -- carry out
END a;
ENTITY add_ha1 IS
PORT ( a, b : IN STD_LOGIC_VECTOR (0 to 7);
The logic diagram Figure 3. gives an idea of the the
ci : IN STD_LOGIC;
o : OUT STD__LOGIC._VECTOR (0 to 7); circuit of HA-1
co : OUT STD_LOGIC);
END ENTITY;

ARCHITECTURE a OF add__hal IS
COMPONENT add_full1 IS -- declare full adderl
PORT ( a, b, ci : IN STD_LOGIC;
o, co : OUT STD_LOGIC); END COMPONENT;

COMPONENT add_full2 IS -- declare full adder2

PORT ( a, b, ci : IN STD_LOGIC;
o, co : OUT STD_LOGIC); END COMPONENT;

COMPONENT add_cla3_0 IS -- declare cla3 with '0' carry in

PORT ( a0, a1, a2 : IN STD_LOGIC;

b0, b1, b2 : IN STD_LOGIC;
o0, o1, o2 : OUT STD_LOGIC;
co : OUT STD_LOGIC); END COMPONENT;

COMPONENT add_cla3_1 IS -- declare cla3 with '1' carry in

PORT ( a0, a1, a2 : IN STD_LOGIC;

b0, b1, b2 : IN STD_LOGIC;
o0, o1, o2 : OUT STD_LOGIC;
co : OUT STD_LOGIC); END COMPONENT;

COMPONENT add_cla3_n IS -- declare cla3 with normal carry in

PORT ( a0, a1, a2 : IN STD_LOGIC;

b0, b1, b2 : IN STD_LOGIC;
ci : IN STD_LOGIC;
o0, o1, o2 : OUT STD_LOGIC;
co : OUT STD_LOGIC); END COMPONENT;

SIGNAL x0,xl,x2,x3,x4,xS,x6,x7 STD_LOGIC;--imm signal for XOR2

SIGNAL s0,s1,s2,s3,s40,s41 STD_LOGIC; -- imm signal for sum
SIGNAL s50,s51,s60,s61,s70,571 STD_LOGIC; -- imm signal for sum
SIGNAL cl,c4,c4n,c70,c71,c80,c81 STD_LOGIC; imm signal for carry

BEGIN
XOR2 gates at the B input for ADD/SUB function
X0 <= b(0) XOR ei; xl <= bel) XOR ci; x2 <= b(2) XOR ei;
x3 <= b(3) XOR ci; x4 <= b(4) XOR ci; x5 <= b(5) XOR ci;
x6 <= b(6) XOR ci; xl <= b(7) XOR ci;

--connecting the different adders together in the way shown in

--the logic circuit diagram of HA1.
g0 : add_full1 PORT MAP (a(0),x0, ci,s0,c1);
g1 : add_cla3_n PORT MAP (a(1),a(2),a(3)lx1,x2,x3,c1,sl,s2,s3,c4);
g2 : add_cla3_O PORT MAP (a(4),a(5),a(6),x4,x5,x6,s40,sSO,s60,c70);
g3 : add_full2 PORT MAP (a(7),x7, c70,s70,c80);
g4 : add_cla3_1 PORT MAP (a(4),a(5),a(6),x4,x5,x6,541,551,561,c71);
g5 : add_full2 PORT MAP (a(7),x7, c71,s71,\c81);

275
seen in the diagram shows 10010110, which
confirms the proper operation. Co= ‘0’ which
indicates that the result does not overflow.
- The results of Subtraction simulation is correct:
When control =1, Sub function is selected . So,
A-B = 50D = 00110010B. The output result (o0 to
o7) as seen in the diagram, matches the result.
Co= 1 indicates the result is not a negative number
after subtraction.
- MAX7000 CPLD device used in this simulation
has 8.1 ns of output delay.
- Glitches appeared during the simulation of sum
output (from 200ns to 208.1 ns). This means that
CPLD is not suitable for HA-1 type of
implementation, thought result were good.

III. CONCLUSION

The results obtained are very encouraging. We

successfully optimized the design chosen from many
versions of HA. HA-1 is not perfect yet as though it is
very fast , it consumes lot of power and the size is yet s
bit large. More streamlined design can be implemented
with some effort.
As mentioned earlier, in this paper we have discussed
the design of just the Hybrid Adder (FA ADD/SUB unit)
along with the VHDL code though we have managed to
simulate almost all the functional units required by the
CPU though the whole lot cannot be discussed here
Also we are happy that the functional units are flexible
enough to be used as a base for more powerful CPU with
sufficient effort and time.

REFERENCES
[1] A. Akkas, M.J. Schulte, “Dual-mode floating-point
multiplier architectures with parallel operations,” Journal
of Systems Architecture, vol. 52, pp. 549 - 562, October
2006.
[2] A. Akkas, “Dual-Mode Quadruple Precision Floating-
Point Adder,” Proceedings of the 9th EUROMICRO
Conference on Digital System Design, 2006, pp. 211 – 220,
ISBN:0-7695-2609-8
[3] V. Agarwal, M.S.Hrishikesh, S.W.Keckler and D.Burger,
“Clock rate versus IPC: the end of the road for
conventional microarchitectures,” Proceedings of the
27th annual international symposium on Computer
Figure. 3. HA-1 Logic diagram architecture, vol.28,May 2000,pp. 248 - 259 , ISSN:0163-
5964.
[4] A. Beaumont-Smith, N. Burgess, S. Lefrere and C. C. Lim
“Reduced Latency IEEE Floating-Point Standard Adder
B. Simulation results: Architectures,” Proceedings of the 14th IEEE Symposium
on Computer Arithmetic, pp. 35, 1999, ISBN:0-7695-
As seen in the Simulation diagram Figure 1. on page 4 0116-8.
of this paper: [5] G. Even, S. M. Mueller and PM. Seidel “A dual precision
IEEE floating-point multiplier”, Integration, the VLSI
A=01100100B = 100D Journal, vol. 29 issue 2, 2000, pp. 167- 180, ISSN:0167-
B=00110010B = 50D 9260.
[6] W. Stallings, Computer Organization and Architecture,
- The results of addition simulation are correct :
sixth edition, Pierson &Prentice-Hall, 2003.
When control is ‘0’ , add function is selected. So,
A+B = 150D = 10010110B. The output result as

276
[7] PM. Seidel, and G. Even, “Delay-Optimized Dr. Joshi is a member of IEEE and leads Computer Systems
Implementation of IEEE Floating-Point Addition,” IEEE group at IEEE Trinidad chapter.
Transactions on Computers.vol.53 issue 2., February 2004
pp. 97-113, ISSN:0018-9340.
[8] Y. Hida, X. S. Li, and D. H. Bailey, “Algorithms for
Quad-Double Precision Floating Point Arithmetic,” S.L.Lam graduated from Multimedia University, Cyberjaya,
Proceedings of the 15th IEEE Symposium on Malaysia.
Computer Arithmetic,pg.155, 2001. He later joined Xilinx in Malaysia as an Engineer.

A. Joshi holds a Ph.D. with specialization in parallel

architecture from the University of Mumbai, India.
Y.Y. Chan graduated from Multimedia University,
He is a member of the Faculty of Engg., the Department of
Cyberjaya, Malaysia.
Electrical and Computer Engg., The University of the West
He later joined Xilinx in Malaysia as an Engineer.
Indies. Trinidad and Tobago.

Figure 1. Simulation result for HA-1

277
Figure 2. FP ADD/SUB Unit Block diagram.

278

Ee529 Lab Asg4
No ratings yet
Ee529 Lab Asg4
10 pages
coa unit 2
No ratings yet
coa unit 2
9 pages
VLSI Implementation of Floating Point Adder
100% (1)
VLSI Implementation of Floating Point Adder
46 pages
EEE 270 Advanced Topics in Logic Design: Read Before Class
No ratings yet
EEE 270 Advanced Topics in Logic Design: Read Before Class
35 pages
Computer Organization & Architecture - CHAPTER-6.Pdf - 3580
No ratings yet
Computer Organization & Architecture - CHAPTER-6.Pdf - 3580
32 pages
Single Precision Floating Point Unit
No ratings yet
Single Precision Floating Point Unit
45 pages
Design and Synthesizing of Floating Point Adder andMultiplier using Cadence RTL Compiler
No ratings yet
Design and Synthesizing of Floating Point Adder andMultiplier using Cadence RTL Compiler
6 pages
Design Amp Implementation of Floating Point ALU On A FPGA Processor
No ratings yet
Design Amp Implementation of Floating Point ALU On A FPGA Processor
5 pages
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
No ratings yet
Implementation of Double Precision Floating Point Radix-2 FFT Using VHDL
7 pages
Floating Point Multipliers: Simulation & Synthesis Using VHDL
No ratings yet
Floating Point Multipliers: Simulation & Synthesis Using VHDL
40 pages
Project Report Vlsi
No ratings yet
Project Report Vlsi
33 pages
FFT Using VHDL
No ratings yet
FFT Using VHDL
22 pages
Aim: Hardware Required
No ratings yet
Aim: Hardware Required
12 pages
10 1 1 961 4530 PDF
No ratings yet
10 1 1 961 4530 PDF
5 pages
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
No ratings yet
Implementation of A High Speed Single Precision Floating Point Unit Using Verilog
5 pages
EE102 Mid 2
No ratings yet
EE102 Mid 2
89 pages
On-Chip Implementation of High Resolution High Speed Low Area Floating Point AdderSubtractor With Reducing Mean Latency For OFDM Applications
No ratings yet
On-Chip Implementation of High Resolution High Speed Low Area Floating Point AdderSubtractor With Reducing Mean Latency For OFDM Applications
6 pages
Design A Floating-Point Fused Add-Subtract Unit Using Verilog
No ratings yet
Design A Floating-Point Fused Add-Subtract Unit Using Verilog
5 pages
Lecture35
No ratings yet
Lecture35
34 pages
Jianggao
No ratings yet
Jianggao
3 pages
Floating Point Adder
No ratings yet
Floating Point Adder
14 pages
RISC-V_Lecture_00
No ratings yet
RISC-V_Lecture_00
62 pages
Laboratorio Digitales Ejercicio Alg Boole
No ratings yet
Laboratorio Digitales Ejercicio Alg Boole
20 pages
Article 87
No ratings yet
Article 87
4 pages
Ee529 Lab Asg
No ratings yet
Ee529 Lab Asg
11 pages
Chapter_4_pdf
No ratings yet
Chapter_4_pdf
49 pages
Solution Manual For Digital Systems Design Using VHDL 3rd Edition by Roth John ISBN 1305635140 9781305635142 2024 Scribd Download Full Chapters
100% (14)
Solution Manual For Digital Systems Design Using VHDL 3rd Edition by Roth John ISBN 1305635140 9781305635142 2024 Scribd Download Full Chapters
46 pages
CHAPTER II-3
No ratings yet
CHAPTER II-3
91 pages
4.add & Shift Sequential Multiplier Architecture With 8-Bit Operands or Beyonds
No ratings yet
4.add & Shift Sequential Multiplier Architecture With 8-Bit Operands or Beyonds
12 pages
VHDL Microproject Report
No ratings yet
VHDL Microproject Report
15 pages
1164 Reference Card
No ratings yet
1164 Reference Card
16 pages
ShiWal95A
No ratings yet
ShiWal95A
8 pages
A 5GHz 128-Bit Binary Floating-Point Adder for the POWER6 Processor
No ratings yet
A 5GHz 128-Bit Binary Floating-Point Adder for the POWER6 Processor
4 pages
1-s2.0-S0045790624001459-main
No ratings yet
1-s2.0-S0045790624001459-main
11 pages
Unit-2 Arithmetic Logic Unit (ALU)
No ratings yet
Unit-2 Arithmetic Logic Unit (ALU)
13 pages
Assignment4 Solution 3rd Edition
No ratings yet
Assignment4 Solution 3rd Edition
7 pages
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
No ratings yet
Synthesis of Single Precision Floating Point ALU: Department of Electronics and Communication Engineering
20 pages
ARITHMETIC and LOGIC UNIT - in This Lecture, We Will Examine How
No ratings yet
ARITHMETIC and LOGIC UNIT - in This Lecture, We Will Examine How
12 pages
Fast Hub Floating Point Adder
No ratings yet
Fast Hub Floating Point Adder
5 pages
Logic Design in Verilog Reports
No ratings yet
Logic Design in Verilog Reports
59 pages
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
No ratings yet
Design and Implementation of Floating Point ALU With Parity Generator Using Verilog HDL
6 pages
Screenshot 2023-12-01 at 4.02.01 PM
No ratings yet
Screenshot 2023-12-01 at 4.02.01 PM
18 pages
Floating Point Multiplier With The Use of Alu
No ratings yet
Floating Point Multiplier With The Use of Alu
4 pages
Alajmi Rashed Thesis 2019
No ratings yet
Alajmi Rashed Thesis 2019
102 pages
Cad of Electronics Lab
No ratings yet
Cad of Electronics Lab
4 pages
Fpga Implementation of FFT Algorithms Using Floating
No ratings yet
Fpga Implementation of FFT Algorithms Using Floating
5 pages
CompEng 361 - Homework 2 Solutions(1)
No ratings yet
CompEng 361 - Homework 2 Solutions(1)
4 pages
DSD Lab Programs Using VHDL (Adders, Subtractors, Comparator, Decoder, Parity, Multiplexer, Flip-Flops, Counters)
100% (33)
DSD Lab Programs Using VHDL (Adders, Subtractors, Comparator, Decoder, Parity, Multiplexer, Flip-Flops, Counters)
85 pages
Design and Implementation of Single Precision Pipelined Floating Point Co-Processor
No ratings yet
Design and Implementation of Single Precision Pipelined Floating Point Co-Processor
4 pages
VLSI
No ratings yet
VLSI
19 pages
Analysisofapproxadders
No ratings yet
Analysisofapproxadders
6 pages
Design of Double Ieee Precision
No ratings yet
Design of Double Ieee Precision
9 pages
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
From Everand
Preliminary Specifications: Programmed Data Processor Model Three (PDP-3) October, 1960
Digital Equipment Corporation
No ratings yet
Projects With Microcontrollers And PICC
From Everand
Projects With Microcontrollers And PICC
Guillermo Perez Guillen
5/5 (1)
Digital Circuit Simulation Using Excel
From Everand
Digital Circuit Simulation Using Excel
Anthony Mazzurco
No ratings yet
Computer Science II Essentials
From Everand
Computer Science II Essentials
Randall Raus
No ratings yet
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
From Everand
Backpropagation: Fundamentals and Applications for Preparing Data for Training in Deep Learning
Fouad Sabry
No ratings yet
Digital Engineering: Complex System Design
From Everand
Digital Engineering: Complex System Design
S Mathioudakis
No ratings yet
Pic® Micro Principles on Your Mobile
From Everand
Pic® Micro Principles on Your Mobile
Clive W. Humphris
No ratings yet
Pic® Micro Principles Teachers Pack V11
From Everand
Pic® Micro Principles Teachers Pack V11
Clive W. Humphris
No ratings yet