research-article

Design Issues in Division and Other Floating-Point Operations

Authors:

Stuart F. Oberman,

Michael J. FlynnAuthors Info & Claims

IEEE Transactions on Computers, Volume 46, Issue 2

Pages 154 - 161

https://doi.org/10.1109/12.565590

Published: 01 February 1997 Publication History

Abstract

Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, in the worst case, a high latency hardware floating-point divider can contribute an additional 0.50 CPI to a system executing SPECfp92 applications. This paper presents the system performance impact of floating-point division latency for varying instruction issue rates. It also examines the performance implications of shared multiplication hardware, shared square root, on-the-fly rounding and conversion, and fused functional units. Using a system level study as a basis, it is shown how typical floating-point applications can guide the designer in making implementation decisions and trade-offs.

References

[1]

S.F. Anderson J.G. Earle R.E. Goldschmidt and D.M. Powers, "The IBMSystem/360 Model 91: Floating-Point Execution Unit," IBM J. Research and Development, vol. 11, pp. 34-52, Jan. 1967.

Digital Library

[2]

T. Asprey G. Averill E. DeLano R. Mason B. Weiner and J. Yetter, "Performance Features of the PA7100 Microprocessor," IEEE Micro, vol. 13, no. 3, pp. 22-35, June 1993.

Digital Library

[3]

D.E. Atkins, "Higher-Radix Division Using Estimates of the Divisor and Partial Remainders," IEEE Trans. Computers, vol. 17, no. 10, pp. 925-934, Oct. 1968.

Digital Library

[4]

N. Burgess and T. Williams, "Choices of Operand Truncation in the SRT Division Algorithm," IEEE Trans. Computers, vol. 44, no. 7, pp. 933-937, July 1995.

Digital Library

[5]

G. Cybenko L. Kipp L. Pointer and D. Kuck, "Supercomputer Performance Evaluation and the Perfect Benchmarks," Proc. Int'l Conf. Supercomputing, pp. 254-266, June 1990.

Digital Library

[6]

M. Darley B. Kronlage D. Bural B. Churchill D. Pulling P. Wang R. Iwamoto and L. Yang, "The TMS390C602A Floating-Point Coprocessor for Sparc Systems," IEEE Micro, vol. 10, no. 3, pp. 36-47, June 1990.

Digital Library

[7]

M.D. Ercegovac and T. Lang, Division and Square Root: Digit-Recurrence Algorithms and Implementations. Kluwer Academic Publishers, 1994.

Digital Library

[8]

M.D. Ercegovac T. Lang and P. Montuschi, "Very High Radix Division with Selection by Rounding and Prescaling," Proc. 11th IEEE Symp. Computer Arithmetic, pp. 112-199, July 1993.

[9]

M. Flynn, "On Division by Functional Iteration," IEEE Trans. Computers, vol. 19, no. 8, pp. 702-706, Aug. 1970.

Digital Library

[10]

S. Fu N. Quach and M. Flynn, "Architecture Evaluator's Work Bench and Its Application to Microprocessor Floating Point Units," Technical Report no. CSL-TR-95-668, Computer Systems Laboratory, Stanford Univ., June 1995.

Digital Library

[11]

R.E. Goldschmidt, "Applications of Division by Convergence," MS thesis, Dept. of Electrical Eng., Massachusetts Inst. of Technology, June 1964.

[12]

J.C. Huck and M.J. Flynn, Analyzing Computer Architectures. Washington, D.C.: IEEE CS Press, 1989.

Digital Library

[13]

Microprocessor Report, various issues, 1994-1996.

[14]

J.M. Mulder N.T. Quach and M. Flynn, "An Area Model for On-Chip Memories and Its Application," IEEE J. Solid-State Circuits, vol. 26, no. 2, pp. 98-105, Feb. 1991.

[15]

NAS Parallel Benchmarks 8/91.

[16]

S. Oberman N. Quach and M. Flynn, "The Design and Implementation of a High-Performance Floating-Point Divider," Technical Report no. CSL-TR-94-599, Computer Systems Laboratory, Stanford Univ., Jan. 1994.

Digital Library

[17]

S.F. Oberman and M.J. Flynn, "Measuring the Complexity of SRT Tables," Technical Report no. CSL-TR-95-679, Computer Systems Laboratory, Stanford Univ., Nov. 1995.

Digital Library

[18]

DEC Fortran Language Reference Manual, 1992.

[19]

M.D. Smith, "Tracing with Pixie," Technical Report no. CSL-TR-91-497, Computer Systems Laboratory, Stanford Univ., Nov. 1991.

[20]

SPEC Benchmark Suite Release 2/92.

[21]

K.G. Tan, "The Theory and Implementation of High-Radix Division," Proc. Fourth IEEE Symp. Computer Arithmetic, pp. 154-163, June 1978.

[22]

G.S. Taylor, "Radix 16 SRT Dividers with Overlapped Quotient Selection Stages," Proc. Seventh IEEE Symp. Computer Arithmetic, pp. 64-71, June 1985.

[23]

S. Waser and M. Flynn, Introduction to Arithmetic for Digital Systems Designers. New York: Holt, Rinehart, and Winston, 1982.

Digital Library

[24]

T.E. Williams and M.A. Horowitz, "A Zero-Overhead Self-Timed 160-ns 54-b CMOS Divider," IEEE J. Solid-State Circuits, vol. 26, no. 11, pp. 1,651-1,661, Nov. 1991.

[25]

D. Wong and M. Flynn, "Fast Division Using Accurate Quotient Approximations to Reduce the Number of Iterations," IEEE Trans. Computers, vol. 41, no. 8, pp. 981-995, Aug. 1992.

Digital Library

Cited By

Vaeztourshizi MKamal MAfzali-Kusha APedram M(2018)An Energy-Efficient, Yet Highly-Accurate, Approximate Non-Iterative DividerProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218650(1-6)Online publication date: 23-Jul-2018
https://dl.acm.org/doi/10.1145/3218603.3218650
Pimentel JBohnenstiehl BBaas B(2017)Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput TradeoffsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.258014225:1(100-113)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TVLSI.2016.2580142
Wijayaratna SRajapaksha NMadanayake ABruton L(2017)Fast FPGA-architecture for fan/beam-steering in wave-digital RF aperture arraysMultidimensional Systems and Signal Processing10.1007/s11045-016-0381-828:2(771-789)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s11045-016-0381-8
Show More Cited By

Index Terms

Design Issues in Division and Other Floating-Point Operations

Recommendations

A Combined Decimal and Binary Floating-Point Multiplier
ASAP '09: Proceedings of the 2009 20th IEEE International Conference on Application-specific Systems, Architectures and Processors

In this paper, we describe the first hardware design of a combined binary and decimal floating-point multiplier, based on specifications in the IEEE 754-2008 Floating-point Standard. The multiplier design operates on either (1) 64-bit binary encoded ...
A Decimal Floating-Point Divider Using Newton---Raphson Iteration

Increasing chip densities and transistor counts provide more room for designers to add functionality for important application domains into future microprocessors. As a result of rapid growth in financial, commercial, and Internet-based applications, ...
Hardware Designs for Decimal Floating-Point Addition and Related Operations

Decimal arithmetic is often used in commercial, financial, and Internet-based applications. Due to the growing importance of decimal floating-point (DFP) arithmetic, the IEEE 754 Draft Standard for Floating-Point Arithmetic (IEEE P754) includes ...

Comments

Information & Contributors

Information

Published In

cover image IEEE Transactions on Computers

IEEE Transactions on Computers Volume 46, Issue 2

February 1997

128 pages

ISSN:0018-9340

Editor:
Jane W. S. Liu
Univ. of Illinois, Urbana

Issue’s Table of Contents

Copyright © Copyright © 1997 IEEE. All Rights Reserved.

Publisher

IEEE Computer Society

United States

Publication History

Published: 01 February 1997

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

29
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 27 Jul 2024

Other Metrics

View Author Metrics

Citations

Cited By

Vaeztourshizi MKamal MAfzali-Kusha APedram M(2018)An Energy-Efficient, Yet Highly-Accurate, Approximate Non-Iterative DividerProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218650(1-6)Online publication date: 23-Jul-2018
https://dl.acm.org/doi/10.1145/3218603.3218650
Pimentel JBohnenstiehl BBaas B(2017)Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput TradeoffsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.258014225:1(100-113)Online publication date: 1-Jan-2017
https://dl.acm.org/doi/10.1109/TVLSI.2016.2580142
Wijayaratna SRajapaksha NMadanayake ABruton L(2017)Fast FPGA-architecture for fan/beam-steering in wave-digital RF aperture arraysMultidimensional Systems and Signal Processing10.1007/s11045-016-0381-828:2(771-789)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1007/s11045-016-0381-8
Hosseiny AJaberipur G(2016)Decimal GoldschmidtComputers and Electrical Engineering10.1016/j.compeleceng.2016.06.00553:C(40-55)Online publication date: 1-Jul-2016
https://dl.acm.org/doi/10.1016/j.compeleceng.2016.06.005
Ebergen JJamadagni N(2015)Radix-2 Division Algorithms with an Over-Redundant Digit SetIEEE Transactions on Computers10.1109/TC.2014.236673864:9(2652-2663)Online publication date: 1-Sep-2015
https://dl.acm.org/doi/10.1109/TC.2014.2366738
Huang KChen Y(2015)Improving Performance of Floating Point Division on GPU and MICProceedings, Part II, of the 15th International Conference on Algorithms and Architectures for Parallel Processing - Volume 952910.1007/978-3-319-27122-4_48(691-703)Online publication date: 18-Nov-2015
https://dl.acm.org/doi/10.1007/978-3-319-27122-4_48
Sajid IAhmed MZiavras S(2012)Novel Pipelined Architecture for Efficient Evaluation of the Square Root Using a Modified Non-Restoring AlgorithmJournal of Signal Processing Systems10.1007/s11265-010-0530-567:2(157-166)Online publication date: 1-May-2012
https://dl.acm.org/doi/10.1007/s11265-010-0530-5
Kong ISwartzlander E(2011)A goldschmidt division method with faster than quadratic convergenceIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2009.203692619:4(696-700)Online publication date: 1-Apr-2011
https://dl.acm.org/doi/10.1109/TVLSI.2009.2036926
Jin ZPittman RForin ACheung PWawrzynek J(2010)Reconfigurable custom floating-point instructions (abstract only)Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays10.1145/1723112.1723173(287-287)Online publication date: 21-Feb-2010
https://dl.acm.org/doi/10.1145/1723112.1723173
Amaricai AVladutiu MBoncalo O(2010)Design issues and implementations for floating-point divide-add fusedIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2010.204347357:4(295-299)Online publication date: 1-Apr-2010
https://dl.acm.org/doi/10.1109/TCSII.2010.2043473
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Issue’s Table of Contents