Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Design Issues in Division and Other Floating-Point Operations

Published: 01 February 1997 Publication History
  • Get Citation Alerts
  • Abstract

    Floating-point division is generally regarded as a low frequency, high latency operation in typical floating-point applications. However, in the worst case, a high latency hardware floating-point divider can contribute an additional 0.50 CPI to a system executing SPECfp92 applications. This paper presents the system performance impact of floating-point division latency for varying instruction issue rates. It also examines the performance implications of shared multiplication hardware, shared square root, on-the-fly rounding and conversion, and fused functional units. Using a system level study as a basis, it is shown how typical floating-point applications can guide the designer in making implementation decisions and trade-offs.

    References

    [1]
    S.F. Anderson J.G. Earle R.E. Goldschmidt and D.M. Powers, "The IBMSystem/360 Model 91: Floating-Point Execution Unit," IBM J. Research and Development, vol. 11, pp. 34-52, Jan. 1967.
    [2]
    T. Asprey G. Averill E. DeLano R. Mason B. Weiner and J. Yetter, "Performance Features of the PA7100 Microprocessor," IEEE Micro, vol. 13, no. 3, pp. 22-35, June 1993.
    [3]
    D.E. Atkins, "Higher-Radix Division Using Estimates of the Divisor and Partial Remainders," IEEE Trans. Computers, vol. 17, no. 10, pp. 925-934, Oct. 1968.
    [4]
    N. Burgess and T. Williams, "Choices of Operand Truncation in the SRT Division Algorithm," IEEE Trans. Computers, vol. 44, no. 7, pp. 933-937, July 1995.
    [5]
    G. Cybenko L. Kipp L. Pointer and D. Kuck, "Supercomputer Performance Evaluation and the Perfect Benchmarks," Proc. Int'l Conf. Supercomputing, pp. 254-266, June 1990.
    [6]
    M. Darley B. Kronlage D. Bural B. Churchill D. Pulling P. Wang R. Iwamoto and L. Yang, "The TMS390C602A Floating-Point Coprocessor for Sparc Systems," IEEE Micro, vol. 10, no. 3, pp. 36-47, June 1990.
    [7]
    M.D. Ercegovac and T. Lang, Division and Square Root: Digit-Recurrence Algorithms and Implementations. Kluwer Academic Publishers, 1994.
    [8]
    M.D. Ercegovac T. Lang and P. Montuschi, "Very High Radix Division with Selection by Rounding and Prescaling," Proc. 11th IEEE Symp. Computer Arithmetic, pp. 112-199, July 1993.
    [9]
    M. Flynn, "On Division by Functional Iteration," IEEE Trans. Computers, vol. 19, no. 8, pp. 702-706, Aug. 1970.
    [10]
    S. Fu N. Quach and M. Flynn, "Architecture Evaluator's Work Bench and Its Application to Microprocessor Floating Point Units," Technical Report no. CSL-TR-95-668, Computer Systems Laboratory, Stanford Univ., June 1995.
    [11]
    R.E. Goldschmidt, "Applications of Division by Convergence," MS thesis, Dept. of Electrical Eng., Massachusetts Inst. of Technology, June 1964.
    [12]
    J.C. Huck and M.J. Flynn, Analyzing Computer Architectures. Washington, D.C.: IEEE CS Press, 1989.
    [13]
    Microprocessor Report, various issues, 1994-1996.
    [14]
    J.M. Mulder N.T. Quach and M. Flynn, "An Area Model for On-Chip Memories and Its Application," IEEE J. Solid-State Circuits, vol. 26, no. 2, pp. 98-105, Feb. 1991.
    [15]
    NAS Parallel Benchmarks 8/91.
    [16]
    S. Oberman N. Quach and M. Flynn, "The Design and Implementation of a High-Performance Floating-Point Divider," Technical Report no. CSL-TR-94-599, Computer Systems Laboratory, Stanford Univ., Jan. 1994.
    [17]
    S.F. Oberman and M.J. Flynn, "Measuring the Complexity of SRT Tables," Technical Report no. CSL-TR-95-679, Computer Systems Laboratory, Stanford Univ., Nov. 1995.
    [18]
    DEC Fortran Language Reference Manual, 1992.
    [19]
    M.D. Smith, "Tracing with Pixie," Technical Report no. CSL-TR-91-497, Computer Systems Laboratory, Stanford Univ., Nov. 1991.
    [20]
    SPEC Benchmark Suite Release 2/92.
    [21]
    K.G. Tan, "The Theory and Implementation of High-Radix Division," Proc. Fourth IEEE Symp. Computer Arithmetic, pp. 154-163, June 1978.
    [22]
    G.S. Taylor, "Radix 16 SRT Dividers with Overlapped Quotient Selection Stages," Proc. Seventh IEEE Symp. Computer Arithmetic, pp. 64-71, June 1985.
    [23]
    S. Waser and M. Flynn, Introduction to Arithmetic for Digital Systems Designers. New York: Holt, Rinehart, and Winston, 1982.
    [24]
    T.E. Williams and M.A. Horowitz, "A Zero-Overhead Self-Timed 160-ns 54-b CMOS Divider," IEEE J. Solid-State Circuits, vol. 26, no. 11, pp. 1,651-1,661, Nov. 1991.
    [25]
    D. Wong and M. Flynn, "Fast Division Using Accurate Quotient Approximations to Reduce the Number of Iterations," IEEE Trans. Computers, vol. 41, no. 8, pp. 981-995, Aug. 1992.

    Cited By

    View all
    • (2018)An Energy-Efficient, Yet Highly-Accurate, Approximate Non-Iterative DividerProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218650(1-6)Online publication date: 23-Jul-2018
    • (2017)Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput TradeoffsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.258014225:1(100-113)Online publication date: 1-Jan-2017
    • (2017)Fast FPGA-architecture for fan/beam-steering in wave-digital RF aperture arraysMultidimensional Systems and Signal Processing10.1007/s11045-016-0381-828:2(771-789)Online publication date: 1-Apr-2017
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image IEEE Transactions on Computers
    IEEE Transactions on Computers  Volume 46, Issue 2
    February 1997
    128 pages
    ISSN:0018-9340
    Issue’s Table of Contents

    Publisher

    IEEE Computer Society

    United States

    Publication History

    Published: 01 February 1997

    Author Tags

    1. Benchmarks
    2. computer arithmetic
    3. division
    4. floating-point
    5. multiplication
    6. square root
    7. system performance.

    Qualifiers

    • Research-article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 27 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2018)An Energy-Efficient, Yet Highly-Accurate, Approximate Non-Iterative DividerProceedings of the International Symposium on Low Power Electronics and Design10.1145/3218603.3218650(1-6)Online publication date: 23-Jul-2018
    • (2017)Hybrid Hardware/Software Floating-Point Implementations for Optimized Area and Throughput TradeoffsIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2016.258014225:1(100-113)Online publication date: 1-Jan-2017
    • (2017)Fast FPGA-architecture for fan/beam-steering in wave-digital RF aperture arraysMultidimensional Systems and Signal Processing10.1007/s11045-016-0381-828:2(771-789)Online publication date: 1-Apr-2017
    • (2016)Decimal GoldschmidtComputers and Electrical Engineering10.1016/j.compeleceng.2016.06.00553:C(40-55)Online publication date: 1-Jul-2016
    • (2015)Radix-2 Division Algorithms with an Over-Redundant Digit SetIEEE Transactions on Computers10.1109/TC.2014.236673864:9(2652-2663)Online publication date: 1-Sep-2015
    • (2015)Improving Performance of Floating Point Division on GPU and MICProceedings, Part II, of the 15th International Conference on Algorithms and Architectures for Parallel Processing - Volume 952910.1007/978-3-319-27122-4_48(691-703)Online publication date: 18-Nov-2015
    • (2012)Novel Pipelined Architecture for Efficient Evaluation of the Square Root Using a Modified Non-Restoring AlgorithmJournal of Signal Processing Systems10.1007/s11265-010-0530-567:2(157-166)Online publication date: 1-May-2012
    • (2011)A goldschmidt division method with faster than quadratic convergenceIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2009.203692619:4(696-700)Online publication date: 1-Apr-2011
    • (2010)Reconfigurable custom floating-point instructions (abstract only)Proceedings of the 18th annual ACM/SIGDA international symposium on Field programmable gate arrays10.1145/1723112.1723173(287-287)Online publication date: 21-Feb-2010
    • (2010)Design issues and implementations for floating-point divide-add fusedIEEE Transactions on Circuits and Systems II: Express Briefs10.1109/TCSII.2010.204347357:4(295-299)Online publication date: 1-Apr-2010
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media