research-article

Low-Cost Multiple-Precision Multiplication Unit Design For Deep Learning

Authors:

Qianming YangAuthors Info & Claims

GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023

Pages 9 - 14

https://doi.org/10.1145/3583781.3590269

Published: 05 June 2023 Publication History

Abstract

Low-precision formats have been proposed and applied to deep learning algorithms to speed up training and inference. This paper proposes a novel multiple-precision multiplication unit(MU) for deep learning. The proposed MU supports four types of precision for floating-point(FP) numbers-FP8-E4M3, FP8-E5M2, FP16, FP32-and 8-bit fixed-point(FIX) numbers. The MU can execute four parallel FP8 and eight parallel FIX8 multiplications simultaneously in one cycle, or four parallel FP16 multiplications fully pipelined with a latency of one, or one FP32 multiplication with a latency of one cycle. The simultaneous execution of FIX8 and FP8 can meet the requirements of the specific deep learning algorithms. Thanks to the low-precision-combination(LPC) and vectorization design method, multiplication in any precision can get 100% utilization of the multiplier resources, and the MU can adopt a lower clock delay to achieve better performance in all data types. Compared with the existing multiple-precision units designed for deep learning, this MU can support more types of low-precision formats by lower area overhead; and exhibits higher throughput at FIX8 with at least 8× improvement.

References

[1]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60:84--90, 2012.

Digital Library

[2]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770--778, 2015.

[3]

Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. Nature, 521:436--444, 2015.

[4]

Dhiraj D. Kalamkar, Dheevatsa Mudigere, and Naveen Mellempudi et al. A study of bfloat16 for deep learning training. ArXiv, abs/1905.12322, 2019.

[5]

Feng Zhu, Ruihao Gong, and Fengwei Yu et al. Towards unified int8 training for convolutional neural network. 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 1966--1976, 2019.

[6]

Sae Kyu Lee, Ankur Agrawal, and Joel Silberman et al. A 7-nm four-core mixed- precision ai chip with 26.2-tflops hybrid-fp8 training, 104.9-tops int4 inference, and workload-aware throttling. IEEE Journal of Solid-State Circuits, 57(1):182--197, 2022.

[7]

Swagath Venkataramani, Vijayalakshmi Srinivasan, and W. Wang et al. Rapid: Ai accelerator for ultra-low precision training and inference. 2021 ACM/IEEE 48th Annual International Symposium on Computer Architecture (ISCA), pages 153--166, 2021.

Digital Library

[8]

Junbin Wang, Shaoxia Fang, and Xi Wang et al. High-performance mixed-low- precision cnn inference accelerator on fpga. IEEE Micro, 41:31--38, 2021.

[9]

Kai Li, Wei Mao, and Junzhuo Zhou et al. A vector systolic accelerator for multi-precision floating-point high-performance computing. IEEE Transactions on Circuits and Systems II: Express Briefs, 69:4123--4127, 2022.

[10]

Hao Zhang, Dongdong Chen, and Seok-Bum Ko. Efficient multiple-precision floating-point fused multiply-add with mixed-precision support. IEEE Transactions on Computers, 68:1035--1048, 2019.

[11]

H. Zhang, Hyuk-Jae Lee, and Seok-Bum Ko. Efficient fixed/floating-point merged mixed-precision multiply-accumulate unit for deep learning processors. 2018 IEEE International Symposium on Circuits and Systems (ISCAS), pages 1--5, 2018.

[12]

Hongbing Tan, Gan Tong, and Libo Huang et al. Multiple-mode-supporting floating-point fma unit for deep learning processors. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 31(2):253--266, 2023.

[13]

Tuan D. Nguyen and James E. Stine. A combined ieee half and single precision floating point multipliers for deep learning. 2017 51st Asilomar Conference on Signals, Systems, and Computers, pages 1038--1042, 2017.

[14]

Dan Zuras, Michael F. Cowlishaw, and Alex Aiken et al. Ieee standard for floating- point arithmetic. 2008.

[15]

Paulius Micikevicius, Dusan Stosic, and Neil Burgess et al. Fp8 formats for deep learning. ArXiv, abs/2209.05433, 2022.

[16]

Andrew Donald Booth. A signed binary multiplication technique. Quarterly Journal of Mechanics and Applied Mathematics, 4:236--240, 1951.

[17]

Gary Bewick. Fast multiplication: Algorithms and implementations. 1994

Index Terms

Low-Cost Multiple-Precision Multiplication Unit Design For Deep Learning
1. Hardware
  1. Integrated circuits
    1. Reconfigurable logic and FPGAs
      1. Hardware accelerators
      2. Programmable logic elements

Recommendations

Efficient Multiple-Precision and Mixed-Precision Floating-Point Fused Multiply-Accumulate Unit for HPC and AI Applications
Algorithms and Architectures for Parallel Processing
Abstract
In this paper, a multiple-precision and mixed-precision floating-point fused multiply-accumulate (FMA) unit is proposed base on the practical requirements of high performance computing (HPC) and artificial intelligence (AI) applications. In ...
Low-Cost Binary128 Floating-Point FMA Unit Design with SIMD Support

Binary64 arithmetic is rapidly becoming inadequate to cope with today's large-scale computations due to an accumulation of errors. Therefore, binary128 arithmetic is now required to increase the accuracy and reliability of these computations. At the ...
Realizing Arbitrary-Precision Modular Multiplication with a Fixed-Precision Multiplier Datapath
RECONFIG '09: Proceedings of the 2009 International Conference on Reconfigurable Computing and FPGAs

Within the context of cryptographic hardware, the term scalability refers to the ability to process operands of any size, regardless of the precision of the underlying datapath or registers. In this paper we present a simple yet effective technique for ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

GLSVLSI '23: Proceedings of the Great Lakes Symposium on VLSI 2023

June 2023

731 pages

ISBN:9798400701252

DOI:10.1145/3583781

General Chairs:
Himanshu Thapliyal
University of Tennessee, Knoxville, USA
,
Ronald DeMara
University of Central Florida, USA
,
Program Chairs:
Inna Partin-Vaisband
University of Illinois Chicago, USA
,
Srinivas Katkoori
University of South Florida, USA

Copyright © 2023 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Sponsors

SIGDA: ACM Special Interest Group on Design Automation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 05 June 2023

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

NSFC
HNNSFC

Conference

GLSVLSI '23

Sponsor:

SIGDA

GLSVLSI '23: Great Lakes Symposium on VLSI 2023

June 5 - 7, 2023

TN, Knoxville, USA

Acceptance Rates

Overall Acceptance Rate 312 of 1,156 submissions, 27%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
227
Total Downloads

Downloads (Last 12 months)168
Downloads (Last 6 weeks)12

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents