research-article

A Reconfigurable Multiplier for Signed Multiplications with Asymmetric Bit-Widths

Authors:

Grace Li Zhang,

Xunzhao Yin, and

Cheng ZhuoAuthors Info & Claims

ACM Journal on Emerging Technologies in Computing Systems (JETC), Volume 17, Issue 4

Article No.: 48, Pages 1 - 16

https://doi.org/10.1145/3446213

Published: 30 June 2021 Publication History

Abstract

Multiplications have been commonly conducted in quantized CNNs, filters, and reconfigurable cores, and so on, which are widely deployed in mobile and embedded applications. Most multipliers are designed to perform multiplications with symmetric bit-widths, i.e., n- by n-bit multiplication. Such features would cause extra area overhead and performance loss when m- by n-bit multiplications (m > n) are deployed in the same hardware design, resulting in inefficient multiplication operations. It is highly desired and challenging to propose a reconfigurable multiplier design to accommodate operands with both symmetric and asymmetric bit-widths. In this work, we propose a reconfigurable approximate multiplier to support multiplications at various precisions, i.e., bit-widths. Unlike prior works of approximate adders assuming a uniform weight distribution with bit-wise independence, scenarios like a quantized CNN may have a centralized weight distribution and hence follow a Gaussian-like distribution with correlated adjacent bits. Thus, a new block-based approximate adder is also proposed as part of the multiplier to ensure energy-efficient operation with an awareness of the bit-wise correlation. Our experimental results show that the proposed approximate adder significantly reduces the error rate by 76% to 98% over a state-of-the-art approximate adder for Gaussian-like distribution scenarios. Evaluation results show that the proposed multiplier is 19% faster and 22% more power saving than a Xilinx multiplier IP at the same bit precision and achieves a 23.94-dB peak signal-to-noise ratio, which is comparable to the accurate one of 24.10 dB when deployed in a Gaussian filter for image processing tasks.

References

[1]

Elisardo Antelo, Paolo Montuschi, and Alberto Nannarelli. 2017. Improved 64-bit Radix-16 booth multiplier based on partial product array height reduction. IEEE Transactions on Circuits and Systems I: Regular Papers 64, 2 (2017), 409–418. https://doi.org/10.1109/TCSI.2016.2561518

[2]

Manish Bansal, Sangeeta Nakhate, and Ajay Somkuwar. 2011. High performance pipelined signed 64x64-bit multiplier using Radix-32 modified booth algorithm and Wallace structure. In Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks (CICN’11). 411–415. https://doi.org/10.1109/CICN.2011.86

Digital Library

[3]

Kartikeya Bhardwaj, Pravin S. Mane, and Jorg Henkel. 2014. Power- and area-efficient approximate Wallace tree multiplier for error-resilient systems. In Proceedings of the 15th International Symposium on Quality Electronic Design (ISQED’14). 263–269. https://doi.org/10.1109/ISQED.2014.6783335

[4]

Indranil Chakraborty, Deboleena Roy, Aayush Ankit, and Kaushik Roy. 2019. Efficient hybrid network architectures for extremely quantized neural networks enabling intelligence at the edge. arxiv:1902.00460.

[5]

Chuangtao Chen, Sen Yang, Weikang Qian, Mohsen Imani, Xunzhao Yin, and Cheng Zhuo. 2020. Optimally approximated and unbiased floating-point multiplier with runtime configurability. In Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD’20). ACM, New York, NY, Article 121, 9 pages. https://doi.org/10.1145/3400302.3415702

Digital Library

[6]

Jian Cheng, Jiaxiang Wu, Cong Leng, Yuhang Wang, and Qinghao Hu. 2018. Quantized CNN: A unified approach to accelerate and compress convolutional networks. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 4730–4743. https://doi.org/10.1109/TNNLS.2017.2774288

[7]

Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, Article 113, 9 pages. https://doi.org/10.1145/2463209.2488873

Digital Library

[8]

Jianing Deng, Zhiguo Shi, and Cheng Zhuo. 2020. Energy-efficient real-time UAV object detection on embedded platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2020), 3123–3127. https://doi.org/10.1109/TCAD.2019.2957724

[9]

Vaibhav Gupta, Debabrata Mohapatra, Sang Phill Park, Anand Raghunathan, and Kaushik Roy. 2011. IMPACT: IMPrecise adders for low-power approximate computing. In Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’11). 409–414. https://doi.org/10.1109/ISLPED.2011.5993675

Digital Library

[10]

Jie Han and Michael Orshansky. 2013. Approximate computing: An emerging paradigm for energy-efficient design. In Proceedings of the 2013 18th IEEE European Test Symposium (ETS’13). 1–6. https://doi.org/10.1109/ETS.2013.6569370

[11]

Muhammad Abdullah Hanif, Rehan Hafiz, Osman Hasan, and Muhammad Shafique. 2017. QuAd: Design and analysis of quality-area optimal low-latency approximate adders. In Proceedings of the 54th Annual Design Automation Conference 2017 (DAC’17). ACM, New York, NY, Article 42, 6 pages. https://doi.org/10.1145/3061639.3062306

Digital Library

[12]

Soheil Hashemi, Nicholas Anthony, Hokchhay Tann, R. Iris Bahar, and Sherief Reda. 2017. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In Proceedings of the 2017 Design, Automation, and Test in Europe Conference and Exhibition (DATE’17). 1474–1479. https://doi.org/10.23919/DATE.2017.7927224

Digital Library

[13]

Chandan Kumar Jha and Joycee Mekie. 2019. SEDA—Single exact dual approximate adders for approximate processors. In Proceedings of the 56th Annual Design Automation Conference (DAC’19). ACM, New York, NY, Article 237, 2 pages. https://doi.org/10.1145/3316781.3322475

Digital Library

[14]

Andrew B. Kahng and Seokhyeong Kang. 2012. Accuracy-configurable adder for approximate arithmetic designs. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY, 820–825. https://doi.org/10.1145/2228360.2228509

Digital Library

[15]

Sukhmeet Kaur, Manpreet Signh Manna Suman, and Signh Manna. 2013. Implementation of modified Booth algorithm (Radix 4) and its comparison with Booth algorithm (Radix-2). Advances in Electronic and Electric Engineering 3, 6 (2013), 683–690.

[16]

Khaing Yin Kyaw, Wang Ling Goh, and Kiat Seng Yeo. 2010. Low-power high-speed multiplier for error-tolerant application. In Proceedings of the 2010 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC’10). 1–4. https://doi.org/10.1109/EDSSC.2010.5713751

[17]

Doyun Kim, Han Young Yim, Sanghyuck Ha, Changgwun Lee, and Inyup Kang. 2018. Convolutional neural network quantization using generalized gamma distribution. arxiv:1810.13329.

[18]

Jong Hwan Ko, Duckhwan Kim, Taesik Na, Jaeha Kung, and Saibal Mukhopadhyay. 2017. Adaptive weight compression for memory-efficient neural networks. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’17). 199–204. https://doi.org/10.23919/DATE.2017.7926982

Digital Library

[19]

Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arxiv:1806.08342.

[20]

Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In Proceedings of the 2011 24th International Conference on VLSI Design (VLSID’11). 346–351. https://doi.org/10.1109/VLSID.2011.51

Digital Library

[21]

Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim, and Hoi-Jun Yoo. 2019. UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE Journal of Solid-State Circuits 54, 1 (2019), 173–185. https://doi.org/10.1109/JSSC.2018.2865489

[22]

Li Li and Hai Zhou. 2014. On error modeling and analysis of approximate adders. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). 511–518. https://doi.org/10.1109/ICCAD.2014.7001399

Digital Library

[23]

Weiqiang Liu, Liangyu Qian, Chenghua Wang, Honglan Jiang, Jie Han, and Fabrizio Lombardi. 2017. Design of approximate Radix-4 Booth multipliers for error-tolerant computing. IEEE Transactions on Computers 66, 8 (2017), 1435–1441. https://doi.org/10.1109/TC.2017.2672976

Digital Library

[24]

Zhongyang Liu, Shaoheng Luo, Xiaowei Xu, Yiyu Shi, and Cheng Zhuo. 2018. A multi-level-optimization framework for FPGA-based cellular neural network implementation. Journal on Emerging Technologies in Computing Systems 14, 4 (Nov. 2018), Article 47, 17 pages. https://doi.org/10.1145/3273957

Digital Library

[25]

Hang Lu, Xin Wei, Ning Lin, Guihai Yan, and Xiaowei Li. 2018. Tetris: Re-architecting convolutional neural network computation for machine learning accelerators. In Proceedings of the 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 1–8. https://doi.org/10.1145/3240765.3240855

Digital Library

[26]

Hamid Reza Mahdiani, Ali-Akbar Ahmadi, Sied Mehdi Fakhraie, and Caro Lucas. 2010. Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 4 (2010), 850–862. https://doi.org/10.1109/TCSI.2009.2027626

Digital Library

[27]

Sana Mazahir, Osman Hasan, Rehan Hafiz, Muhammad Shafique, and Jorg Henkel. 2017. Probabilistic error modeling for approximate adders. IEEE Transactions on Computers 66, 3 (2017), 515–530. https://doi.org/10.1109/TC.2016.2605382

Digital Library

[28]

Bert Moons and Marian Verhelst. 2015. DVAS: Dynamic voltage accuracy scaling for increased energy-efficiency in approximate computing. In Proceedings of the 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’15). 237–242. https://doi.org/10.1109/ISLPED.2015.7273520

[29]

Muhammad Shafique, Waqas Ahmad, Rehan Hafiz, and Jorg Henkel. 2015. A low latency generic accuracy configurable adder. In Proceedings of the 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC’15). 1–6. https://doi.org/10.1145/2744769.2744778

Digital Library

[30]

Honey Durga Tiwari, Ganzorig Gankhuyag, Chan Mo Kim, and Yong Beom Cho. 2008. Multiplier design based on ancient Indian Vedic mathematics. In Proceedings of the 2008 International SoC Design Conference, Vol. 2. II-65–II-68. https://doi.org/10.1109/SOCDC.2008.4815685

[31]

Ajay K. Verma, Philip Brisk, and Paolo Ienne. 2008. Variable latency speculative addition: A new paradigm for arithmetic circuit design. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’08). ACM, New York, NY, 1250–1255. https://doi.org/10.1145/1403375.1403679

Digital Library

[32]

C. S. Wallace. 1964. A suggestion for a fast multiplier. IEEE Transactions on Electronic Computers EC-13, 1 (1964), 14–17. https://doi.org/10.1109/PGEC.1964.263830

[33]

Rong Ye, Ting Wang, Feng Yuan, Rakesh Kumar, and Qiang Xu. 2013. On reconfiguration-oriented approximate adder design and its application. In Proceedings of the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’13). 48–54. https://doi.org/10.1109/ICCAD.2013.6691096

Digital Library

[34]

Chengwei Zhou, Yujie Gu, Xing Fan, Zhiguo Shi, Guoqiang Mao, and Yinmin D. Zhang. 2018. Direction-of-arrival estimation for coprime array via virtual array interpolation. IEEE Transactions on Signal Processing 66, 22 (2018), 5956–5971. https://doi.org/10.1109/TSP.2018.2872012

Digital Library

[35]

Chengwei Zhou, Yujie Gu, Shibo He, and Zhiguo Shi. 2018. A robust and efficient algorithm for coprime array adaptive beamforming. IEEE Transactions on Vehicular Technology 67, 2 (2018), 1099–1112. https://doi.org/10.1109/TVT.2017.2704610

[36]

Rui Zhou and Weikang Qian. 2016. A general sign bit error correction scheme for approximate adders. In Proceedings of the 26th Edition of the Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, New York, NY, 221–226. https://doi.org/10.1145/2902961.2903012

Digital Library

[37]

Xian Zhou, Li Zhang, Chuliang Guo, Xunzhao Yin, and Cheng Zhuo. 2020. A convolutional neural network accelerator architecture with fine-granular mixed precision configurability. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS’20). 1–5. https://doi.org/10.1109/ISCAS45731.2020.9180844

[38]

Ning Zhu, Wang Ling Goh, Gang Wang, and Kiat Seng Yeo. 2010. Enhanced low-power high-speed adder for error-tolerant application. In Proceedings of the 2010 International SoC Design Conference (ISOCC’10). 323–327. https://doi.org/10.1109/SOCDC.2010.5682905

[39]

Cheng Zhuo, Shaoheng Luo, Houle Gan, Jiang Hu, and Zhiguo Shi. 2020. Noise-aware DVFS for efficient transitions on battery-powered IoT devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 7 (2020), 1498–1510. https://doi.org/10.1109/TCAD.2019.2917844

Digital Library

[40]

Cheng Zhuo, Kassan Unda, Yiyu Shi, and Wei-Kai Shih. 2019. From layout to system: Early stage power delivery and architecture co-exploration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 7 (2019), 1291–1304. https://doi.org/10.1109/TCAD.2018.2834438

Cited By

Jiang ADu LDu Y(2024)GroupQ: Group-Wise Quantization With Multi-Objective Optimization for CNN AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336307343:7(2071-2083)Online publication date: Jul-2024
https://doi.org/10.1109/TCAD.2024.3363073
Chen CQian WImani MYin XZhuo C(2022)PAM: A Piecewise-Linearly-Approximated Floating-Point Multiplier With Unbiasedness and ConfigurabilityIEEE Transactions on Computers10.1109/TC.2021.313185071:10(2473-2486)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/TC.2021.3131850
Zheng HZhou CShi ZYan C(2022)Joint Coprime Weights Optimization for Sub-Nyquist Tensor Beamforming2022 IEEE Radar Conference (RadarConf22)10.1109/RadarConf2248738.2022.9764278(1-6)Online publication date: 21-Mar-2022
https://doi.org/10.1109/RadarConf2248738.2022.9764278
Show More Cited By

Index Terms

A Reconfigurable Multiplier for Signed Multiplications with Asymmetric Bit-Widths
1. Computer systems organization
  1. Dependable and fault-tolerant systems and networks
    1. Redundancy
  2. Embedded and cyber-physical systems
    1. Embedded systems
    2. Robotics
2. Networks
  1. Network properties
    1. Network reliability

Recommendations

A Signed Bit-Sequential Multiplier

Bit-sequential algorithms for arithmetic processing are good candidates for VLSI signal processing circuits because of their canonical structure and minimal interconnection requirements. Several recent papers have dealt with algorithms that accept ...
Read More
A High-speed 32-bit Signed/Unsigned Pipelined Multiplier
DELTA '10: Proceedings of the 2010 Fifth IEEE International Symposium on Electronic Design, Test & Applications

In this paper, a novel unified implementation of signed/unsigned multiplication is proposed using a simple sign-control unit together with a line of multiplexers. The proposed approach is demonstrated through a 0.18um CMOS implementation of a 32-bit ...
Read More
A Canonical Bit-Sequential Multiplier
Lecture notes in computer science Vol. 174

A serial multiplier suitable for VLSI implementation is discussed. The multiplier accepts binary operands supplied in a serial fashion, least significant bits first. The multiplier uses a canonical cell which allows calculation of a 2k length product ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Journal on Emerging Technologies in Computing Systems

ACM Journal on Emerging Technologies in Computing Systems Volume 17, Issue 4

October 2021

446 pages

ISSN:1550-4832

EISSN:1550-4840

DOI:10.1145/3472280

Editor:
Ramesh Karri
Polytechnic Institute of New York University, USA

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 30 June 2021

Accepted: 01 December 2020

Revised: 01 August 2020

Received: 01 April 2020

Published in JETC Volume 17, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Funding Sources

National Key Research and Development Project
National Natural Science Foundation of China
Natural Science Foundation of Zhejiang Province

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

4
Total Citations
View Citations
266
Total Downloads

Downloads (Last 12 months)67
Downloads (Last 6 weeks)5

Other Metrics

View Author Metrics

Citations

Cited By

Jiang ADu LDu Y(2024)GroupQ: Group-Wise Quantization With Multi-Objective Optimization for CNN AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336307343:7(2071-2083)Online publication date: Jul-2024
https://doi.org/10.1109/TCAD.2024.3363073
Chen CQian WImani MYin XZhuo C(2022)PAM: A Piecewise-Linearly-Approximated Floating-Point Multiplier With Unbiasedness and ConfigurabilityIEEE Transactions on Computers10.1109/TC.2021.313185071:10(2473-2486)Online publication date: 1-Oct-2022
https://dl.acm.org/doi/10.1109/TC.2021.3131850
Zheng HZhou CShi ZYan C(2022)Joint Coprime Weights Optimization for Sub-Nyquist Tensor Beamforming2022 IEEE Radar Conference (RadarConf22)10.1109/RadarConf2248738.2022.9764278(1-6)Online publication date: 21-Mar-2022
https://doi.org/10.1109/RadarConf2248738.2022.9764278
Sun SXu ZChen XYin X(2022)An Approximating Twiddle Factor Coefficient Based Multiplier for Fixed-Point FFT2022 China Semiconductor Technology International Conference (CSTIC)10.1109/CSTIC55103.2022.9856879(1-4)Online publication date: 20-Jun-2022
https://doi.org/10.1109/CSTIC55103.2022.9856879

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents