Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

A Reconfigurable Multiplier for Signed Multiplications with Asymmetric Bit-Widths

Published: 30 June 2021 Publication History
  • Get Citation Alerts
  • Abstract

    Multiplications have been commonly conducted in quantized CNNs, filters, and reconfigurable cores, and so on, which are widely deployed in mobile and embedded applications. Most multipliers are designed to perform multiplications with symmetric bit-widths, i.e., n- by n-bit multiplication. Such features would cause extra area overhead and performance loss when m- by n-bit multiplications (m > n) are deployed in the same hardware design, resulting in inefficient multiplication operations. It is highly desired and challenging to propose a reconfigurable multiplier design to accommodate operands with both symmetric and asymmetric bit-widths. In this work, we propose a reconfigurable approximate multiplier to support multiplications at various precisions, i.e., bit-widths. Unlike prior works of approximate adders assuming a uniform weight distribution with bit-wise independence, scenarios like a quantized CNN may have a centralized weight distribution and hence follow a Gaussian-like distribution with correlated adjacent bits. Thus, a new block-based approximate adder is also proposed as part of the multiplier to ensure energy-efficient operation with an awareness of the bit-wise correlation. Our experimental results show that the proposed approximate adder significantly reduces the error rate by 76% to 98% over a state-of-the-art approximate adder for Gaussian-like distribution scenarios. Evaluation results show that the proposed multiplier is 19% faster and 22% more power saving than a Xilinx multiplier IP at the same bit precision and achieves a 23.94-dB peak signal-to-noise ratio, which is comparable to the accurate one of 24.10 dB when deployed in a Gaussian filter for image processing tasks.

    References

    [1]
    Elisardo Antelo, Paolo Montuschi, and Alberto Nannarelli. 2017. Improved 64-bit Radix-16 booth multiplier based on partial product array height reduction. IEEE Transactions on Circuits and Systems I: Regular Papers 64, 2 (2017), 409–418. https://doi.org/10.1109/TCSI.2016.2561518
    [2]
    Manish Bansal, Sangeeta Nakhate, and Ajay Somkuwar. 2011. High performance pipelined signed 64x64-bit multiplier using Radix-32 modified booth algorithm and Wallace structure. In Proceedings of the 2011 International Conference on Computational Intelligence and Communication Networks (CICN’11). 411–415. https://doi.org/10.1109/CICN.2011.86
    [3]
    Kartikeya Bhardwaj, Pravin S. Mane, and Jorg Henkel. 2014. Power- and area-efficient approximate Wallace tree multiplier for error-resilient systems. In Proceedings of the 15th International Symposium on Quality Electronic Design (ISQED’14). 263–269. https://doi.org/10.1109/ISQED.2014.6783335
    [4]
    Indranil Chakraborty, Deboleena Roy, Aayush Ankit, and Kaushik Roy. 2019. Efficient hybrid network architectures for extremely quantized neural networks enabling intelligence at the edge. arxiv:1902.00460.
    [5]
    Chuangtao Chen, Sen Yang, Weikang Qian, Mohsen Imani, Xunzhao Yin, and Cheng Zhuo. 2020. Optimally approximated and unbiased floating-point multiplier with runtime configurability. In Proceedings of the 39th International Conference on Computer-Aided Design (ICCAD’20). ACM, New York, NY, Article 121, 9 pages. https://doi.org/10.1145/3400302.3415702
    [6]
    Jian Cheng, Jiaxiang Wu, Cong Leng, Yuhang Wang, and Qinghao Hu. 2018. Quantized CNN: A unified approach to accelerate and compress convolutional networks. IEEE Transactions on Neural Networks and Learning Systems 29, 10 (2018), 4730–4743. https://doi.org/10.1109/TNNLS.2017.2774288
    [7]
    Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 50th Annual Design Automation Conference (DAC’13). ACM, New York, NY, Article 113, 9 pages. https://doi.org/10.1145/2463209.2488873
    [8]
    Jianing Deng, Zhiguo Shi, and Cheng Zhuo. 2020. Energy-efficient real-time UAV object detection on embedded platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2020), 3123–3127. https://doi.org/10.1109/TCAD.2019.2957724
    [9]
    Vaibhav Gupta, Debabrata Mohapatra, Sang Phill Park, Anand Raghunathan, and Kaushik Roy. 2011. IMPACT: IMPrecise adders for low-power approximate computing. In Proceedings of the IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’11). 409–414. https://doi.org/10.1109/ISLPED.2011.5993675
    [10]
    Jie Han and Michael Orshansky. 2013. Approximate computing: An emerging paradigm for energy-efficient design. In Proceedings of the 2013 18th IEEE European Test Symposium (ETS’13). 1–6. https://doi.org/10.1109/ETS.2013.6569370
    [11]
    Muhammad Abdullah Hanif, Rehan Hafiz, Osman Hasan, and Muhammad Shafique. 2017. QuAd: Design and analysis of quality-area optimal low-latency approximate adders. In Proceedings of the 54th Annual Design Automation Conference 2017 (DAC’17). ACM, New York, NY, Article 42, 6 pages. https://doi.org/10.1145/3061639.3062306
    [12]
    Soheil Hashemi, Nicholas Anthony, Hokchhay Tann, R. Iris Bahar, and Sherief Reda. 2017. Understanding the impact of precision quantization on the accuracy and energy of neural networks. In Proceedings of the 2017 Design, Automation, and Test in Europe Conference and Exhibition (DATE’17). 1474–1479. https://doi.org/10.23919/DATE.2017.7927224
    [13]
    Chandan Kumar Jha and Joycee Mekie. 2019. SEDA—Single exact dual approximate adders for approximate processors. In Proceedings of the 56th Annual Design Automation Conference (DAC’19). ACM, New York, NY, Article 237, 2 pages. https://doi.org/10.1145/3316781.3322475
    [14]
    Andrew B. Kahng and Seokhyeong Kang. 2012. Accuracy-configurable adder for approximate arithmetic designs. In Proceedings of the 49th Annual Design Automation Conference (DAC’12). ACM, New York, NY, 820–825. https://doi.org/10.1145/2228360.2228509
    [15]
    Sukhmeet Kaur, Manpreet Signh Manna Suman, and Signh Manna. 2013. Implementation of modified Booth algorithm (Radix 4) and its comparison with Booth algorithm (Radix-2). Advances in Electronic and Electric Engineering 3, 6 (2013), 683–690.
    [16]
    Khaing Yin Kyaw, Wang Ling Goh, and Kiat Seng Yeo. 2010. Low-power high-speed multiplier for error-tolerant application. In Proceedings of the 2010 IEEE International Conference on Electron Devices and Solid-State Circuits (EDSSC’10). 1–4. https://doi.org/10.1109/EDSSC.2010.5713751
    [17]
    Doyun Kim, Han Young Yim, Sanghyuck Ha, Changgwun Lee, and Inyup Kang. 2018. Convolutional neural network quantization using generalized gamma distribution. arxiv:1810.13329.
    [18]
    Jong Hwan Ko, Duckhwan Kim, Taesik Na, Jaeha Kung, and Saibal Mukhopadhyay. 2017. Adaptive weight compression for memory-efficient neural networks. In Proceedings of the Design, Automation, and Test in Europe Conference and Exhibition (DATE’17). 199–204. https://doi.org/10.23919/DATE.2017.7926982
    [19]
    Raghuraman Krishnamoorthi. 2018. Quantizing deep convolutional networks for efficient inference: A whitepaper. arxiv:1806.08342.
    [20]
    Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In Proceedings of the 2011 24th International Conference on VLSI Design (VLSID’11). 346–351. https://doi.org/10.1109/VLSID.2011.51
    [21]
    Jinmook Lee, Changhyeon Kim, Sanghoon Kang, Dongjoo Shin, Sangyeob Kim, and Hoi-Jun Yoo. 2019. UNPU: An energy-efficient deep neural network accelerator with fully variable weight bit precision. IEEE Journal of Solid-State Circuits 54, 1 (2019), 173–185. https://doi.org/10.1109/JSSC.2018.2865489
    [22]
    Li Li and Hai Zhou. 2014. On error modeling and analysis of approximate adders. In Proceedings of the 2014 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’14). 511–518. https://doi.org/10.1109/ICCAD.2014.7001399
    [23]
    Weiqiang Liu, Liangyu Qian, Chenghua Wang, Honglan Jiang, Jie Han, and Fabrizio Lombardi. 2017. Design of approximate Radix-4 Booth multipliers for error-tolerant computing. IEEE Transactions on Computers 66, 8 (2017), 1435–1441. https://doi.org/10.1109/TC.2017.2672976
    [24]
    Zhongyang Liu, Shaoheng Luo, Xiaowei Xu, Yiyu Shi, and Cheng Zhuo. 2018. A multi-level-optimization framework for FPGA-based cellular neural network implementation. Journal on Emerging Technologies in Computing Systems 14, 4 (Nov. 2018), Article 47, 17 pages. https://doi.org/10.1145/3273957
    [25]
    Hang Lu, Xin Wei, Ning Lin, Guihai Yan, and Xiaowei Li. 2018. Tetris: Re-architecting convolutional neural network computation for machine learning accelerators. In Proceedings of the 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’18). 1–8. https://doi.org/10.1145/3240765.3240855
    [26]
    Hamid Reza Mahdiani, Ali-Akbar Ahmadi, Sied Mehdi Fakhraie, and Caro Lucas. 2010. Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 4 (2010), 850–862. https://doi.org/10.1109/TCSI.2009.2027626
    [27]
    Sana Mazahir, Osman Hasan, Rehan Hafiz, Muhammad Shafique, and Jorg Henkel. 2017. Probabilistic error modeling for approximate adders. IEEE Transactions on Computers 66, 3 (2017), 515–530. https://doi.org/10.1109/TC.2016.2605382
    [28]
    Bert Moons and Marian Verhelst. 2015. DVAS: Dynamic voltage accuracy scaling for increased energy-efficiency in approximate computing. In Proceedings of the 2015 IEEE/ACM International Symposium on Low Power Electronics and Design (ISLPED’15). 237–242. https://doi.org/10.1109/ISLPED.2015.7273520
    [29]
    Muhammad Shafique, Waqas Ahmad, Rehan Hafiz, and Jorg Henkel. 2015. A low latency generic accuracy configurable adder. In Proceedings of the 2015 52nd ACM/EDAC/IEEE Design Automation Conference (DAC’15). 1–6. https://doi.org/10.1145/2744769.2744778
    [30]
    Honey Durga Tiwari, Ganzorig Gankhuyag, Chan Mo Kim, and Yong Beom Cho. 2008. Multiplier design based on ancient Indian Vedic mathematics. In Proceedings of the 2008 International SoC Design Conference, Vol. 2. II-65–II-68. https://doi.org/10.1109/SOCDC.2008.4815685
    [31]
    Ajay K. Verma, Philip Brisk, and Paolo Ienne. 2008. Variable latency speculative addition: A new paradigm for arithmetic circuit design. In Proceedings of the Conference on Design, Automation, and Test in Europe (DATE’08). ACM, New York, NY, 1250–1255. https://doi.org/10.1145/1403375.1403679
    [32]
    C. S. Wallace. 1964. A suggestion for a fast multiplier. IEEE Transactions on Electronic Computers EC-13, 1 (1964), 14–17. https://doi.org/10.1109/PGEC.1964.263830
    [33]
    Rong Ye, Ting Wang, Feng Yuan, Rakesh Kumar, and Qiang Xu. 2013. On reconfiguration-oriented approximate adder design and its application. In Proceedings of the 2013 IEEE/ACM International Conference on Computer-Aided Design (ICCAD’13). 48–54. https://doi.org/10.1109/ICCAD.2013.6691096
    [34]
    Chengwei Zhou, Yujie Gu, Xing Fan, Zhiguo Shi, Guoqiang Mao, and Yinmin D. Zhang. 2018. Direction-of-arrival estimation for coprime array via virtual array interpolation. IEEE Transactions on Signal Processing 66, 22 (2018), 5956–5971. https://doi.org/10.1109/TSP.2018.2872012
    [35]
    Chengwei Zhou, Yujie Gu, Shibo He, and Zhiguo Shi. 2018. A robust and efficient algorithm for coprime array adaptive beamforming. IEEE Transactions on Vehicular Technology 67, 2 (2018), 1099–1112. https://doi.org/10.1109/TVT.2017.2704610
    [36]
    Rui Zhou and Weikang Qian. 2016. A general sign bit error correction scheme for approximate adders. In Proceedings of the 26th Edition of the Great Lakes Symposium on VLSI (GLSVLSI’16). ACM, New York, NY, 221–226. https://doi.org/10.1145/2902961.2903012
    [37]
    Xian Zhou, Li Zhang, Chuliang Guo, Xunzhao Yin, and Cheng Zhuo. 2020. A convolutional neural network accelerator architecture with fine-granular mixed precision configurability. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS’20). 1–5. https://doi.org/10.1109/ISCAS45731.2020.9180844
    [38]
    Ning Zhu, Wang Ling Goh, Gang Wang, and Kiat Seng Yeo. 2010. Enhanced low-power high-speed adder for error-tolerant application. In Proceedings of the 2010 International SoC Design Conference (ISOCC’10). 323–327. https://doi.org/10.1109/SOCDC.2010.5682905
    [39]
    Cheng Zhuo, Shaoheng Luo, Houle Gan, Jiang Hu, and Zhiguo Shi. 2020. Noise-aware DVFS for efficient transitions on battery-powered IoT devices. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 7 (2020), 1498–1510. https://doi.org/10.1109/TCAD.2019.2917844
    [40]
    Cheng Zhuo, Kassan Unda, Yiyu Shi, and Wei-Kai Shih. 2019. From layout to system: Early stage power delivery and architecture co-exploration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 7 (2019), 1291–1304. https://doi.org/10.1109/TCAD.2018.2834438

    Cited By

    View all
    • (2024)GroupQ: Group-Wise Quantization With Multi-Objective Optimization for CNN AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336307343:7(2071-2083)Online publication date: Jul-2024
    • (2022)PAM: A Piecewise-Linearly-Approximated Floating-Point Multiplier With Unbiasedness and ConfigurabilityIEEE Transactions on Computers10.1109/TC.2021.313185071:10(2473-2486)Online publication date: 1-Oct-2022
    • (2022)Joint Coprime Weights Optimization for Sub-Nyquist Tensor Beamforming2022 IEEE Radar Conference (RadarConf22)10.1109/RadarConf2248738.2022.9764278(1-6)Online publication date: 21-Mar-2022
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Journal on Emerging Technologies in Computing Systems
    ACM Journal on Emerging Technologies in Computing Systems  Volume 17, Issue 4
    October 2021
    446 pages
    ISSN:1550-4832
    EISSN:1550-4840
    DOI:10.1145/3472280
    • Editor:
    • Ramesh Karri
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 30 June 2021
    Accepted: 01 December 2020
    Revised: 01 August 2020
    Received: 01 April 2020
    Published in JETC Volume 17, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Reconfigurable multiplier
    2. approximate computing
    3. approximate adder

    Qualifiers

    • Research-article
    • Refereed

    Funding Sources

    • National Key Research and Development Project
    • National Natural Science Foundation of China
    • Natural Science Foundation of Zhejiang Province

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)67
    • Downloads (Last 6 weeks)5

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)GroupQ: Group-Wise Quantization With Multi-Objective Optimization for CNN AcceleratorsIEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems10.1109/TCAD.2024.336307343:7(2071-2083)Online publication date: Jul-2024
    • (2022)PAM: A Piecewise-Linearly-Approximated Floating-Point Multiplier With Unbiasedness and ConfigurabilityIEEE Transactions on Computers10.1109/TC.2021.313185071:10(2473-2486)Online publication date: 1-Oct-2022
    • (2022)Joint Coprime Weights Optimization for Sub-Nyquist Tensor Beamforming2022 IEEE Radar Conference (RadarConf22)10.1109/RadarConf2248738.2022.9764278(1-6)Online publication date: 21-Mar-2022
    • (2022)An Approximating Twiddle Factor Coefficient Based Multiplier for Fixed-Point FFT2022 China Semiconductor Technology International Conference (CSTIC)10.1109/CSTIC55103.2022.9856879(1-4)Online publication date: 20-Jun-2022

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media