survey

A Survey on Approximate Multiplier Designs for Energy Efficiency: From Algorithms to Circuits

Authors:

Chuangtao Chen,

Cheng ZhuoAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems, Volume 29, Issue 1

Article No.: 23, Pages 1 - 37

https://doi.org/10.1145/3610291

Published: 13 January 2024 Publication History

Abstract

Given the stringent requirements of energy efficiency for Internet-of-Things edge devices, approximate multipliers, as a basic component of many processors and accelerators, have been constantly proposed and studied for decades, especially in error-resilient applications. The computation error and energy efficiency largely depend on how and where the approximation is introduced into a design. Thus, this article aims to provide a comprehensive review of the approximation techniques in multiplier designs ranging from algorithms and architectures to circuits. We have implemented representative approximate multiplier designs in each category to understand the impact of the design techniques on accuracy and efficiency. The designs can then be effectively deployed in high-level applications, such as machine learning, to gain energy efficiency at the cost of slight accuracy loss.

References

[1]

2019. IEEE standard for floating-point arithmetic. IEEE Std 754-2019 (Revision of IEEE 754-2008) (2019), 1–84.

[2]

Mohammad Ahmadinejad, Mohammad H. Moaiyeri, and Farnaz Sabetzadeh. 2019. Energy and area efficient imprecise compressors for approximate multiplication at nanoscale. AEU-International Journal of Electronics and Communications 110 (2019), 152859.

[3]

Syed E. Ahmed, Sanket Kadam, and M. B. Srinivas. 2016. An iterative logarithmic multiplier with improved precision. In 2016 IEEE 23rd Symposium on Computer Arithmetic (ARITH). IEEE, 104–111.

[4]

Syed Ershad Ahmed and M. B. Srinivas. 2019. An improved logarithmic multiplier for media processing. Journal of Signal Processing Systems 91 (2019), 561–574.

Digital Library

[5]

Omid Akbari, Mehdi Kamal, Ali Afzali-Kusha, and Massoud Pedram. 2017. Dual-quality 4:2 compressors for utilizing in dynamic accuracy configurable multipliers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 4 (2017), 1352–1361.

Digital Library

[6]

Omid Akbari, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram, and Muhammad Shafique. 2018. PX-CGRA: Polymorphic approximate coarse-grained reconfigurable architecture. In 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 413–418.

[7]

Ihsen Alouani, Hamzeh Ahangari, Ozcan Ozturk, and Smail Niar. 2017. A novel heterogeneous approximate multiplier for low power and high performance. IEEE Embedded Systems Letters 10, 2 (2017), 45–48.

[8]

Mohammad S. Ansari, Bruce F. Cockburn, and Jie Han. 2019. A hardware-efficient logarithmic multiplier with improved accuracy. In 2019 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 928–931.

[9]

Mohammad S. Ansari, Bruce F. Cockburn, and Jie Han. 2020. An improved logarithmic multiplier for energy-efficient neural computing. IEEE Trans. Comput. 70, 4 (2020), 614–625.

[10]

Mohammad S. Ansari, Vojtech Mrazek, Bruce F. Cockburn, Lukas Sekanina, Zdenek Vasicek, and Jie Han. 2019. Improving the accuracy and hardware efficiency of neural networks using approximate multipliers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 28, 2 (2019), 317–328.

[11]

Armineh Arasteh, Mohammad H. Moaiyeri, MohammadReza Taheri, Keivan Navi, and Nader Bagherzadeh. 2018. An energy and area efficient 4:2 compressor based on FinFETs. Integration 60 (2018), 224–231.

Digital Library

[12]

Sunil Ashtaputre, Carla D. Savage, and Wesley E. Snyder. 1985. Using an Approximate Multiplier in a One-dimensional Array Architecture for Real-time Convolution. Technical Report. North Carolina State University. Center for Communications and Signal Processing.

[13]

Luigi Atzori, Antonio Iera, and Giacomo Morabito. 2010. The Internet of Things: A survey. Computer Networks 54, 15 (2010), 2787–2805.

Digital Library

[14]

Zdenka Babić, Aleksej Avramović, and Patricio Bulić. 2011. An iterative logarithmic multiplier. Microprocessors and Microsystems 35, 1 (2011), 23–33.

Digital Library

[15]

Dursun Baran, Mustafa Aktan, and Vojin G. Oklobdzija. 2010. Energy efficient implementation of parallel CMOS multipliers with improved compressors. In Proceedings of the 16th ACM/IEEE International Symposium on Low Power Electronics and Design. 147–152.

Digital Library

[16]

Kartikeya Bhardwaj, Pravin S. Mane, and Jörg Henkel. 2014. Power- and area-efficient Approximate Wallace Tree Multiplier for error-resilient systems. In Fifteenth International Symposium on Quality Electronic Design. 263–269.

[17]

Marcelo Brandalero, Antonio Carlos S. Beck, Luigi Carro, and Muhammad Shafique. 2018. Approximate on-the-fly coarse-grained reconfigurable acceleration for general-purpose applications. In 2018 55th ACM/ESDA/IEEE Design Automation Conference (DAC). IEEE, 1–6.

Digital Library

[18]

Vincent Camus, Jeremy Schlachter, Christian Enz, Michael Gautschi, and Frank K. Gurkaynak. 2016. Approximate 32-bit floating-point unit design with 53% power-area product reduction. In ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference. 465–468.

[19]

Anantha P. Chandrakasan and Robert W. Brodersen. 1995. Minimizing power consumption in digital CMOS circuits. Proc. IEEE 83, 4 (1995), 498–523.

[20]

Chip-Hong Chang, Jiangmin Gu, and Mingyan Zhang. 2004. Ultra low-voltage low-power CMOS 4-2 and 5-2 compressors for fast arithmetic circuits. IEEE Transactions on Circuits and Systems I: Regular Papers 51, 10 (2004), 1985–1997.

[21]

Chuangtao Chen, Weikang Qian, Mohsen Imani, Xunzhao Yin, and Cheng Zhuo. 2021. PAM: A piecewise-linearly-approximated floating-point multiplier with unbiasedness and configurability. IEEE Trans. Comput. 71, 10 (2021), 2473–2486.

Digital Library

[22]

Chuangtao Chen, Sen Yang, Weikang Qian, Mohsen Imani, Xunzhao Yin, and Cheng Zhuo. 2020. Optimally approximated and unbiased floating-point multiplier with runtime configurability. In Proceedings of the 39th International Conference on Computer-Aided Design. 1–9.

Digital Library

[23]

Jienan Chen and Jianhao Hu. 2012. Energy-efficient digital signal processing via voltage-overscaling-based residue number system. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 21, 7 (2012), 1322–1332.

Digital Library

[24]

Yuan-Ho Chen and Tsin-Yuan Chang. 2011. A high-accuracy adaptive conditional-probability estimator for fixed-width Booth multipliers. IEEE Transactions on Circuits and Systems I: Regular Papers 59, 3 (2011), 594–603.

[25]

Kyung-Ju Cho, Kwang-Chul Lee, Jin-Gyun Chung, and Keshab K. Parhi. 2004. Design of low-error fixed-width modified Booth multiplier. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 12, 5 (2004), 522–531.

Digital Library

[26]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2014. Training deep neural networks with low precision multiplications. arXiv preprint arXiv:1412.7024 (2014).

[27]

Luigi Dadda. 1965. Some schemes for parallel multipliers. Alta Frequenza 34 (1965), 349–356.

[28]

Edwin de Angel and E. E. Swartzlander. 1996. Low power para llel multipliers. In VLSI Signal Processing, Ix. IEEE, 199–208.

[29]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, 248–255.

[30]

Jianing Deng, Zhiguo Shi, and Cheng Zhuo. 2019. Energy-efficient real-time UAV object detection on embedded platforms. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2019), 3123–3127.

[31]

Li Deng. 2012. The MNIST database of handwritten digit images for machine learning research. IEEE Signal Processing Magazine 29, 6 (2012), 141–142.

[32]

Zidong Du, Krishna Palem, Avinash Lingamneni, Olivier Temam, Yunji Chen, and Chengyong Wu. 2014. Leveraging the error resilience of machine-learning applications for designing highly energy efficient accelerators. In 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 201–206.

[33]

Darjn Esposito, Antonio Giuseppe Maria Strollo, Ettore Napoli, Davide De Caro, and Nicola Petra. 2018. Approximate multipliers based on new approximate compressors. IEEE Transactions on Circuits and Systems I: Regular Papers 65, 12 (2018), 4169–4182.

[34]

Farzad Farshchi, Muhammad S. Abrishami, and Sied M. Fakhraie. 2013. New approximate multiplier for low power digital signal processing. In The 17th CSI International Symposium on Computer Architecture & Digital Systems (CADS 2013). IEEE, 25–30.

[35]

Christopher Fritz and Adly T. Fam. 2017. Fast binary counters based on symmetric stacking. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 10 (2017), 2971–2975.

Digital Library

[36]

Matt W. Gardner and S. R. Dorling. 1998. Artificial neural networks (the multilayer perceptron)–a review of applications in the atmospheric sciences. Atmospheric Environment 32, 14-15 (1998), 2627–2636.

[37]

Gokul Govindu, Ling Zhuo, Seonil Choi, and Viktor Prasanna. 2004. Analysis of high-performance floating-point arithmetic on FPGAs. In 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings.149.

[38]

Chuliang Guo, Li Zhang, Xian Zhou, Weikang Qian, and Cheng Zhuo. 2020. A reconfigurable approximate multiplier for quantized CNN applications. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 235–240.

Digital Library

[39]

Yi Guo, Heming Sun, and Shinji Kimura. 2020. Small-area and low-power FPGA-based multipliers using approximate elementary modules. In 2020 25th Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 599–604.

Digital Library

[40]

Vaibhav Gupta, Debabrata Mohapatra, Sang Phill Park, Anand Raghunathan, and Kaushik Roy. 2011. IMPACT: IMPrecise adders for low-power approximate computing. In IEEE/ACM International Symposium on Low Power Electronics and Design. IEEE, 409–414.

[41]

Minho Ha and Sunggu Lee. 2017. Multipliers with approximate 4–2 compressors and error recovery modules. IEEE Embedded Systems Letters 10, 1 (2017), 6–9.

[42]

Winston Haaswijk, Mathias Soeken, Alan Mishchenko, and Giovanni De Micheli. 2020. SAT-based exact synthesis: Encodings, topology families, and parallelism. IEEE TCAD 39, 4 (2020), 871–884.

[43]

Issam Hammad and Kamal El-Sankary. 2018. Impact of approximate multipliers on VGG deep learning network. IEEE Access 6 (2018), 60438–60444.

[44]

Issam Hammad, Kamal El-Sankary, and Jason Gu. 2019. Deep learning training with simulated approximate multipliers. In 2019 IEEE International Conference on Robotics and Biomimetics (ROBIO). IEEE, 47–51.

Digital Library

[45]

Issam Hammad, Ling Li, Kamal El-Sankary, and W. Martin Snelgrove. 2021. CNN inference using a preprocessing precision controller and approximate multipliers with various precisions. IEEE Access 9 (2021), 7220–7232.

[46]

Jie Han and Michael Orshansky. 2013. Approximate computing: An emerging paradigm for energy-efficient design. In 2013 18th IEEE European Test Symposium (ETS). 1–6.

[47]

Soheil Hashemi, R. Iris Bahar, and Sherief Reda. 2015. DRUM: A dynamic range unbiased multiplier for approximate applications. In 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 418–425.

Digital Library

[48]

Ku He, Andreas Gerstlauer, and Michael Orshansky. 2013. Circuit-level timing-error acceptance for design of energy-efficient DCT/IDCT-based systems. Circuits and Systems for Video Technology, IEEE Transactions on 23 (062013), 961–974.

Digital Library

[49]

Radek Hrbacek, Vojtech Mrazek, and Zdenek Vasicek. 2016. Automatic design of approximate circuits by means of multi-objective evolutionary algorithms. In 2016 International Conference on Design and Technology of Integrated Systems in Nanoscale Era (DTIS). IEEE, 1–6.

[50]

Forrest N. Iandola, Song Han, Matthew W. Moskewicz, Khalid Ashraf, William J. Dally, and Kurt Keutzer. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and< 0.5 MB model size. arXiv preprint arXiv:1602.07360 (2016).

[51]

Mohsen Imani, Ricardo Garcia, Saransh Gupta, and Tajana Rosing. 2018. RMAC: Runtime configurable floating point multiplier for approximate computing. In Proceedings of the International Symposium on Low Power Electronics and Design. 1–6.

Digital Library

[52]

Mohsen Imani, Yeseong Kim, Abbas Rahimi, and Tajana Rosing. 2016. ACAM: Approximate computing based on adaptive associative memory with online learning. In Proceedings of the 2016 International Symposium on Low Power Electronics and Design (San Francisco, CA, USA) (ISLPED’16). Association for Computing Machinery, New York, NY, USA, 162–167.

Digital Library

[53]

Mohsen Imani, Shruti Patil, and Tajana S. Rosing. 2016. MASC: Ultra-low energy multiple-access single-charge TCAM for approximate computing. In Proceedings of the 2016 Conference on Design, Automation & Test in Europe (Dresden, Germany) (DATE’16). EDA Consortium, San Jose, CA, USA, 373—378.

[54]

Mohsen Imani, Daniel Peroni, and Tajana Rosing. 2017. CFPU: Configurable floating point multiplier for energy-efficient computing. In 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1–6.

Digital Library

[55]

Mohsen Imani, Abbas Rahimi, and Tajana S. Rosing. 2016. Resistive configurable associative memory for approximate computing. In Proceedings of the 2016 Conference on Design, Automation & Test in Europe (Dresden, Germany) (DATE’16). EDA Consortium, San Jose, CA, USA, 1327—1332.

[56]

Mohsen Imani, Mohammad Samragh, Yeseong Kim, Saransh Gupta, Farinaz Koushanfar, and Tajana Rosing. 2018. RAPIDNN: In-memory deep neural network acceleration framework. arXiv preprint arXiv:1806.05794 (2018). http://arxiv.org/abs/1806.05794

[57]

Mohsen Imani, Alice Sokolova, Ricardo Garcia, Andrew Huang, Fan Wu, Baris Aksanli, and Tajana Rosing. 2019. ApproxLP: Approximate multiplication with linearization and iterative error control. In Proceedings of the 56th Annual Design Automation Conference 2019. 1–6.

Digital Library

[58]

Mohsen Imani, Xunzhao Yin, John Messerly, Saransh Gupta, Michael Niemier, Xiaobo Sharon Hu, and Tajana Rosing. 2019. SearcHD: A memory-centric hyperdimensional computing with stochastic training. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 10 (2019), 2422–2433.

[59]

Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 2704–2713.

[60]

Honglan Jiang, Jie Han, Fei Qiao, and Fabrizio Lombardi. 2015. Approximate radix-8 Booth multipliers for low-power and high-performance operation. IEEE Trans. Comput. 65, 8 (2015), 2638–2644.

Digital Library

[61]

Honglan Jiang, Cong Liu, Fabrizio Lombardi, and Jie Han. 2018. Low-power approximate unsigned multipliers with configurable error recovery. IEEE Transactions on Circuits and Systems I: Regular Papers 66, 1 (2018), 189–202.

[62]

Honglan Jiang, Francisco Javier Hernandez Santiago, Hai Mo, Leibo Liu, and Jie Han. 2020. Approximate arithmetic circuits: A survey, characterization, and recent applications. Proc. IEEE 108, 12 (2020), 2108–2135.

[63]

Shyh-Jye Jon and Hui-Hsuan Wang. 2000. Fixed-width multiplier for DSP application. In Proceedings 2000 International Conference on Computer Design. IEEE, 318–322.

[64]

Jer Min Jou, Shiann Rong Kuang, and Ren Der Chen. 1999. Design of low-error fixed-width multipliers for DSP applications. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 46, 6 (1999), 836–842.

[65]

HyunJin Kim, Min Soo Kim, Alberto A. Del Barrio, and Nader Bagherzadeh. 2019. A cost-efficient iterative truncated logarithmic multiplication for convolutional neural networks. In 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH). IEEE, 108–111.

[66]

Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. University of Toronto, Toronto, ON, Canada. Retrieved from http://www.cs.utoronto.ca/kriz/learning-features-2009-TR.pdf

[67]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (2017), 84–90.

Digital Library

[68]

Parag Kulkarni, Puneet Gupta, and Milos Ercegovac. 2011. Trading accuracy for power with an underdesigned multiplier architecture. In 2011 24th International Conference on VLSI Design. IEEE, 346–351.

Digital Library

[69]

Mark S. K. Lau, Keck-Voon Ling, and Yun-Chung Chu. 2009. Energy-aware probabilistic multiplier: Design and analysis. In Proceedings of the 2009 International Conference on Compilers, Architecture, and Synthesis for Embedded Systems. 281–290.

Digital Library

[70]

Vasileios Leon, Konstantinos Asimakopoulos, Sotirios Xydis, Dimitrios Soudris, and Kiamal Pekmestzi. 2019. Cooperative arithmetic-aware approximation techniques for energy-efficient multipliers. In Proceedings of the 56th Annual Design Automation Conference 2019. 1–6.

Digital Library

[71]

Yong Ching Lim. 1992. Single-precision multiplier with reduced circuit complexity for signal processing applications. IEEE Transactions on Computers 41, 10 (1992), 1333–1336.

Digital Library

[72]

Chia-Hao Lin and Chao Lin. 2013. High accuracy approximate multiplier with error correction. In International Conference on Computer Design. 33–38.

[73]

Hsin-Lei Lin, Robert C. Chang, and Ming-Tsai Chan. 2004. Design of a novel radix-4 Booth multiplier. In The 2004 IEEE Asia-Pacific Conference on Circuits and Systems, Vol. 2. Citeseer, 837–840.

[74]

Avinash Lingamneni, Arindam Basu, Christian Enz, Krishna V. Palem, and Christian Piguet. 2013. Improving energy gains of inexact DSP hardware through reciprocative error compensation. In 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1–8.

Digital Library

[75]

Avinash Lingamneni, Christian Enz, Jean-Luc Nagel, Krishna Palem, and Christian Piguet. 2011. Energy parsimonious circuit design through probabilistic pruning. In 2011 Design, Automation & Test in Europe. IEEE, 1–6.

[76]

Cong Liu, Jie Han, and Fabrizio Lombardi. 2014. A low-power, high-performance approximate multiplier with configurable partial error recovery. In 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 1–4.

[77]

Weiqiang Liu, Fabrizio Lombardi, and Michael Shulte. 2020. A retrospective and prospective view of approximate computing. Proc. IEEE 108, 3 (2020), 394–399.

[78]

Weiqiang Liu, Liangyu Qian, Chenghua Wang, Honglan Jiang, Jie Han, and Fabrizio Lombardi. 2017. Design of approximate radix-4 Booth multipliers for error-tolerant computing. IEEE Trans. Comput. 66, 8 (2017), 1435–1441.

Digital Library

[79]

Weiqiang Liu, Jiahua Xu, Danye Wang, and Fabrizio Lombardi. 2017. Design of approximate logarithmic multipliers. In Proceedings of the on Great Lakes Symposium on VLSI 2017. 47–52.

Digital Library

[80]

Weiqiang Liu, Jiahua Xu, Danye Wang, Chenghua Wang, Paolo Montuschi, and Fabrizio Lombardi. 2018. Design and evaluation of approximate logarithmic multipliers for low power error-tolerant applications. IEEE Transactions on Circuits and Systems I: Regular Papers 65, 9 (2018), 2856–2868.

[81]

Yang Liu, Tong Zhang, and Keshab K. Parhi. 2009. Computation error analysis in digital signal processing systems with overscaled supply voltage. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 18, 4 (2009), 517–526.

Digital Library

[82]

Hamid Reza Mahdiani, Ali Ahmadi, Sied Mehdi Fakhraie, and Caro Lucas. 2009. Bio-inspired imprecise computational blocks for efficient VLSI implementation of soft-computing applications. IEEE Transactions on Circuits and Systems I: Regular Papers 57, 4 (2009), 850–862.

Digital Library

[83]

R. Marimuthu, Y. Elsie Rezinold, and Partha Sharathi Mallick. 2016. Design and analysis of multiplier using approximate 15-4 compressor. IEEE Access 5 (2016), 1027–1036.

[84]

Julian Francis Miller and Simon L. Harding. 2008. Cartesian genetic programming. In Proceedings of the 10th Annual Conference Companion on Genetic and Evolutionary Computation. 2701–2726.

Digital Library

[85]

John N. Mitchell. 1962. Computer multiplication and division using binary logarithms. IRE Transactions on Electronic Computers EC-11, 4 (1962), 512–517.

[86]

Debabrata Mohapatra, Vinay K. Chippa, Anand Raghunathan, and Kaushik Roy. 2011. Design of voltage-scalable meta-functions for approximate computing. In 2011 Design, Automation & Test in Europe. IEEE, 1–6.

[87]

Amir Momeni, Jie Han, Paolo Montuschi, and Fabrizio Lombardi. 2014. Design and analysis of approximate compressors for multiplication. IEEE Trans. Comput. 64, 4 (2014), 984–994.

Digital Library

[88]

Vojtech Mrazek, Radek Hrbacek, Zdenek Vasicek, and Lukas Sekanina. 2017. EvoApprox8B: Library of approximate adders and multipliers for circuit design and benchmarking of approximation methods. In Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017. IEEE, 258–261.

[89]

Vojtech Mrazek, Syed Shakib Sarwar, Lukas Sekanina, Zdenek Vasicek, and Kaushik Roy. 2016. Design of power-efficient approximate multipliers for approximate artificial neural networks. In Proceedings of the 35th International Conference on Computer-Aided Design. 1–7.

Digital Library

[90]

Srinivasan Narayanamoorthy, Hadi Asghari Moghaddam, Zhenhong Liu, Taejoon Park, and Nam Sung Kim. 2014. Energy-efficient approximate multiplication for digital signal processing and classification applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 23, 6 (2014), 1180–1184.

Digital Library

[91]

Kai Ni, Xunzhao Yin, Ann Franchesca Laguna, Siddharth Joshi, Stefan Dünkel, Martin Trentzsch, Johannes Müller, Sven Beyer, Michael Niemier, Xiaobo Sharon Hu, and Suman Datta. 2019. Ferroelectric ternary content-addressable memory for one-shot learning. Nature Electronics 2, 11 (2019), 521–529.

[92]

S. Pabithra and S. Nageswari. 2018. Analysis of approximate multiplier using 15–4 compressor for error tolerant application. In 2018 International Conference on Control, Power, Communication and Computing Technologies (ICCPCCT). IEEE, 410–415.

[93]

Abdoreza Pishvaie, Ghassem Jaberipur, and Ali Jahanian. 2012. Improved CMOS (4; 2) compressor designs for parallel multipliers. Computers & Electrical Engineering 38, 6 (2012), 1703–1716.

Digital Library

[94]

Bharath Srinivas Prabakaran, Vojtech Mrazek, Zdenek Vasicek, Lukas Sekanina, and Muhammad Shafique. 2020. ApproxFPGAs: Embracing ASIC-based approximate arithmetic components for FPGA-based systems. In 2020 57th ACM/IEEE Design Automation Conference (DAC). IEEE, 1–6.

[95]

Liangyu Qian, Chenghua Wang, Weiqiang Liu, Fabrizio Lombardi, and Jie Han. 2016. Design and evaluation of an approximate Wallace-Booth multiplier. In 2016 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1974–1977.

Digital Library

[96]

Semeen Rehman, Walaa El-Harouni, Muhammad Shafique, Akash Kumar, and Jörg Henkel. 2016. Architectural-space exploration of approximate multipliers. In 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD). IEEE, 1–8.

Digital Library

[97]

Hassaan Saadat, Haseeb Bokhari, and Sri Parameswaran. 2018. Minimally biased multipliers for approximate integer and floating-point multiplication. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 37, 11 (2018), 2623–2635.

[98]

Syed Shakib Sarwar, Swagath Venkataramani, Anand Raghunathan, and Kaushik Roy. 2016. Multiplier-less artificial neurons exploiting error resiliency for energy-efficient neural computing. In 2016 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 145–150.

Digital Library

[99]

Jeremy Schlachter, Vincent Camus, Christian Enz, and Krishna V. Palem. 2015. Automatic generation of inexact digital circuits by gate-level pruning. In 2015 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 173–176.

[100]

Michael J. Schulte and Earl E. Swartzlander. 1993. Truncated multiplication with correction constant [for DSP]. In Proceedings of IEEE Workshop on VLSI Signal Processing. IEEE, 388–396.

[101]

Abu Sebastian, Manuel Le Gallo, Riduan Khaddam-Aljameh, and Evangelos Eleftheriou. 2020. Memory devices and applications for in-memory computing. Nature Nanotechnology 15, 7 (2020), 529–544.

[102]

Lukas Sekanina and Zdenek Vasicek. 2013. Approximate circuit design by means of evolvable hardware. In 2013 IEEE International Conference on Evolvable Systems (ICES). IEEE, 21–28.

[103]

Muhammad Shafique, Rehan Hafiz, Semeen Rehman, Walaa El-Harouni, and Jörg Henkel. 2016. Invited: Cross-layer approximate computing: From logic to architectures. In 2016 53rd ACM/EDAC/IEEE Design Automation Conference (DAC). 1–6.

Digital Library

[104]

Farhana Sharmin Snigdha, Deepashree Sengupta, Jiang Hu, and Sachin S. Sapatnekar. 2016. Optimal design of JPEG hardware under the approximate computing paradigm. In 2016 53rd ACM/EDAC/IEEE Design Automation Conference (DAC). IEEE, 1–6.

Digital Library

[105]

Wilson Snyder. 2003-2022. Verilator. https://github.com/verilator/verilator

[106]

Min-An Song, Lan-Da Van, and Sy-Yen Kuo. 2007. Adaptive low-error fixed-width Booth multipliers. IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences 90, 6 (2007), 1180–1187.

Digital Library

[107]

Antonio G. M. Strollo, Ettore Napoli, Davide De Caro, Nicola Petra, and Gennaro Di Meo. 2020. Comparison and extension of approximate 4-2 compressors for low-power approximate multipliers. IEEE Transactions on Circuits and Systems I: Regular Papers 67, 9 (2020), 3021–3034.

[108]

Alexander Suhre, Furkan Keskin, Tulin Ersahin, Rengul Cetin-Atalay, Rashid Ansari, and A. Enis Cetin. 2013. A multiplication-free framework for signal processing and applications in biomedical image analysis. In 2013 IEEE International Conference on Acoustics, Speech and Signal Processing. 1123–1127.

[109]

Synopsys. 2022. Design Compiler. https://www.synopsys.com/

[110]

Synopsys. 2022. DesignWare. https://www.synopsys.com/designware-ip.html

[111]

Che-Wei Tung and Shih-Hsu Huang. 2019. Low-power high-accuracy approximate multiplier using approximate high-order compressors. In 2019 2nd International Conference on Communication Engineering and Technology (ICCET). IEEE, 163–167.

[112]

Salim Ullah, Sanjeev Sripadraj Murthy, and Akash Kumar. 2018. SMApproxLib: Library of FPGA-based approximate multipliers. In Proceedings of the 55th Annual Design Automation Conference. 1–6.

Digital Library

[113]

Salim Ullah, Semeen Rehman, Bharath Srinivas Prabakaran, Florian Kriebel, Muhammad Abdullah Hanif, Muhammad Shafique, and Akash Kumar. 2018. Area-optimized low-latency approximate multipliers for FPGA-based hardware accelerators. In Proceedings of the 55th Annual Design Automation Conference. 1–6.

Digital Library

[114]

Salim Ullah, Semeen Rehman, Muhammad Shafique, and Akash Kumar. 2021. High-performance accurate and approximate multipliers for FPGA-based hardware accelerators. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 41, 2 (2021), 211–224.

[115]

Salim Ullah, Siva Satyendra Sahoo, Nemath Ahmed, Debabrata Chaudhury, and Akash Kumar. 2022. AppAxO: Designing application-specific approximate operators for FPGA-based embedded systems. ACM Transactions on Embedded Computing Systems (TECS) 21, 3 (2022), 1–31.

Digital Library

[116]

UMC. 2022. UMC40. https://www.umc.com

[117]

Shaghayegh Vahdat, Mehdi Kamal, Ali Afzali-Kusha, and Massoud Pedram. 2017. LETAM: A low energy truncation-based approximate multiplier. Computers & Electrical Engineering 63 (2017), 1–17.

Digital Library

[118]

Shaghayegh Vahdat, Mehdi Kamal, Ali Afzali-Kusha, and Massoud Pedram. 2019. TOSAM: An energy-efficient truncation-and rounding-based scalable approximate multiplier. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 27, 5 (2019), 1161–1173.

Digital Library

[119]

Shaghayegh Vahdat, Mehdi Kamal, Ali Afzali-Kusha, Massoud Pedram, and Zainalabedin Navabi. 2017. TruncApp: A truncation-based approximate divider for energy efficient DSP applications. In Design, Automation Test in Europe Conference Exhibition (DATE), 2017. 1635–1638.

[120]

Lan-Da Van, Shuenn-Shyang Wang, and Wu-Shiung Feng. 2000. Design of the lower error fixed-width multiplier and its application. IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing 47, 10 (2000), 1112–1118.

[121]

L.-D. Van and Chih-Chyau Yang. 2005. Generalized low-error area-efficient fixed-width multipliers. IEEE Transactions on Circuits and Systems I: Regular Papers 52, 8 (2005), 1608–1619.

[122]

Nguyen Van Toan and Jeong-Gun Lee. 2020. FPGA-based multi-level approximate multipliers for high-performance error-resilient applications. IEEE Access 8 (2020), 25481–25497.

[123]

Zdenek Vasicek and Lukas Sekanina. 2014. Evolutionary design of approximate multipliers under different error metrics. In 17th International Symposium on Design and Diagnostics of Electronic Circuits & Systems. IEEE, 135–140.

[124]

Sreehari Veeramachaneni, Kirthi M. Krishna, Lingamneni Avinash, Sreekanth Reddy Puppala, and MB Srinivas. 2007. Novel architectures for high-speed and low-power 3-2, 4-2 and 5-2 compressors. In 20th International Conference on VLSI Design held jointly with 6th International Conference on Embedded Systems (VLSID’07). 324–329.

Digital Library

[125]

Suganthi Venkatachalam and Seok-Bum Ko. 2017. Design of power and area efficient approximate multipliers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 5 (2017), 1782–1786.

Digital Library

[126]

Swagath Venkataramani, Vinay K. Chippa, Srimat T. Chakradhar, Kaushik Roy, and Anand Raghunathan. 2013. Quality programmable vector processors for approximate computing. In Proceedings of the 46th Annual IEEE/ACM International Symposium on Microarchitecture (Davis, California) (MICRO-46). Association for Computing Machinery, New York, NY, USA, 1—12.

Digital Library

[127]

Christopher S. Wallace. 1964. A suggestion for a fast multiplier. IEEE Transactions on Electronic Computers EC-13, 1 (1964), 14–17.

[128]

Jiun-Ping Wang, Shiann-Rong Kuang, and Shish-Chang Liang. 2009. High-accuracy fixed-width modified Booth multipliers for lossy applications. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 19, 1 (2009), 52–60.

Digital Library

[129]

Manzhen Wang, Yuanyong Luo, Mengyu An, Yuou Qiu, Muhan Zheng, Zhongfeng Wang, and Hongbing Pan. 2020. An optimized compression strategy for compressor-based approximate multiplier. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS).

[130]

Xuan Wang and Weikang Qian. 2022. MinAC: Minimal-area approximate compressor design based on exact synthesis for approximate multipliers. In 2022 IEEE International Symposium on Circuits and Systems (ISCAS), to be published.

[131]

Haroon Waris, Chenghua Wang, Weiqiang Liu, and Fabrizio Lombardi. 2021. AxBMs: Approximate radix-8 booth multipliers for high-performance FPGA-based accelerators. IEEE Transactions on Circuits and Systems II: Express Briefs 68, 5 (2021), 1566–1570.

[132]

Ying Wu and Cheng Zhuo. 2022. Verilog Implementation of Approximate Multipliers. https://github.com/skycrapers/AM-Lib

[133]

Weihua Xiao, Cheng Zhuo, and Weikang Qian. 2022. OPACT: Optimization of approximate compressor tree for approximate multiplier. In 2022 Design, Automation, and Test in Europe Conference (DATE).

[134]

Tongxin Yang, Tomoaki Ukezono, and Toshinori Sato. 2017. Low-power and high-speed approximate multiplier design with a tree compressor. In 2017 IEEE International Conference on Computer Design (ICCD). IEEE, 89–96.

[135]

Tongxin Yang, Tomoaki Ukezono, and Toshinori Sato. 2018. A low-power high-speed accuracy-controllable approximate multiplier design. In 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC). IEEE, 605–610.

Digital Library

[136]

Zhixi Yang, Jie Han, and Fabrizio Lombardi. 2015. Approximate compressors for error-resilient multiplier design. In 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS). IEEE, 183–186.

[137]

Peipei Yin, Chenghua Wang, Weiqiang Liu, Earl E. Swartzlander, and Fabrizio Lombardi. 2018. Designs of approximate floating-point multipliers with variable accuracy for error-tolerant applications. Journal of Signal Processing Systems 90, 4 (2018), 641–654.

Digital Library

[138]

Peipei Yin, Chenghua Wang, Haroon Waris, Weiqiang Liu, Yinhe Han, and Fabrizio Lombardi. 2020. Design and analysis of energy-efficient dynamic range approximate logarithmic multipliers for machine learning. IEEE Transactions on Sustainable Computing 6, 4 (2020), 612–625.

[139]

Byoung-Joo Yoo, Dong-Hyuk Lim, Hyonguk Pang, June-Hee Lee, Seung-Yeob Baek, Naxin Kim, Dong-Ho Choi, Young-Ho Choi, Hyeyeon Yang, Taehun Yoon, Sang-Hyeok Chu, Kangjik Kim, Woochul Jung, Bong-Kyu Kim, Jaechol Lee, Gunil Kang, Sang-Hune Park, Michael Choi, and Jongshin Shin. 2020. 6.4 A 56Gb/s 7.7 mW/Gb/s PAM-4 wireline transceiver in 10nm FinFET using MM-CDR-Based ADC timing skew control and low-power DSP with approximate multiplier. In 2020 IEEE International Solid-State Circuits Conference-(ISSCC). IEEE, 122–124.

[140]

Robert K. Yu and Gregory B. Zyner. 1995. 167 MHz radix-4 floating point multiplier. In Proceedings of the 12th Symposium on Computer Arithmetic. 149–154.

[141]

Reza Zendegani, Mehdi Kamal, Milad Bahadori, Ali Afzali-Kusha, and Massoud Pedram. 2016. RoBA multiplier: A rounding-based approximate multiplier for high-speed yet energy-efficient digital signal processing. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 25, 2 (2016), 393–401.

Digital Library

[142]

Qian Zhang, Ting Wang, Ye Tian, Feng Yuan, and Qiang Xu. 2015. ApproxANN: An approximate computing framework for artificial neural network. In 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE). IEEE, 701–706.

[143]

Xian Zhou, Li Zhang, Chuliang Guo, Xunzhao Yin, and Cheng Zhuo. 2020. A convolutional neural network accelerator architecture with fine-granular mixed precision configurability. In 2020 IEEE International Symposium on Circuits and Systems (ISCAS). IEEE, 1–5.

[144]

Cheng Zhuo, Kassan Unda, Yiyu Shi, and Wei-Kai Shih. 2018. From layout to system: Early stage power delivery and architecture co-exploration. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 38, 7 (2018), 1291–1304.

Cited By

Wen CDu HChen ZZhang LSun QZhuo C(2024)PACE: A Piece-Wise Approximate and Configurable Floating - Point Divider for Energy - Efficient Computing2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546711(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546711
Vakili SVaziri MZarei ALanglois J(2024)DyRecMul: Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic ReconfigurationACM Transactions on Reconfigurable Technology and Systems10.1145/3663480Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1145/3663480
Yu TWu BChen KYan CLiu W(2024)Toward Efficient Retraining: A Large-Scale Approximate Neural Network Framework With Cross-Layer OptimizationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.338690032:6(1004-1017)Online publication date: Jun-2024
https://doi.org/10.1109/TVLSI.2024.3386900
Show More Cited By

Index Terms

A Survey on Approximate Multiplier Designs for Energy Efficiency: From Algorithms to Circuits
1. General and reference
  1. Document types
    1. Surveys and overviews
2. Hardware
  1. Integrated circuits
    1. Logic circuits
      1. Arithmetic and datapath circuits
  2. Robustness
    1. Fault tolerance
      1. System-level fault tolerance

Recommendations

A Review, Classification, and Comparative Evaluation of Approximate Arithmetic Circuits

Often as the most important arithmetic modules in a processor, adders, multipliers, and dividers determine the performance and energy efficiency of many computing tasks. The demand of higher speed and power efficiency, as well as the feature of error ...
Implementation of energy-efficient approximate multiplier with guaranteed worst case relative error
Abstract
Existing design methods for approximate multipliers typically rely on exhaustive simulation to determine the approximation error. However, this approach is not tractable for complex designs. In this paper, a two-dimensional piecewise ...
Invited - Cross-layer approximate computing: from logic to architectures
DAC '16: Proceedings of the 53rd Annual Design Automation Conference

We present a survey of approximate techniques and discuss concepts for building power-/energy-efficient computing components reaching from approximate accelerators to arithmetic blocks (like adders and multipliers). We provide a systematical ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 29, Issue 1

January 2024

521 pages

EISSN:1557-7309

DOI:10.1145/3613510

Editor:
X. Sharon Hu
University of Notre Dame, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 13 January 2024

Online AM: 24 July 2023

Accepted: 20 June 2023

Revised: 11 May 2023

Received: 08 October 2022

Published in TODAES Volume 29, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey

Funding Sources

National Key R&D Program of China
National Natural Science Foundation of China
SGC Cooperation Project

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

11
Total Citations
View Citations
928
Total Downloads

Downloads (Last 12 months)882
Downloads (Last 6 weeks)125

Reflects downloads up to 12 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Wen CDu HChen ZZhang LSun QZhuo C(2024)PACE: A Piece-Wise Approximate and Configurable Floating - Point Divider for Energy - Efficient Computing2024 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE58400.2024.10546711(1-6)Online publication date: 25-Mar-2024
https://doi.org/10.23919/DATE58400.2024.10546711
Vakili SVaziri MZarei ALanglois J(2024)DyRecMul: Fast and Low-Cost Approximate Multiplier for FPGAs using Dynamic ReconfigurationACM Transactions on Reconfigurable Technology and Systems10.1145/3663480Online publication date: 1-May-2024
https://dl.acm.org/doi/10.1145/3663480
Yu TWu BChen KYan CLiu W(2024)Toward Efficient Retraining: A Large-Scale Approximate Neural Network Framework With Cross-Layer OptimizationIEEE Transactions on Very Large Scale Integration (VLSI) Systems10.1109/TVLSI.2024.338690032:6(1004-1017)Online publication date: Jun-2024
https://doi.org/10.1109/TVLSI.2024.3386900
Zhou YYan JZhou YShao ZChen J(2024)Stochastic-Binary Hybrid Spatial Coding Multiplier for Convolutional Neural Network AcceleratorIEEE Transactions on Nanotechnology10.1109/TNANO.2024.344427823(600-605)Online publication date: 2024
https://doi.org/10.1109/TNANO.2024.3444278
Jha CHassan MDrechsler R(2024) cecApprox: Enabling Automated Combinational Equivalence Checking for Approximate Circuits IEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.338825671:7(3282-3293)Online publication date: Jul-2024
https://doi.org/10.1109/TCSI.2024.3388256
Napoli EStrollo AZacharelos EDi Meo G(2024)Comprehensive Analysis of Input Order Invariant Approximate 4-2 Compressors for Binary Multipliers2024 IEEE International Symposium on Circuits and Systems (ISCAS)10.1109/ISCAS58744.2024.10558503(1-5)Online publication date: 19-May-2024
https://doi.org/10.1109/ISCAS58744.2024.10558503
Vakili S(2024)A Cost-Effective Baugh-Wooley Approximate Multiplier for FPGA-based Machine Learning Computing2024 IEEE 6th International Conference on AI Circuits and Systems (AICAS)10.1109/AICAS59952.2024.10595892(367-371)Online publication date: 22-Apr-2024
https://doi.org/10.1109/AICAS59952.2024.10595892
Rehman AVakili S(2023)A Cost-Effective FPGA-Based Approximate Multiplier for Machine Learning Acceleration2023 IEEE 14th International Symposium on Parallel Architectures, Algorithms and Programming (PAAP)10.1109/PAAP60200.2023.10391619(1-6)Online publication date: 24-Nov-2023
https://doi.org/10.1109/PAAP60200.2023.10391619
Damsgaard HOmetov ANurmi J(2023)Verification of Approximate Hardware Designs with ChiselVerify2023 IEEE Nordic Circuits and Systems Conference (NorCAS)10.1109/NorCAS58970.2023.10305474(1-7)Online publication date: 31-Oct-2023
https://doi.org/10.1109/NorCAS58970.2023.10305474
Choudhary PBhargava LSuhag A(2023)Designing of Energy-Efficient Approximate Multiplier Circuit for Processing Unit of IoT DevicesSN Computer Science10.1007/s42979-023-01864-44:5Online publication date: 30-Jun-2023
https://dl.acm.org/doi/10.1007/s42979-023-01864-4
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents