research-article

Double-Shift: A Low-Power DNN Weights Storage and Access Framework based on Approximate Decomposition and Quantization

Authors:

Gang QuAuthors Info & Claims

ACM Transactions on Design Automation of Electronic Systems (TODAES), Volume 27, Issue 2

Article No.: 15, Pages 1 - 16

https://doi.org/10.1145/3477047

Published: 02 November 2021 Publication History

Abstract

One major challenge in deploying Deep Neural Network (DNN) in resource-constrained applications, such as edge nodes, mobile embedded systems, and IoT devices, is its high energy cost. The emerging approximate computing methodology can effectively reduce the energy consumption during the computing process in DNN. However, a recent study shows that the weight storage and access operations can dominate DNN's energy consumption due to the fact that the huge size of DNN weights must be stored in the high-energy-cost DRAM. In this paper, we propose Double-Shift, a low-power DNN weight storage and access framework, to solve this problem. Enabled by approximate decomposition and quantization, Double-Shift can reduce the data size of the weights effectively. By designing a novel weight storage allocation strategy, Double-Shift can boost the energy efficiency by trading the energy consuming weight storage and access operations for low-energy-cost computations. Our experimental results show that Double-Shift can reduce DNN weights to 3.96%–6.38% of the original size and achieve an energy saving of 86.47%–93.62%, while introducing a DNN classification error within 2%.

References

[1]

Ali Heydari Gorji, Mahdi Torabzadehkashi, Siavash Rezaei, Hossein Bobarshad, Vladimir Castro Alves, and Pai H. Chou. 2020. Stannis: Low-power acceleration of DNN training using computational storage devices. In Proceedings of 57th ACM/IEEE Design Automation Conference (DAC 2020) July 20–24, San Francisco, CA, USA. IEEE 2020, 1–6. https://doi.org/10.1109/DAC18072.2020.9218687

Digital Library

[2]

Bo Liu, Shisheng Guo, Hai Qin, Yu Gong, Jinjiang Yang, Wei Ge, and Jun Yang. 2018. An energy-efficient reconfigurable hybrid DNN architecture for speech recognition with approximate computing. In Proceedings of 23rd IEEE International Conference on Digital Signal Processing (DSP 2018), November 19–21, Shanghai, China. IEEE 2018, 1–5. https://doi.org/10.1109/ICDSP.2018.8631826

[3]

Yang Zhao, Xiaohan Chen, Yue Wang, Chaojian Li, Haoran You, Yonggan Fu, Yuan Xie, Zhangyang Wang, and Yingyan Lin. 2020. SmartExchange: Trading higher-cost memory storage/access for lower-cost computation. In Proceedings of the 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA 2020), May 30 – June 3, Valencia, Spain. IEEE 2020, 954–967. https://doi.org/10.1109/ISCA45697.2020.00082

Digital Library

[4]

Jongmin Lee and Soontae Kim. 2016. Write buffer-oriented energy reduction in the L1 data cache for embedded systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 3 (2016), 871–883. DOI:https://doi.org/10.1109/TVLSI.2015.2429587

Digital Library

[5]

Tosiron Adegbija, Ann Gordon-Ross, and Marisha Rawlins. 2014. Analysis of cache tuner architectural layouts for multicore embedded systems. In Proceedings of 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC 2014), December 5–7, Austin, TX, USA. IEEE Computer Society 2014, 871–883. https://doi.org/10.1109/PCCC.2014.7017091

[6]

Hao Zhou, Jose M. Alvarez, and Fatih Porikli. 2016. Less is more: Towards compact CNNs. In Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), October 11–14, Amsterdam, The Netherlands. Springer 2016, 662–677. https://doi.org/10.1007/978-3-319-46493-0_40

[7]

Cheng Tai, Tong Xiao, Xiaogang Wang, and Weinan E. 2016. Convolutional neural networks with low-rank regularization. In Proceedings of 4th International Conference on Learning Formats (ICLR 2016), May 2–4, San Juan, Puerto Rico. arXiv:1511.06067. Retrieved from https://arxiv.org/abs/1511.06067.

[8]

Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, and George A. Constantinides. 2019. Deep neural network approximation for custom hardware: Where we've been, where we're going. ACM Comput. Surv 52, 2 Article 40 (2019), 39 pages. https://doi.org/10.1145/3309551

Digital Library

[9]

Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2016. Low precision arithmetic for deep learning. In Proceedings of 3rd International Conference on Learning Formats (ICLR 2015), May 7–9, San Diego, CA, USA. arXiv:1412.7024. Retrieved from https://arxiv.org/abs/1412.7024.

[10]

Yufei Ma, Yu Cao, Sarma B. K. Vrudhula, and Jae-sun Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017), February 22–24, Monterey, CA, USA. ACM 2017, 45–54. https://doi.org/10.1145/3020078.3021736

Digital Library

[11]

Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards uniformed format and acceleration for deep convolutional neural networks. In Proceedings of the International Conference on Computer Aided Design (ICCAD 2016), November 7–10, Austin, TX, USA. ACM 2016, 12:1–8. https://doi.org/10.1145/2966986.2967011

Digital Library

[12]

Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Heng Wai Leong, Magnus Jahre, and Kees A. Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017), February 22–24, Monterey, CA, USA. ACM 2017, 65–74. https://doi.org/10.1145/3020078.3021744

Digital Library

[13]

Mohammad Ghasemzadeh, Mohammad Samragh, and Farinaz Koushanfar. 2018. ReBNet: Residual binarized neural network. In Proceedings of 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2018, April 29–May 1, Boulder, CO, USA. IEEE Computer Society 2018, 57–64. https://doi.org/10.1109/FCCM.2018.00018

[14]

Edward H. Lee, Daisuke Miyashita, Elaina Chai, Boris Murmann, and S. Simon Wong. 2017. LogNet: Energy-efficient neural networks using logarithmic computation. In Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), March 5–9, New Orleans, LA, USA. IEEE 2017, 5900–5904. https://doi.org/10.1109/ICASSP.2017.7953288

[15]

Suraj Srinivas and R. Venkatesh Babu. 2015. Data-free parameter pruning for deep neural networks. In Proceedings of the British Machine Vision Conference 2015 (BMVC 2015), September 7–10, Swansea, UK. BMVA Press 2015, 31:1–12. https://doi.org/10.5244/C.29.31

[16]

Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014 (NIPS 2014), December 8–13, Montreal, Quebec, Canada. Curran Associates, Inc., 1269–1277. https://proceedings.neurips.cc/paper/2014/hash/2afe4567e1bf64d32a5527244d104cea-Abstract.html.

Digital Library

[17]

Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In Proceedings of 2017 International Joint Conference on Neural Networks (IJCNN 2017), May 14–19, Anchorage, AK, USA. IEEE 2017, 2547–2554. https://doi.org/10.1109/IJCNN.2017.7966166

[18]

A. Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Master's thesis. University of Toronto, Canada.

[19]

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), June 20–25, Miami, Florida, USA. IEEE Computer Society 2009, 248–255. https://doi.org/10.1109/CVPR.2009.5206848

[20]

D. Kalman. 1996. A singularly valuable decomposition: The SVD of a matrix. College Math. J. 27, 1 (1996), 2–23. https://doi.org/10.1080/07468342.1996.11973744

[21]

Kim Batselier, Wenjian Yu, Luca Daniel, and Ngai Wong. 2018. Computing low-rank approximations of large-scale matrices with the tensor network randomized SVD. SIAM J. Matrix Anal. Appl 39, 3 (2018), 1221–1244. https://doi.org/10.1137/17M1140480

Digital Library

Index Terms

Double-Shift: A Low-Power DNN Weights Storage and Access Framework based on Approximate Decomposition and Quantization
1. Computing methodologies
  1. Machine learning

Recommendations

Big/little deep neural network for ultra low power inference
CODES '15: Proceedings of the 10th International Conference on Hardware/Software Codesign and System Synthesis

Deep neural networks (DNNs) have recently proved their effectiveness in complex data analyses such as object/speech recognition. As their applications are being expanded to mobile devices, their energy efficiencies are becoming critical. In this paper, ...
Defensive approximation: securing CNNs using approximate computing
ASPLOS '21: Proceedings of the 26th ACM International Conference on Architectural Support for Programming Languages and Operating Systems

In the past few years, an increasing number of machine-learning and deep learning structures, such as Convolutional Neural Networks (CNNs), have been applied to solving a wide range of real-life problems. However, these architectures are vulnerable to ...
Perturbation of deep autoencoder weights for model compression and classification of tabular data
Abstract
Fully connected deep neural networks (DNN) often include redundant weights leading to overfitting and high memory requirements. Additionally, in tabular data classification, DNNs are challenged by the often superior performance of ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Design Automation of Electronic Systems

ACM Transactions on Design Automation of Electronic Systems Volume 27, Issue 2

March 2022

217 pages

ISSN:1084-4309

EISSN:1557-7309

DOI:10.1145/3494074

Editor:
X. Sharon Hu
University of Notre Dame, USA

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 02 November 2021

Accepted: 01 July 2021

Revised: 01 May 2021

Received: 01 February 2021

Published in TODAES Volume 27, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
341
Total Downloads

Downloads (Last 12 months)41
Downloads (Last 6 weeks)0

Reflects downloads up to 04 Oct 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Full Text

View this article in Full Text.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View full text|Download PDF

View Issue’s Table of Contents