Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Double-Shift: A Low-Power DNN Weights Storage and Access Framework based on Approximate Decomposition and Quantization

Published: 02 November 2021 Publication History

Abstract

One major challenge in deploying Deep Neural Network (DNN) in resource-constrained applications, such as edge nodes, mobile embedded systems, and IoT devices, is its high energy cost. The emerging approximate computing methodology can effectively reduce the energy consumption during the computing process in DNN. However, a recent study shows that the weight storage and access operations can dominate DNN's energy consumption due to the fact that the huge size of DNN weights must be stored in the high-energy-cost DRAM. In this paper, we propose Double-Shift, a low-power DNN weight storage and access framework, to solve this problem. Enabled by approximate decomposition and quantization, Double-Shift can reduce the data size of the weights effectively. By designing a novel weight storage allocation strategy, Double-Shift can boost the energy efficiency by trading the energy consuming weight storage and access operations for low-energy-cost computations. Our experimental results show that Double-Shift can reduce DNN weights to 3.96%–6.38% of the original size and achieve an energy saving of 86.47%–93.62%, while introducing a DNN classification error within 2%.

References

[1]
Ali Heydari Gorji, Mahdi Torabzadehkashi, Siavash Rezaei, Hossein Bobarshad, Vladimir Castro Alves, and Pai H. Chou. 2020. Stannis: Low-power acceleration of DNN training using computational storage devices. In Proceedings of 57th ACM/IEEE Design Automation Conference (DAC 2020) July 20–24, San Francisco, CA, USA. IEEE 2020, 1–6. https://doi.org/10.1109/DAC18072.2020.9218687
[2]
Bo Liu, Shisheng Guo, Hai Qin, Yu Gong, Jinjiang Yang, Wei Ge, and Jun Yang. 2018. An energy-efficient reconfigurable hybrid DNN architecture for speech recognition with approximate computing. In Proceedings of 23rd IEEE International Conference on Digital Signal Processing (DSP 2018), November 19–21, Shanghai, China. IEEE 2018, 1–5. https://doi.org/10.1109/ICDSP.2018.8631826
[3]
Yang Zhao, Xiaohan Chen, Yue Wang, Chaojian Li, Haoran You, Yonggan Fu, Yuan Xie, Zhangyang Wang, and Yingyan Lin. 2020. SmartExchange: Trading higher-cost memory storage/access for lower-cost computation. In Proceedings of the 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA 2020), May 30 – June 3, Valencia, Spain. IEEE 2020, 954–967. https://doi.org/10.1109/ISCA45697.2020.00082
[4]
Jongmin Lee and Soontae Kim. 2016. Write buffer-oriented energy reduction in the L1 data cache for embedded systems. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 24, 3 (2016), 871–883. DOI:https://doi.org/10.1109/TVLSI.2015.2429587
[5]
Tosiron Adegbija, Ann Gordon-Ross, and Marisha Rawlins. 2014. Analysis of cache tuner architectural layouts for multicore embedded systems. In Proceedings of 2014 IEEE 33rd International Performance Computing and Communications Conference (IPCCC 2014), December 5–7, Austin, TX, USA. IEEE Computer Society 2014, 871–883. https://doi.org/10.1109/PCCC.2014.7017091
[6]
Hao Zhou, Jose M. Alvarez, and Fatih Porikli. 2016. Less is more: Towards compact CNNs. In Proceedings of the 14th European Conference on Computer Vision (ECCV 2016), October 11–14, Amsterdam, The Netherlands. Springer 2016, 662–677. https://doi.org/10.1007/978-3-319-46493-0_40
[7]
Cheng Tai, Tong Xiao, Xiaogang Wang, and Weinan E. 2016. Convolutional neural networks with low-rank regularization. In Proceedings of 4th International Conference on Learning Formats (ICLR 2016), May 2–4, San Juan, Puerto Rico. arXiv:1511.06067. Retrieved from https://arxiv.org/abs/1511.06067.
[8]
Erwei Wang, James J. Davis, Ruizhe Zhao, Ho-Cheung Ng, Xinyu Niu, Wayne Luk, Peter Y. K. Cheung, and George A. Constantinides. 2019. Deep neural network approximation for custom hardware: Where we've been, where we're going. ACM Comput. Surv 52, 2 Article 40 (2019), 39 pages. https://doi.org/10.1145/3309551
[9]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2016. Low precision arithmetic for deep learning. In Proceedings of 3rd International Conference on Learning Formats (ICLR 2015), May 7–9, San Diego, CA, USA. arXiv:1412.7024. Retrieved from https://arxiv.org/abs/1412.7024.
[10]
Yufei Ma, Yu Cao, Sarma B. K. Vrudhula, and Jae-sun Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017), February 22–24, Monterey, CA, USA. ACM 2017, 45–54. https://doi.org/10.1145/3020078.3021736
[11]
Chen Zhang, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2016. Caffeine: Towards uniformed format and acceleration for deep convolutional neural networks. In Proceedings of the International Conference on Computer Aided Design (ICCAD 2016), November 7–10, Austin, TX, USA. ACM 2016, 12:1–8. https://doi.org/10.1145/2966986.2967011
[12]
Yaman Umuroglu, Nicholas J. Fraser, Giulio Gambardella, Michaela Blott, Philip Heng Wai Leong, Magnus Jahre, and Kees A. Vissers. 2017. FINN: A framework for fast, scalable binarized neural network inference. In Proceedings of the 2017 ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA 2017), February 22–24, Monterey, CA, USA. ACM 2017, 65–74. https://doi.org/10.1145/3020078.3021744
[13]
Mohammad Ghasemzadeh, Mohammad Samragh, and Farinaz Koushanfar. 2018. ReBNet: Residual binarized neural network. In Proceedings of 26th IEEE Annual International Symposium on Field-Programmable Custom Computing Machines, FCCM 2018, April 29–May 1, Boulder, CO, USA. IEEE Computer Society 2018, 57–64. https://doi.org/10.1109/FCCM.2018.00018
[14]
Edward H. Lee, Daisuke Miyashita, Elaina Chai, Boris Murmann, and S. Simon Wong. 2017. LogNet: Energy-efficient neural networks using logarithmic computation. In Proceedings of 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP 2017), March 5–9, New Orleans, LA, USA. IEEE 2017, 5900–5904. https://doi.org/10.1109/ICASSP.2017.7953288
[15]
Suraj Srinivas and R. Venkatesh Babu. 2015. Data-free parameter pruning for deep neural networks. In Proceedings of the British Machine Vision Conference 2015 (BMVC 2015), September 7–10, Swansea, UK. BMVA Press 2015, 31:1–12. https://doi.org/10.5244/C.29.31
[16]
Emily L. Denton, Wojciech Zaremba, Joan Bruna, Yann LeCun, and Rob Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. In Proceedings of Advances in Neural Information Processing Systems 27: Annual Conference on Neural Information Processing Systems 2014 (NIPS 2014), December 8–13, Montreal, Quebec, Canada. Curran Associates, Inc., 1269–1277. https://proceedings.neurips.cc/paper/2014/hash/2afe4567e1bf64d32a5527244d104cea-Abstract.html.
[17]
Hande Alemdar, Vincent Leroy, Adrien Prost-Boucle, and Frédéric Pétrot. 2017. Ternary neural networks for resource-efficient AI applications. In Proceedings of 2017 International Joint Conference on Neural Networks (IJCNN 2017), May 14–19, Anchorage, AK, USA. IEEE 2017, 2547–2554. https://doi.org/10.1109/IJCNN.2017.7966166
[18]
A. Krizhevsky. 2009. Learning Multiple Layers of Features from Tiny Images. Master's thesis. University of Toronto, Canada.
[19]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Fei-Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR 2009), June 20–25, Miami, Florida, USA. IEEE Computer Society 2009, 248–255. https://doi.org/10.1109/CVPR.2009.5206848
[20]
D. Kalman. 1996. A singularly valuable decomposition: The SVD of a matrix. College Math. J. 27, 1 (1996), 2–23. https://doi.org/10.1080/07468342.1996.11973744
[21]
Kim Batselier, Wenjian Yu, Luca Daniel, and Ngai Wong. 2018. Computing low-rank approximations of large-scale matrices with the tensor network randomized SVD. SIAM J. Matrix Anal. Appl 39, 3 (2018), 1221–1244. https://doi.org/10.1137/17M1140480

Index Terms

  1. Double-Shift: A Low-Power DNN Weights Storage and Access Framework based on Approximate Decomposition and Quantization

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Design Automation of Electronic Systems
    ACM Transactions on Design Automation of Electronic Systems  Volume 27, Issue 2
    March 2022
    217 pages
    ISSN:1084-4309
    EISSN:1557-7309
    DOI:10.1145/3494074
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 02 November 2021
    Accepted: 01 July 2021
    Revised: 01 May 2021
    Received: 01 February 2021
    Published in TODAES Volume 27, Issue 2

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep neural network
    2. approximate computing
    3. matrix compression

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • 0
      Total Citations
    • 341
      Total Downloads
    • Downloads (Last 12 months)41
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 04 Oct 2024

    Other Metrics

    Citations

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media