Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology

Published: 17 September 2021 Publication History

Abstract

Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods.

References

[1]
Mohamed S. Abdelfattah, Łukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, and Nicholas D. Lane. 2020. Best of both worlds: AutoML codesign of a CNN and its hardware accelerator. In Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference (Virtual Event, USA) (DAC’20). IEEE Press, Article 192, 6 pages. https://doi.org/10.5555/3437539.3437731
[2]
Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. abs/1308.3432 (2013). arXiv:1308.3432
[3]
Michaela Blott, Thomas B. Preußer, Nicholas J. Fraser, Giulio Gambardella, Kenneth O’brien, Yaman Umuroglu, Miriam Leeser, and Kees Vissers. 2018. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Trans. Reconfigurable Technol. Syst. 11, 3, Article 16 (Dec. 2018), 23 pages. https://doi.org/10.1145/3242897
[4]
Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. abs/1706.05587 (2017). arXiv:1706.05587v3
[5]
Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture (Seoul, Republic of Korea) (ISCA’16). IEEE Press, 367–379. https://doi.org/10.1109/ISCA.2016.40
[6]
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. ArXiv abs/1805.06085 (2018). arXiv:1805.06085
[7]
Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3213–3223. https://doi.org/10.1109/CVPR.2016.350
[8]
Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, and Niraj K. Jha. 2019. ChamNet: Towards efficient network design through platform-aware model adaptation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11390–11399. https://doi.org/10.1109/CVPR.2019.01166
[9]
Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182–197. https://doi.org/10.1109/4235.996017
[10]
Zhen Dong, Zhewei Yao, Amir Gholami, Michael Mahoney, and Kurt Keutzer. 2019. HAWQ: Hessian AWare quantization of neural networks with mixed-precision. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 293–302. https://doi.org/10.1109/ICCV.2019.00038
[11]
Nael Fasfous, Manoj Rohit Vemparala, Alexander Frickenstein, and Walter Stechele. 2020. OrthrusPE: Runtime reconfigurable processing elements for binary neural networks. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 1662–1667. https://doi.org/10.23919/DATE48585.2020.9116308
[12]
Alexander Frickenstein, Manoj-Rohit Vemparala, Nael Fasfous, Laura Hauenschild, Naveen-Shankar Nagaraja, Christian Unger, and Walter Stechele. 2020. ALF: Autoencoder-based low-rank filter-sharing for efficient convolutional neural networks. In Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference (Virtual Event, USA) (DAC’20). IEEE Press, Article 50, 6 pages. https://doi.org/10.1109/DAC18072.2020.9218501
[13]
Alexander Frickenstein, Manoj Rohit Vemparala, Christian Unger, Fatih Ayar, and Walter Stechele. 2019. DSC: Dense-sparse convolution for vectorized inference of convolutional neural networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1353–1360. https://doi.org/10.1109/CVPRW.2019.00175
[14]
Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and efficient neural network acceleration with 3D memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China). ACM, New York, NY, USA, 751–764. https://doi.org/10.1145/3037697.3037702
[15]
Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, and Christos Kozyrakis. 2019. TANGRAM: Optimized coarse-grained dataflow for scalable NN accelerators. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Providence, RI, USA) (ASPLOS’19). ACM, New York, NY, USA, 807–820. https://doi.org/10.1145/3297858.3304014
[16]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90
[17]
Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV). https://doi.org/10.1007/978-3-030-01234-2_48
[18]
Mark Horowitz. 2014. 1.1 Computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 10–14. https://doi.org/10.1109/ISSCC.2014.6757323
[19]
Qiangui Huang, Kevin Zhou, Suya You, and Ulrich Neumann. 2018. Learning to prune filters in convolutional neural networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 709–718. https://doi.org/10.1109/WACV.2018.00083
[20]
Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research (JMLR) 18, 1 (Jan. 2017), 6869–6898. https://doi.org/10.5555/3122009.3242044
[21]
Weiwen Jiang, Lei Yang, Edwin Hsing-Mean Sha, Qingfeng Zhuge, Shouzhen Gu, Sakyasingha Dasgupta, Yiyu Shi, and Jingtong Hu. 2020. Hardware/software co-exploration of neural architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 12 (2020), 4805–4815. https://doi.org/10.1109/TCAD.2020.2986127
[22]
Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. (2009). University of Toronto.
[23]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (May 2017), 84–90. https://doi.org/10.1145/3065386
[24]
Yujun Lin, Driss Hafdi, Kuan Wang, Zhijian Liu, and Song Han. 2019. Neural-hardware architecture search. In NeurIPS Workshop.
[25]
Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2018. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 7 (2018), 1354–1367. https://doi.org/10.1109/TVLSI.2018.2815603
[26]
P. Meloni, D. Loi, G. Deriu, A. D. Pimentel, D. Sapra, B. Moser, N. Shepeleva, F. Conti, L. Benini, O. Ripolles, D. Solans, M. Pintor, B. Biggio, T. Stefanov, S. Minakova, N. Fragoulis, I. Theodorakopoulos, M. Masin, and F. Palumbo. 2018. ALOHA: An architectural-aware framework for deep learning at the edge. In Proceedings of the Workshop on INTelligent Embedded Systems Architectures and Applications (Turin, Italy) (INTESA’18). Association for Computing Machinery, New York, NY, USA, 19–26. https://doi.org/10.1145/3285017.3285019
[27]
G. De Micheli, A. Sangiovanni-Vincentelli, and P. Antognetti. 1987. Design Systems for VLSI Circuits: Logic Synthesis and Silicon Compilation. Springer Netherlands.
[28]
Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel Emer. 2019. Timeloop: A systematic approach to DNN accelerator evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 304–315. https://doi.org/10.1109/ISPASS.2019.00042
[29]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In The European Conference on Computer Vision (ECCV). Springer International Publishing, 525–542. https://doi.org/10.1007/978-3-319-46493-0_32
[30]
Baidu Research. [n.d.]. DeepBench. https://github.com/baidu-research/DeepBench.
[31]
Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. ACM International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y
[32]
Sayeh Sharify, Alberto Delmas Lascorz, Kevin Siu, Patrick Judd, and Andreas Moshovos. 2018. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC’18). Association for Computing Machinery, New York, NY, USA, Article 20, 6 pages. https://doi.org/10.1145/3195970.3196072
[33]
Frank Vahid and Tony D. Givargis. 2001. Embedded System Design: A Unified Hardware/Software Introduction. Wiley.
[34]
Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. HAQ: Hardware-aware automated quantization with mixed precision. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8604–8612. https://doi.org/10.1109/CVPR.2019.00881
[35]
Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, and Song Han. 2020. APQ: Joint search for network architecture, pruning and quantization policy. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2075–2084. https://doi.org/10.1109/CVPR42600.2020.00215
[36]
Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, and Kurt Keutzer. 2018. Mixed precision quantization of convnets via differentiable neural architecture search. abs/1812.00090 (2018). arXiv:1812.00090
[37]
Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide’s scheduling language to analyze DNN accelerators. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Lausanne, Switzerland) (ASPLOS’20). Association for Computing Machinery, New York, NY, USA, 369–383. https://doi.org/10.1145/3373376.3378514
[38]
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. abs/1606.06160 (2016). arXiv:1606.06160

Cited By

View all
  • (2024)MARLIN: A Co-Design Methodology for Approximate ReconfigurabLe Inference of Neural Networks at the EdgeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.336595271:5(2105-2118)Online publication date: May-2024
  • (2024)A 28-nm 50.1-TOPS/W P-8T SRAM Compute-In-Memory Macro Design With BL Charge-Sharing-Based In-SRAM DAC/ADC OperationsIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.333456659:6(1926-1937)Online publication date: Jun-2024
  • (2024)Toward Efficient Co- Design of CNN Quantization and HW Architecture on FPGA Hybrid-Accelerator2024 2nd International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA62518.2024.10617620(678-683)Online publication date: 10-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems
ACM Transactions on Embedded Computing Systems  Volume 20, Issue 5s
Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021
October 2021
1367 pages
ISSN:1539-9087
EISSN:1558-3465
DOI:10.1145/3481713
  • Editor:
  • Tulika Mitra
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

Publication History

Published: 17 September 2021
Accepted: 01 July 2021
Revised: 01 June 2021
Received: 01 April 2021
Published in TECS Volume 20, Issue 5s

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Convolutional neural networks
  2. multi-objective optimization
  3. hardware modeling
  4. genetic algorithms
  5. quantization

Qualifiers

  • Research-article
  • Refereed

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)50
  • Downloads (Last 6 weeks)4
Reflects downloads up to 10 Nov 2024

Other Metrics

Citations

Cited By

View all
  • (2024)MARLIN: A Co-Design Methodology for Approximate ReconfigurabLe Inference of Neural Networks at the EdgeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.336595271:5(2105-2118)Online publication date: May-2024
  • (2024)A 28-nm 50.1-TOPS/W P-8T SRAM Compute-In-Memory Macro Design With BL Charge-Sharing-Based In-SRAM DAC/ADC OperationsIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.333456659:6(1926-1937)Online publication date: Jun-2024
  • (2024)Toward Efficient Co- Design of CNN Quantization and HW Architecture on FPGA Hybrid-Accelerator2024 2nd International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA62518.2024.10617620(678-683)Online publication date: 10-May-2024
  • (2024)TEMET: Truncated REconfigurable Multiplier with Error TuningApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-48121-5_53(370-377)Online publication date: 13-Jan-2024
  • (2023)The ZuSE-KI-Mobil AI Accelerator SoC: Overview and a Functional Safety Perspective2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137257(1-6)Online publication date: Apr-2023
  • (2022)HW-Flow-Fusion: Inter-Layer Scheduling for Convolutional Neural Network Accelerators with Dataflow ArchitecturesElectronics10.3390/electronics1118293311:18(2933)Online publication date: 16-Sep-2022
  • (2022)AnaCoNGA: Analytical HW-CNN Co-Design Using Nested Genetic Algorithms2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774574(238-243)Online publication date: 14-Mar-2022
  • (2022)MOHAQJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2022.102778133:COnline publication date: 1-Dec-2022

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media