research-article

HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology

Authors:

Manoj Rohit Vemparala,

Alexander Frickenstein,

Emanuele Valpreda,

Nguyen Anh Vu Doan,

Christian Unger,

Naveen Shankar Nagaraja,

Maurizio Martina,

Walter StecheleAuthors Info & Claims

ACM Transactions on Embedded Computing Systems (TECS), Volume 20, Issue 5s

Article No.: 66, Pages 1 - 25

https://doi.org/10.1145/3476997

Published: 17 September 2021 Publication History

Abstract

Model compression through quantization is commonly applied to convolutional neural networks (CNNs) deployed on compute and memory-constrained embedded platforms. Different layers of the CNN can have varying degrees of numerical precision for both weights and activations, resulting in a large search space. Together with the hardware (HW) design space, the challenge of finding the globally optimal HW-CNN combination for a given application becomes daunting. To this end, we propose HW-FlowQ, a systematic approach that enables the co-design of the target hardware platform and the compressed CNN model through quantization. The search space is viewed at three levels of abstraction, allowing for an iterative approach for narrowing down the solution space before reaching a high-fidelity CNN hardware modeling tool, capable of capturing the effects of mixed-precision quantization strategies on different hardware architectures (processing unit counts, memory levels, cost models, dataflows) and two types of computation engines (bit-parallel vectorized, bit-serial). To combine both worlds, a multi-objective non-dominated sorting genetic algorithm (NSGA-II) is leveraged to establish a Pareto-optimal set of quantization strategies for the target HW-metrics at each abstraction level. HW-FlowQ detects optima in a discrete search space and maximizes the task-related accuracy of the underlying CNN while minimizing hardware-related costs. The Pareto-front approach keeps the design space open to a range of non-dominated solutions before refining the design to a more detailed level of abstraction. With equivalent prediction accuracy, we improve the energy and latency by 20% and 45% respectively for ResNet56 compared to existing mixed-precision search methods.

References

[1]

Mohamed S. Abdelfattah, Łukasz Dudziak, Thomas Chau, Royson Lee, Hyeji Kim, and Nicholas D. Lane. 2020. Best of both worlds: AutoML codesign of a CNN and its hardware accelerator. In Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference (Virtual Event, USA) (DAC’20). IEEE Press, Article 192, 6 pages. https://doi.org/10.5555/3437539.3437731

Digital Library

[2]

Yoshua Bengio, Nicholas Léonard, and Aaron Courville. 2013. Estimating or propagating gradients through stochastic neurons for conditional computation. abs/1308.3432 (2013). arXiv:1308.3432

[3]

Michaela Blott, Thomas B. Preußer, Nicholas J. Fraser, Giulio Gambardella, Kenneth O’brien, Yaman Umuroglu, Miriam Leeser, and Kees Vissers. 2018. FINN-R: An end-to-end deep-learning framework for fast exploration of quantized neural networks. ACM Trans. Reconfigurable Technol. Syst. 11, 3, Article 16 (Dec. 2018), 23 pages. https://doi.org/10.1145/3242897

Digital Library

[4]

Liang-Chieh Chen, George Papandreou, Florian Schroff, and Hartwig Adam. 2017. Rethinking atrous convolution for semantic image segmentation. abs/1706.05587 (2017). arXiv:1706.05587v3

[5]

Yu-Hsin Chen, Joel Emer, and Vivienne Sze. 2016. Eyeriss: A spatial architecture for energy-efficient dataflow for convolutional neural networks. In Proceedings of the 43rd International Symposium on Computer Architecture (Seoul, Republic of Korea) (ISCA’16). IEEE Press, 367–379. https://doi.org/10.1109/ISCA.2016.40

Digital Library

[6]

Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. PACT: Parameterized clipping activation for quantized neural networks. ArXiv abs/1805.06085 (2018). arXiv:1805.06085

[7]

Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. 2016. The cityscapes dataset for semantic urban scene understanding. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3213–3223. https://doi.org/10.1109/CVPR.2016.350

[8]

Xiaoliang Dai, Peizhao Zhang, Bichen Wu, Hongxu Yin, Fei Sun, Yanghan Wang, Marat Dukhan, Yunqing Hu, Yiming Wu, Yangqing Jia, Peter Vajda, Matt Uyttendaele, and Niraj K. Jha. 2019. ChamNet: Towards efficient network design through platform-aware model adaptation. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 11390–11399. https://doi.org/10.1109/CVPR.2019.01166

[9]

Kalyanmoy Deb, Amrit Pratap, Sameer Agarwal, and T. Meyarivan. 2002. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Transactions on Evolutionary Computation 6, 2 (2002), 182–197. https://doi.org/10.1109/4235.996017

Digital Library

[10]

Zhen Dong, Zhewei Yao, Amir Gholami, Michael Mahoney, and Kurt Keutzer. 2019. HAWQ: Hessian AWare quantization of neural networks with mixed-precision. In 2019 IEEE/CVF International Conference on Computer Vision (ICCV). 293–302. https://doi.org/10.1109/ICCV.2019.00038

[11]

Nael Fasfous, Manoj Rohit Vemparala, Alexander Frickenstein, and Walter Stechele. 2020. OrthrusPE: Runtime reconfigurable processing elements for binary neural networks. In 2020 Design, Automation Test in Europe Conference Exhibition (DATE). 1662–1667. https://doi.org/10.23919/DATE48585.2020.9116308

Digital Library

[12]

Alexander Frickenstein, Manoj-Rohit Vemparala, Nael Fasfous, Laura Hauenschild, Naveen-Shankar Nagaraja, Christian Unger, and Walter Stechele. 2020. ALF: Autoencoder-based low-rank filter-sharing for efficient convolutional neural networks. In Proceedings of the 57th ACM/EDAC/IEEE Design Automation Conference (Virtual Event, USA) (DAC’20). IEEE Press, Article 50, 6 pages. https://doi.org/10.1109/DAC18072.2020.9218501

[13]

Alexander Frickenstein, Manoj Rohit Vemparala, Christian Unger, Fatih Ayar, and Walter Stechele. 2019. DSC: Dense-sparse convolution for vectorized inference of convolutional neural networks. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW). 1353–1360. https://doi.org/10.1109/CVPRW.2019.00175

[14]

Mingyu Gao, Jing Pu, Xuan Yang, Mark Horowitz, and Christos Kozyrakis. 2017. TETRIS: Scalable and efficient neural network acceleration with 3D memory. In Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems (Xi’an, China). ACM, New York, NY, USA, 751–764. https://doi.org/10.1145/3037697.3037702

Digital Library

[15]

Mingyu Gao, Xuan Yang, Jing Pu, Mark Horowitz, and Christos Kozyrakis. 2019. TANGRAM: Optimized coarse-grained dataflow for scalable NN accelerators. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Providence, RI, USA) (ASPLOS’19). ACM, New York, NY, USA, 807–820. https://doi.org/10.1145/3297858.3304014

Digital Library

[16]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 770–778. https://doi.org/10.1109/CVPR.2016.90

[17]

Yihui He, Ji Lin, Zhijian Liu, Hanrui Wang, Li-Jia Li, and Song Han. 2018. AMC: AutoML for model compression and acceleration on mobile devices. In Proceedings of the European Conference on Computer Vision (ECCV). https://doi.org/10.1007/978-3-030-01234-2_48

Digital Library

[18]

Mark Horowitz. 2014. 1.1 Computing’s energy problem (and what we can do about it). In 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC). 10–14. https://doi.org/10.1109/ISSCC.2014.6757323

[19]

Qiangui Huang, Kevin Zhou, Suya You, and Ulrich Neumann. 2018. Learning to prune filters in convolutional neural networks. In 2018 IEEE Winter Conference on Applications of Computer Vision (WACV). 709–718. https://doi.org/10.1109/WACV.2018.00083

[20]

Itay Hubara, Matthieu Courbariaux, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2017. Quantized neural networks: Training neural networks with low precision weights and activations. Journal of Machine Learning Research (JMLR) 18, 1 (Jan. 2017), 6869–6898. https://doi.org/10.5555/3122009.3242044

Digital Library

[21]

Weiwen Jiang, Lei Yang, Edwin Hsing-Mean Sha, Qingfeng Zhuge, Shouzhen Gu, Sakyasingha Dasgupta, Yiyu Shi, and Jingtong Hu. 2020. Hardware/software co-exploration of neural architectures. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems 39, 12 (2020), 4805–4815. https://doi.org/10.1109/TCAD.2020.2986127

[22]

Alex Krizhevsky. 2009. Learning multiple layers of features from tiny images. (2009). University of Toronto.

[23]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2017. ImageNet classification with deep convolutional neural networks. Commun. ACM 60, 6 (May 2017), 84–90. https://doi.org/10.1145/3065386

Digital Library

[24]

Yujun Lin, Driss Hafdi, Kuan Wang, Zhijian Liu, and Song Han. 2019. Neural-hardware architecture search. In NeurIPS Workshop.

[25]

Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2018. Optimizing the convolution operation to accelerate deep neural networks on FPGA. IEEE Transactions on Very Large Scale Integration (VLSI) Systems 26, 7 (2018), 1354–1367. https://doi.org/10.1109/TVLSI.2018.2815603

Digital Library

[26]

P. Meloni, D. Loi, G. Deriu, A. D. Pimentel, D. Sapra, B. Moser, N. Shepeleva, F. Conti, L. Benini, O. Ripolles, D. Solans, M. Pintor, B. Biggio, T. Stefanov, S. Minakova, N. Fragoulis, I. Theodorakopoulos, M. Masin, and F. Palumbo. 2018. ALOHA: An architectural-aware framework for deep learning at the edge. In Proceedings of the Workshop on INTelligent Embedded Systems Architectures and Applications (Turin, Italy) (INTESA’18). Association for Computing Machinery, New York, NY, USA, 19–26. https://doi.org/10.1145/3285017.3285019

Digital Library

[27]

G. De Micheli, A. Sangiovanni-Vincentelli, and P. Antognetti. 1987. Design Systems for VLSI Circuits: Logic Synthesis and Silicon Compilation. Springer Netherlands.

[28]

Angshuman Parashar, Priyanka Raina, Yakun Sophia Shao, Yu-Hsin Chen, Victor A. Ying, Anurag Mukkara, Rangharajan Venkatesan, Brucek Khailany, Stephen W. Keckler, and Joel Emer. 2019. Timeloop: A systematic approach to DNN accelerator evaluation. In 2019 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS). 304–315. https://doi.org/10.1109/ISPASS.2019.00042

[29]

Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. In The European Conference on Computer Vision (ECCV). Springer International Publishing, 525–542. https://doi.org/10.1007/978-3-319-46493-0_32

[30]

Baidu Research. [n.d.]. DeepBench. https://github.com/baidu-research/DeepBench.

[31]

Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, Andrej Karpathy, Aditya Khosla, Michael Bernstein, Alexander C. Berg, and Li Fei-Fei. 2015. ImageNet large scale visual recognition challenge. ACM International Journal of Computer Vision (IJCV) 115, 3 (2015), 211–252. https://doi.org/10.1007/s11263-015-0816-y

Digital Library

[32]

Sayeh Sharify, Alberto Delmas Lascorz, Kevin Siu, Patrick Judd, and Andreas Moshovos. 2018. Loom: Exploiting weight and activation precisions to accelerate convolutional neural networks. In Proceedings of the 55th Annual Design Automation Conference (San Francisco, California) (DAC’18). Association for Computing Machinery, New York, NY, USA, Article 20, 6 pages. https://doi.org/10.1145/3195970.3196072

Digital Library

[33]

Frank Vahid and Tony D. Givargis. 2001. Embedded System Design: A Unified Hardware/Software Introduction. Wiley.

Digital Library

[34]

Kuan Wang, Zhijian Liu, Yujun Lin, Ji Lin, and Song Han. 2019. HAQ: Hardware-aware automated quantization with mixed precision. In 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 8604–8612. https://doi.org/10.1109/CVPR.2019.00881

[35]

Tianzhe Wang, Kuan Wang, Han Cai, Ji Lin, Zhijian Liu, Hanrui Wang, Yujun Lin, and Song Han. 2020. APQ: Joint search for network architecture, pruning and quantization policy. In 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). 2075–2084. https://doi.org/10.1109/CVPR42600.2020.00215

[36]

Bichen Wu, Yanghan Wang, Peizhao Zhang, Yuandong Tian, Peter Vajda, and Kurt Keutzer. 2018. Mixed precision quantization of convnets via differentiable neural architecture search. abs/1812.00090 (2018). arXiv:1812.00090

[37]

Xuan Yang, Mingyu Gao, Qiaoyi Liu, Jeff Setter, Jing Pu, Ankita Nayak, Steven Bell, Kaidi Cao, Heonjae Ha, Priyanka Raina, Christos Kozyrakis, and Mark Horowitz. 2020. Interstellar: Using Halide’s scheduling language to analyze DNN accelerators. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS) (Lausanne, Switzerland) (ASPLOS’20). Association for Computing Machinery, New York, NY, USA, 369–383. https://doi.org/10.1145/3373376.3378514

Digital Library

[38]

Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. abs/1606.06160 (2016). arXiv:1606.06160

Cited By

Guella FValpreda ECaon MMasera GMartina M(2024)MARLIN: A Co-Design Methodology for Approximate ReconfigurabLe Inference of Neural Networks at the EdgeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.336595271:5(2105-2118)Online publication date: May-2024
https://doi.org/10.1109/TCSI.2024.3365952
Lee KKim JPark J(2024)A 28-nm 50.1-TOPS/W P-8T SRAM Compute-In-Memory Macro Design With BL Charge-Sharing-Based In-SRAM DAC/ADC OperationsIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.333456659:6(1926-1937)Online publication date: Jun-2024
https://doi.org/10.1109/JSSC.2023.3334566
Zhang YLi GYuan B(2024)Toward Efficient Co- Design of CNN Quantization and HW Architecture on FPGA Hybrid-Accelerator2024 2nd International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA62518.2024.10617620(678-683)Online publication date: 10-May-2024
https://doi.org/10.1109/ISEDA62518.2024.10617620
Show More Cited By

Index Terms

HW-FlowQ: A Multi-Abstraction Level HW-CNN Co-design Quantization Methodology
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Modeling and simulation
    1. Model development and analysis
      1. Modeling methodologies

Recommendations

Multi-objective control systems design with criteria reduction
SEAL'10: Proceedings of the 8th international conference on Simulated evolution and learning

Control systems design may be based on many criteria. These optimization problems are nonconvex, therefore evolutionary multi-objective optimization algorithms (EMOA) are methods of choice. In engineering design problems it is desirable to find the one ...
Applied Pareto multi-objective optimization by stochastic solvers

It is well known that many engineering design problems with different objectives, some of which can be opposed to one another, can be formulated as multi-objective functions and resolved with the construction of a Pareto front that helps to select the ...
Reliability-based robust Pareto design of linear state feedback controllers using a multi-objective uniform-diversity genetic algorithm (MUGA)

In this paper, fuzzy threshold values, instead of crisp threshold values, have been used for optimal reliability-based multi-objective Pareto design of robust state feedback controllers for a single inverted pendulum having parameters with probabilistic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Embedded Computing Systems

ACM Transactions on Embedded Computing Systems Volume 20, Issue 5s

Special Issue ESWEEK 2021, CASES 2021, CODES+ISSS 2021 and EMSOFT 2021

October 2021

1367 pages

ISSN:1539-9087

EISSN:1558-3465

DOI:10.1145/3481713

Editor:
Tulika Mitra
National University of Singapore, Singapore

Issue’s Table of Contents

Copyright © 2021 Association for Computing Machinery.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Journal Family

ACM Journals for the Design of Smart and Connected Systems

Publication History

Published: 17 September 2021

Accepted: 01 July 2021

Revised: 01 June 2021

Received: 01 April 2021

Published in TECS Volume 20, Issue 5s

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

8
Total Citations
View Citations
304
Total Downloads

Downloads (Last 12 months)68
Downloads (Last 6 weeks)3

Reflects downloads up to 02 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Guella FValpreda ECaon MMasera GMartina M(2024)MARLIN: A Co-Design Methodology for Approximate ReconfigurabLe Inference of Neural Networks at the EdgeIEEE Transactions on Circuits and Systems I: Regular Papers10.1109/TCSI.2024.336595271:5(2105-2118)Online publication date: May-2024
https://doi.org/10.1109/TCSI.2024.3365952
Lee KKim JPark J(2024)A 28-nm 50.1-TOPS/W P-8T SRAM Compute-In-Memory Macro Design With BL Charge-Sharing-Based In-SRAM DAC/ADC OperationsIEEE Journal of Solid-State Circuits10.1109/JSSC.2023.333456659:6(1926-1937)Online publication date: Jun-2024
https://doi.org/10.1109/JSSC.2023.3334566
Zhang YLi GYuan B(2024)Toward Efficient Co- Design of CNN Quantization and HW Architecture on FPGA Hybrid-Accelerator2024 2nd International Symposium of Electronics Design Automation (ISEDA)10.1109/ISEDA62518.2024.10617620(678-683)Online publication date: 10-May-2024
https://doi.org/10.1109/ISEDA62518.2024.10617620
Guella FValpreda ECaon MMasera GMartina M(2024)TEMET: Truncated REconfigurable Multiplier with Error TuningApplications in Electronics Pervading Industry, Environment and Society10.1007/978-3-031-48121-5_53(370-377)Online publication date: 13-Jan-2024
https://doi.org/10.1007/978-3-031-48121-5_53
Kempf FHoefer JHarbaum TBecker JFasfous NFrickenstein AVoegel HFriedrich SWittig RMatúš EFettweis GLueders MBlume HBenndorf JGrantz DZeller MEngelke DEickel K(2023)The ZuSE-KI-Mobil AI Accelerator SoC: Overview and a Functional Safety Perspective2023 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE56975.2023.10137257(1-6)Online publication date: Apr-2023
https://doi.org/10.23919/DATE56975.2023.10137257
Valpreda EMorì PFasfous NVemparala MFrickenstein AFrickenstein LStechele WPasserone CMasera GMartina M(2022)HW-Flow-Fusion: Inter-Layer Scheduling for Convolutional Neural Network Accelerators with Dataflow ArchitecturesElectronics10.3390/electronics1118293311:18(2933)Online publication date: 16-Sep-2022
https://doi.org/10.3390/electronics11182933
Fasfous NVemparala MFrickenstein AValpreda ESalihu DHufer JSingh ANagaraja NVoegel HVu Doan NMartina MBecker JStechele W(2022)AnaCoNGA: Analytical HW-CNN Co-Design Using Nested Genetic Algorithms2022 Design, Automation & Test in Europe Conference & Exhibition (DATE)10.23919/DATE54114.2022.9774574(238-243)Online publication date: 14-Mar-2022
https://doi.org/10.23919/DATE54114.2022.9774574
Rezk NNordström TStathis DUl-Abdin ZAksoy EHemani A(2022)MOHAQJournal of Systems Architecture: the EUROMICRO Journal10.1016/j.sysarc.2022.102778133:COnline publication date: 1-Dec-2022
https://dl.acm.org/doi/10.1016/j.sysarc.2022.102778

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents