Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework

Published: 09 December 2022 Publication History

Abstract

Efficient deployment of Deep Neural Networks (DNNs) on edge devices (i.e., FPGAs and mobile platforms) is very challenging, especially under a recent witness of the increasing DNN model size and complexity. Model compression strategies, including weight quantization and pruning, are widely recognized as effective approaches to significantly reduce computation and memory intensities, and have been implemented in many DNNs on edge devices. However, most state-of-the-art works focus on ad hoc optimizations, and there lacks a thorough study to comprehensively reveal the potentials and constraints of different edge devices when considering different compression strategies. In this article, we qualitatively and quantitatively compare the energy efficiency of FPGA-based and mobile-based DNN executions using mobile GPU and provide a detailed analysis. Based on the observations obtained from the analysis, we propose a unified optimization framework using block-based pruning to reduce the weight storage and accelerate the inference speed on mobile devices and FPGAs, achieving high hardware performance and energy-efficiency gain while maintaining accuracy.

References

[1]
[3]
Arash Ashari, Shirish Tatikonda, Matthias Boehm, Berthold Reinwald, Keith Campbell, John Keenleyside, and P. Sadayappan. 2015. On optimizing machine learning workloads via kernel fusion. ACM SIGPLAN Not. 50, 8 (2015), 173–182.
[4]
Lin Bai, Yiming Zhao, and Xinming Huang. 2018. A CNN accelerator on FPGA using depthwise separable convolution. IEEE Trans. Circ. Syst. II: Express Briefs 65, 10 (2018), 1415–1419.
[5]
Jeff Bezanson, Alan Edelman, Stefan Karpinski, and Viral B. Shah. 2017. Julia: A fresh approach to numerical computing. SIAM Rev. 59, 1 (2017), 65–98.
[6]
Matthias Boehm, Berthold Reinwald, Dylan Hutchison, Alexandre V. Evfimievski, and Prithviraj Sen. 2018. On optimizing operator fusion plans for large-scale machine learning in SystemML. Retrieved from https://arXiv:1801.00829.
[7]
Sung-En Chang, Yanyu Li, Mengshu Sun, Weiwen Jiang, Runbin Shi, Xue Lin, and Yanzhi Wang. 2020. MSP: An FPGA-specific mixed-scheme, multi-precision deep neural network quantization framework. Retrieved from https://arXiv:2009.07460.
[8]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI’18). 578–594.
[9]
Gong Cheng, Lu Ye, Li Tao, Zhang Xiaofan, Hao Cong, Chen Deming, and Chen Yao. 2019. \(\mu\)L2Q: An ultra-low loss quantization method for DNN. Proceedings of the International Joint Conference on Neural Networks (IJCNN’19).
[10]
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I.-Jen Chuang, Vijayalakshmi Srinivasan, and Kailash Gopalakrishnan. 2018. Pact: Parameterized clipping activation for quantized neural networks. Retrieved from https://arXiv:1805.06085.
[11]
Jason Cong, Zhenman Fang, Michael Lo, Hanrui Wang, Jingxian Xu, and Shaochong Zhang. 2018. Understanding performance differences of FPGAs and GPUs: (Abtract only). In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’18). Association for Computing Machinery, New York, NY, 288.
[12]
Matthieu Courbariaux, Yoshua Bengio, and Jean-Pierre David. 2015. Binaryconnect: Training deep neural networks with binary weights during propagations. In Advances in Neural Information Processing Systems. MIT Press, 3123–3131.
[13]
Matthieu Courbariaux, Itay Hubara, Daniel Soudry, Ran El-Yaniv, and Yoshua Bengio. 2016. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. Retrieved from https://arXiv:1602.02830.
[14]
Xiaoliang Dai, Hongxu Yin, and Niraj K. Jha. 2017. NeST: A neural network synthesis tool based on a grow-and-prune paradigm. Retrieved from https://arXiv:1711.02017.
[15]
Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. 2009. Imagenet: A large-scale hierarchical image database. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 248–255.
[16]
Caiwen Ding, Siyu Liao, Yanzhi Wang, Zhe Li, Ning Liu, Youwei Zhuo, Chao Wang, Xuehai Qian, Yu Bai, Geng Yuan, Xiaolong Ma, Yipeng Zhang, Jian Tang, Qinru Qiu, Xue Lin, and Bo Yuan. 2017. Circnn: Accelerating and compressing deep neural networks using block-circulant weight matrices. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. 395–408.
[17]
Caiwen Ding, Shuo Wang, Ning Liu, Kaidi Xu, Yanzhi Wang, and Yun Liang. 2019. REQ-YOLO: A resource-aware, efficient quantization framework for object detection on FPGAs. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 33–42.
[18]
Xuanyi Dong and Yi Yang. 2019. Network pruning via transformable architecture search. In Advances in Neural Information Processing Systems. MIT Press, 759–770.
[19]
Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, and Dharmendra S. Modha. 2019. Learned step size quantization. In Proceedings of the International Conference on Learning Representations (ICLR’19).
[20]
M. Everingham, L. Van Gool, C. K. I. Williams, J. Winn, and A. Zisserman. [n.d.]. The PASCAL Visual Object Classes Challenge 2012 (VOC2012) Results. Retrieved from http://www.pascal-network.org/challenges/VOC/voc2012/workshop/index.html.
[21]
Ashish Gondimalla, Noah Chesnut, Mithuna Thottethodi, and T. N. Vijaykumar. 2019. SparTen: A sparse tensor accelerator for convolutional neural networks. In Proceedings of the 52nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO’19). 151–165.
[22]
Ruihao Gong, Xianglong Liu, Shenghu Jiang, Tianxiang Li, Peng Hu, Jiazhen Lin, Fengwei Yu, and Junjie Yan. 2019. Differentiable soft quantization: Bridging full-precision and low-bit neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’19). 4852–4861.
[23]
Kaiyuan Guo, Song Han, Song Yao, Yu Wang, Yuan Xie, and Huazhong Yang. 2017. Software-hardware codesign for efficient neural network acceleration. In Proceedings of the 50th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 18–25.
[24]
Kaiyuan Guo, Lingzhi Sui, Jiantao Qiu, Jincheng Yu, Junbin Wang, Song Yao, Song Han, Yu Wang, and Huazhong Yang. 2017. Angel-eye: A complete design flow for mapping CNN onto embedded FPGA. IEEE Trans. Computer-Aided Design Integr. Circ. Syst. 37, 1 (2017), 35–47.
[25]
Peng Guo, Hong Ma, Ruizhi Chen, Pin Li, Shaolin Xie, and Donglin Wang. 2018. Fbna: A fully binarized neural network accelerator. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, 51–513.
[26]
Yiwen Guo, Anbang Yao, and Yurong Chen. 2016. Dynamic network surgery for efficient DNNs. In Advances In Neural Information Processing Systems. 1379–1387.
[27]
Song Han, Jeff Pool, John Tran, and William Dally. 2015. Learning both weights and connections for efficient neural network. In Advances in Neural Information Processing Systems. MIT Press, 1135–1143.
[28]
Yang He, Guoliang Kang, Xuanyi Dong, Yanwei Fu, and Yi Yang. 2018. Soft filter pruning for accelerating deep convolutional neural networks. In Proceedings of the International Joint Conference on Artificial Intelligence (IJCAI’18).
[29]
Yang He, Ping Liu, Ziwei Wang, Zhilan Hu, and Yi Yang. 2019. Filter pruning via geometric median for deep convolutional neural networks acceleration. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4340–4349.
[30]
Yihui He, Xiangyu Zhang, and Jian Sun. 2017. Channel pruning for accelerating very deep neural networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV’17). IEEE, 1398–1406.
[31]
Zhezhi He and Deliang Fan. 2019. Simultaneously optimizing weight and quantizer of ternary neural network using truncated gaussian approximation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 11438–11446.
[32]
Hanhwi Jang, Joonsung Kim, Jae-Eon Jo, Jaewon Lee, and Jangwoo Kim. 2019. MnnFast: A fast and scalable system architecture for memory-augmented neural networks. In Proceedings of the 46th International Symposium on Computer Architecture (ISCA’19). 250–263.
[33]
Weiwen Jiang, Edwin H.-M. Sha, Xinyi Zhang, Lei Yang, Qingfeng Zhuge, Yiyu Shi, and Jingtong Hu. 2019. Achieving super-linear speedup across multi-FPGA for real-time DNN inference. ACM Trans. Embed. Comput. Syst. 18, 5s (2019), 1–23.
[34]
Weiwen Jiang, Lei Yang, Sakyasingha Dasgupta, Jingtong Hu, and Yiyu Shi. 2020. Standing on the shoulders of giants: Hardware and neural architecture co-search with hot start. Retrieved from https://arXiv:2007.09087.
[35]
Weiwen Jiang, Lei Yang, Edwin H-M Sha, Qingfeng Zhuge, Shouzhen Gu, Sakyasingha Dasgupta, Yiyu Shi, and Jingtong Hu. 2020. Hardware/software co-exploration of neural architectures. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. (2020).
[36]
Weiwen Jiang, Xinyi Zhang, Edwin H-M Sha, Lei Yang, Qingfeng Zhuge, Yiyu Shi, and Jingtong Hu. 2019. Accuracy vs. efficiency: Achieving both through FPGA-implementation aware neural architecture search. In Proceedings of the 56th Annual Design Automation Conference. 1–6.
[37]
Xiaotang Jiang, Huan Wang, Yiliu Chen, Ziqi Wu, Lichuan Wang, Bin Zou, Yafeng Yang, Zongyang Cui, Yu Cai, Tianhang Yu, Chengfei Lyu, and Zhihua Wu. 2020. MNN: A universal and efficient inference engine. In Proceedings of Machine Learning and Systems, I. Dhillon, D. Papailiopoulos, and V. Sze (Eds.). Vol. 2. 1–13.
[38]
Sangil Jung, Changyong Son, Seohyung Lee, Jinwoo Son, Jae-Joon Han, Youngjun Kwak, Sung Ju Hwang, and Changkyu Choi. 2019. Learning to quantize deep networks by optimizing quantization intervals with task loss. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 4350–4359.
[39]
Cong Leng, Zesheng Dou, Hao Li, Shenghuo Zhu, and Rong Jin. 2018. Extremely low bit neural network: Squeeze the last bit out with ADMM. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence (AAAI’18).
[40]
Fengfu Li, Bo Zhang, and Bin Liu. 2016. Ternary weight networks. Retrieved from https://arXiv:1605.04711.
[41]
Tuanhui Li, Baoyuan Wu, Yujiu Yang, Yanbo Fan, Yong Zhang, and Wei Liu. 2019. Compressing convolutional neural networks via factorized convolutional filters. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’19). 3977–3986.
[42]
Yuhang Li, Xin Dong, and Wei Wang. 2020. Additive powers-of-two quantization: An efficient non-uniform discretization for neural networks. In Proceedings of the International Conference on Learning Representations (ICLR’20).
[43]
Tsung-Yi Lin, Michael Maire, Serge Belongie, James Hays, Pietro Perona, Deva Ramanan, Piotr Dollár, and C. Lawrence Zitnick. 2014. Microsoft coco: Common objects in context. In Proceedings of the European Conference on Computer Vision. Springer, 740–755.
[44]
Xiaofan Lin, Cong Zhao, and Wei Pan. 2017. Towards accurate binary convolutional neural network. In Advances in Neural Information Processing Systems. MIT Press, 345–353.
[45]
Ning Liu, Xiaolong Ma, Zhiyuan Xu, Yanzhi Wang, Jian Tang, and Jieping Ye. 2019. AutoCompress: An automatic DNN structured pruning framework for ultra-high compression rates. Retrieved from https://arXiv:1907.03141.
[46]
Zhuang Liu, Mingjie Sun, Tinghui Zhou, Gao Huang, and Trevor Darrell. 2018. Rethinking the value of network pruning. Retrieved from https://arXiv:1810.05270.
[47]
Qing Lu, Weiwen Jiang, Xiaowei Xu, Yiyu Shi, and Jingtong Hu. 2019. On neural architecture search for resource-constrained hardware platforms. Retrieved from https://arXiv:1911.00105.
[48]
Cheng Luo, Wei Cao, Lingli Wang, and Philip H. W. Leong. 2019. Rna: An accurate residual network accelerator for quantized and reconstructed deep neural networks. IEICE Trans. Info. Syst. 102, 5 (2019), 1037–1045.
[49]
Jian-Hao Luo, Jianxin Wu, and Weiyao Lin. 2017. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on Computer Vision. 5058–5066.
[50]
Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang. 2020. Pconv: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).
[51]
Xiaolong Ma, Fu-Ming Guo, Wei Niu, Xue Lin, Jian Tang, Kaisheng Ma, Bin Ren, and Yanzhi Wang. 2020. Pconv: The missing but desirable sparsity in DNN weight pruning for real-time execution on mobile devices. In Proceedings of the 34th AAAI Conference on Artificial Intelligence (AAAI’20).
[52]
Yufei Ma, Yu Cao, Sarma Vrudhula, and Jae-sun Seo. 2017. Optimizing loop operation and dataflow in FPGA acceleration of deep convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 45–54.
[53]
Huizi Mao, Song Han, Jeff Pool, Wenshuo Li, Xingyu Liu, Yu Wang, and William J. Dally. 2017. Exploring the regularity of sparse structure in convolutional neural networks. Retrieved from https://arXiv:1705.08922.
[54]
Daisuke Miyashita, Edward H. Lee, and Boris Murmann. 2016. Convolutional neural networks using logarithmic data representation. Retrieved from https://arXiv:1603.01025.
[55]
Hiroki Nakahara, Tomoya Fujii, and Shimpei Sato. 2017. A fully connected layer elimination for a binarized convolutional neural network on an FPGA. In Proceedings of the 27th International Conference on Field Programmable Logic and Applications (FPL’17). IEEE, 1–4.
[56]
Hiroki Nakahara, Masayuki Shimoda, and Shimpei Sato. 2018. A tri-state weight convolutional neural network for an FPGA: Applied to YOLOv2 object detector. In Proceedings of the International Conference on Field-Programmable Technology (FPT’18). IEEE, 298–301.
[57]
Hiroki Nakahara, Haruyoshi Yonekawa, Tomoya Fujii, and Shimpei Sato. 2018. A lightweight yolov2: A binarized cnn with a parallel support vector regression for an FPGA. In Proceedings of the ACM/SIGDA International Symposium on Field-programmable Gate Arrays. 31–40.
[58]
Hiroki Nakahara, Haruyoshi Yonekawa, Tsutomu Sasao, Hisashi Iwamoto, and Masato Motomura. 2016. A memory-based realization of a binarized deep convolutional neural network. In Proceedings of the International Conference on Field-Programmable Technology (FPT’16). IEEE, 277–280.
[59]
Duy Thanh Nguyen, Tuan Nghia Nguyen, Hyun Kim, and Hyuk-Jae Lee. 2019. A high-throughput and power-efficient FPGA implementation of YOLO CNN for object detection. IEEE Trans. Very Large Scale Integr. Syst. 27, 8 (2019), 1861–1873.
[60]
Wei Niu, Xiaolong Ma, Sheng Lin, Shihao Wang, Xuehai Qian, Xue Lin, Yanzhi Wang, and Bin Ren. 2020. PatDNN: Achieving real-time DNN execution on mobile devices with pattern-based weight pruning. In Proceedings of the 25th International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS’20).
[61]
Yue Niu, Rajgopal Kannan, Ajitesh Srivastava, and Viktor Prasanna. 2020. Reuse kernels or activations? A flexible dataflow for low-latency spectral CNN acceleration. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays (FPGA’20). 266–276.
[62]
Thomas B. Preußer, Giulio Gambardella, Nicholas Fraser, and Michaela Blott. 2018. Inference of quantized neural networks on heterogeneous all-programmable devices. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’18). IEEE, 833–838.
[63]
Murad Qasaimeh, Kristof Denolf, Jack Lo, Kees A. Vissers, Joseph Zambreno, and Phillip H. Jones. 2019. Comparing energy efficiency of CPU, GPU, and FPGA implementations for vision kernels. In Proceedings of the 15th IEEE International Conference on Embedded Software and Systems (ICESS’19). IEEE, 1–8.
[64]
Jiantao Qiu, Jie Wang, Song Yao, Kaiyuan Guo, Boxun Li, Erjin Zhou, Jincheng Yu, Tianqi Tang, Ningyi Xu, Sen Song, Yu Wang, and Huazhong Yang. 2016. Going deeper with embedded FPGA platform for convolutional neural network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 26–35.
[65]
Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’16). Springer, 525–542.
[66]
Hardik Sharma, Jongse Park, Divya Mahajan, Emmanuel Amaro, Joon Kyung Kim, Chenkai Shao, Asit Mishra, and Hadi Esmaeilzadeh. 2016. From high-level deep neural models to FPGAs. In Proceedings of the 49th Annual IEEE/ACM International Symposium on Microarchitecture. IEEE Computer Society, 1–13.
[67]
Runbin Shi, Peiyan Dong, Tong Geng, Yuhao Ding, Xiaolong Ma, Hayden K.-H. So, Martin Herbordt, Ang Li, and Yanzhi Wang. 2020. CSB-RNN: A faster-than-realtime RNN acceleration framework with compressed structured blocks. Retrieved from https://arXiv:2005.05758.
[68]
Khurram Soomro, Amir Roshan Zamir, and Mubarak Shah. 2012. UCF101: A dataset of 101 human actions classes from videos in the wild. Retrieved from https://arXiv:1212.0402.
[69]
Jiang Su, Julian Faraone, Junyi Liu, Yiren Zhao, David B. Thomas, Philip H. W. Leong, and Peter Y. K. Cheung. 2018. Redundancy-reduced MobileNet acceleration on reconfigurable logic for ImageNet classification. In Proceedings of the International Symposium on Applied Reconfigurable Computing. Springer, 16–28.
[70]
Stylianos I. Venieris and Christos-Savvas Bouganis. 2018. FPGAConvNet: Mapping regular and irregular convolutional neural networks on FPGAs. IEEE Trans. Neural Netw. Learn. Syst. 30, 2 (2018), 326–342.
[71]
Junsong Wang, Qiuwen Lou, Xiaofan Zhang, Chao Zhu, Yonghua Lin, and Deming Chen. 2018. Design flow of accelerating hybrid extremely low bit-width neural network in embedded FPGA. In Proceedings of the 28th International Conference on Field Programmable Logic and Applications (FPL’18). IEEE, 163–1636.
[72]
Wei Wen, Chunpeng Wu, Yandan Wang, Yiran Chen, and Hai Li. 2016. Learning structured sparsity in deep neural networks. In Advances in Neural Information Processing Systems. MIT Press, 2074–2082.
[73]
Jincheng Yu, Kaiyuan Guo, Yiming Hu, Xuefei Ning, Jiantao Qiu, Huizi Mao, Song Yao, Tianqi Tang, Boxun Li, Yu Wang, and Huazhong Yang. 2018. Real-time object detection towards high power efficiency. In Proceedings of the Design, Automation and Test in Europe Conference and Exhibition (DATE’18). IEEE, 704–708.
[74]
Ruichi Yu, Ang Li, Chun-Fu Chen, Jui-Hsin Lai, Vlad I. Morariu, Xintong Han, Mingfei Gao, Ching-Yung Lin, and Larry S. Davis. 2018. Nisp: Pruning networks using neuron importance score propagation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 9194–9203.
[75]
Yunxuan Yu, Tiandong Zhao, Kun Wang, and Lei He. 2020. Light-OPU: An FPGA-based overlay processor for lightweight convolutional neural networks. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. 122–132.
[76]
Chi Zhang and Viktor Prasanna. 2017. Frequency domain acceleration of convolutional neural networks on CPU-FPGA shared memory system. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 35–44.
[77]
Chen Zhang, Guangyu Sun, Zhenman Fang, Peipei Zhou, Peichen Pan, and Jason Cong. 2018. Caffeine: Toward uniformed representation and acceleration for deep convolutional neural networks. IEEE Trans. Comput.-Aided Design Integr. Circ. Syst. 38, 11 (2018), 2072–2085.
[78]
Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, and Gang Hua. 2018. Lq-nets: Learned quantization for highly accurate and compact deep neural networks. In Proceedings of the European Conference on Computer Vision (ECCV’18). 365–382.
[79]
Jiaqi Zhang, Xiangru Chen, Mingcong Song, and Tao Li. 2019. Eager pruning: Algorithm and architecture support for fast training of deep neural networks. In Proceedings of the 46th Annual International Symposium on Computer Architecture (ISCA’19). IEEE, 292–303.
[80]
Jialiang Zhang and Jing Li. 2017. Improving the performance of OpenCL-based FPGA accelerator for convolutional neural network. In Proceedings of the ACM/SIGDA International Symposium on Field-Programmable Gate Arrays. ACM, 25–34.
[81]
Tianyun Zhang, Shaokai Ye, Kaiqi Zhang, Jian Tang, Wujie Wen, Makan Fardad, and Yanzhi Wang. 2018. A systematic DNN weight pruning framework using alternating direction method of multipliers. Retrieved from https://arXiv:1804.03294.
[82]
Ruizhe Zhao, Xinyu Niu, Yajie Wu, Wayne Luk, and Qiang Liu. 2017. Optimizing CNN-based object detection algorithms on embedded FPGA platforms. In Proceedings of the International Symposium on Applied Reconfigurable Computing. Springer, 255–267.
[83]
Aojun Zhou, Anbang Yao, Yiwen Guo, Lin Xu, and Yurong Chen. 2017. Incremental network quantization: Towards lossless CNNs with low-precision weights. In Proceedings of the International Conference on Learning Representations (ICLR’17).
[84]
Shuchang Zhou, Yuxin Wu, Zekun Ni, Xinyu Zhou, He Wen, and Yuheng Zou. 2016. Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients. Retrieved from https://arXiv:1606.06160.
[85]
Chenzhuo Zhu, Song Han, Huizi Mao, and William J. Dally. 2017. Trained ternary quantization. In Proceedings of the International Conference on Learning Representations (ICLR’17).

Cited By

View all
  • (2023)A Scalable Real-time Semantic Segmentation Network for Autonomous DrivingProceedings of the 2023 Workshop on Advanced Multimedia Computing for Smart Manufacturing and Engineering10.1145/3606042.3616451(3-12)Online publication date: 29-Oct-2023

Index Terms

  1. Mobile or FPGA? A Comprehensive Evaluation on Energy Efficiency and a Unified Optimization Framework

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 21, Issue 5
    September 2022
    526 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3561947
    • Editor:
    • Tulika Mitra
    Issue’s Table of Contents

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 09 December 2022
    Online AM: 20 April 2022
    Accepted: 23 March 2022
    Revised: 10 February 2022
    Received: 15 July 2021
    Published in TECS Volume 21, Issue 5

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DNN model compression
    2. edge device
    3. efficient deep learning

    Qualifiers

    • Research-article
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)91
    • Downloads (Last 6 weeks)4
    Reflects downloads up to 26 Jan 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2023)A Scalable Real-time Semantic Segmentation Network for Autonomous DrivingProceedings of the 2023 Workshop on Advanced Multimedia Computing for Smart Manufacturing and Engineering10.1145/3606042.3616451(3-12)Online publication date: 29-Oct-2023

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Full Text

    View this article in Full Text.

    Full Text

    HTML Format

    View this article in HTML Format.

    HTML Format

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media