Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Optimizing Deep Learning Inference on Embedded Systems Through Adaptive Model Selection

Published: 06 February 2020 Publication History
  • Get Citation Alerts
  • Abstract

    Deep neural networks (DNNs) are becoming a key enabling technique for many application domains. However, on-device inference on battery-powered, resource-constrained embedding systems is often infeasible due to prohibitively long inferencing time and resource requirements of many DNNs. Offloading computation into the cloud is often unacceptable due to privacy concerns, high latency, or the lack of connectivity. Although compression algorithms often succeed in reducing inferencing times, they come at the cost of reduced accuracy.
    This article presents a new, alternative approach to enable efficient execution of DNNs on embedded devices. Our approach dynamically determines which DNN to use for a given input by considering the desired accuracy and inference time. It employs machine learning to develop a low-cost predictive model to quickly select a pre-trained DNN to use for a given input and the optimization constraint. We achieve this first by offline training a predictive model and then using the learned model to select a DNN model to use for new, unseen inputs. We apply our approach to two representative DNN domains: image classification and machine translation. We evaluate our approach on a Jetson TX2 embedded deep learning platform and consider a range of influential DNN models including convolutional and recurrent neural networks. For image classification, we achieve a 1.8x reduction in inference time with a 7.52% improvement in accuracy over the most capable single DNN model. For machine translation, we achieve a 1.34x reduction in inference time over the most capable single model with little impact on the quality of translation.

    References

    [1]
    J. J. Allaire, Dirk Eddelbuettel, Nick Golding, and Yuan Tang. 2016. TensorFlow for R. Available at https://tensorflow.rstudio.com/.
    [2]
    Dario Amodei, Sundaram Ananthanarayanan, Rishita Anubhai, Jingliang Bai, Eric Battenberg, Carl Case, Jared Casper, Bryan Catanzaro, et al. 2016. Deep speech 2: End-to-end speech recognition in English and Mandarin. In Proceedings of ICML’16.
    [3]
    Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural machine translation by jointly learning to align and translate. arXiv:1409.0473.
    [4]
    Jiawang Bai, Yiming Li, Jiawei Li, Yong Jiang, and Shutao Xia. 2019. Rectified decision trees: Towards interpretability, compression and empirical soundness. arxiv:1903.05965.
    [5]
    Sourav Bhattacharya and Nicholas D. Lane. 2016. Sparsification and separation of deep learning layers for constrained resource inference on wearables. In Proceedings of SenSys’16.
    [6]
    Alfredo Canziani, Adam Paszke, and Eugenio Culurciello. 2016. An analysis of deep neural network models for practical applications. arXiv:1605.07678.
    [7]
    Donglin Chen, Jianbin Fang, Chuanfu Xu, Shizhao Chen, and Zheng Wang. 2019. Characterizing scalability of sparse matrix-vector multiplications on phytium FT-2000+. International Journal of Parallel Programming. Retrieved December 14, 2019 from https://link.springer.com/article/10.1007/s10766-019-00646-x.
    [8]
    Shizhao Chen, Jianbin Fang, Donglin Chen, Chuanfu Xu, and Zheng Wang. 2018. Adaptive optimization of sparse matrix-vector multiplication on emerging many-core architectures. In Proceedings of HPCC’18.
    [9]
    Wenlin Chen, James T. Wilson, Stephen Tyree, Kilian Q. Weinberger, and Yixin Chen. 2015. Compressing neural networks with the hashing trick. In Proceedings of ICML’15.
    [10]
    Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. End-to-end deep learning of optimization heuristics. In Proceedings of PACT’17.
    [11]
    Chris Cummins, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Synthesizing benchmarks for predictive modeling. In Proceedings of CGO’17.
    [12]
    Christina Delimitrou and Christos Kozyrakis. 2014. Quasar: Resource-efficient and QoS-aware cluster management. In Proceedings of ASPLOS’14.
    [13]
    Jeff Donahue, Yangqing Jia, Oriol Vinyals, Judy Hoffman, Ning Zhang, Eric Tzeng, and Trevor Darrell. 2014. DeCAF: A deep convolutional activation feature for generic visual recognition. In Proceedings of ICML’14.
    [14]
    Yehia Elkhatib. 2015. Building cloud applications for challenged networks. In Embracing Global Computing in Emerging Economies. Communications in Computer and Information Science, Vol. 514. Springer, 1--10.
    [15]
    Yehia Elkhatib, Barry Porter, Heverson B. Ribeiro, Mohamed Faten Zhani, Junaid Qadir, and Etienne Riviere. 2017. On using micro-clouds to deliver the fog. Internet Computing 21, 2 (March 2017), 8--15.
    [16]
    Murali Krishna Emani, Zheng Wang, and Michael F. P. O’Boyle. 2013. Smart, adaptive mapping of parallelism in the presence of external workload. In Proceedings of CGO’13.
    [17]
    Murali Krishna Emani and Michael O’Boyle. 2015. Celebrating diversity: A mixture of experts approach for runtime mapping in dynamic environments. In Proceedings PLDI’15.
    [18]
    Yonghui Wu, Mike Schuster, Zhifeng Chen, Quoc V. Le, Mohammad Norouzi, Wolfgang Macherey, Maxim Krikun, et al. 2016. Google’s Neural Machine Translation system: Bridging the gap between human and machine translation. arXiv:1609.08144.
    [19]
    Petko Georgiev, Souray Bhattacharya, Nicholas D. Lane, and Cecilia Mascolo. 2017. Low-resource multi-task audio sensing for mobile and embedded devices via shared deep neural network representations. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technology 1, 3 (2017), Article 50.
    [20]
    Dominik Grewe, Zheng Wang, and Michael F. P. O’Boyle. 2011. A workload-aware mapping approach for data-parallel programs. In Proceedings of HiPEAC’11.
    [21]
    Dominik Grewe, Zheng Wang, and Michael F. P. O’Boyle. 2013. OpenCL task partitioning in the presence of GPU contention. In Proceedings of LCPC’13.
    [22]
    Michael F. P. O’Boyle, Zheng Wang, and Dominik Grewe. 2013. Portable mapping of data parallel programs to OpenCL for heterogeneous systems. In Proceedings of CGO’13.
    [23]
    Tian Guo. 2017. Towards efficient deep inference for mobile applications. arXiv:1707.04610.
    [24]
    Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization and huffman coding. arXiv:1510.00149.
    [25]
    Song Han, Jeff Pool, John Tran, and William J. Dally. 2015. Learning both weights and connections for efficient neural network. In Proceedings of NIPS’15.
    [26]
    Song Han, Xingyu Liu, Huizi Mao, Jing Pu, Ardavan Pedram, Mark A. Horowitz, and William J. Dally. 2016. EIE: Efficient inference engine on compressed deep neural network. In Proceedings of ISCA’16.
    [27]
    M. Hassaballah, Aly Amin Abdelmgeid, and Hammam A. Alshazly. 2016. Image features detection, description and matching. In Image Feature Detectors and Descriptors. Studies in Computational Intelligence, Vol. 630. Springer, 11--45.
    [28]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of CVPR’16.
    [29]
    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Identity mappings in deep residual networks. In Proceedings of ECCV’16.
    [30]
    Andrew G. Howard, Menglong Zhu, Bo Chen, Dmitry Kalenichenko, Weijun Wang, Tobias Weyand, Marco Andreetto, and Hartwig Adam. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv:1704.04861.
    [31]
    Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of MobiSys’17.
    [32]
    Sergey Ioffe and Christian Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. In Proceedings of ICML’15.
    [33]
    Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, and Dmitry Kalenichenko. 2018. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of CVPR’18.
    [34]
    Armand Joulin, Edouard Grave, Piotr Bojanowski, and Tomas Mikolov. 2017. Bag of tricks for efficient text classification. In Proceedings of EACL’17.
    [35]
    Yiping Kang, Johann Hauswald, Cao Gao, Austin Rovinski, Trevor Mudge, Jason Mars, and Lingjia Tang. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of ASPLOS’17.
    [36]
    Anthony Khoo, Yuval Marom, and David Albrecht. 2006. Experiments with sentence classification. In Proceedings of the ALTA’06 Workshop.
    [37]
    Yoon Kim. 2014. Convolutional neural networks for sentence classification. arXiv:1408.5882.
    [38]
    Aaron Klein, Stefan Falkner, Simon Bartels, Philipp Hennig, and Frank Hutter. 2016. Fast Bayesian optimization of machine learning hyperparameters on large datasets. arXiv:1605.07079.
    [39]
    Nicholas D. Lane and Pete Warden. 2018. The deep (learning) transformation of mobile and embedded computing. Computer 51, 5 (2018), 12--16.
    [40]
    Seyyed Salar Latifi Oskouei, Hossein Golestani, Matin Hashemi, and Soheil Ghiasi. 2016. CNNdroid: GPU-accelerated execution of trained deep convolutional neural networks on Android. In Proceedings of MM’16.
    [41]
    Tsung-Yi Lin, Michael Maire, Serge Belongie, Lubomir Bourdev, Ross Girshick, James Hays, Pietro Perona, Deva Ramanan, C. Lawrence Zitnick, and Piotr Dollar. 2014. Microsoft COCO: Common objects in context. In Proceedings of ECCV’14.
    [42]
    Marco Lui. 2012. Feature stacking for sentence classification in evidence-based medicine. In Proceedings of the ALTA’12 Workshop.
    [43]
    Minh-Thang Luong, Eugene Brevdo, and Rui Zhao. 2017. Neural Machine Translation (seq2seq) Tutorial. Retrieved December 14, 2019 from https://github.com/tensorflow/nmt.
    [44]
    Walid Magdy, Yehia Elkhatib, Gareth Tyson, Sagar Joglekar, and Nishanth Sastry. 2017. Fake it till you make it: Fishing for catfishes. In Proceedings of ASONAM’17.
    [45]
    Vicent Sanz Marco, Ben Taylor, Barry Porter, and Zheng Wang. 2017. Improving spark application throughput via memory aware task co-location: A mixture of experts approach. In Proceedings of Middleware’17.
    [46]
    Vicent Sanz Marco, Ben Taylor, Barry Porter, and Zheng Wang. 2017. Improving spark application throughput via memory aware task co-location: A mixture of experts approach. In Proceedings of Middleware’17.
    [47]
    Mohammad Motamedi, Daniel Fong, and Soheil Ghiasi. 2017. Machine intelligence on resource-constrained IoT devices: The case of thread granularity optimization for CNN inference. ACM Transactions on Embedded Computing Systems 16, 5s (2017), Article 151.
    [48]
    William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2014. Fast automatic heuristic construction using active learning. In Proceedings of LCPC’14.
    [49]
    William F. Ogilvie, Pavlos Petoumenos, Zheng Wang, and Hugh Leather. 2017. Minimizing the cost of iterative compilation with active learning. In Proceedings of CGO’17.
    [50]
    Seyed Ali Ossia, Ali Shahin Shamsabadi, Sina Sajadmanesh, Ali Taheri, Kleomenis Katevas, Hamid R. Rabiee, Nicholas D. Lane, and Hamed Haddadi. 2017. A hybrid deep learning architecture for privacy-preserving mobile analytics. arXiv:1703.02952.
    [51]
    Omkar M. Parkhi, Andrea Vedaldi, and Andrew Zisserman. 2015. Deep face recognition. In Proceedings of BMVC’15.
    [52]
    Swati Rallapalli, Hang Qui, Archith John Bency, S. Karthikeyan, Ramesh Govindan, B. S. Manjunath, and Rahul Urgaonkar. 2016. Are Very Deep Neural Networks Feasible on Mobile Devices? Technical Report. University of Southern California.
    [53]
    Mohammad Rastegari, Vicente Ordonez, Joseph Redmon, and Ali Farhadi. 2016. XNOR-Net: ImageNet classification using binary convolutional neural networks. arXiv:1603.05279.
    [54]
    Sujith Ravi. 2015. ProjectionNet: Learning efficient on-device deep networks using neural projections. arXiv:1708.00630.
    [55]
    Jie Ren et al. 2017. Optimise web browsing on heterogeneous mobile platforms: A machine learning based approach. In Proceedings of INFOCOM’17.
    [56]
    Jie Ren, Ling Gao, Hai Wang, and Zheng Wang. 2018. Proteus: Network-aware web browsing on heterogeneous mobile systems. In Proceedings of CoNEXT’18.
    [57]
    Sandra Servia Rodríguez, Liang Wang, Jianxin R. Zhao, Richard Mortier, and Hamed Haddadi. 2017. Privacy-preserving personal model training. arXiv:1703.00380.
    [58]
    Olga Russakovsky, Jia Deng, Hao Su, Jonathan Krause, Sanjeev Satheesh, Sean Ma, Zhiheng Huang, et al. 2015. ImageNet large scale visual recognition challenge. In Proceedings of IJCV’15.
    [59]
    Faiza Samreen et al. 2016. Daleel: Simplifying cloud instance selection using machine learning. In Proceedings of NOMS’16.
    [60]
    Faiza Samreen, Yehia Elkhatib, Matthew Rowe, and Gordon S. Blair. 2019. Transferable knowledge for low-cost decision making in cloud environments. arXiv:1905.02448.
    [61]
    Danielle Saunders, Felix Stahlberg, Adria de Gispert, and Bill Byrne. 2018. Multi-representation ensembles and delayed SGD updates improve syntax-based NMT. arXiv:1805.00456.
    [62]
    Glenn Shafer and Vladimir Vovk. 2008. A tutorial on conformal prediction. Journal of Machine Learning Research 9 (2008), 371--421.
    [63]
    Nathan Silberman and Sergio Guadarrama. 2013. TensorFlow-Slim Image Classification Library. Retrieved December 14, 2019 from https://github.com/tensorflow/models/tree/master/research/slim.
    [64]
    Mingcong Song, Yang Hu, Huixiang Chen, and Tao Li. 2017. Towards pervasive and user satisfactory CNN across GPU microarchitectures. In Proceedings of HPCA’17.
    [65]
    Felix Stahlberg, Adria de Gispert, and Bill Byrne. 2018. The University of Cambridge’s machine translation systems for WMT18. arXiv:1808.09465.
    [66]
    Yi Sun, Xiaogang Wang, and Xiaoou Tang. 2014. Deep learning face representation by joint identification-verification. In Proceedings of NIPS’14.
    [67]
    Ben Taylor, Vicent Sanz Marco, and Zheng Wang. 2017. Adaptive optimization for Open CL programs on embedded heterogeneous systems. In Proceedings of LCTES’17.
    [68]
    Ben Taylor, Vicent Sanz Marco, Willy Wolff, Yehia Elkhatib, and Zheng Wang. 2018. Adaptive deep learning model selection on embedded systems. In Proceedings of LCTES’18. ACM, New York, NY, 31--43.
    [69]
    Surat Teerapittayanon, Bradley McDanel, and H. T. Kung. 2017. Distributed deep neural networks over the cloud, the edge and end devices. In Proceedings of ICDCS’17.
    [70]
    Georgios Tournavitis, Zheng Wang, Bjorn Franke, and Michael F. P. O’Boyle. 2009. Towards a holistic approach to auto-parallelization: Integrating profile-driven parallelism detection and machine-learning based mapping. In Proceedings of PLDI’09.
    [71]
    EMNLP 2015 Tenth Workshop on Statistical Machine Translation. 2015. Shared Task: Machine Translation. Retrieved December 14, 2019 from https://www.statmt.org/wmt15/translation-task.html.
    [72]
    Muhammad Usama, Junaid Qadir, Aunn Raza, Hunain Arif, Kok-Lim Alvin Yau, Yehia Elkhatib, Amir Hussain, and Ala Al-Fuqaha. 2017. Unsupervised machine learning for networking: Techniques, applications and research challenges. arXiv:1709.06599.
    [73]
    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. arXiv:1706.03762.
    [74]
    Zheng Wang, Dominik Grewe, and Michael F. P. O’Boyle. 2015. Automatic and portable mapping of data parallel programs to OpenCL for GPU-based heterogeneous systems. ACM Transactions on Architecture and Code Optimization 11, 4 (2015), Article 42.
    [75]
    Zheng Wang, Georgios Tournavitis, Bjorn Franke, and Michael F. P. O’Boyle. 2014. Integrating profile-driven parallelism detection and machine-learning-based mapping. ACM Transactions on Architecture and Code Optimization 11, 1 (2014), Article 2.
    [76]
    Zheng Wang and Michael O’Boyle. 2018. Machine learning in compiler optimization. Proceedings of the IEEE 106, 11 (2018), 1879--1901.
    [77]
    Zheng Wang and Michael F. P. O’Boyle. 2009. Mapping parallelism to multi-cores: A machine learning based approach. In Proceedings of PPoPP’09.
    [78]
    Zheng Wang and Michael F. P. O’Boyle. 2010. Partitioning streaming parallelism for multi-cores: A machine learning based approach. In Proceedings of PACT’10.
    [79]
    Zheng Wang and Michael F. P. O’Boyle. 2013. Using machine learning to partition streaming programs. ACM Transactions on Architecture and Code Optimization 10, 3 (2013), Article 20.
    [80]
    Jie Zhang, Zhanyong Tang, Meng Li, Dingyi Fang, Petteri Nurmi, and Zheng Wang. 2018. CrossSense: Towards cross-site and large-scale WiFi sensing. In Proceedings of MobiCom’18.
    [81]
    Peng Zhang, Jianbin Fang, Tao Tang, Canqun Yang, and Zheng Wang. 2018. Auto-tuning streamed applications on Intel Xeon Phi. In Proceedings of IPDPS’18.

    Cited By

    View all
    • (2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
    • (2024)Artificial intelligence and edge computing for machine maintenance-reviewArtificial Intelligence Review10.1007/s10462-024-10748-957:5Online publication date: 15-Apr-2024
    • (2023)Evaluation and Prediction of Resource Usage for multi-parametric Deep Learning training and inferenceProceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics10.1145/3635059.3635070(67-73)Online publication date: 24-Nov-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Embedded Computing Systems
    ACM Transactions on Embedded Computing Systems  Volume 19, Issue 1
    January 2020
    185 pages
    ISSN:1539-9087
    EISSN:1558-3465
    DOI:10.1145/3382497
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Journal Family

    Publication History

    Published: 06 February 2020
    Accepted: 01 October 2019
    Revised: 01 August 2019
    Received: 01 April 2019
    Published in TECS Volume 19, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Deep learning
    2. adaptive computing
    3. embedded systems

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)176
    • Downloads (Last 6 weeks)21
    Reflects downloads up to 26 Jul 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
    • (2024)Artificial intelligence and edge computing for machine maintenance-reviewArtificial Intelligence Review10.1007/s10462-024-10748-957:5Online publication date: 15-Apr-2024
    • (2023)Evaluation and Prediction of Resource Usage for multi-parametric Deep Learning training and inferenceProceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics10.1145/3635059.3635070(67-73)Online publication date: 24-Nov-2023
    • (2023)Fine-grained Hardware Acceleration for Efficient Batteryless Intermittent Inference on the EdgeACM Transactions on Embedded Computing Systems10.1145/360847522:5(1-19)Online publication date: 26-Sep-2023
    • (2023)Transforming Large-Size to Lightweight Deep Neural Networks for IoT ApplicationsACM Computing Surveys10.1145/357095555:11(1-35)Online publication date: 9-Feb-2023
    • (2023)Exercise Pose Recognition and Counting System using Robust Topological Landmarks2023 18th International Joint Symposium on Artificial Intelligence and Natural Language Processing (iSAI-NLP)10.1109/iSAI-NLP60301.2023.10354974(1-6)Online publication date: 27-Nov-2023
    • (2023)ROMA: Run-Time Object Detection To Maximize Real-Time Accuracy2023 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV56688.2023.00634(6394-6403)Online publication date: Jan-2023
    • (2023)REDRESS: Generating Compressed Models for Edge Inference Using Tsetlin MachinesIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326841545:9(11152-11168)Online publication date: 1-Sep-2023
    • (2023)Edge AI as a Service: Configurable Model Deployment and Delay-Energy Optimization With Result Quality ConstraintsIEEE Transactions on Cloud Computing10.1109/TCC.2022.317572511:2(1954-1969)Online publication date: 1-Apr-2023
    • (2023)ECADA: An Edge Computing Assisted Delay-Aware Anomaly Detection Scheme for ICS2023 19th International Conference on Mobility, Sensing and Networking (MSN)10.1109/MSN60784.2023.00042(215-222)Online publication date: 14-Dec-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media