research-article

Open access

DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing

Authors:

Jingling XueAuthors Info & Claims

ACM Transactions on Architecture and Code Optimization (TACO), Volume 16, Issue 4

Article No.: 49, Pages 1 - 26

https://doi.org/10.1145/3368305

Published: 26 December 2019 Publication History

All formats PDF

Abstract

Deep Neural Networks (DNNs) are now increasingly adopted in a variety of Artificial Intelligence (AI) applications. Meantime, more and more DNNs are moving from cloud to the mobile devices, as emerging AI chips are integrated into mobiles. Therefore, the DNN models can be deployed in the cloud, on the mobile devices, or even mobile-cloud coordinate processing, making it a big challenge to select an optimal deployment strategy under specific objectives.

This article proposes a DNN tuning framework, i.e., DNNTune, that can provide layer-wise behavior analysis across a number of platforms. Using DNNTune, this article further selects 13 representative DNN models, including CNN, LSTM, and MLP, and three mobile devices ranging from low-end to high-end, and two AI accelerator chips to characterize the DNN models on these devices to further assist users finding opportunities for mobile-cloud coordinate computing. Our experimental results demonstrate that DNNTune can find a coordinated deployment achieving up to 1.66× speedup and 15× energy saving comparing with mobile-only and cloud-only deployment.

References

[1]

Guohui Wang. 2015. OpenCL-Z Android Official Webpage. Retrieved from http://web.guohuiwang.com/software/opencl_z_android.

[2]

Qualcomm Technologies Inc. 2016. Snapdragon 820 Mobile Platform. Retrieved from https://www.qualcomm.com/products/snapdragon-820-mobile-platform.

[3]

Apple Inc. 2017. The future is here: iPhone X. Retrieved from https://www.apple.com/newsroom/2017/09/the-future-is-here-iphone-x/.

[4]

WikiChip. 2017. Kirin 970—HiSilicon. Retrieved from https://en.wikichip.org/wiki/hisilicon/kirin/970.

[5]

Open Neural Network Exchange. 2017. Open Neural Network Exchange. Retrieved from https://github.com/onnx/onnx.

[6]

Matt Humrick and Ryan Smith. 2017. The Qualcomm Snapdragon 835 Performance Preview. Retrieved from https://www.anandtech.com/show/11201/qualcomm-snapdragon-835-performance-preview/2.

[7]

Michael Passingham. 2017. Snapdragon 835 Benchmarks Revealed: All you need to know about the new chip. Retrieved from https://www.trustedreviews.com/news/snapdragon-835-phones-processor-specs-speed-benchmark-chipset-cores-2944086.

[8]

Inc. Gartner. 2018. Gartner Highlights 10 Uses for AI-Powered Smartphones. Retrieved from https://www.gartner.com/en/newsroom/press-releases/2018-03-20-gartner-highlights-10-uses-for-ai-powered-smartphones.

[9]

HUAWEI Developer. 2018. HiAI Foundation. Retrieved from https://developer.huawei.com/consumer/cn/hiai#Foundation.

[10]

Amazon.com Inc. 2018. Huawei Honor 10. Retrieved from https://www.amazon.com/Huawei-10-128GB-Factory-Unlocked-Smartphone/dp/B07D7GZBDW.

[11]

WikiPedia. 2018. Kryo CPU. Retrieved from https://https://en.wikipedia.org/wiki/Kryo.

[12]

Qualcomm Technologies Inc. and/or its affiliated companies. 2018. List of Qualcomm Snapdragon systems-on-chip. Retrieved from https://en.wikipedia.org/wiki/List_of_Qualcomm_Snapdragon_systems-on-chip.

[13]

MACE Developers. 2018. Mobile AI Compute Engine Documentation. Retrieved from https://mace.readthedocs.io/en/latest/index.html.

[14]

LineaseOS Wiki. 2018. OnePlus 3. Retrieved from https://wiki.lineageos.org/devices/oneplus3.

[15]

LineaseOS Wiki. 2018. OnePlus 5t. Retrieved from https://wiki.lineageos.org/devices/dumpling.

[16]

GSMArena.com. 2018. RedMi Note 4x. Retrieved from https://www.gsmarena.com/xiaomi_redmi_note_4x-8580.php.

[17]

Google Developers. 2018. Simpleperf. Retrieved from https://developer.android.com/ndk/guides/simpleperf.

[18]

Qualcomm Technologies Inc. 2018. Snapdragon Profiler. Retrieved from https://developer.qualcomm.com/software/snapdragon-profiler.

[19]

Google Inc. 2018. Tensorflow Lite. Retrieved from https://www.tensorflow.org/mobile/tflite/.

[20]

Google Inc. 2018. TensorFlow Lite is for mobile and embedded devices. Retrieved from https://www.tensorflow.org/lite/.

[21]

TestMy.net. 2018. TestMyNet: Internet Speed Test. Retrieved from https://testmy.net/.

[22]

Google Inc. 2019. Hosted models. Retrieved from https://www.tensorflow.org/lite/guide/hosted_models.

[23]

NVIDIA Corporation. 2019. Jetson TX2 Module. Retrieved from https://developer.nvidia.com/embedded/jetson-tx2.

[24]

Google Inc. 2019. Model optimization. Retrieved from https://www.tensorflow.org/lite/performance/model_optimization.

[25]

Martín Abadi, Paul Barham, Jianmin Chen, et al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the OSDI’16. 265--283.

[26]

Liang-Chieh Chen, Yukun Zhu, George Papandreou, et al. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the ECCV.

[27]

Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the OSDI’18. USENIX Association, 578--594. Retrieved from https://www.usenix.org/conference/osdi18/presentation/chen.

[28]

Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, et al. 2011. Clonecloud: Elastic execution between mobile device and cloud. In Proceedings of the EuroSys’11. ACM, 301--314.

Digital Library

[29]

Eduardo Cuervo, Balasubramanian et al. 2010. MAUI: Making smartphones last longer with code offload. In Proceedings of the MobiSys’10. 49--62.

[30]

Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho et al. 2010. MAUI: Making smartphones last longer with code offload. In Proceedings of the MobiSys’10. ACM, 49--62.

[31]

J. Deng, W. Dong, R. Socher et al. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the CVPR’09.

[32]

Mark Everingham, Luc Gool, Christopher K. Williams, et al. [n.d.]. The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88, 2 ([n.d.]), 303--338.

[33]

Mark S. Gordon, Davoud Anoushe Jamshidi, Scott A. Mahlke et al. 2012. COMET: Code offload by migrating execution transparently. In Proceedings of the OSDI, Vol. 12. 93--106.

[34]

Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. Retrieved from http://eigen.tuxfamily.org.

[35]

Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning. 1737--1746.

Digital Library

[36]

Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization, and Huffman coding. arXiv preprint arXiv:1510.00149 (2015).

[37]

Seungyeop Han, Haichen Shen, Matthai Philipose, et al. 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the MobiSys’16. 123--136.

Digital Library

[38]

Jussi Hanhirova, Teemu Kämäräinen, Sipi Seppälä, et al. 2018. Latency and throughput characterization of convolutional neural networks for mobile computer vision. In Proceedings of the 9th ACM Multimedia Systems Conference (MMSys '18). ACM, 204--215. https://doi.org/10.1145/3204949.3204975

Digital Library

[39]

Awni Y. Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep speech: Scaling up end-to-end speech recognition. CoRR abs/1412.5567 (2014). Retrieved from http://arxiv.org/abs/1412.5567.

[40]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the CVPR’16. 770--778.

[41]

Andrew G. Howard, Menglong Zhu, Bo Chen, et al. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).

[42]

Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of the MobiSys’17.

Digital Library

[43]

Forrest N. Iandola, Matthew W. Moskewicz, et al. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1 MB model size. arXiv preprint arXiv:1602.07360 (2016).

[44]

Yiping Kang, Hauswald Johann, Cao Gao, et al. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of the ASPLOS’17. 615--629.

Digital Library

[45]

Aajna Karki, Chethan Palangotu Keshava, Spoorthi Mysore Shivakumar, Joshua Skow, Goutam Madhukeshwar Hegde, and Hyeran Jeon. 2019. Tango: A deep neural network benchmark suite for various accelerators. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'19). IEEE, 137--138.

[46]

Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Lee. 2017. Performance analysis of CNN frameworks for GPUs. In Proceedings of the ISPASS’17.

[47]

Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. Retrieved from http://yann.lecun.com/exdb/mnist/.

[48]

Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the ECCV’18.

[49]

Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: The Penn treebank. Comput. Linguist. 19, 2 (June 1993), 313--330.

Digital Library

[50]

Qualcomm Technologies Inc. 2017. Qualcomm-SnapdragonTM Mobile Platform OpenCL General Programming and Optimization.

[51]

Mark Sandler, Andrew G. Howard, Menglong Zhu, et al. 2018. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection, and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18).

[52]

Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. 2016. Benchmarking state-of-the-art deep learning software tools. In Proceedings of the CCBD’16. IEEE, 99--104.

[53]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR'15). http://arxiv.org/abs/1409.1556.

[54]

Ion Stoica, Dawn Song, Raluca Ada Popa, et al. 2017. A Berkeley view of systems challenges for AI. arXiv preprint arXiv:1712.05855 (2017).

[55]

C. Szegedy, Wei Liu, Yangqing Jia, et al. 2015. Going deeper with convolutions. In Proceedings of the CVPR’15. 1--9.

[56]

Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, et al. 2015. Rethinking the inception architecture for computer vision. In Proceedings of the CVPR’15.

[57]

J. Turner, J. Cano, V. Radu, E. J. Crowley, M. O’Boyle, and A. Storkey. 2018. Characterising across-stack optimisations for deep convolutional neural networks. In Proceedings of the IISWC’18. 101--110.

[58]

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. CoRR abs/1706.03762 (2017). Retrieved from http://arxiv.org/abs/1706.03762.

[59]

Yu Wang, Gu-Yeon Wei, and David Brooks. 2019. Benchmarking TPU, GPU, and CPU platforms for deep learning. CoRR abs/1907.10701 (2019). Retrieved from http://arxiv.org/abs/1907.10701.

[60]

Carole-Jean Wu, David Brooks, Kevin Chen, et al. 2019. Machine learning at Facebook: Understanding inference at the edge. In Proceedings of the HPCA’19.

[61]

Mengwei Xu, Jiawei Liu, Yuanqiang Liu, et al. 2018. When mobile apps going deep: An empirical study of mobile deep learning. arXiv preprint arXiv:1812.05448 (2018).

[62]

W. Zaremba, I. Sutskever, and O. Vinyals. 2014. Recurrent neural network regularization. ArXiv e-prints (Sept. 2014). Retrieved from arxiv:1409.2329

Cited By

Fu QDeng FXue XZeng JWei B(2024)DNN Adaptive Partitioning Strategy for Heterogeneous Online Inspection Systems of SubstationsElectronics10.3390/electronics1317338313:17(3383)Online publication date: 26-Aug-2024
https://doi.org/10.3390/electronics13173383
Ghosh SRaha ARaghunathan VRaghunathan A(2024)PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End LatencyACM Transactions on Embedded Computing Systems10.1145/363026623:1(1-38)Online publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1145/3630266
Kim SJung SLee H(2024)Distributed Computation of DNN via DRL With Spatiotemporal State EmbeddingIEEE Internet of Things Journal10.1109/JIOT.2023.333669511:7(12686-12701)Online publication date: 1-Apr-2024
https://doi.org/10.1109/JIOT.2023.3336695
Show More Cited By

Index Terms

DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software

Recommendations

A Self-Cloning Agents Based Model for High-Performance Mobile-Cloud Computing
CLOUD '15: Proceedings of the 2015 IEEE 8th International Conference on Cloud Computing

The rise of the mobile-cloud computing paradigm in recent years has enabled mobile devices with processing power and battery life limitations to achieve complex tasks in real-time. While mobile-cloud computing is promising to overcome the limitations of ...
An Open and Portable Platform for Learning Data Security in Mobile Cloud Computing
SIGITE '16: Proceedings of the 17th Annual Conference on Information Technology Education

Mobile cloud computing (MCC) has become an emerging technology given the explosive growth of mobile devices and advances in cloud computing in recent years. Data security and privacy are the main issues preventing individuals and organizations from ...
JavaScript Offloading for Web Applications in Mobile-Cloud Computing
MS '15: Proceedings of the 2015 IEEE International Conference on Mobile Services

Current mobile Web applications (a.k.a, Mobile Web apps) become quite computation-intensive by involving complex JavaScript, e.g., Data analytic applications and AI games, etc. In the context of mobile-cloud computing, computation offloading is a ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Architecture and Code Optimization

ACM Transactions on Architecture and Code Optimization Volume 16, Issue 4

December 2019

572 pages

ISSN:1544-3566

EISSN:1544-3973

DOI:10.1145/3366460

Editor:
Koen De Bosschere
Ghent University, Belgium

Issue’s Table of Contents

Copyright © 2019 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 26 December 2019

Accepted: 01 October 2019

Revised: 01 October 2019

Received: 01 June 2019

Published in TACO Volume 16, Issue 4

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Australian Research Council grant
National Natural Science Foundation of China
National Key R&D Program of China
CCF-Tencent Open Research Fund

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

22
Total Citations
View Citations
3,481
Total Downloads

Downloads (Last 12 months)648
Downloads (Last 6 weeks)73

Reflects downloads up to 09 Nov 2024

Other Metrics

View Author Metrics

Citations

Cited By

Fu QDeng FXue XZeng JWei B(2024)DNN Adaptive Partitioning Strategy for Heterogeneous Online Inspection Systems of SubstationsElectronics10.3390/electronics1317338313:17(3383)Online publication date: 26-Aug-2024
https://doi.org/10.3390/electronics13173383
Ghosh SRaha ARaghunathan VRaghunathan A(2024)PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End LatencyACM Transactions on Embedded Computing Systems10.1145/363026623:1(1-38)Online publication date: 10-Jan-2024
https://dl.acm.org/doi/10.1145/3630266
Kim SJung SLee H(2024)Distributed Computation of DNN via DRL With Spatiotemporal State EmbeddingIEEE Internet of Things Journal10.1109/JIOT.2023.333669511:7(12686-12701)Online publication date: 1-Apr-2024
https://doi.org/10.1109/JIOT.2023.3336695
Daou APothin JHoneine PBensrhair A(2023)Indoor Scene Recognition Mechanism Based on Direction-Driven Convolutional Neural NetworksSensors10.3390/s2312567223:12(5672)Online publication date: 17-Jun-2023
https://doi.org/10.3390/s23125672
Diaz-de-Arcaya JTorre-Bastida AZárate GMiñón RAlmeida A(2023)A Joint Study of the Challenges, Opportunities, and Roadmap of MLOps and AIOps: A Systematic SurveyACM Computing Surveys10.1145/362528956:4(1-30)Online publication date: 21-Oct-2023
https://dl.acm.org/doi/10.1145/3625289
Cozzolino VTonetto LMohan NDing AOtt J(2023)Nimbus: Towards Latency-Energy Efficient Task Offloading for AR ServicesIEEE Transactions on Cloud Computing10.1109/TCC.2022.314661511:2(1530-1545)Online publication date: 1-Apr-2023
https://doi.org/10.1109/TCC.2022.3146615
Yang YWei H(2023)Edge–IoT Computing and Networking Resource Allocation for Decomposable Deep Learning InferenceIEEE Internet of Things Journal10.1109/JIOT.2022.322246110:6(5178-5193)Online publication date: 15-Mar-2023
https://doi.org/10.1109/JIOT.2022.3222461
Chan KAbu-Salih BQaddoura RAl-Zoubi APalade VPham DSer JMuhammad K(2023)Deep neural networks in the cloud: Review, applications, challenges and research directionsNeurocomputing10.1016/j.neucom.2023.126327545(126327)Online publication date: Aug-2023
https://doi.org/10.1016/j.neucom.2023.126327
Tuli SMirhakimi FPallewatta SZawad SCasale GJavadi BYan FBuyya RJennings N(2023)AI augmented Edge and Fog computing: Trends and challengesJournal of Network and Computer Applications10.1016/j.jnca.2023.103648216(103648)Online publication date: Jul-2023
https://doi.org/10.1016/j.jnca.2023.103648
(2023)BibliographyIoT for Smart Operations in the Oil and Gas Industry10.1016/B978-0-32-391151-1.00018-6(225-237)Online publication date: 2023
https://doi.org/10.1016/B978-0-32-391151-1.00018-6
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents