Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Open access

DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing

Published: 26 December 2019 Publication History

Abstract

Deep Neural Networks (DNNs) are now increasingly adopted in a variety of Artificial Intelligence (AI) applications. Meantime, more and more DNNs are moving from cloud to the mobile devices, as emerging AI chips are integrated into mobiles. Therefore, the DNN models can be deployed in the cloud, on the mobile devices, or even mobile-cloud coordinate processing, making it a big challenge to select an optimal deployment strategy under specific objectives.
This article proposes a DNN tuning framework, i.e., DNNTune, that can provide layer-wise behavior analysis across a number of platforms. Using DNNTune, this article further selects 13 representative DNN models, including CNN, LSTM, and MLP, and three mobile devices ranging from low-end to high-end, and two AI accelerator chips to characterize the DNN models on these devices to further assist users finding opportunities for mobile-cloud coordinate computing. Our experimental results demonstrate that DNNTune can find a coordinated deployment achieving up to 1.66× speedup and 15× energy saving comparing with mobile-only and cloud-only deployment.

References

[1]
Guohui Wang. 2015. OpenCL-Z Android Official Webpage. Retrieved from http://web.guohuiwang.com/software/opencl_z_android.
[2]
Qualcomm Technologies Inc. 2016. Snapdragon 820 Mobile Platform. Retrieved from https://www.qualcomm.com/products/snapdragon-820-mobile-platform.
[3]
Apple Inc. 2017. The future is here: iPhone X. Retrieved from https://www.apple.com/newsroom/2017/09/the-future-is-here-iphone-x/.
[4]
WikiChip. 2017. Kirin 970—HiSilicon. Retrieved from https://en.wikichip.org/wiki/hisilicon/kirin/970.
[5]
Open Neural Network Exchange. 2017. Open Neural Network Exchange. Retrieved from https://github.com/onnx/onnx.
[6]
Matt Humrick and Ryan Smith. 2017. The Qualcomm Snapdragon 835 Performance Preview. Retrieved from https://www.anandtech.com/show/11201/qualcomm-snapdragon-835-performance-preview/2.
[7]
Michael Passingham. 2017. Snapdragon 835 Benchmarks Revealed: All you need to know about the new chip. Retrieved from https://www.trustedreviews.com/news/snapdragon-835-phones-processor-specs-speed-benchmark-chipset-cores-2944086.
[8]
Inc. Gartner. 2018. Gartner Highlights 10 Uses for AI-Powered Smartphones. Retrieved from https://www.gartner.com/en/newsroom/press-releases/2018-03-20-gartner-highlights-10-uses-for-ai-powered-smartphones.
[9]
HUAWEI Developer. 2018. HiAI Foundation. Retrieved from https://developer.huawei.com/consumer/cn/hiai#Foundation.
[10]
Amazon.com Inc. 2018. Huawei Honor 10. Retrieved from https://www.amazon.com/Huawei-10-128GB-Factory-Unlocked-Smartphone/dp/B07D7GZBDW.
[11]
WikiPedia. 2018. Kryo CPU. Retrieved from https://https://en.wikipedia.org/wiki/Kryo.
[12]
Qualcomm Technologies Inc. and/or its affiliated companies. 2018. List of Qualcomm Snapdragon systems-on-chip. Retrieved from https://en.wikipedia.org/wiki/List_of_Qualcomm_Snapdragon_systems-on-chip.
[13]
MACE Developers. 2018. Mobile AI Compute Engine Documentation. Retrieved from https://mace.readthedocs.io/en/latest/index.html.
[14]
LineaseOS Wiki. 2018. OnePlus 3. Retrieved from https://wiki.lineageos.org/devices/oneplus3.
[15]
LineaseOS Wiki. 2018. OnePlus 5t. Retrieved from https://wiki.lineageos.org/devices/dumpling.
[16]
GSMArena.com. 2018. RedMi Note 4x. Retrieved from https://www.gsmarena.com/xiaomi_redmi_note_4x-8580.php.
[17]
Google Developers. 2018. Simpleperf. Retrieved from https://developer.android.com/ndk/guides/simpleperf.
[18]
Qualcomm Technologies Inc. 2018. Snapdragon Profiler. Retrieved from https://developer.qualcomm.com/software/snapdragon-profiler.
[19]
Google Inc. 2018. Tensorflow Lite. Retrieved from https://www.tensorflow.org/mobile/tflite/.
[20]
Google Inc. 2018. TensorFlow Lite is for mobile and embedded devices. Retrieved from https://www.tensorflow.org/lite/.
[21]
TestMy.net. 2018. TestMyNet: Internet Speed Test. Retrieved from https://testmy.net/.
[22]
Google Inc. 2019. Hosted models. Retrieved from https://www.tensorflow.org/lite/guide/hosted_models.
[23]
NVIDIA Corporation. 2019. Jetson TX2 Module. Retrieved from https://developer.nvidia.com/embedded/jetson-tx2.
[24]
Google Inc. 2019. Model optimization. Retrieved from https://www.tensorflow.org/lite/performance/model_optimization.
[25]
Martín Abadi, Paul Barham, Jianmin Chen, et al. 2016. TensorFlow: A system for large-scale machine learning. In Proceedings of the OSDI’16. 265--283.
[26]
Liang-Chieh Chen, Yukun Zhu, George Papandreou, et al. 2018. Encoder-decoder with atrous separable convolution for semantic image segmentation. In Proceedings of the ECCV.
[27]
Tianqi Chen, Thierry Moreau, Ziheng Jiang, Lianmin Zheng, Eddie Yan, Haichen Shen, Meghan Cowan, Leyuan Wang, Yuwei Hu, Luis Ceze, Carlos Guestrin, and Arvind Krishnamurthy. 2018. TVM: An automated end-to-end optimizing compiler for deep learning. In Proceedings of the OSDI’18. USENIX Association, 578--594. Retrieved from https://www.usenix.org/conference/osdi18/presentation/chen.
[28]
Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, et al. 2011. Clonecloud: Elastic execution between mobile device and cloud. In Proceedings of the EuroSys’11. ACM, 301--314.
[29]
Eduardo Cuervo, Balasubramanian et al. 2010. MAUI: Making smartphones last longer with code offload. In Proceedings of the MobiSys’10. 49--62.
[30]
Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho et al. 2010. MAUI: Making smartphones last longer with code offload. In Proceedings of the MobiSys’10. ACM, 49--62.
[31]
J. Deng, W. Dong, R. Socher et al. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the CVPR’09.
[32]
Mark Everingham, Luc Gool, Christopher K. Williams, et al. [n.d.]. The Pascal visual object classes (VOC) challenge. Int. J. Comput. Vision 88, 2 ([n.d.]), 303--338.
[33]
Mark S. Gordon, Davoud Anoushe Jamshidi, Scott A. Mahlke et al. 2012. COMET: Code offload by migrating execution transparently. In Proceedings of the OSDI, Vol. 12. 93--106.
[34]
Gaël Guennebaud, Benoît Jacob, et al. 2010. Eigen v3. Retrieved from http://eigen.tuxfamily.org.
[35]
Suyog Gupta, Ankur Agrawal, Kailash Gopalakrishnan, and Pritish Narayanan. 2015. Deep learning with limited numerical precision. In International Conference on Machine Learning. 1737--1746.
[36]
Song Han, Huizi Mao, and William J. Dally. 2015. Deep compression: Compressing deep neural network with pruning, trained quantization, and Huffman coding. arXiv preprint arXiv:1510.00149 (2015).
[37]
Seungyeop Han, Haichen Shen, Matthai Philipose, et al. 2016. MCDNN: An approximation-based execution framework for deep stream processing under resource constraints. In Proceedings of the MobiSys’16. 123--136.
[38]
Jussi Hanhirova, Teemu Kämäräinen, Sipi Seppälä, et al. 2018. Latency and throughput characterization of convolutional neural networks for mobile computer vision. In Proceedings of the 9th ACM Multimedia Systems Conference (MMSys '18). ACM, 204--215. https://doi.org/10.1145/3204949.3204975
[39]
Awni Y. Hannun, Carl Case, Jared Casper, Bryan Catanzaro, Greg Diamos, Erich Elsen, Ryan Prenger, Sanjeev Satheesh, Shubho Sengupta, Adam Coates, and Andrew Y. Ng. 2014. Deep speech: Scaling up end-to-end speech recognition. CoRR abs/1412.5567 (2014). Retrieved from http://arxiv.org/abs/1412.5567.
[40]
K. He, X. Zhang, S. Ren, and J. Sun. 2016. Deep residual learning for image recognition. In Proceedings of the CVPR’16. 770--778.
[41]
Andrew G. Howard, Menglong Zhu, Bo Chen, et al. 2017. MobileNets: Efficient convolutional neural networks for mobile vision applications. arXiv preprint arXiv:1704.04861 (2017).
[42]
Loc N. Huynh, Youngki Lee, and Rajesh Krishna Balan. 2017. DeepMon: Mobile GPU-based deep learning framework for continuous vision applications. In Proceedings of the MobiSys’17.
[43]
Forrest N. Iandola, Matthew W. Moskewicz, et al. 2016. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1 MB model size. arXiv preprint arXiv:1602.07360 (2016).
[44]
Yiping Kang, Hauswald Johann, Cao Gao, et al. 2017. Neurosurgeon: Collaborative intelligence between the cloud and mobile edge. In Proceedings of the ASPLOS’17. 615--629.
[45]
Aajna Karki, Chethan Palangotu Keshava, Spoorthi Mysore Shivakumar, Joshua Skow, Goutam Madhukeshwar Hegde, and Hyeran Jeon. 2019. Tango: A deep neural network benchmark suite for various accelerators. In IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS'19). IEEE, 137--138.
[46]
Heehoon Kim, Hyoungwook Nam, Wookeun Jung, and Jaejin Lee. 2017. Performance analysis of CNN frameworks for GPUs. In Proceedings of the ISPASS’17.
[47]
Yann LeCun and Corinna Cortes. 2010. MNIST handwritten digit database. Retrieved from http://yann.lecun.com/exdb/mnist/.
[48]
Ningning Ma, Xiangyu Zhang, Hai-Tao Zheng, and Jian Sun. 2018. ShuffleNet V2: Practical guidelines for efficient CNN architecture design. In Proceedings of the ECCV’18.
[49]
Mitchell P. Marcus, Mary Ann Marcinkiewicz, and Beatrice Santorini. 1993. Building a large annotated corpus of English: The Penn treebank. Comput. Linguist. 19, 2 (June 1993), 313--330.
[50]
Qualcomm Technologies Inc. 2017. Qualcomm-SnapdragonTM Mobile Platform OpenCL General Programming and Optimization.
[51]
Mark Sandler, Andrew G. Howard, Menglong Zhu, et al. 2018. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection, and segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR'18).
[52]
Shaohuai Shi, Qiang Wang, Pengfei Xu, and Xiaowen Chu. 2016. Benchmarking state-of-the-art deep learning software tools. In Proceedings of the CCBD’16. IEEE, 99--104.
[53]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. In Proceedings of the 3rd International Conference on Learning Representations (ICLR'15). http://arxiv.org/abs/1409.1556.
[54]
Ion Stoica, Dawn Song, Raluca Ada Popa, et al. 2017. A Berkeley view of systems challenges for AI. arXiv preprint arXiv:1712.05855 (2017).
[55]
C. Szegedy, Wei Liu, Yangqing Jia, et al. 2015. Going deeper with convolutions. In Proceedings of the CVPR’15. 1--9.
[56]
Christian Szegedy, Vincent Vanhoucke, Sergey Ioffe, et al. 2015. Rethinking the inception architecture for computer vision. In Proceedings of the CVPR’15.
[57]
J. Turner, J. Cano, V. Radu, E. J. Crowley, M. O’Boyle, and A. Storkey. 2018. Characterising across-stack optimisations for deep convolutional neural networks. In Proceedings of the IISWC’18. 101--110.
[58]
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. CoRR abs/1706.03762 (2017). Retrieved from http://arxiv.org/abs/1706.03762.
[59]
Yu Wang, Gu-Yeon Wei, and David Brooks. 2019. Benchmarking TPU, GPU, and CPU platforms for deep learning. CoRR abs/1907.10701 (2019). Retrieved from http://arxiv.org/abs/1907.10701.
[60]
Carole-Jean Wu, David Brooks, Kevin Chen, et al. 2019. Machine learning at Facebook: Understanding inference at the edge. In Proceedings of the HPCA’19.
[61]
Mengwei Xu, Jiawei Liu, Yuanqiang Liu, et al. 2018. When mobile apps going deep: An empirical study of mobile deep learning. arXiv preprint arXiv:1812.05448 (2018).
[62]
W. Zaremba, I. Sutskever, and O. Vinyals. 2014. Recurrent neural network regularization. ArXiv e-prints (Sept. 2014). Retrieved from arxiv:1409.2329

Cited By

View all
  • (2024)DNN Adaptive Partitioning Strategy for Heterogeneous Online Inspection Systems of SubstationsElectronics10.3390/electronics1317338313:17(3383)Online publication date: 26-Aug-2024
  • (2024)PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End LatencyACM Transactions on Embedded Computing Systems10.1145/363026623:1(1-38)Online publication date: 10-Jan-2024
  • (2024)Distributed Computation of DNN via DRL With Spatiotemporal State EmbeddingIEEE Internet of Things Journal10.1109/JIOT.2023.333669511:7(12686-12701)Online publication date: 1-Apr-2024
  • Show More Cited By

Index Terms

  1. DNNTune: Automatic Benchmarking DNN Models for Mobile-cloud Computing

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Transactions on Architecture and Code Optimization
    ACM Transactions on Architecture and Code Optimization  Volume 16, Issue 4
    December 2019
    572 pages
    ISSN:1544-3566
    EISSN:1544-3973
    DOI:10.1145/3366460
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 26 December 2019
    Accepted: 01 October 2019
    Revised: 01 October 2019
    Received: 01 June 2019
    Published in TACO Volume 16, Issue 4

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. DNN
    2. heterogeneous computing
    3. mobile-cloud computing

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)648
    • Downloads (Last 6 weeks)73
    Reflects downloads up to 09 Nov 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)DNN Adaptive Partitioning Strategy for Heterogeneous Online Inspection Systems of SubstationsElectronics10.3390/electronics1317338313:17(3383)Online publication date: 26-Aug-2024
    • (2024)PArtNNer: Platform-Agnostic Adaptive Edge-Cloud DNN Partitioning for Minimizing End-to-End LatencyACM Transactions on Embedded Computing Systems10.1145/363026623:1(1-38)Online publication date: 10-Jan-2024
    • (2024)Distributed Computation of DNN via DRL With Spatiotemporal State EmbeddingIEEE Internet of Things Journal10.1109/JIOT.2023.333669511:7(12686-12701)Online publication date: 1-Apr-2024
    • (2023)Indoor Scene Recognition Mechanism Based on Direction-Driven Convolutional Neural NetworksSensors10.3390/s2312567223:12(5672)Online publication date: 17-Jun-2023
    • (2023)A Joint Study of the Challenges, Opportunities, and Roadmap of MLOps and AIOps: A Systematic SurveyACM Computing Surveys10.1145/362528956:4(1-30)Online publication date: 21-Oct-2023
    • (2023)Nimbus: Towards Latency-Energy Efficient Task Offloading for AR ServicesIEEE Transactions on Cloud Computing10.1109/TCC.2022.314661511:2(1530-1545)Online publication date: 1-Apr-2023
    • (2023)Edge–IoT Computing and Networking Resource Allocation for Decomposable Deep Learning InferenceIEEE Internet of Things Journal10.1109/JIOT.2022.322246110:6(5178-5193)Online publication date: 15-Mar-2023
    • (2023)Deep neural networks in the cloud: Review, applications, challenges and research directionsNeurocomputing10.1016/j.neucom.2023.126327545(126327)Online publication date: Aug-2023
    • (2023)AI augmented Edge and Fog computing: Trends and challengesJournal of Network and Computer Applications10.1016/j.jnca.2023.103648216(103648)Online publication date: Jul-2023
    • (2023)BibliographyIoT for Smart Operations in the Oil and Gas Industry10.1016/B978-0-32-391151-1.00018-6(225-237)Online publication date: 2023
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    HTML Format

    View this article in HTML Format.

    HTML Format

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media