research-article

Public Access

Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge

Authors:

Johann Hauswald,

Austin Rovinski,

Lingjia TangAuthors Info & Claims

ACM SIGARCH Computer Architecture News, Volume 45, Issue 1

Pages 615 - 629

https://doi.org/10.1145/3093337.3037698

Published: 04 April 2017 Publication History

Abstract

The computation for today's intelligent personal assistants such as Apple Siri, Google Now, and Microsoft Cortana, is performed in the cloud. This cloud-only approach requires significant amounts of data to be sent to the cloud over the wireless network and puts significant computational pressure on the datacenter. However, as the computational resources in mobile devices become more powerful and energy efficient, questions arise as to whether this cloud-only processing is desirable moving forward, and what are the implications of pushing some or all of this compute to the mobile devices on the edge.

In this paper, we examine the status quo approach of cloud-only processing and investigate computation partitioning strategies that effectively leverage both the cycles in the cloud and on the mobile device to achieve low latency, low energy consumption, and high datacenter throughput for this class of intelligent applications. Our study uses 8 intelligent applications spanning computer vision, speech, and natural language domains, all employing state-of-the-art Deep Neural Networks (DNNs) as the core machine learning technique. We find that given the characteristics of DNN algorithms, a fine-grained, layer-level computation partitioning strategy based on the data and computation variations of each layer within a DNN has significant latency and energy advantages over the status quo approach.

Using this insight, we design Neurosurgeon, a lightweight scheduler to automatically partition DNN computation between mobile devices and datacenters at the granularity of neural network layers. Neurosurgeon does not require per-application profiling. It adapts to various DNN architectures, hardware platforms, wireless networks, and server load levels, intelligently partitioning computation for best latency or best mobile energy. We evaluate Neurosurgeon on a state-of-the-art mobile development platform and show that it improves end-to-end latency by 3.1X on average and up to 40.7X, reduces mobile energy consumption by 59.5% on average and up to 94.7%, and improves datacenter throughput by 1.5X on average and up to 6.7X.

References

[1]

Wearables market to be worth$25 billion by 2019. http://www.ccsinsight.com/press/company-news/2332-wearables-market-to-be-worth-25-billion-by-2019-reveals-ccs-insight. Accessed: 2017-01.

[2]

Rapid Expansion Projected for Smart Home Devices, IHS Markit Says. http://news.ihsmarkit.com/press-release/technology/rapid-expansion-projected-smart-home-devices-ihs-markit-says. Accessed: 2017-01.

[3]

Intelligent Virtual Assistant Market Worth$3.07Bn By 2020. https://globenewswire.com/news-release/2015/12/17/796353/0/en/Intelligent-Virtual-Assistant-Market-Worth-3-07Bn-By-2020.html. Accessed: 2016-08.

[4]

Intelligent Virtual Assistant Market Analysis And Segment Forecasts 2015 To 2022. https://www.hexaresearch.com/research-report/intelligent-virtual-assistant-industry/. Accessed: 2016-08.

[5]

Growing Focus on Strengthening Customer Relations Spurs Adoption of Intelligent Virtual Assistant Technology. http://www.transparencymarketresearch.com/pressrelease/intelligent-virtual-assistant-industry.html/. Accessed: 2016-08.

[6]

Google Brain. https://backchannel.com/google-search-will-be-your-next-brain-5207c26e4523#.x9n2ajota. Accessed: 2017-01.

[7]

Microsoft Deep Learning Outperforms Humans in Image Recognition. http://www.forbes.com/sites/michaelthomsen/2015/02/19/microsofts-deep-learning-project-outperforms-humans-in-image-recognition/. Accessed: 2016-08.

[8]

Baidu Supercomputer. https://gigaom.com/2015/01/14/baidu-has-built-a-supercomputer-for-deep-learning/. Accessed: 2016-08.

[9]

Johann Hauswald, Yiping Kang, Michael A. Laurenzano, Quan Chen, Cheng Li, Trevor Mudge, Ronald G. Dreslinski, Jason Mars, and Lingjia Tang. Djinn and tonic: Dnn as a service and its implications for future warehouse scale computers. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), ISCA '15, New York, NY, USA, 2015. ACM.

Digital Library

[10]

The 'Google Brain' is a real thing but very few people have seen it. http://www.businessinsider.com/what-is-google-brain-2016--9. Accessed: 2017-01.

[11]

Google supercharges machine learning tasks with TPU custom chip. https://cloudplatform.googleblog.com/2016/05/Google-supercharges-machine-learning-tasks-with-custom-chip.html. Accessed: 2017-01.

[12]

Apple's Massive New Data Center Set To Host Nuance Tech. http://techcrunch.com/2011/05/09/apple-nuance-data-center-deal/. Accessed: 2016-08.

[13]

Apple moves to third-generation Siri back-end, built on open-source Mesos platform. http://9to5mac.com/2015/04/27/siri-backend-mesos/. Accessed: 2016-08.

[14]

Matthew Halpern, Yuhao Zhu, and Vijay Janapa Reddi. Mobile cpu's rise to power: Quantifying the impact of generational mobile cpu design trends on performance, energy, and user satisfaction. In High Performance Computer Architecture (HPCA), 2016 IEEE International Symposium on, pages 64--76. IEEE, 2016.

[15]

Whitepaper: NVIDIA Tegra X1. Technical report. Accessed: 2017-01.

[16]

NVIDIA Jetson TK1 Development Kit: Bringing GPU-accelerated computing to Embedded Systems. Technical report. Accessed: 2017-01.

[17]

Nvidia's Tegra K1 at the Heart of Google's Nexus 9. http://www.pcmag.com/article2/0,2817,2470740,00.asp. Accessed: 2016-08.

[18]

Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093, 2014.

[19]

Qian Wang, Xianyi Zhang, Yunquan Zhang, and Qing Yi. Augem: automatically generate high performance dense linear algebra kernels on x86 cpus. In Proceedings of the International Conference on High Performance Computing, Networking, Storage and Analysis, page 25. ACM, 2013.

Digital Library

[20]

Sharan Chetlur, Cliff Woolley, Philippe Vandermersch, Jonathan Cohen, John Tran, Bryan Catanzaro, and Evan Shelhamer. cuDNN: Efficient Primitives for Deep Learning. CoRR, abs/1410.0759, 2014.

[21]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. In Advances in neural information processing systems, 2012.

Digital Library

[22]

Adam Coates, Brody Huval, Tao Wang, David Wu, Bryan Catanzaro, and Andrew Ng. Deep learning with cots hpc systems. In Proceedings of the 30th international conference on machine learning, pages 1337--1345, 2013.

Digital Library

[23]

TestMyNet: Internet Speed Test. http://testmy.net/. Accessed: 2015-02.

[24]

Watts Up? Power Meter. https://www.wattsupmeters.com/. Accessed: 2015-05.

[25]

Junxian Huang, Feng Qian, Alexandre Gerber, Z Morley Mao, Subhabrata Sen, and Oliver Spatscheck. A close examination of performance and power characteristics of 4g lte networks. In Proceedings of the 10th international conference on Mobile systems, applications, and services, pages 225--238. ACM, 2012.

Digital Library

[26]

Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.

[27]

Yaniv Taigman, Ming Yang, Marc'Aurelio Ranzato, and Lior Wolf. Deepface: Closing the gap to human-level performance in face verification. In Computer Vision and Pattern Recognition (CVPR), 2014.

Digital Library

[28]

Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.

[29]

Daniel Povey, Arnab Ghoshal, Gilles Boulianne, Lukas Burget, Ondrej Glembek, Nagendra Goel, Mirko Hannemann, Petr Motlicek, Yanmin Qian, Petr Schwarz, et al. The kaldi speech recognition toolkit. In Proc. ASRU, 2011.

[30]

Ronan Collobert, Jason Weston, Léon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. Natural language processing (almost) from scratch. The Journal of Machine Learning Research, 2011.

[31]

Ashkan Nikravesh, David R Choffnes, Ethan Katz-Bassett, Z Morley Mao, and Matt Welsh. Mobile network performance from user devices: A longitudinal, multidimensional analysis. In International Conference on Passive and Active Network Measurement, pages 12--22. Springer, 2014.

Digital Library

[32]

David Lo, Liqun Cheng, Rama Govindaraju, Luiz André Barroso, and Christos Kozyrakis. Towards energy proportionality for large-scale latency-critical workloads. In ACM SIGARCH Computer Architecture News, volume 42, pages 301--312. IEEE Press, 2014.

[33]

Mark Slee, Aditya Agarwal, and Marc Kwiatkowski. Thrift: Scalable cross-language services implementation. Facebook White Paper, 5(8), 2007.

[34]

Eduardo Cuervo, Aruna Balasubramanian, Dae-ki Cho, Alec Wolman, Stefan Saroiu, Ranveer Chandra, and Paramvir Bahl. Maui: making smartphones last longer with code offload. In Proceedings of the 8th international conference on Mobile systems, applications, and services, pages 49--62. ACM, 2010.

Digital Library

[35]

Mark S Gordon, Davoud Anoushe Jamshidi, Scott A Mahlke, Zhuoqing Morley Mao, and Xu Chen. Comet: Code offload by migrating execution transparently.

[36]

Moo-Ryong Ra, Anmol Sheth, Lily Mummert, Padmanabhan Pillai, David Wetherall, and Ramesh Govindan. Odessa: enabling interactive perception applications on mobile devices. In Proceedings of the 9th international conference on Mobile systems, applications, and services, pages 43--56. ACM, 2011.

Digital Library

[37]

Byung-Gon Chun, Sunghwan Ihm, Petros Maniatis, Mayur Naik, and Ashwin Patti. Clonecloud: elastic execution between mobile device and cloud. In Proceedings of the sixth conference on Computer systems, pages 301--314. ACM, 2011.

Digital Library

[38]

David Meisner, Junjie Wu, and Thomas F. Wenisch. BigHouse: A Simulation Infrastructure for Data Center Systems. ISPASS '12: International Symposium on Performance Analysis of Systems and Software, April 2012.

Digital Library

[39]

Chang-Hong Hsu, Yunqi Zhang, Michael A. Laurenzano, David Meisner, Thomas Wenisch, Lingjia Tang, Jason Mars, and Ronald G. Dreslinski. Adrenaline: Pinpointing and reigning in tail queries with quick voltage boosting. In International Symposium on High Performance Computer Architecture (HPCA), 2015.

[40]

Michael A. Laurenzano, Yunqi Zhang, Lingjia Tang, and Jason Mars. Protean code: Achieving near-free online code transformations for warehouse scale computers. In International Symposium on Microarchitecture (MICRO), 2014.

Digital Library

[41]

Jason Mars, Lingjia Tang, Robert Hundt, Kevin Skadron, and Mary Lou Soffa. Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations. In International Symposium on Microarchitecture (MICRO), 2011.

Digital Library

[42]

Vinicius Petrucci, Michael A. Laurenzano, Yunqi Zhang, John Doherty, Daniel Mosse, Jason Mars, and Lingjia Tang. Octopus-man: Qos-driven task management for heterogeneous multicore in warehouse scale computers. In International Symposium on High Performance Computer Architecture (HPCA), 2015.

[43]

Jason Mars and Lingjia Tang. Whare-map: Heterogeneity in "homogeneous" warehouse-scale computers. In International Symposium on Computer Architecture (ISCA), 2013.

Digital Library

[44]

Johann Hauswald, Tom Manville, Qi Zheng, Ronald G. Dreslinski, Chaitali Chakrabarti, and Trevor Mudge. A hybrid approach to offloading mobile image classification. In International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2014.

[45]

Johann Hauswald, Michael A. Laurenzano, Yunqi Zhang, Cheng Li, Austin Rovinski, Arjun Khurana, Ronald G. Dreslinski, Trevor Mudge, Vinicius Petrucci, Lingjia Tang, and Jason Mars. Sirius: An open end-to-end voice and vision personal assistant and its implications for future warehouse scale computers. In International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS), 2015.

Digital Library

[46]

Hailong Yang, Alex Breslow, Jason Mars, and Lingjia Tang. Bubble-flux: Precise online qos management for increased utilization in warehouse scale computers. In International Symposium on Computer Architecture (ISCA), 2013.

Digital Library

[47]

Matt Skach, Manish Arora, Chang-Hong Hsu, Qi Li, Dean Tullsen, Lingjia Tang, and Jason Mars. Thermal time shifting: Leveraging phase change materials to reduce cooling costs in warehouse-scale computers. In Proceedings of the 42nd Annual International Symposium on Computer Architecture (ISCA), ISCA '15, 2015.

Digital Library

[48]

Yunqi Zhang, Michael A. Laurenzano, Jason Mars, and Lingjia Tang. Smite: Precise qos prediction on real system smt processors to improve utilization in warehouse scale computers. In International Symposium on Microarchitecture (MICRO), 2014.

Digital Library

[49]

Quan Chen, Hailong Yang, Jason Mars, and Lingjia Tang. Baymax: Qos awareness and increased utilization for non-preemptive accelerators in warehouse scale computers. In ACM SIGPLAN Notices, volume 51, pages 681--696. ACM, 2016.

[50]

Yunqi Zhang, David Meisner, Jason Mars, and Lingjia Tang. Treadmill: Attributing the source of tail latency through precise load testing and statistical inference. In Computer Architecture (ISCA), 2016 ACM/IEEE 43rd Annual International Symposium on, pages 456--468. IEEE, 2016.

Digital Library

[51]

Animesh Jain, Michael A Laurenzano, Lingjia Tang, and Jason Mars. Continuous shape shifting: Enabling loop co-optimization via near-free dynamic code rewriting. In Microarchitecture (MICRO), 2016 49th Annual IEEE/ACM International Symposium on, pages 1--12. IEEE, 2016.

[52]

Michael A. Laurenzano, Yunqi Zhang, Jiang Chen, Lingjia Tang, and Jason Mars. Powerchop: Identifying and managing non-critical units in hybrid processor architectures. In Proceedings of the 43rd International Symposium on Computer Architecture, ISCA '16, pages 140--152, Piscataway, NJ, USA, 2016. IEEE Press.

Digital Library

[53]

Tianshi Chen, Zidong Du, Ninghui Sun, Jia Wang, Chengyong Wu, Yunji Chen, and Olivier Temam. Diannao: A small-footprint high-throughput accelerator for ubiquitous machine-learning. In Proceedings of the 19th international conference on Architectural support for programming languages and operating systems, pages 269--284. ACM, 2014.

Digital Library

[54]

Daofu Liu, Tianshi Chen, Shaoli Liu, Jinhong Zhou, Shengyuan Zhou, Olivier Teman, Xiaobing Feng, Xuehai Zhou, and Yunji Chen. Pudiannao: A polyvalent machine learning accelerator. In Proceedings of the Twentieth International Conference on Architectural Support for Programming Languages and Operating Systems, pages 369--381. ACM, 2015.

Digital Library

[55]

Kalin Ovtcharov, Olatunji Ruwase, Joo-Young Kim, Jeremy Fowers, Karin Strauss, and Eric S Chung. Accelerating deep convolutional neural networks using specialized hardware. Microsoft Research Whitepaper, 2(11), 2015.

[56]

Xin Lei, Andrew Senior, Alexander Gruenstein, and Jeffrey Sorensen. Accurate and Compact Large vocabulary speech recognition on mobile devices. In INTERSPEECH, pages 662--665, 2013.

[57]

Xin Lei, Andrew Senior, Alexander Gruenstein, and Jeffrey Sorensen. Accurate and compact large vocabulary speech recognition on mobile devices. In INTERSPEECH, pages 662--665, 2013.

[58]

Seungyeop Han, Haichen Shen, Matthai Philipose, Sharad Agarwal, Alec Wolman, and Arvind Krishnamurthy. Mcdnn: An execution framework for deep neural networks on resource-constrained devices. In MobiSys, 2016.

Cited By

Shang WGe JDing LJiang ZSui H(2025)Acceleration offloading for differential privacy protection based on federated learning in edge intelligent controllersFuture Generation Computer Systems10.1016/j.future.2024.107526163(107526)Online publication date: Feb-2025
https://doi.org/10.1016/j.future.2024.107526
Xia JWu MLi P(2025)SFML: A personalized, efficient, and privacy-preserving collaborative traffic classification architecture based on split learning and mutual learningFuture Generation Computer Systems10.1016/j.future.2024.107487162(107487)Online publication date: Jan-2025
https://doi.org/10.1016/j.future.2024.107487
Fu QDeng FXue XZeng JWei B(2024)DNN Adaptive Partitioning Strategy for Heterogeneous Online Inspection Systems of SubstationsElectronics10.3390/electronics1317338313:17(3383)Online publication date: 26-Aug-2024
https://doi.org/10.3390/electronics13173383
Show More Cited By

Index Terms

Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing theory, concepts and paradigms
      1. Mobile computing
2. Software and its engineering
  1. Software organization and properties
    1. Software system structures
      1. Distributed systems organizing principles
        Cloud computing

Recommendations

Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge
ASPLOS '17

The computation for today's intelligent personal assistants such as Apple Siri, Google Now, and Microsoft Cortana, is performed in the cloud. This cloud-only approach requires significant amounts of data to be sent to the cloud over the wireless network ...
Neurosurgeon: Collaborative Intelligence Between the Cloud and Mobile Edge
ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems

The computation for today's intelligent personal assistants such as Apple Siri, Google Now, and Microsoft Cortana, is performed in the cloud. This cloud-only approach requires significant amounts of data to be sent to the cloud over the wireless network ...
A CCRA-Based Architecture for Enterprise Mobile Cloud Computing
MS '13: Proceedings of the 2013 IEEE Second International Conference on Mobile Services

With the popularity of smart phones and tablets, as well as the maturity of cloud computing techniques, mobile cloud computing becomes a promising area where people's business and daily life can be facilitated by adopting mobile cloud applications. ...

Comments

Information & Contributors

Information

Published In

cover image ACM SIGARCH Computer Architecture News

ACM SIGARCH Computer Architecture News Volume 45, Issue 1

Asplos'17

March 2017

812 pages

ISSN:0163-5964

DOI:10.1145/3093337

Editor:
Babak Falsafi
Interim

Issue’s Table of Contents

ASPLOS '17: Proceedings of the Twenty-Second International Conference on Architectural Support for Programming Languages and Operating Systems
April 2017
856 pages
ISBN:9781450344654
DOI:10.1145/3037697
General Chairs:
Yunji Chen
Institute of Computing Technology, CAS, China
,
Olivier Temam
Google, USA
,
Program Chair:
John Carter
IBM, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 04 April 2017

Published in SIGARCH Volume 45, Issue 1

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

969
Total Citations
View Citations
14,346
Total Downloads

Downloads (Last 12 months)3,195
Downloads (Last 6 weeks)346

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Shang WGe JDing LJiang ZSui H(2025)Acceleration offloading for differential privacy protection based on federated learning in edge intelligent controllersFuture Generation Computer Systems10.1016/j.future.2024.107526163(107526)Online publication date: Feb-2025
https://doi.org/10.1016/j.future.2024.107526
Xia JWu MLi P(2025)SFML: A personalized, efficient, and privacy-preserving collaborative traffic classification architecture based on split learning and mutual learningFuture Generation Computer Systems10.1016/j.future.2024.107487162(107487)Online publication date: Jan-2025
https://doi.org/10.1016/j.future.2024.107487
Fu QDeng FXue XZeng JWei B(2024)DNN Adaptive Partitioning Strategy for Heterogeneous Online Inspection Systems of SubstationsElectronics10.3390/electronics1317338313:17(3383)Online publication date: 26-Aug-2024
https://doi.org/10.3390/electronics13173383
Huang JLiu FZhang J(2024)Multi-Dimensional QoS Evaluation and Optimization of Mobile Edge Computing for IoT: A SurveyChinese Journal of Electronics10.23919/cje.2023.00.26433:4(859-874)Online publication date: Jul-2024
https://doi.org/10.23919/cje.2023.00.264
GAO YLIU SGUO BXU XBIAN HHAO JXU WYU Z(2024)Lightweight sensing-computing-decision collaboration enhancement for multi-mobile terminalsSCIENTIA SINICA Informationis10.1360/SSI-2024-008954:9(2136)Online publication date: 9-Sep-2024
https://doi.org/10.1360/SSI-2024-0089
Zhang CChen JLi WSun HGeng YZhang TJi MFu T(2024)A cloud-edge collaborative task scheduling method based on model segmentationJournal of Cloud Computing10.1186/s13677-024-00635-713:1Online publication date: 5-Apr-2024
https://doi.org/10.1186/s13677-024-00635-7
Lin JLi MZhang SLeon-Garcia A(2024)Murmuration: On-the-fly DNN Adaptation for SLO-Aware Distributed Inference in Dynamic Edge EnvironmentsProceedings of the 53rd International Conference on Parallel Processing10.1145/3673038.3673154(792-801)Online publication date: 12-Aug-2024
https://dl.acm.org/doi/10.1145/3673038.3673154
Chen YLuo TFang WXiong N(2024)EdgeCI: Distributed Workload Assignment and Model Partitioning for CNN Inference on Edge ClustersACM Transactions on Internet Technology10.1145/365604124:2(1-24)Online publication date: 6-May-2024
https://dl.acm.org/doi/10.1145/3656041
Bin KPark JPark CKim SLee KOkoshi TKo JLiKamWa R(2024)CoActo: CoActive Neural Network Inference Offloading with Fine-grained and Concurrent ExecutionProceedings of the 22nd Annual International Conference on Mobile Systems, Applications and Services10.1145/3643832.3661885(412-424)Online publication date: 3-Jun-2024
https://dl.acm.org/doi/10.1145/3643832.3661885
Li ZPaolieri MGolubchik LDing AAral AHilt V(2024)A Benchmark for ML Inference Latency on Mobile DevicesProceedings of the 7th International Workshop on Edge Systems, Analytics and Networking10.1145/3642968.3654818(31-36)Online publication date: 22-Apr-2024
https://dl.acm.org/doi/10.1145/3642968.3654818
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents