research-article

Accelerating Mobile Audio Sensing Algorithms through On-Chip GPU Offloading

Authors:

Petko Georgiev,

Nicholas D. Lane,

Cecilia Mascolo,

David ChuAuthors Info & Claims

MobiSys '17: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services

Pages 306 - 318

https://doi.org/10.1145/3081333.3081358

Published: 16 June 2017 Publication History

Abstract

GPUs have recently enjoyed increased popularity as general purpose software accelerators in multiple application domains including computer vision and natural language processing. However, there has been little exploration into the performance and energy trade-offs mobile GPUs can deliver for the increasingly popular workload of deep-inference audio sensing tasks, such as, spoken keyword spotting in energy-constrained smartphones and wearables. In this paper, we study these trade-offs and introduce an optimization engine that leverages a series of structural and memory access optimization techniques that allow audio algorithm performance to be automatically tuned as a function of GPU device specifications and model semantics. We find that parameter optimized audio routines obtain inferences an order of magnitude faster than sequential CPU implementations, and up to 6.5x times faster than cloud offloading with good connectivity, while critically consuming 3-4x less energy than the CPU. Under our optimized GPU, conventional wisdom about how to use the cloud and low power chips is broken. Unless the network has a throughput of at least 20Mbps (and a RTT of 25 ms or less), with only about 10 to 20 seconds of buffering audio data for batched execution, the optimized GPU audio sensing apps begin to consume less energy than cloud offloading. Under such conditions we find the optimized GPU can provide energy benefits comparable to low-power reference DSP implementations with some preliminary level of optimization; in addition to the GPU always winning with lower latency.

References

[1]

Amazon Echo. http://www.amazon.com/Amazon-Echo-Bluetooth-Speaker-with-WiFi-Alexa/dp/B00X4WHP5E.

[2]

Apple Siri. https://www.apple.com/uk/ios/siri/.

[3]

GameBench. https://www.gamebench.net/.

[4]

Google Home. https://home.google.com/.

[5]

Google Now. http://www.google.co.uk/landing/now/.

[6]

HTK Speech Recognition Toolkit. http://htk.eng.cam.ac.uk/.

[7]

Intel Xeon Phi. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html.

[8]

Monsoon Power Monitor. http://www.msoon.com/LabEquipment/PowerMonitor/.

[9]

NVIDIA CUDA. http://www.nvidia.com/object/cuda_home_new.html.

[10]

NVidia Tegra X1. http://www.nvidia.com/object/tegra-x1-processor.html.

[11]

OpenCL. https://www.khronos.org/opencl/.

[12]

Qualcomm Adreno GPU. https://developer.qualcomm.com/software/adreno-gpu-sdk/gpu.

[13]

Qualcomm Hexagon DSP. https://developer.qualcomm.com/mobile-development/maximize-hardware/multimedia-optimization-hexagon-sdk/hexagon-dsp-processor.

[14]

Qualcomm Hexagon SDK. https://developer.qualcomm.com/mobile-development/maximize-hardware/multimedia-optimization-hexagon-sdk.

[15]

Qualcomm Snapdragon 800 MDP. http://goo.gl/ySfCFl.

[16]

TensorFlow. https://www.tensorflow.org/.

[17]

Theano. http://deeplearning.net/software/theano/.

[18]

C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.

Digital Library

[19]

S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54(5):67--77, May 2011.

Digital Library

[20]

S. Campanoni, K. Brownell, S. Kanev, T. M. Jones, G.-Y. Wei, and D. Brooks. Helix-rc: An architecture-compiler co-design for automatic parallelization of irregular programs. In Proceeding of the 41st Annual International Symposium on Computer Architecuture, ISCA '14, pages 217--228, Piscataway, NJ, USA, 2014. IEEE Press.

Digital Library

[21]

S. Campanoni, T. M. Jones, G. H. Holloway, G.-Y. Wei, and D. M. Brooks. Helix: Making the extraction of thread-level parallelism mainstream. IEEE Micro, 32(4):8--18, 2012.

Digital Library

[22]

G. Chen, C. Parada, and G. Heigold. Small-footprint keyword spotting using deep neural networks. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP'14, 2014.

[23]

K. T. Cheng and Y. C. Wang. Using mobile gpu for general-purpose computing -- a case study of face recognition on smartphones. In VLSI Design, Automation and Test (VLSI-DAT), 2011 International Symposium on, pages 1--4, April 2011.

[24]

D. Chu, N. D. Lane, T. T.-T. Lai, C. Pang, X. Meng, Q. Guo, F. Li, and F. Zhao. Balancing energy, latency and accuracy for mobile sensor data classification. In Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems, SenSys '11, pages 54--67, New York, NY, USA, 2011. ACM.

Digital Library

[25]

A. de Cheveigné and H. Kawahara. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4):1917--1930, 2002.

[26]

Z. Fang, Z. Guoliang, and S. Zhanjiang. Comparison of different implementations of mfcc. J. Comput. Sci. Technol., 16(6):582--589, Nov. 2001.

Digital Library

[27]

P. Georgiev, N. D. Lane, K. K. Rachuri, and C. Mascolo. DSP.Ear: leveraging co-processor support for continuous audio sensing on smartphones. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems, SenSys '14, New York, NY, USA, 2014. ACM.

Digital Library

[28]

P. Georgiev, N. D. Lane, K. K. Rachuri, and C. Mascolo. Leo: Scheduling sensor inference algorithms across heterogeneous mobile processors and network resources. In Proceedings of the 22Nd Annual International Conference on Mobile Computing and Networking, MobiCom '16, pages 320--333, New York, NY, USA, 2016. ACM.

Digital Library

[29]

K. Gupta and J. D. Owens. Compute & memory optimizations for high-quality speech recognition on low-end gpu processors. In Proceedings of the 2011 18th International Conference on High Performance Computing, HIPC '11, pages 1--10, Washington, DC, USA, 2011. IEEE Computer Society.

Digital Library

[30]

K. Han, D. Yu, and I. Tashev. Speech emotion recognition using deep neural network and extreme learning machine. In Fifteenth Annual Conference of the International Speech Communication Association, 2014.

[31]

S. Han, K. Jang, K. Park, and S. Moon. Packetshader: A gpu-accelerated software router. In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM '10, pages 195--206, New York, NY, USA, 2010. ACM.

Digital Library

[32]

H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am., 57(4):1738--52, Apr. 1990.

[33]

G. Hinton, L. Deng, D. Yu, G. Dahl, A. rahman Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine, 2012.

[34]

G. Hinton, L. Deng, D. Yu, A. rahman Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. S. G. Dahl, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6):82--97, November 2012.

[35]

A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: Portable stream programming on graphics engines. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 381--392, New York, NY, USA, 2011. ACM.

Digital Library

[36]

A. Huqqani, E. Schikuta, S. Yea, and P. Chena. Multicore and gpu parallelization of neural networks for face recognition. In International Conference on Computational Science, ICCS, Procedia Computer Science, pages 349--358, London, UK, June 2013. Elsevier.

[37]

K. Jang, S. Han, S. Han, S. Moon, and K. Park. Sslshader: Cheap ssl acceleration with commodity processors. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, pages 1--14, Berkeley, CA, USA, 2011. USENIX Association.

Digital Library

[38]

A. Jog, O. Kayiran, N. Chidambaram Nachiappan, A. K. Mishra, M. T. Kandemir, O. Mutlu, R. Iyer, and C. R. Das. Owl: Cooperative thread array aware scheduling techniques for improving gpgpu performance. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 395--406, New York, NY, USA, 2013. ACM.

Digital Library

[39]

D. B. Kirk and W.-m. W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2010.

Digital Library

[40]

N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qendro, and F. Kawsar. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pages 1--12, April 2016.

Digital Library

[41]

N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar. An early resource characterization of deep learning on wearables, smartphones and internet-of-things devices. In Proceedings of the 2015 International Workshop on Internet of Things Towards Applications, IoT-App '15, pages 7--12, New York, NY, USA, 2015. ACM.

Digital Library

[42]

N. D. Lane, P. Georgiev, and L. Qendro. Deepear: Robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp '15, pages 283--294, New York, NY, USA, 2015. ACM.

Digital Library

[43]

Y. Lee, C. Min, C. Hwang, J. L. 0001, I. Hwang, Y. Ju, C. Yoo, M. Moon, U. Lee, and J. Song. Sociophone: everyday face-to-face interaction monitoring platform using multi-phone sensor fusion. In H.-H. Chu, P. Huang, R. R. Choudhury, and F. Zhao, editors, MobiSys, pages 499--500. ACM, 2013.

Digital Library

[44]

M. Liberman, K. Davis, M. Grossman, N. Martey, and J. Bell. Emotional prosody speech and transcripts. 2002.

[45]

H. Lu, A. J. B. Brush, B. Priyantha, A. K. Karlson, and J. Liu. Speakersense: Energy efficient unobtrusive speaker identification on mobile phones. In Proceedings of the 9th International Conference on Pervasive Computing, Pervasive'11, pages 188--205, Berlin, Heidelberg, 2011. Springer-Verlag.

Digital Library

[46]

H. Lu, D. Frauendorfer, M. Rabbi, M. S. Mast, G. T. Chittaranjan, A. T. Campbell, D. Gatica-Perez, and T. Choudhury. Stresssense: Detecting stress in unconstrained acoustic environments using smartphones. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp '12, pages 351--360, New York, NY, USA, 2012. ACM.

Digital Library

[47]

H. Lu, J. Yang, Z. Liu, N. D. Lane, T. Choudhury, and A. T. Campbell. The jigsaw continuous sensing engine for mobile phone applications. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, SenSys '10, pages 71--84, New York, NY, USA, 2010. ACM.

Digital Library

[48]

C. Luo and M. C. Chan. Socialweaver: Collaborative inference of human conversation networks using smartphones. In Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems, SenSys '13, pages 20:1--20:14, New York, NY, USA, 2013. ACM.

Digital Library

[49]

I. McLoughlin, H. Zhang, Z. Xie, Y. Song, and W. Xiao. Robust sound event classification using deep neural networks. Trans. Audio, Speech and Lang. Proc., 23(3):540--552, Mar. 2015.

Digital Library

[50]

I. K. Park, N. Singhal, M. H. Lee, S. Cho, and C. Kim. Design and performance evaluation of image processing algorithms on gpus. IEEE Trans. Parallel Distrib. Syst., 22(1):91--104, Jan. 2011.

Digital Library

[51]

B. Priyantha, D. Lymberopoulos, and J. Liu. Littlerock: Enabling energy-efficient continuous sensing on mobile phones. IEEE Pervasive Computing, 10(2):12--15, 2011.

Digital Library

[52]

K. K. Rachuri, M. Musolesi, C. Mascolo, P. J. Rentfrow, C. Longworth, and A. Aucinas. Emotionsense: A mobile phones based adaptive platform for experimental social psychology research. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing, Ubicomp '10, pages 281--290, New York, NY, USA, 2010. ACM.

Digital Library

[53]

C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. Ptask: Operating system abstractions to manage gpus as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 233--248, New York, NY, USA, 2011. ACM.

Digital Library

[54]

C. Shen, S. Chakraborty, K. R. Raghavan, H. Choi, and M. B. Srivastava. Exploiting processor heterogeneity for energy efficient context inference on mobile phones. In Proceedings of the Workshop on Power-Aware Computing and Systems, HotPower '13, pages 9:1--9:5, New York, NY, USA, 2013. ACM.

Digital Library

[55]

N. Singhal, I. K. Park, and S. Cho. Implementation and optimization of image processing algorithms on handheld gpu. In Image Processing (ICIP), 2010 17th IEEE International Conference on, pages 4481--4484, Sept 2010.

[56]

S. Verma, A. Robinson, and P. Dutta. Audiodaq: Turning the mobile phone's ubiquitous headset port into a universal data acquisition interface. In Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems, SenSys '12, pages 197--210, New York, NY, USA, 2012. ACM.

Digital Library

[57]

G. Wang, Y. Xiong, J. Yun, and J. R. Cavallaro. Accelerating computer vision algorithms using opencl framework on the mobile gpu - a case study. In ICASSP, pages 2629--2633. IEEE, 2013.

[58]

C. Xu, S. Li, G. Liu, Y. Zhang, E. Miluzzo, Y.-F. Chen, J. Li, and B. Firner. Crowd

[59]

: Unsupervised speaker count with smartphones. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp '13, pages 43--52, New York, NY, USA, 2013. ACM.

Digital Library

[60]

E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 369--380, New York, NY, USA, 2011. ACM.

Digital Library

[61]

Y. Zhang, K. Adl, and J. Glass. Fast spoken query detection using lower-bound dynamic time warping on graphical processing units. In In Proc. ICASSP, pages 5173--5176, 2012.

[62]

G. Zhou, J. H. L. Hansen, and J. F. Kaiser. Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3):201--216, 2001.

Cited By

Cao CDong WZhang WGao Y(2023)WiEdge: Edge Computing for Audio Sensing Applications With Accurate Wireless Link PredictionIEEE Internet of Things Journal10.1109/JIOT.2022.317366810:5(3982-3994)Online publication date: 1-Mar-2023
https://doi.org/10.1109/JIOT.2022.3173668
Xu DXu MWang QWang SMa YHuang KHuang GJin XLiu X(2022)MandhelingProceedings of the 28th Annual International Conference on Mobile Computing And Networking10.1145/3495243.3560545(214-227)Online publication date: 14-Oct-2022
https://dl.acm.org/doi/10.1145/3495243.3560545
Chen YMascolo C(2022)Women in Networks: Professor Cecilia MascoloIEEE Network10.1109/MNET.2022.991977836:4(4-5)Online publication date: Jul-2022
https://doi.org/10.1109/MNET.2022.9919778
Show More Cited By

Index Terms

Accelerating Mobile Audio Sensing Algorithms through On-Chip GPU Offloading
1. Computer systems organization
  1. Embedded and cyber-physical systems
    1. Embedded systems
      1. Embedded software
2. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing theory, concepts and paradigms
      1. Ubiquitous computing

Recommendations

Accelerating PQMRCGSTAB algorithm on GPU
UCHPC-MAW '09: Proceedings of the combined workshops on UnConventional high performance computing workshop plus memory access workshop

The general computations on GPU are becoming more and more popular because of GPU's powerful computing ability. In this paper, how to use GPU to accelerate sparse linear system solver, preconditioned QMRCGSTAB (PQMRCGSTAB for short), is our concern. We ...
Accelerating financial applications on the GPU
GPGPU-6: Proceedings of the 6th Workshop on General Purpose Processor Using Graphics Processing Units

The QuantLib library is a popular library used for many areas of computational finance. In this work, the parallel processing power of the GPU is used to accelerate QuantLib financial applications. Black-Scholes, Monte-Carlo, Bonds, and Repo code paths ...
Accelerating genetic algorithms with GPU computing: A selective overview
Highlights
- Comprehensive survey on accelerating GAs with GPU computing.
- Major difference ...
Abstract
The emergence of GPU-CPU heterogeneous architectures has led to a fundamental paradigm shift in parallel programming. Accelerating Genetic Algorithms (GAs) on these architectures has received significant attention from both ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

MobiSys '17: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services

June 2017

520 pages

ISBN:9781450349284

DOI:10.1145/3081333

General Chairs:
Tanzeem Choudhury
Cornell University, USA
,
Steve Ko
University at Buffalo, USA
,
Program Chairs:
Andrew Campbell
Dartmouth College, USA
,
Deepak Ganesan
University of Massachusetts, USA

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

SIGMOBILE: ACM Special Interest Group on Mobility of Systems, Users, Data and Computing

In-Cooperation

SIGOPS: ACM Special Interest Group on Operating Systems

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2017

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Funding Sources

Microsoft Research

Conference

MobiSys'17

Sponsor:

SIGMOBILE

MobiSys'17: The 15th Annual International Conference on Mobile Systems, Applications, and Services

June 19 - 23, 2017

New York, Niagara Falls, USA

Acceptance Rates

MobiSys '17 Paper Acceptance Rate 34 of 188 submissions, 18%;

Overall Acceptance Rate 274 of 1,679 submissions, 16%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

18
Total Citations
View Citations
431
Total Downloads

Downloads (Last 12 months)26
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Cao CDong WZhang WGao Y(2023)WiEdge: Edge Computing for Audio Sensing Applications With Accurate Wireless Link PredictionIEEE Internet of Things Journal10.1109/JIOT.2022.317366810:5(3982-3994)Online publication date: 1-Mar-2023
https://doi.org/10.1109/JIOT.2022.3173668
Xu DXu MWang QWang SMa YHuang KHuang GJin XLiu X(2022)MandhelingProceedings of the 28th Annual International Conference on Mobile Computing And Networking10.1145/3495243.3560545(214-227)Online publication date: 14-Oct-2022
https://dl.acm.org/doi/10.1145/3495243.3560545
Chen YMascolo C(2022)Women in Networks: Professor Cecilia MascoloIEEE Network10.1109/MNET.2022.991977836:4(4-5)Online publication date: Jul-2022
https://doi.org/10.1109/MNET.2022.9919778
Distler T(2021)Byzantine Fault-tolerant State-machine Replication from a Systems PerspectiveACM Computing Surveys10.1145/343672854:1(1-38)Online publication date: 11-Feb-2021
https://dl.acm.org/doi/10.1145/3436728
Mirsky YLee W(2021)The Creation and Detection of DeepfakesACM Computing Surveys10.1145/342578054:1(1-41)Online publication date: 2-Jan-2021
https://dl.acm.org/doi/10.1145/3425780
Gu RNiu CWu FChen GHu CLyu CWu Z(2021)From Server-Based to Client-Based Machine LearningACM Computing Surveys10.1145/342466054:1(1-36)Online publication date: 2-Jan-2021
https://dl.acm.org/doi/10.1145/3424660
Jiang HLi JZhao PZeng FXiao ZIyengar A(2021)Location Privacy-preserving Mechanisms in Location-based ServicesACM Computing Surveys10.1145/342316554:1(1-36)Online publication date: 2-Jan-2021
https://dl.acm.org/doi/10.1145/3423165
Xu DLi TLi YSu XTarkoma SJiang TCrowcroft JHui P(2021)Edge Intelligence: Empowering Intelligence to the Edge of NetworkProceedings of the IEEE10.1109/JPROC.2021.3119950109:11(1778-1837)Online publication date: Nov-2021
https://doi.org/10.1109/JPROC.2021.3119950
Li LHan JZheng WHuang RLai Y(2020)Improved Environment-Aware–Based Noise Reduction System for Cochlear Implant Users Based on Knowledge Transfer Approach: A Development and Usability Study (Preprint)Journal of Medical Internet Research10.2196/25460Online publication date: 9-Nov-2020
https://doi.org/10.2196/25460
Gao DHe XZhou ZTong YXu KThiele LGupta RLiu YShah MRajan STang JPrakash B(2020)Rethinking Pruning for Accelerating Deep Inference At the EdgeProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403058(155-164)Online publication date: 23-Aug-2020
https://dl.acm.org/doi/10.1145/3394486.3403058
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten