Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3081333.3081358acmconferencesArticle/Chapter ViewAbstractPublication PagesmobisysConference Proceedingsconference-collections
research-article

Accelerating Mobile Audio Sensing Algorithms through On-Chip GPU Offloading

Published: 16 June 2017 Publication History

Abstract

GPUs have recently enjoyed increased popularity as general purpose software accelerators in multiple application domains including computer vision and natural language processing. However, there has been little exploration into the performance and energy trade-offs mobile GPUs can deliver for the increasingly popular workload of deep-inference audio sensing tasks, such as, spoken keyword spotting in energy-constrained smartphones and wearables. In this paper, we study these trade-offs and introduce an optimization engine that leverages a series of structural and memory access optimization techniques that allow audio algorithm performance to be automatically tuned as a function of GPU device specifications and model semantics. We find that parameter optimized audio routines obtain inferences an order of magnitude faster than sequential CPU implementations, and up to 6.5x times faster than cloud offloading with good connectivity, while critically consuming 3-4x less energy than the CPU. Under our optimized GPU, conventional wisdom about how to use the cloud and low power chips is broken. Unless the network has a throughput of at least 20Mbps (and a RTT of 25 ms or less), with only about 10 to 20 seconds of buffering audio data for batched execution, the optimized GPU audio sensing apps begin to consume less energy than cloud offloading. Under such conditions we find the optimized GPU can provide energy benefits comparable to low-power reference DSP implementations with some preliminary level of optimization; in addition to the GPU always winning with lower latency.

References

[1]
Amazon Echo. http://www.amazon.com/Amazon-Echo-Bluetooth-Speaker-with-WiFi-Alexa/dp/B00X4WHP5E.
[2]
Apple Siri. https://www.apple.com/uk/ios/siri/.
[3]
GameBench. https://www.gamebench.net/.
[4]
Google Home. https://home.google.com/.
[5]
Google Now. http://www.google.co.uk/landing/now/.
[6]
HTK Speech Recognition Toolkit. http://htk.eng.cam.ac.uk/.
[7]
Intel Xeon Phi. http://www.intel.com/content/www/us/en/processors/xeon/xeon-phi-detail.html.
[8]
Monsoon Power Monitor. http://www.msoon.com/LabEquipment/PowerMonitor/.
[9]
NVIDIA CUDA. http://www.nvidia.com/object/cuda_home_new.html.
[10]
NVidia Tegra X1. http://www.nvidia.com/object/tegra-x1-processor.html.
[11]
OpenCL. https://www.khronos.org/opencl/.
[12]
Qualcomm Adreno GPU. https://developer.qualcomm.com/software/adreno-gpu-sdk/gpu.
[13]
Qualcomm Hexagon DSP. https://developer.qualcomm.com/mobile-development/maximize-hardware/multimedia-optimization-hexagon-sdk/hexagon-dsp-processor.
[14]
Qualcomm Hexagon SDK. https://developer.qualcomm.com/mobile-development/maximize-hardware/multimedia-optimization-hexagon-sdk.
[15]
Qualcomm Snapdragon 800 MDP. http://goo.gl/ySfCFl.
[16]
TensorFlow. https://www.tensorflow.org/.
[17]
Theano. http://deeplearning.net/software/theano/.
[18]
C. M. Bishop. Pattern Recognition and Machine Learning (Information Science and Statistics). Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2006.
[19]
S. Borkar and A. A. Chien. The future of microprocessors. Commun. ACM, 54(5):67--77, May 2011.
[20]
S. Campanoni, K. Brownell, S. Kanev, T. M. Jones, G.-Y. Wei, and D. Brooks. Helix-rc: An architecture-compiler co-design for automatic parallelization of irregular programs. In Proceeding of the 41st Annual International Symposium on Computer Architecuture, ISCA '14, pages 217--228, Piscataway, NJ, USA, 2014. IEEE Press.
[21]
S. Campanoni, T. M. Jones, G. H. Holloway, G.-Y. Wei, and D. M. Brooks. Helix: Making the extraction of thread-level parallelism mainstream. IEEE Micro, 32(4):8--18, 2012.
[22]
G. Chen, C. Parada, and G. Heigold. Small-footprint keyword spotting using deep neural networks. In IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP'14, 2014.
[23]
K. T. Cheng and Y. C. Wang. Using mobile gpu for general-purpose computing -- a case study of face recognition on smartphones. In VLSI Design, Automation and Test (VLSI-DAT), 2011 International Symposium on, pages 1--4, April 2011.
[24]
D. Chu, N. D. Lane, T. T.-T. Lai, C. Pang, X. Meng, Q. Guo, F. Li, and F. Zhao. Balancing energy, latency and accuracy for mobile sensor data classification. In Proceedings of the 9th ACM Conference on Embedded Networked Sensor Systems, SenSys '11, pages 54--67, New York, NY, USA, 2011. ACM.
[25]
A. de Cheveigné and H. Kawahara. YIN, a fundamental frequency estimator for speech and music. The Journal of the Acoustical Society of America, 111(4):1917--1930, 2002.
[26]
Z. Fang, Z. Guoliang, and S. Zhanjiang. Comparison of different implementations of mfcc. J. Comput. Sci. Technol., 16(6):582--589, Nov. 2001.
[27]
P. Georgiev, N. D. Lane, K. K. Rachuri, and C. Mascolo. DSP.Ear: leveraging co-processor support for continuous audio sensing on smartphones. In Proceedings of the 12th ACM Conference on Embedded Network Sensor Systems, SenSys '14, New York, NY, USA, 2014. ACM.
[28]
P. Georgiev, N. D. Lane, K. K. Rachuri, and C. Mascolo. Leo: Scheduling sensor inference algorithms across heterogeneous mobile processors and network resources. In Proceedings of the 22Nd Annual International Conference on Mobile Computing and Networking, MobiCom '16, pages 320--333, New York, NY, USA, 2016. ACM.
[29]
K. Gupta and J. D. Owens. Compute & memory optimizations for high-quality speech recognition on low-end gpu processors. In Proceedings of the 2011 18th International Conference on High Performance Computing, HIPC '11, pages 1--10, Washington, DC, USA, 2011. IEEE Computer Society.
[30]
K. Han, D. Yu, and I. Tashev. Speech emotion recognition using deep neural network and extreme learning machine. In Fifteenth Annual Conference of the International Speech Communication Association, 2014.
[31]
S. Han, K. Jang, K. Park, and S. Moon. Packetshader: A gpu-accelerated software router. In Proceedings of the ACM SIGCOMM 2010 Conference, SIGCOMM '10, pages 195--206, New York, NY, USA, 2010. ACM.
[32]
H. Hermansky. Perceptual linear predictive (PLP) analysis of speech. J. Acoust. Soc. Am., 57(4):1738--52, Apr. 1990.
[33]
G. Hinton, L. Deng, D. Yu, G. Dahl, A. rahman Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. Sainath, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition. Signal Processing Magazine, 2012.
[34]
G. Hinton, L. Deng, D. Yu, A. rahman Mohamed, N. Jaitly, A. Senior, V. Vanhoucke, P. Nguyen, T. S. G. Dahl, and B. Kingsbury. Deep neural networks for acoustic modeling in speech recognition. IEEE Signal Processing Magazine, 29(6):82--97, November 2012.
[35]
A. H. Hormati, M. Samadi, M. Woh, T. Mudge, and S. Mahlke. Sponge: Portable stream programming on graphics engines. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 381--392, New York, NY, USA, 2011. ACM.
[36]
A. Huqqani, E. Schikuta, S. Yea, and P. Chena. Multicore and gpu parallelization of neural networks for face recognition. In International Conference on Computational Science, ICCS, Procedia Computer Science, pages 349--358, London, UK, June 2013. Elsevier.
[37]
K. Jang, S. Han, S. Han, S. Moon, and K. Park. Sslshader: Cheap ssl acceleration with commodity processors. In Proceedings of the 8th USENIX Conference on Networked Systems Design and Implementation, NSDI'11, pages 1--14, Berkeley, CA, USA, 2011. USENIX Association.
[38]
A. Jog, O. Kayiran, N. Chidambaram Nachiappan, A. K. Mishra, M. T. Kandemir, O. Mutlu, R. Iyer, and C. R. Das. Owl: Cooperative thread array aware scheduling techniques for improving gpgpu performance. In Proceedings of the Eighteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS '13, pages 395--406, New York, NY, USA, 2013. ACM.
[39]
D. B. Kirk and W.-m. W. Hwu. Programming Massively Parallel Processors: A Hands-on Approach. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 1st edition, 2010.
[40]
N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, L. Jiao, L. Qendro, and F. Kawsar. Deepx: A software accelerator for low-power deep learning inference on mobile devices. In 2016 15th ACM/IEEE International Conference on Information Processing in Sensor Networks (IPSN), pages 1--12, April 2016.
[41]
N. D. Lane, S. Bhattacharya, P. Georgiev, C. Forlivesi, and F. Kawsar. An early resource characterization of deep learning on wearables, smartphones and internet-of-things devices. In Proceedings of the 2015 International Workshop on Internet of Things Towards Applications, IoT-App '15, pages 7--12, New York, NY, USA, 2015. ACM.
[42]
N. D. Lane, P. Georgiev, and L. Qendro. Deepear: Robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp '15, pages 283--294, New York, NY, USA, 2015. ACM.
[43]
Y. Lee, C. Min, C. Hwang, J. L. 0001, I. Hwang, Y. Ju, C. Yoo, M. Moon, U. Lee, and J. Song. Sociophone: everyday face-to-face interaction monitoring platform using multi-phone sensor fusion. In H.-H. Chu, P. Huang, R. R. Choudhury, and F. Zhao, editors, MobiSys, pages 499--500. ACM, 2013.
[44]
M. Liberman, K. Davis, M. Grossman, N. Martey, and J. Bell. Emotional prosody speech and transcripts. 2002.
[45]
H. Lu, A. J. B. Brush, B. Priyantha, A. K. Karlson, and J. Liu. Speakersense: Energy efficient unobtrusive speaker identification on mobile phones. In Proceedings of the 9th International Conference on Pervasive Computing, Pervasive'11, pages 188--205, Berlin, Heidelberg, 2011. Springer-Verlag.
[46]
H. Lu, D. Frauendorfer, M. Rabbi, M. S. Mast, G. T. Chittaranjan, A. T. Campbell, D. Gatica-Perez, and T. Choudhury. Stresssense: Detecting stress in unconstrained acoustic environments using smartphones. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing, UbiComp '12, pages 351--360, New York, NY, USA, 2012. ACM.
[47]
H. Lu, J. Yang, Z. Liu, N. D. Lane, T. Choudhury, and A. T. Campbell. The jigsaw continuous sensing engine for mobile phone applications. In Proceedings of the 8th ACM Conference on Embedded Networked Sensor Systems, SenSys '10, pages 71--84, New York, NY, USA, 2010. ACM.
[48]
C. Luo and M. C. Chan. Socialweaver: Collaborative inference of human conversation networks using smartphones. In Proceedings of the 11th ACM Conference on Embedded Networked Sensor Systems, SenSys '13, pages 20:1--20:14, New York, NY, USA, 2013. ACM.
[49]
I. McLoughlin, H. Zhang, Z. Xie, Y. Song, and W. Xiao. Robust sound event classification using deep neural networks. Trans. Audio, Speech and Lang. Proc., 23(3):540--552, Mar. 2015.
[50]
I. K. Park, N. Singhal, M. H. Lee, S. Cho, and C. Kim. Design and performance evaluation of image processing algorithms on gpus. IEEE Trans. Parallel Distrib. Syst., 22(1):91--104, Jan. 2011.
[51]
B. Priyantha, D. Lymberopoulos, and J. Liu. Littlerock: Enabling energy-efficient continuous sensing on mobile phones. IEEE Pervasive Computing, 10(2):12--15, 2011.
[52]
K. K. Rachuri, M. Musolesi, C. Mascolo, P. J. Rentfrow, C. Longworth, and A. Aucinas. Emotionsense: A mobile phones based adaptive platform for experimental social psychology research. In Proceedings of the 12th ACM International Conference on Ubiquitous Computing, Ubicomp '10, pages 281--290, New York, NY, USA, 2010. ACM.
[53]
C. J. Rossbach, J. Currey, M. Silberstein, B. Ray, and E. Witchel. Ptask: Operating system abstractions to manage gpus as compute devices. In Proceedings of the Twenty-Third ACM Symposium on Operating Systems Principles, SOSP '11, pages 233--248, New York, NY, USA, 2011. ACM.
[54]
C. Shen, S. Chakraborty, K. R. Raghavan, H. Choi, and M. B. Srivastava. Exploiting processor heterogeneity for energy efficient context inference on mobile phones. In Proceedings of the Workshop on Power-Aware Computing and Systems, HotPower '13, pages 9:1--9:5, New York, NY, USA, 2013. ACM.
[55]
N. Singhal, I. K. Park, and S. Cho. Implementation and optimization of image processing algorithms on handheld gpu. In Image Processing (ICIP), 2010 17th IEEE International Conference on, pages 4481--4484, Sept 2010.
[56]
S. Verma, A. Robinson, and P. Dutta. Audiodaq: Turning the mobile phone's ubiquitous headset port into a universal data acquisition interface. In Proceedings of the 10th ACM Conference on Embedded Network Sensor Systems, SenSys '12, pages 197--210, New York, NY, USA, 2012. ACM.
[57]
G. Wang, Y. Xiong, J. Yun, and J. R. Cavallaro. Accelerating computer vision algorithms using opencl framework on the mobile gpu - a case study. In ICASSP, pages 2629--2633. IEEE, 2013.
[58]
C. Xu, S. Li, G. Liu, Y. Zhang, E. Miluzzo, Y.-F. Chen, J. Li, and B. Firner. Crowd
[59]
: Unsupervised speaker count with smartphones. In Proceedings of the 2013 ACM International Joint Conference on Pervasive and Ubiquitous Computing, UbiComp '13, pages 43--52, New York, NY, USA, 2013. ACM.
[60]
E. Z. Zhang, Y. Jiang, Z. Guo, K. Tian, and X. Shen. On-the-fly elimination of dynamic irregularities for gpu computing. In Proceedings of the Sixteenth International Conference on Architectural Support for Programming Languages and Operating Systems, ASPLOS XVI, pages 369--380, New York, NY, USA, 2011. ACM.
[61]
Y. Zhang, K. Adl, and J. Glass. Fast spoken query detection using lower-bound dynamic time warping on graphical processing units. In In Proc. ICASSP, pages 5173--5176, 2012.
[62]
G. Zhou, J. H. L. Hansen, and J. F. Kaiser. Nonlinear feature based classification of speech under stress. IEEE Transactions on Speech and Audio Processing, 9(3):201--216, 2001.

Cited By

View all
  • (2023)WiEdge: Edge Computing for Audio Sensing Applications With Accurate Wireless Link PredictionIEEE Internet of Things Journal10.1109/JIOT.2022.317366810:5(3982-3994)Online publication date: 1-Mar-2023
  • (2022)MandhelingProceedings of the 28th Annual International Conference on Mobile Computing And Networking10.1145/3495243.3560545(214-227)Online publication date: 14-Oct-2022
  • (2022)Women in Networks: Professor Cecilia MascoloIEEE Network10.1109/MNET.2022.991977836:4(4-5)Online publication date: Jul-2022
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
MobiSys '17: Proceedings of the 15th Annual International Conference on Mobile Systems, Applications, and Services
June 2017
520 pages
ISBN:9781450349284
DOI:10.1145/3081333
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

In-Cooperation

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 16 June 2017

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. audio sensing
  2. mobile GPU offloading

Qualifiers

  • Research-article

Funding Sources

  • Microsoft Research

Conference

MobiSys'17
Sponsor:

Acceptance Rates

MobiSys '17 Paper Acceptance Rate 34 of 188 submissions, 18%;
Overall Acceptance Rate 274 of 1,679 submissions, 16%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)26
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Feb 2025

Other Metrics

Citations

Cited By

View all
  • (2023)WiEdge: Edge Computing for Audio Sensing Applications With Accurate Wireless Link PredictionIEEE Internet of Things Journal10.1109/JIOT.2022.317366810:5(3982-3994)Online publication date: 1-Mar-2023
  • (2022)MandhelingProceedings of the 28th Annual International Conference on Mobile Computing And Networking10.1145/3495243.3560545(214-227)Online publication date: 14-Oct-2022
  • (2022)Women in Networks: Professor Cecilia MascoloIEEE Network10.1109/MNET.2022.991977836:4(4-5)Online publication date: Jul-2022
  • (2021)Byzantine Fault-tolerant State-machine Replication from a Systems PerspectiveACM Computing Surveys10.1145/343672854:1(1-38)Online publication date: 11-Feb-2021
  • (2021)The Creation and Detection of DeepfakesACM Computing Surveys10.1145/342578054:1(1-41)Online publication date: 2-Jan-2021
  • (2021)From Server-Based to Client-Based Machine LearningACM Computing Surveys10.1145/342466054:1(1-36)Online publication date: 2-Jan-2021
  • (2021)Location Privacy-preserving Mechanisms in Location-based ServicesACM Computing Surveys10.1145/342316554:1(1-36)Online publication date: 2-Jan-2021
  • (2021)Edge Intelligence: Empowering Intelligence to the Edge of NetworkProceedings of the IEEE10.1109/JPROC.2021.3119950109:11(1778-1837)Online publication date: Nov-2021
  • (2020)Improved Environment-Aware–Based Noise Reduction System for Cochlear Implant Users Based on Knowledge Transfer Approach: A Development and Usability Study (Preprint)Journal of Medical Internet Research10.2196/25460Online publication date: 9-Nov-2020
  • (2020)Rethinking Pruning for Accelerating Deep Inference At the EdgeProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining10.1145/3394486.3403058(155-164)Online publication date: 23-Aug-2020
  • Show More Cited By

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media