research-article

Ubicoustics: Plug-and-Play Acoustic Activity Recognition

Authors:

Chris HarrisonAuthors Info & Claims

UIST '18: Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology

Pages 213 - 224

https://doi.org/10.1145/3242587.3242609

Published: 11 October 2018 Publication History

Abstract

Despite sound being a rich source of information, computing devices with microphones do not leverage audio to glean useful insights about their physical and social context. For example, a smart speaker sitting on a kitchen countertop cannot figure out if it is in a kitchen, let alone know what a user is doing in a kitchen - a missed opportunity. In this work, we describe a novel, real-time, sound-based activity recognition system. We start by taking an existing, state-of-the-art sound labeling model, which we then tune to classes of interest by drawing data from professional sound effect libraries traditionally used in the entertainment industry. These well-labeled and high-quality sounds are the perfect atomic unit for data augmentation, including amplitude, reverb, and mixing, allowing us to exponentially grow our tuning data in realistic ways. We quantify the performance of our approach across a range of environments and device categories and show that microphone-equipped computing devices already have the requisite capability to unlock real-time activity recognition comparable to human accuracy.

Supplementary Material

suppl.mov (ufp1140.mp4)

Supplemental video

Download
125.10 MB

suppl.mov (ufp1140p.mp4)

Supplemental video

Download
10.72 MB

MP4 File (p213-laput.mp4)

Download
298.70 MB

References

[1]

R.E Abbott and S.C. Hadden. 1990. Product Specifica-tion for a Nonintrusive Appliance Load Monitoring System. EPRI Report #NI-101, 1990.

[2]

Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. Youtube-8M: A large-scale video classification benchmark. https://arxiv.org/abs/1609.08675

[3]

Apple, Inc. Impulse Response Utility, User Manual. 2009. https://documentation.apple.com/en/impulseresponseutility/usermanual, Retrieved on July 12, 2018.

[4]

Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. SoundNet: Learning sound representations from unlabeled video. Advances in Neural Information Pro-cessing Systems (NIPS '16).

Digital Library

[5]

BBC Sound Effects. https://shop.prosoundeffects.com/products/bbc-complete-sound-effects-library, Retrieved on April 4, 2018.

[6]

S. D. Brookfield and S. Preskill. 1999. Discussion as a way of teaching: Tools and techniques for democratic classrooms. San Francisco: Jossey-Bass.

[7]

Peter Doyle. Echo and Reverb: Fabricating Space in Popular Music Recording. 2005. Wesleyan. ISBN 978-0819567949.

[8]

Benjamin Elizalde, Rohan Badlani, Ankit Shah, Anu-rag Kumar, Bhiksha Raj. 2017. Never-Ending Learner of Sounds. In NIPS Workshop on Machine Learning for Audio, 2017. https://arxiv.org/pdf/1801.05544

[9]

Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Free-man, Michael Rubinstein. 2018. Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation. https://arxiv.org/abs/1804.03619

Digital Library

[10]

Antti J.Eronen, Vesa T. Peltonen, Juha T. Tuomi, Anssi P. Klapuri, Seppo Fagerlund, Timo Sorsa, Gaëtan Lorho, and Jyri Huopaniemi. 2006. Audio-based context recognition. In IEEE Transactions on Audio, Speech, and Language Processing 14, no. 1 (2006): 321--329.

Digital Library

[11]

Pasquale Foggia, Nicolai Petkov, Alessia Saggese, Nicola Strisciuglio, and Mario Vento. 2015. Reliable detection of audio events in highly noisy environments. In Pattern Recognition Letters 65 (2015): 22--28.

Digital Library

[12]

Jon E. Froehlich, Eric Larson, Tim Campbell, Conor Haggerty, James Fogarty, and Shwetak N. Patel. 2009. HydroSense: infrastructure-mediated single-point sens-ing of whole-home water activity. In Proc. of the 11th international conference on Ubiquitous computing (UbiComp '09). ACM, New York, NY, USA, 235--244.

Digital Library

[13]

Jort F. Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio Set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776--780. IEEE.

Digital Library

[14]

Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. Cam-bridge: MIT Press.

Digital Library

[15]

Sidhant Gupta, Matthew S. Reynolds, and Shwetak N. Patel. 2010. ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home. In Proceedings of the 12th ACM interna-tional conference on Ubiquitous computing (UbiComp '10). ACM, New York, NY, USA, 139--148.

Digital Library

[16]

Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal et al. 2017. CNN architectures for large-scale audio classification. In IEEE International Con-ference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 131--135. IEEE.

[17]

Mike Lam, Mostafa Mirshekari, Shijia Pan, Pei Zhang, and Hae Young Noh. 2016. Robust occupant detection through step-induced floor vibration by incorporating structural characteristics. In Dynamics of Coupled Structures, Volume 4, pp. 357--367. Springer, Cham.

[18]

Nicholas D. Lane, Petko Georgiev, and Lorena Qendro. 2015. DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learn-ing. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '15). ACM, New York, NY, USA, 283--294.

Digital Library

[19]

Gierad Laput, Walter S. Lasecki, Jason Wiese, Robert Xiao, Jeffrey P. Bigham, and Chris Harrison. 2015. Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Com-puting Systems (CHI '15). ACM, New York, NY, USA, 1935--1944.

Digital Library

[20]

Gierad Laput, Yang Zhang, and Chris Harrison. 2017. Synthetic Sensors: Towards General-Purpose Sensing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3986--3999.

Digital Library

[21]

Hanchuan Li, Can Ye, and Alanson P. Sample. 2015. IDSense: A Human Object Interaction Detection Sys-tem Based on Passive UHF RFID. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 2555--2564.

Digital Library

[22]

Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury, and Andrew T. Campbell. 2009. Sound-Sense: scalable sound sensing for people-centric applications on mobile phones. In Proceedings of the 7th international conference on Mobile systems, appli-cations, and services (MobiSys '09). ACM, New York, NY, USA, 165--178.

Digital Library

[23]

Brian McFee, Eric J. Humphrey, and Juan P. Bello. 2015. A Software Framework for Musical Data Aug-mentation." In Proceeding International Society for Music Information Retrieval Conference (ISMIR 2015), pp. 248--254.

[24]

Ian McLoughlin, Haomin Zhang, Zhipeng Xie, Yan Song, and Wei Xiao. 2015. Robust sound event classi-fication using deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Pro-cessing, 23, no. 3 (2015): 540--552.

[25]

Vinod Nair, and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. 2010. In Proceedings of the 27th international conference on machine learning (ICML-10).

Digital Library

[26]

Network Sound Effects. https://www.sound-ideas.com/Product/199/Network-Sound-Effects-Library, Retrieved on July 12, 2018.

[27]

Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, and Tuomas Virtanen. 2017. Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25, 6: 1291--1303.

Digital Library

[28]

Huy Phan, Lars Hertel, Marco Maass, and Alfred Mer-tins. 2016. Robust audio event recognition with 1-max pooling convolutional neural networks. https://arxiv.org/abs/1604.06338

[29]

Karol J. Piczak. 2015. ESC: Dataset for Environmental Sound Classification. In Proceedings of the 23rd ACM international conference on Multimedia (MM '15). ACM, New York, NY, USA, 1015--1018.

Digital Library

[30]

Jay Rose. 2008. Producing Great Sound for Film and Video. 3rd Edition. Focal Press and Elsevier Inc. ISBN 978-0--240--80970-0.

Digital Library

[31]

Thomas D. Rossing, F. Richard Moore, and Paul A. Wheeler. 2001. The Science of Sound. 3rd Edition. Pearson. ISBN 978-0805385656.

[32]

Justin Salamon and Juan Pablo Bello. 2017. Deep con-volutional neural networks and data augmentation for environmental sound classification. In IEEE Signal Processing Letters. 24.3 (2017): 279--283.

[33]

Justin Salamon and Juan Pablo Bello. 2015. Feature learning with deep scattering for urban sound analysis. In Proc. 23rd European Signal Processing Conference (EUSIPCO), 2015. IEEE.

[34]

Seeed Studio. ReSpeaker 2-Mic PHAT. https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT-p-2874.html, Retrieved on July 18, 2018

[35]

ShotSpotter. www.shotspotter.com, Retrieved on April 3, 2018.

[36]

Soundsnap. https://www.soundsnap.com, Retrieved on July 12, 2018.

[37]

Johannes A. Stork, Luciano Spinello, Jens Silva, and Kai O. Arras. 2012. Audio-based human activity recognition using non-markovian ensemble voting. In RO-MAN, 2012 IEEE, pp. 509--514. IEEE.

[38]

Johannes. A. Stork, L. Spinello, J. Silva and K. O. Arras, "Audio-based human activity recognition using Non-Markovian Ensemble Voting," The 21st IEEE In-ternational Symposium on Robot and Human Interactive Communication (RO-MAN 2012), 509--514.

[39]

Dan Stowell, Dimitrios Giannoulis, Emmanouil Bene-tos, Mathieu Lagrange, and Mark D. Plumbley. 2015. Detection and classification of acoustic scenes and events. In IEEE Transactions on Multimedia, 17, no. 10 (2015): 1733--1746.

[40]

Tanzeem Choudhury, Sunny Consolvo, Beverly Harri-son, Jeffrey Hightower, Anthony LaMarca, Louis LeGrand, Ali Rahimi et al. 2008. The mobile sensing platform: An embedded activity recognition system. In IEEE Pervasive Computing, vol. 7, no. 2, pp. 32--41, April-June 2008.

Digital Library

[41]

Jamie A. Ward, Paul Lukowicz, Gerhard Troster, and Thad E. Starner. 2006. Activity recognition of assem-bly tasks using body-worn microphones and accelerometers. In IEEE transactions on pattern analysis and machine intelligence, 28, no. 10: 1553--1567.

Digital Library

[42]

Koji Yatani and Khai N. Truong. 2012. BodyScope: a wearable acoustic sensor for activity recognition. In Proceedings of the 2012 ACM Conference on Ubiqui-tous Computing (UbiComp '12). ACM, New York, NY, USA, 341--350.

Digital Library

[43]

Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411--412.

Digital Library

[44]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770--778. 2016.

Cited By

Zhang XHu QXiao ZSun TZhang JZhang JLi Z(2025)Few-Shot Adaptation to Unseen Conditions for Wireless-Based Human Activity Recognition Without Fine-TuningIEEE Transactions on Mobile Computing10.1109/TMC.2024.346246624:2(585-599)Online publication date: Feb-2025
https://doi.org/10.1109/TMC.2024.3462466
Arakawa RLehman JGoel M(2024)PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language ModelsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997598:4(1-26)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699759
Mahmud SParikh VLiang QLi KZhang RAjit AGunda VAgarwal DGuimbretiere FZhang C(2024)ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the BodyProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997528:4(1-32)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699752
Show More Cited By

Index Terms

Ubicoustics: Plug-and-Play Acoustic Activity Recognition
1. Human-centered computing
  1. Ubiquitous and mobile computing
    1. Ubiquitous and mobile computing systems and tools

Recommendations

PrivacyMic: Utilizing Inaudible Frequencies for Privacy Preserving Daily Activity Recognition
CHI '21: Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems

Sound presents an invaluable signal source that enables computing systems to perform daily activity recognition. However, microphones are optimized for human speech and hearing ranges: capturing private content, such as speech, while omitting useful, ...
A review of transduction techniques used in acoustics echo cancellation
AMTA'07: Proceedings of the 8th WSEAS international conference on Acoustics & music: theory & applications

In this review we shall describe the most typical and conventional transducers used in the acoustic echo cancellation systems and the effect of the transducer characteristics on achieving echo cancellation. We shall also discuss the typical design ...
Transparent hearing
CHI EA '02: CHI '02 Extended Abstracts on Human Factors in Computing Systems

This paper describes what we call Transparent Hearing: the use of microphone equipped headphones for augmented audio. It provides a framework for experiments like real-time audio alteration, multi-modal sensory integration and collaborative listening ...

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences

UIST '18: Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology

October 2018

1016 pages

ISBN:9781450359481

DOI:10.1145/3242587

General Chairs:
Patrick Baudisch
Hasso-Plattner Institute, Germany
,
Albrecht Schmidt
LMU, Germany
,
Program Chair:
Andy Wilson
Microsoft Research, USA

Copyright © 2018 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 11 October 2018

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article

Conference

UIST '18

Sponsor:

UIST '18: The 31st Annual ACM Symposium on User Interface Software and Technology

October 14, 2018

Berlin, Germany

Acceptance Rates

UIST '18 Paper Acceptance Rate 80 of 375 submissions, 21%;

Overall Acceptance Rate 561 of 2,567 submissions, 22%

Upcoming Conference

UIST '25

Sponsor:
sigchi
sigchi

The 38th Annual ACM Symposium on User Interface Software and Technology

September 28 - October 1, 2025

Busan , Republic of Korea

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

105
Total Citations
View Citations
2,197
Total Downloads

Downloads (Last 12 months)232
Downloads (Last 6 weeks)32

Reflects downloads up to 08 Feb 2025

Other Metrics

View Author Metrics

Citations

Cited By

Zhang XHu QXiao ZSun TZhang JZhang JLi Z(2025)Few-Shot Adaptation to Unseen Conditions for Wireless-Based Human Activity Recognition Without Fine-TuningIEEE Transactions on Mobile Computing10.1109/TMC.2024.346246624:2(585-599)Online publication date: Feb-2025
https://doi.org/10.1109/TMC.2024.3462466
Arakawa RLehman JGoel M(2024)PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language ModelsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997598:4(1-26)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699759
Mahmud SParikh VLiang QLi KZhang RAjit AGunda VAgarwal DGuimbretiere FZhang C(2024)ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the BodyProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997528:4(1-32)Online publication date: 21-Nov-2024
https://dl.acm.org/doi/10.1145/3699752
Migliore ESagesser MMa Y(2024)The Grain No. 12/11 W4 102 1 From the City: Integrating a 3D Model with Audio Data as an Experimental Creative MethodProceedings of the 17th International Symposium on Visual Information Communication and Interaction10.1145/3678698.3687179(1-8)Online publication date: 11-Dec-2024
https://dl.acm.org/doi/10.1145/3678698.3687179
Arakawa RGoel MKostakos VKay JHoang T(2024)Unified Framework for Procedural Task Assistants powered by Human Activity RecognitionCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678448(513-518)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3678448
Wang YTang MHe YTang TKostakos VKay JHoang T(2024)Interactive Design with Autistic Children using LLM and IoT for Personalized Training: The Good, The Bad and The ChallengingCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3677573(1000-1003)Online publication date: 5-Oct-2024
https://dl.acm.org/doi/10.1145/3675094.3677573
Chhaglani BZakaria CPeltier RGummeson JShenoy P(2024)AeroSense: Sensing Aerosol Emissions from Indoor Human ActivitiesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595938:2(1-30)Online publication date: 15-May-2024
https://dl.acm.org/doi/10.1145/3659593
Arakawa RYakura HGoel M(2024)PrISM-Observer: Intervention Agent to Help Users Perform Everyday Procedures Sensed using a SmartwatchProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676350(1-16)Online publication date: 13-Oct-2024
https://dl.acm.org/doi/10.1145/3654777.3676350
Chang RHung CChen BJain DGuo A(2024)SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality AwarenessProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661556(116-132)Online publication date: 1-Jul-2024
https://dl.acm.org/doi/10.1145/3643834.3661556
Boovaraghavan SZhou HGoel MAgarwal Y(2024)KirigamiProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435028:1(1-28)Online publication date: 6-Mar-2024
https://dl.acm.org/doi/10.1145/3643502
Show More Cited By

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Figures

Tables

Media

View Table of Conten