Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3242587.3242609acmconferencesArticle/Chapter ViewAbstractPublication PagesuistConference Proceedingsconference-collections
research-article

Ubicoustics: Plug-and-Play Acoustic Activity Recognition

Published: 11 October 2018 Publication History

Abstract

Despite sound being a rich source of information, computing devices with microphones do not leverage audio to glean useful insights about their physical and social context. For example, a smart speaker sitting on a kitchen countertop cannot figure out if it is in a kitchen, let alone know what a user is doing in a kitchen - a missed opportunity. In this work, we describe a novel, real-time, sound-based activity recognition system. We start by taking an existing, state-of-the-art sound labeling model, which we then tune to classes of interest by drawing data from professional sound effect libraries traditionally used in the entertainment industry. These well-labeled and high-quality sounds are the perfect atomic unit for data augmentation, including amplitude, reverb, and mixing, allowing us to exponentially grow our tuning data in realistic ways. We quantify the performance of our approach across a range of environments and device categories and show that microphone-equipped computing devices already have the requisite capability to unlock real-time activity recognition comparable to human accuracy.

Supplementary Material

suppl.mov (ufp1140.mp4)
Supplemental video
suppl.mov (ufp1140p.mp4)
Supplemental video
MP4 File (p213-laput.mp4)

References

[1]
R.E Abbott and S.C. Hadden. 1990. Product Specifica-tion for a Nonintrusive Appliance Load Monitoring System. EPRI Report #NI-101, 1990.
[2]
Sami Abu-El-Haija, Nisarg Kothari, Joonseok Lee, Paul Natsev, George Toderici, Balakrishnan Varadarajan, and Sudheendra Vijayanarasimhan. 2016. Youtube-8M: A large-scale video classification benchmark. https://arxiv.org/abs/1609.08675
[3]
Apple, Inc. Impulse Response Utility, User Manual. 2009. https://documentation.apple.com/en/impulseresponseutility/usermanual, Retrieved on July 12, 2018.
[4]
Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. SoundNet: Learning sound representations from unlabeled video. Advances in Neural Information Pro-cessing Systems (NIPS '16).
[5]
BBC Sound Effects. https://shop.prosoundeffects.com/products/bbc-complete-sound-effects-library, Retrieved on April 4, 2018.
[6]
S. D. Brookfield and S. Preskill. 1999. Discussion as a way of teaching: Tools and techniques for democratic classrooms. San Francisco: Jossey-Bass.
[7]
Peter Doyle. Echo and Reverb: Fabricating Space in Popular Music Recording. 2005. Wesleyan. ISBN 978-0819567949.
[8]
Benjamin Elizalde, Rohan Badlani, Ankit Shah, Anu-rag Kumar, Bhiksha Raj. 2017. Never-Ending Learner of Sounds. In NIPS Workshop on Machine Learning for Audio, 2017. https://arxiv.org/pdf/1801.05544
[9]
Ariel Ephrat, Inbar Mosseri, Oran Lang, Tali Dekel, Kevin Wilson, Avinatan Hassidim, William T. Free-man, Michael Rubinstein. 2018. Looking to Listen at the Cocktail Party: A Speaker-Independent Audio-Visual Model for Speech Separation. https://arxiv.org/abs/1804.03619
[10]
Antti J.Eronen, Vesa T. Peltonen, Juha T. Tuomi, Anssi P. Klapuri, Seppo Fagerlund, Timo Sorsa, Gaëtan Lorho, and Jyri Huopaniemi. 2006. Audio-based context recognition. In IEEE Transactions on Audio, Speech, and Language Processing 14, no. 1 (2006): 321--329.
[11]
Pasquale Foggia, Nicolai Petkov, Alessia Saggese, Nicola Strisciuglio, and Mario Vento. 2015. Reliable detection of audio events in highly noisy environments. In Pattern Recognition Letters 65 (2015): 22--28.
[12]
Jon E. Froehlich, Eric Larson, Tim Campbell, Conor Haggerty, James Fogarty, and Shwetak N. Patel. 2009. HydroSense: infrastructure-mediated single-point sens-ing of whole-home water activity. In Proc. of the 11th international conference on Ubiquitous computing (UbiComp '09). ACM, New York, NY, USA, 235--244.
[13]
Jort F. Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio Set: An ontology and human-labeled dataset for audio events. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 776--780. IEEE.
[14]
Ian Goodfellow, Yoshua Bengio, Aaron Courville, and Yoshua Bengio. 2016. Deep learning. Vol. 1. Cam-bridge: MIT Press.
[15]
Sidhant Gupta, Matthew S. Reynolds, and Shwetak N. Patel. 2010. ElectriSense: single-point sensing using EMI for electrical event detection and classification in the home. In Proceedings of the 12th ACM interna-tional conference on Ubiquitous computing (UbiComp '10). ACM, New York, NY, USA, 139--148.
[16]
Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F. Gemmeke, Aren Jansen, R. Channing Moore, Manoj Plakal et al. 2017. CNN architectures for large-scale audio classification. In IEEE International Con-ference on Acoustics, Speech and Signal Processing (ICASSP), 2017, pp. 131--135. IEEE.
[17]
Mike Lam, Mostafa Mirshekari, Shijia Pan, Pei Zhang, and Hae Young Noh. 2016. Robust occupant detection through step-induced floor vibration by incorporating structural characteristics. In Dynamics of Coupled Structures, Volume 4, pp. 357--367. Springer, Cham.
[18]
Nicholas D. Lane, Petko Georgiev, and Lorena Qendro. 2015. DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learn-ing. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing (UbiComp '15). ACM, New York, NY, USA, 283--294.
[19]
Gierad Laput, Walter S. Lasecki, Jason Wiese, Robert Xiao, Jeffrey P. Bigham, and Chris Harrison. 2015. Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Com-puting Systems (CHI '15). ACM, New York, NY, USA, 1935--1944.
[20]
Gierad Laput, Yang Zhang, and Chris Harrison. 2017. Synthetic Sensors: Towards General-Purpose Sensing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems (CHI '17). ACM, New York, NY, USA, 3986--3999.
[21]
Hanchuan Li, Can Ye, and Alanson P. Sample. 2015. IDSense: A Human Object Interaction Detection Sys-tem Based on Passive UHF RFID. In Proceedings of the 33rd Annual ACM Conference on Human Factors in Computing Systems (CHI '15). ACM, New York, NY, USA, 2555--2564.
[22]
Hong Lu, Wei Pan, Nicholas D. Lane, Tanzeem Choudhury, and Andrew T. Campbell. 2009. Sound-Sense: scalable sound sensing for people-centric applications on mobile phones. In Proceedings of the 7th international conference on Mobile systems, appli-cations, and services (MobiSys '09). ACM, New York, NY, USA, 165--178.
[23]
Brian McFee, Eric J. Humphrey, and Juan P. Bello. 2015. A Software Framework for Musical Data Aug-mentation." In Proceeding International Society for Music Information Retrieval Conference (ISMIR 2015), pp. 248--254.
[24]
Ian McLoughlin, Haomin Zhang, Zhipeng Xie, Yan Song, and Wei Xiao. 2015. Robust sound event classi-fication using deep neural networks. IEEE/ACM Transactions on Audio, Speech, and Language Pro-cessing, 23, no. 3 (2015): 540--552.
[25]
Vinod Nair, and Geoffrey E. Hinton. Rectified linear units improve restricted boltzmann machines. 2010. In Proceedings of the 27th international conference on machine learning (ICML-10).
[26]
Network Sound Effects. https://www.sound-ideas.com/Product/199/Network-Sound-Effects-Library, Retrieved on July 12, 2018.
[27]
Giambattista Parascandolo, Toni Heittola, Heikki Huttunen, and Tuomas Virtanen. 2017. Convolutional recurrent neural networks for polyphonic sound event detection. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 25, 6: 1291--1303.
[28]
Huy Phan, Lars Hertel, Marco Maass, and Alfred Mer-tins. 2016. Robust audio event recognition with 1-max pooling convolutional neural networks. https://arxiv.org/abs/1604.06338
[29]
Karol J. Piczak. 2015. ESC: Dataset for Environmental Sound Classification. In Proceedings of the 23rd ACM international conference on Multimedia (MM '15). ACM, New York, NY, USA, 1015--1018.
[30]
Jay Rose. 2008. Producing Great Sound for Film and Video. 3rd Edition. Focal Press and Elsevier Inc. ISBN 978-0--240--80970-0.
[31]
Thomas D. Rossing, F. Richard Moore, and Paul A. Wheeler. 2001. The Science of Sound. 3rd Edition. Pearson. ISBN 978-0805385656.
[32]
Justin Salamon and Juan Pablo Bello. 2017. Deep con-volutional neural networks and data augmentation for environmental sound classification. In IEEE Signal Processing Letters. 24.3 (2017): 279--283.
[33]
Justin Salamon and Juan Pablo Bello. 2015. Feature learning with deep scattering for urban sound analysis. In Proc. 23rd European Signal Processing Conference (EUSIPCO), 2015. IEEE.
[34]
Seeed Studio. ReSpeaker 2-Mic PHAT. https://www.seeedstudio.com/ReSpeaker-2-Mics-Pi-HAT-p-2874.html, Retrieved on July 18, 2018
[35]
ShotSpotter. www.shotspotter.com, Retrieved on April 3, 2018.
[36]
Soundsnap. https://www.soundsnap.com, Retrieved on July 12, 2018.
[37]
Johannes A. Stork, Luciano Spinello, Jens Silva, and Kai O. Arras. 2012. Audio-based human activity recognition using non-markovian ensemble voting. In RO-MAN, 2012 IEEE, pp. 509--514. IEEE.
[38]
Johannes. A. Stork, L. Spinello, J. Silva and K. O. Arras, "Audio-based human activity recognition using Non-Markovian Ensemble Voting," The 21st IEEE In-ternational Symposium on Robot and Human Interactive Communication (RO-MAN 2012), 509--514.
[39]
Dan Stowell, Dimitrios Giannoulis, Emmanouil Bene-tos, Mathieu Lagrange, and Mark D. Plumbley. 2015. Detection and classification of acoustic scenes and events. In IEEE Transactions on Multimedia, 17, no. 10 (2015): 1733--1746.
[40]
Tanzeem Choudhury, Sunny Consolvo, Beverly Harri-son, Jeffrey Hightower, Anthony LaMarca, Louis LeGrand, Ali Rahimi et al. 2008. The mobile sensing platform: An embedded activity recognition system. In IEEE Pervasive Computing, vol. 7, no. 2, pp. 32--41, April-June 2008.
[41]
Jamie A. Ward, Paul Lukowicz, Gerhard Troster, and Thad E. Starner. 2006. Activity recognition of assem-bly tasks using body-worn microphones and accelerometers. In IEEE transactions on pattern analysis and machine intelligence, 28, no. 10: 1553--1567.
[42]
Koji Yatani and Khai N. Truong. 2012. BodyScope: a wearable acoustic sensor for activity recognition. In Proceedings of the 2012 ACM Conference on Ubiqui-tous Computing (UbiComp '12). ACM, New York, NY, USA, 341--350.
[43]
Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia (MM '13). ACM, New York, NY, USA, 411--412.
[44]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. "Deep residual learning for image recognition." In Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 770--778. 2016.

Cited By

View all
  • (2025)Few-Shot Adaptation to Unseen Conditions for Wireless-Based Human Activity Recognition Without Fine-TuningIEEE Transactions on Mobile Computing10.1109/TMC.2024.346246624:2(585-599)Online publication date: Feb-2025
  • (2024)PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language ModelsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997598:4(1-26)Online publication date: 21-Nov-2024
  • (2024)ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the BodyProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997528:4(1-32)Online publication date: 21-Nov-2024
  • Show More Cited By

Index Terms

  1. Ubicoustics: Plug-and-Play Acoustic Activity Recognition

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image ACM Conferences
    UIST '18: Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology
    October 2018
    1016 pages
    ISBN:9781450359481
    DOI:10.1145/3242587
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Sponsors

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 11 October 2018

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. acoustics
    2. internet-of-things
    3. iot
    4. microphones
    5. smart environments
    6. sound sensing
    7. ubiquitous sensing

    Qualifiers

    • Research-article

    Conference

    UIST '18

    Acceptance Rates

    UIST '18 Paper Acceptance Rate 80 of 375 submissions, 21%;
    Overall Acceptance Rate 561 of 2,567 submissions, 22%

    Upcoming Conference

    UIST '25
    The 38th Annual ACM Symposium on User Interface Software and Technology
    September 28 - October 1, 2025
    Busan , Republic of Korea

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)232
    • Downloads (Last 6 weeks)32
    Reflects downloads up to 08 Feb 2025

    Other Metrics

    Citations

    Cited By

    View all
    • (2025)Few-Shot Adaptation to Unseen Conditions for Wireless-Based Human Activity Recognition Without Fine-TuningIEEE Transactions on Mobile Computing10.1109/TMC.2024.346246624:2(585-599)Online publication date: Feb-2025
    • (2024)PrISM-Q&A: Step-Aware Voice Assistant on a Smartwatch Enabled by Multimodal Procedure Tracking and Large Language ModelsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997598:4(1-26)Online publication date: 21-Nov-2024
    • (2024)ActSonic: Recognizing Everyday Activities from Inaudible Acoustic Wave Around the BodyProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36997528:4(1-32)Online publication date: 21-Nov-2024
    • (2024)The Grain No. 12/11 W4 102 1 From the City: Integrating a 3D Model with Audio Data as an Experimental Creative MethodProceedings of the 17th International Symposium on Visual Information Communication and Interaction10.1145/3678698.3687179(1-8)Online publication date: 11-Dec-2024
    • (2024)Unified Framework for Procedural Task Assistants powered by Human Activity RecognitionCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3678448(513-518)Online publication date: 5-Oct-2024
    • (2024)Interactive Design with Autistic Children using LLM and IoT for Personalized Training: The Good, The Bad and The ChallengingCompanion of the 2024 on ACM International Joint Conference on Pervasive and Ubiquitous Computing10.1145/3675094.3677573(1000-1003)Online publication date: 5-Oct-2024
    • (2024)AeroSense: Sensing Aerosol Emissions from Indoor Human ActivitiesProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36595938:2(1-30)Online publication date: 15-May-2024
    • (2024)PrISM-Observer: Intervention Agent to Help Users Perform Everyday Procedures Sensed using a SmartwatchProceedings of the 37th Annual ACM Symposium on User Interface Software and Technology10.1145/3654777.3676350(1-16)Online publication date: 13-Oct-2024
    • (2024)SoundShift: Exploring Sound Manipulations for Accessible Mixed-Reality AwarenessProceedings of the 2024 ACM Designing Interactive Systems Conference10.1145/3643834.3661556(116-132)Online publication date: 1-Jul-2024
    • (2024)KirigamiProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36435028:1(1-28)Online publication date: 6-Mar-2024
    • Show More Cited By

    View Options

    Login options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Figures

    Tables

    Media

    Share

    Share

    Share this Publication link

    Share on social media