Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos

Published: 29 March 2019 Publication History

Abstract

Over the years, activity sensing and recognition has been shown to play a key enabling role in a wide range of applications, from sustainability and human-computer interaction to health care. While many recognition tasks have traditionally employed inertial sensors, acoustic-based methods offer the benefit of capturing rich contextual information, which can be useful when discriminating complex activities. Given the emergence of deep learning techniques and leveraging new, large-scale multimedia datasets, this paper revisits the opportunity of training audio-based classifiers without the onerous and time-consuming task of annotating audio data. We propose a framework for audio-based activity recognition that can make use of millions of embedding features from public online video sound clips. Based on the combination of oversampling and deep learning approaches, our framework does not require further feature processing or outliers filtering as in prior work. We evaluated our approach in the context of Activities of Daily Living (ADL) by recognizing 15 everyday activities with 14 participants in their own homes, achieving 64.2% and 83.6% averaged within-subject accuracy in terms of top-1 and top-3 classification respectively. Individual class performance was also examined in the paper to further study the co-occurrence characteristics of the activities and the robustness of the framework.

References

[1]
Martín Abadi, Ashish Agarwal, Paul Barham, Eugene Brevdo, Zhifeng Chen, Craig Citro, Greg S. Corrado, Andy Davis, Jeffrey Dean, Matthieu Devin, Sanjay Ghemawat, Ian Goodfellow, Andrew Harp, Geoffrey Irving, Michael Isard, Yangqing Jia, Rafal Jozefowicz, Lukasz Kaiser, Manjunath Kudlur, Josh Levenberg, Dandelion Mané, Rajat Monga, Sherry Moore, Derek Murray, Chris Olah, Mike Schuster, Jonathon Shlens, Benoit Steiner, Ilya Sutskever, Kunal Talwar, Paul Tucker, Vincent Vanhoucke, Vijay Vasudevan, Fernanda Viégas, Oriol Vinyals, Pete Warden, Martin Wattenberg, Martin Wicke, Yuan Yu, and Xiaoqiang Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). https://www.tensorflow.org/ Software available from tensorflow.org.
[2]
Alvina Anjum and Muhammad Usman Ilyas. 2013. Activity recognition using smartphone sensors. In Consumer Communications and Networking Conference (CCNC), 2013 IEEE. IEEE, 914--919.
[3]
Yusuf Aytar, Carl Vondrick, and Antonio Torralba. 2016. Soundnet: Learning sound representations from unlabeled video. In Advances in Neural Information Processing Systems. 892--900.
[4]
Nitesh V Chawla, Kevin W Bowyer, Lawrence O Hall, and W Philip Kegelmeyer. 2002. SMOTE: synthetic minority over-sampling technique. Journal of artificial intelligence research 16 (2002), 321--357.
[5]
Jianfeng Chen, Alvin Harvey Kam, Jianmin Zhang, Ning Liu, and Louis Shue. 2005. Bathroom activity monitoring based on sound. In International Conference on Pervasive Computing. Springer, 47--61.
[6]
François Chollet et al. 2015. Keras. https://keras.io. (2015).
[7]
Delphine Christin, Andreas Reinhardt, Salil S Kanhere, and Matthias Hollick. 2011. A survey on privacy in mobile participatory sensing applications. Journal of systems and software 84, 11 (2011), 1928--1946.
[8]
Antti J Eronen, Vesa T Peltonen, Juha T Tuomi, Anssi P Klapuri, Seppo Fagerlund, Timo Sorsa, Gaëtan Lorho, and Jyri Huopaniemi. 2006. Audio-based context recognition. IEEE Transactions on Audio, Speech, and Language Processing 14, 1 (2006), 321--329.
[9]
Ethan Fast, William McGrath, Pranav Rajpurkar, and Michael S Bernstein. 2016. Augur: Mining human behaviors from fiction to power interactive systems. In Proceedings of the 2016 CHI Conference on Human Factors in Computing Systems. ACM, 237--247.
[10]
Frederic Font, Gerard Roma, and Xavier Serra. 2013. Freesound technical demo. In Proceedings of the 21st ACM international conference on Multimedia. ACM, 411--412.
[11]
Raghu K Ganti, Nam Pham, Hossein Ahmadi, Saurabh Nangia, and Tarek F Abdelzaher. 2010. GreenGPS: a participatory sensing fuel-efficient maps application. In Proceedings of the 8th international conference on Mobile systems, applications, and services. ACM, 151--164.
[12]
Jort F Gemmeke, Daniel PW Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio set: An ontology and human-labeled dataset for audio events. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 776--780.
[13]
Hui Han, Wen-Yuan Wang, and Bing-Huan Mao. 2005. Borderline-SMOTE: a new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing. Springer, 878--887.
[14]
Shawn Hershey, Sourish Chaudhuri, Daniel PW Ellis, Jort F Gemmeke, Aren Jansen, R Channing Moore, Manoj Plakal, Devin Platt, Rif A Saurous, Bryan Seybold, et al. 2017. CNN architectures for large-scale audio classification. In Acoustics, Speech and Signal Processing (ICASSP), 2017 IEEE International Conference on. IEEE, 131--135.
[15]
Derek Hao Hu, Vincent Wenchen Zheng, and Qiang Yang. 2011. Cross-domain activity recognition via transfer learning. Pervasive and Mobile Computing 7, 3 (2011), 344--358.
[16]
Kyuwoong Hwang and Soo-Young Lee. 2012. Environmental audio scene and activity recognition through mobile-based crowdsourcing. IEEE Transactions on Consumer Electronics 58, 2 (2012).
[17]
Qiuqiang Kong, Yong Xu, Wenwu Wang, and Mark D Plumbley. 2017. Audio Set classification with attention model: A probabilistic perspective. arXiv preprint arXiv:1711.00927 (2017).
[18]
Jennifer R Kwapisz, Gary M Weiss, and Samuel A Moore. 2011. Activity recognition using cell phone accelerometers. ACM SigKDD Explorations Newsletter 12, 2 (2011), 74--82.
[19]
Nicholas D Lane, Petko Georgiev, and Lorena Qendro. 2015. DeepEar: robust smartphone audio sensing in unconstrained acoustic environments using deep learning. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 283--294.
[20]
Gierad Laput, Karan Ahuja, Mayank Goel, and Chris Harrison. 2018. Ubicoustics: Plug-and-Play Acoustic Activity Recognition. In The 31st Annual ACM Symposium on User Interface Software and Technology. ACM, 213--224.
[21]
Gierad Laput, Yang Zhang, and Chris Harrison. 2017. Synthetic sensors: Towards general-purpose sensing. In Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. ACM, 3986--3999.
[22]
Guillaume Lemaître, Fernando Nogueira, and Christos K. Aridas. 2017. Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning. Journal of Machine Learning Research 18, 17 (2017), 1--5. http://jmlr.org/papers/v18/16-365.html
[23]
Alexander Liu, Joydeep Ghosh, and Cheryl E Martin. 2007. Generative Oversampling for Mining Imbalanced Datasets. In DMIN. 66--72.
[24]
Hong Lu, Wei Pan, Nicholas D Lane, Tanzeem Choudhury, and Andrew T Campbell. 2009. SoundSense: scalable sound sensing for people-centric applications on mobile phones. In Proceedings of the 7th international conference on Mobile systems, applications, and services. ACM, 165--178.
[25]
Emiliano Miluzzo, Nicholas D Lane, Kristóf Fodor, Ronald Peterson, Hong Lu, Mirco Musolesi, Shane B Eisenman, Xiao Zheng, and Andrew T Campbell. 2008. Sensing meets mobile social networks: the design, implementation and evaluation of the cenceme application. In Proceedings of the 6th ACM conference on Embedded network sensor systems. ACM, 337--350.
[26]
Long-Van Nguyen-Dinh, Ulf Blanke, and Gerhard Tröster. 2013. Towards scalable activity recognition: Adapting zero-effort crowdsourced acoustic models. In Proceedings of the 12th International Conference on Mobile and Ubiquitous Multimedia. ACM, 18.
[27]
Long-Van Nguyen-Dinh, Mirco Rossi, Ulf Blanke, and Gerhard Tröster. 2013. Combining crowd-generated media and personal data: semi-supervised learning for context recognition. In Proceedings of the 1st ACM international workshop on Personal data meets distributed multimedia. ACM, 35--38.
[28]
F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cournapeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825--2830.
[29]
Nishkam Ravi, Nikhil Dandekar, Preetham Mysore, and Michael L Littman. 2005. Activity recognition from accelerometer data. In Aaai, Vol. 5. 1541--1546.
[30]
Mirco Rossi, Sebastian Feese, Oliver Amft, Nils Braune, Sandro Martis, and Gerhard Tröster. 2013. AmbientSense: A real-time ambient sound recognition system for smartphones. In Pervasive Computing and Communications Workshops (PERCOM Workshops), 2013 IEEE International Conference on. IEEE, 230--235.
[31]
Mirco Rossi, Gerhard Troster, and Oliver Amft. 2012. Recognizing daily life context using web-collected audio data. In Wearable Computers (ISWC), 2012 16th International Symposium on. IEEE, 25--28.
[32]
Sebastian Säger, Benjamin Elizalde, Damian Borth, Christian Schulze, Bhiksha Raj, and Ian Lane. 2018. AudioPairBank: towards a large-scale tag-pair-based audio content analysis. EURASIP Journal on Audio, Speech, and Music Processing 2018, 1 (2018), 12.
[33]
Justin Salamon and Juan Pablo Bello. 2017. Deep convolutional neural networks and data augmentation for environmental sound classification. IEEE Signal Processing Letters 24, 3 (2017), 279--283.
[34]
Justin Salamon, Christopher Jacoby, and Juan Pablo Bello. 2014. A dataset and taxonomy for urban sound research. In Proceedings of the 22nd ACM international conference on Multimedia. ACM, 1041--1044.
[35]
Muhammad Shoaib, Stephan Bosch, Hans Scholten, Paul JM Havinga, and Ozlem Durmaz Incel. 2015. Towards detection of bad habits by fusing smartphone and smartwatch sensors. In Pervasive Computing and Communication Workshops (PerCom Workshops), 2015 IEEE International Conference on. IEEE, 591--596.
[36]
Edison Thomaz, Irfan Essa, and Gregory D Abowd. 2015. A practical approach for recognizing eating moments with wrist-mounted inertial sensing. In Proceedings of the 2015 ACM International Joint Conference on Pervasive and Ubiquitous Computing. ACM, 1029--1040.
[37]
Edison Thomaz, Cheng Zhang, Irfan Essa, and Gregory D Abowd. 2015. Inferring meal eating activities in real world settings from ambient sounds: A feasibility study. In Proceedings of the 20th International Conference on Intelligent User Interfaces. ACM, 427--431.
[38]
Danny Wyatt, Tanzeem Choudhury, and Jeff Bilmes. 2007. Conversation detection and speaker segmentation in privacy-sensitive situated speech data. In Eighth Annual Conference of the International Speech Communication Association.
[39]
Hanghang Tong Xing Su and Ping Ji. 2014. Activity Recognition with Smartphone Sensors. Tsinghua Science and Technology 19, 3 (2014), 235--249.
[40]
Koji Yatani and Khai N Truong. 2012. BodyScope: a wearable acoustic sensor for activity recognition. In Proceedings of the 2012 ACM Conference on Ubiquitous Computing. ACM, 341--350.

Cited By

View all
  • (2024)Effect of Timing of Rehabilitation Nursing Intervention on Children with Acute Viral EncephalitisIranian Journal of Pediatrics10.5812/ijp-14290634:3Online publication date: 17-May-2024
  • (2024)Motion Capture Technology in Sports Scenarios: A SurveySensors10.3390/s2409294724:9(2947)Online publication date: 6-May-2024
  • (2024)Midas++: Generating Training Data of mmWave Radars From Videos for Privacy-Preserving Human Sensing With MobilityIEEE Transactions on Mobile Computing10.1109/TMC.2023.332539923:6(6650-6666)Online publication date: Jun-2024
  • Show More Cited By

Index Terms

  1. Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 3, Issue 1
    March 2019
    786 pages
    EISSN:2474-9567
    DOI:10.1145/3323054
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 29 March 2019
    Accepted: 01 January 2019
    Received: 01 November 2018
    Published in IMWUT Volume 3, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Activity Recognition
    2. Audio Processing
    3. Deep Learning
    4. Multi-Class Classification

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)88
    • Downloads (Last 6 weeks)10
    Reflects downloads up to 13 Sep 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Effect of Timing of Rehabilitation Nursing Intervention on Children with Acute Viral EncephalitisIranian Journal of Pediatrics10.5812/ijp-14290634:3Online publication date: 17-May-2024
    • (2024)Motion Capture Technology in Sports Scenarios: A SurveySensors10.3390/s2409294724:9(2947)Online publication date: 6-May-2024
    • (2024)Midas++: Generating Training Data of mmWave Radars From Videos for Privacy-Preserving Human Sensing With MobilityIEEE Transactions on Mobile Computing10.1109/TMC.2023.332539923:6(6650-6666)Online publication date: Jun-2024
    • (2024)AcouDL: Context-Aware Daily Activity Recognition from Natural Acoustic Signals2024 IEEE International Conference on Smart Computing (SMARTCOMP)10.1109/SMARTCOMP61445.2024.00077(332-337)Online publication date: 29-Jun-2024
    • (2024)Multimodal Vehicle Classification Based on Radar and Acoustic Sensors Using Hybrid Shallow CNN2024 IEEE Symposium on Wireless Technology & Applications (ISWTA)10.1109/ISWTA62130.2024.10652023(264-268)Online publication date: 20-Jul-2024
    • (2024)Classification of Nursing Care Activities Using Smartwatches2024 International Conference on Activity and Behavior Computing (ABC)10.1109/ABC61795.2024.10652176(1-10)Online publication date: 29-May-2024
    • (2024)Full-coverage unobtrusive health monitoring of elders at homesInternet of Things10.1016/j.iot.2024.10118226(101182)Online publication date: Jul-2024
    • (2024)Transfer learning and its extensive appositeness in human activity recognition: A surveyExpert Systems with Applications10.1016/j.eswa.2023.122538240(122538)Online publication date: Apr-2024
    • (2024)Digital human and embodied intelligence for sports science: advancements, opportunities and prospectsThe Visual Computer10.1007/s00371-024-03547-4Online publication date: 21-Jun-2024
    • (2023)Multi-dimensional task recognition for human-robot teaming: literature reviewFrontiers in Robotics and AI10.3389/frobt.2023.112337410Online publication date: 7-Aug-2023
    • Show More Cited By

    View Options

    Get Access

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media