Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Ok Google, What Am I Doing?: Acoustic Activity Recognition Bounded by Conversational Assistant Interactions

Published: 30 March 2021 Publication History

Abstract

Conversational assistants in the form of stand-alone devices such as Amazon Echo and Google Home have become popular and embraced by millions of people. By serving as a natural interface to services ranging from home automation to media players, conversational assistants help people perform many tasks with ease, such as setting timers, playing music and managing to-do lists. While these systems offer useful capabilities, they are largely passive and unaware of the human behavioral context in which they are used. In this work, we explore how off-the-shelf conversational assistants can be enhanced with acoustic-based human activity recognition by leveraging the short interval after a voice command is given to the device. Since always-on audio recording can pose privacy concerns, our method is unique in that it does not require capturing and analyzing any audio other than the speech-based interactions between people and their conversational assistants. In particular, we leverage background environmental sounds present in these short duration voice-based interactions to recognize activities of daily living. We conducted a study with 14 participants in 3 different locations in their own homes. We showed that our method can recognize 19 different activities of daily living with average precision of 84.85% and average recall of 85.67% in a leave-one-participant-out performance evaluation with 30-second audio clips bound by the voice interactions.

References

[1]
2020. Smart Home Personal Assistants: A Security and Privacy Review. Comput. Surveys (22 July 2020).
[2]
Noura Abdi, Kopo M. Ramokapane, and Jose M. Such. 2019. More than Smart Speakers: Security and Privacy Perceptions of Smart Home Personal Assistants. In Fifteenth Symposium on Usable Privacy and Security (SOUPS 2019). USENIX Association, Santa Clara, CA. https://www.usenix.org/conference/soups2019/presentation/abdi
[3]
M'hamed Bilal Abidine and Belkacem Fergani. 2015. News Schemes for Activity Recognition Systems Using PCA-WSVM, ICA-WSVM, and LDA-WSVM. Information 6, 3 (2015), 505--521. https://doi.org/10.3390/info6030505
[4]
A. Arcelus, R. Goubran, H. Sveistrup, M. Bilodeau, and F. Knoefel. 2010. Context-aware smart home monitoring through pressure measurement sequences. In 2010 IEEE International Workshop on Medical Measurements and Applications. 32--37.
[5]
Frank Bentley, Chris Luvogt, Max Silverman, Rushani Wirasinghe, Brooke White, and Danielle Lottridge. 2018. Understanding the Long-Term Use of Smart Speaker Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 3, Article 91 (Sept. 2018), 24 pages. https://doi.org/10.1145/3264901
[6]
John Callaham. 2018. Speaking to Google Home will now be more natural with Continued Conversation. https://www.androidauthority.com/google-home-continued-conversation-878770/
[7]
Hongzhao Chen, Guijin Wang, Jing-Hao Xue, and Li He. 2016. A novel hierarchical framework for human action recognition. Pattern Recognition 55 (2016), 148--159. https://doi.org/10.1016/j.patcog.2016.01.020
[8]
K. S. Chun, H. Jeong, R. Adaimi, and E. Thomaz. 2020. Eating Episode Detection with Jawbone-Mounted Inertial Sensing. In 2020 42nd Annual International Conference of the IEEE Engineering in Medicine Biology Society (EMBC). 4361--4364. https://doi.org/10.1109/EMBC44109.2020.9175949
[9]
Keum San Chun, Ashley B. Sanders, Rebecca Adaimi, Necole Streeper, David E. Conroy, and Edison Thomaz. 2019. Towards a Generalizable Method for Detecting Fluid Intake with Wrist-Mounted Sensors and Adaptive Segmentation. In Proceedings of the 24th International Conference on Intelligent User Interfaces (Marina del Ray, California) (IUI '19). Association for Computing Machinery, New York, NY, USA, 80--85. https://doi.org/10.1145/3301275.3302315
[10]
C. Debes, A. Merentitis, S. Sukhanov, M. Niessen, N. Frangiadakis, and A. Bauer. 2016. Monitoring Activities of Daily Living in Smart Homes: Understanding human behavior. IEEE Signal Processing Magazine 33, 2 (March 2016), 81--94. https://doi.org/10.1109/MSP.2015.2503881
[11]
Svilen Dimitrov, Jochen Britz, Boris Brandherm, and Jochen Frey. 2014. Analyzing Sounds of Home Environment for Device Recognition. In Ambient Intelligence, Emile Aarts, Boris de Ruyter, Panos Markopoulos, Evert van Loenen, Reiner Wichert, Ben Schouten, Jacques Terken, Rob Van Kranenburg, Elke Den Ouden, and Gregory O'Hare (Eds.). Springer International Publishing, Cham, 1--16.
[12]
Stefania Druga, Randi Williams, Cynthia Breazeal, and Mitchel Resnick. 2017. "Hey Google is It OK If I Eat You?": Initial Explorations in Child-Agent Interaction. In Proceedings of the 2017 Conference on Interaction Design and Children (Stanford, California, USA) (IDC '17). Association for Computing Machinery, New York, NY, USA, 595--600. https://doi.org/10.1145/3078072.3084330
[13]
Daniel J. Dubois, Roman Kolcun, Anna Maria Mandalari, Muhammad Talha Paracha, David Choffnes, and Hamed Haddadi. 01 Oct. 2020. When Speakers Are All Ears: Characterizing Misactivations of IoT Smart Speakers. Proceedings on Privacy Enhancing Technologies 2020, 4 (01 Oct. 2020), 255--276. https://doi.org/10.2478/popets-2020-0072
[14]
M. Forouzanfar, M. Mabrouk, S. Rajan, M. Bolic, H. R. Dajani, and V. Z. Groza. 2017. Event Recognition for Contactless Activity Monitoring Using Phase-Modulated Continuous Wave Radar. IEEE Transactions on Biomedical Engineering 64, 2 (2017), 479--491.
[15]
Radhika Garg and Subhasree Sengupta. 2020. He Is Just Like Me: A Study of the Long-Term Use of Smart Speakers by Parents and Children. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 4, 1, Article 11 (March 2020), 24 pages. https://doi.org/10.1145/3381002
[16]
Jort F. Gemmeke, Daniel P. W. Ellis, Dylan Freedman, Aren Jansen, Wade Lawrence, R. Channing Moore, Manoj Plakal, and Marvin Ritter. 2017. Audio Set: An ontology and human-labeled dataset for audio events. In Proc. IEEE ICASSP 2017. New Orleans, LA.
[17]
Google. [n.d.]. Introduction to the Google Assistant Service | Google Assistant SDK. https://developers.google.com/assistant/sdk/guides/service/python
[18]
S. Hershey, S. Chaudhuri, D. P. W. Ellis, J. F. Gemmeke, A. Jansen, R. C. Moore, M. Plakal, D. Platt, R. A. Saurous, B. Seybold, M. Slaney, R. J. Weiss, and K. Wilson. 2017. CNN architectures for large-scale audio classification. In 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). 131--135.
[19]
Peter Hevesi, Sebastian Wille, Gerald Pirkl, Norbert Wehn, and Paul Lukowicz. 2014. Monitoring Household Activities and User Location with a Cheap, Unobtrusive Thermal Sensor Array. In Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing (Seattle, Washington) (UbiComp '14). Association for Computing Machinery, New York, NY, USA, 141--145. https://doi.org/10.1145/2632048.2636084
[20]
Bret Kinsella. 2019. Loup Ventures Says 75% of U.S. Households Will Have Smart Speakers by 2025, Google to Surpass Amazon in Market Share. https://voicebot.ai/2019/06/18/loup-ventures-says-75-of-u-s-households-will-have-smart-speakers-by-2025-google-to-surpass-amazon-in-market-share/
[21]
Julia Kiseleva, Kyle Williams, Jiepu Jiang, Ahmed Hassan Awadallah, Aidan C. Crook, Imed Zitouni, and Tasos Anastasakos. 2016. Understanding User Satisfaction with Intelligent Assistants. In Proceedings of the 2016 ACM on Conference on Human Information Interaction and Retrieval (Carrboro, North Carolina, USA) (CHIIR '16). Association for Computing Machinery, New York, NY, USA, 121--130. https://doi.org/10.1145/2854946.2854961
[22]
Q. Kong, Y. Cao, T. Iqbal, Y. Wang, W. Wang, and M. D. Plumbley. 2020. PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition. IEEE/ACM Transactions on Audio, Speech, and Language Processing 28 (2020), 2880--2894. https://doi.org/10.1109/TASLP.2020.3030497
[23]
Dounia Lahoual and Myriam Frejus. 2019. When Users Assist the Voice Assistants: From Supervision to Failure Resolution. In Extended Abstracts of the 2019 CHI Conference on Human Factors in Computing Systems (Glasgow, Scotland Uk) (CHI EA '19). Association for Computing Machinery, New York, NY, USA, Article CS08, 8 pages. https://doi.org/10.1145/3290607.3299053
[24]
Gierad Laput, Karan Ahuja, Mayank Goel, and Chris Harrison. 2018. Ubicoustics: Plug-and-Play Acoustic Activity Recognition. In Proceedings of the 31st Annual ACM Symposium on User Interface Software and Technology (Berlin, Germany) (UIST '18). Association for Computing Machinery, New York, NY, USA, 213--224. https://doi.org/10.1145/3242587.3242609
[25]
Y. Li, W. Li, V. Mahadevan, and N. Vasconcelos. 2016. VLAD3: Encoding Dynamics of Deep Features for Action Recognition. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1951--1960.
[26]
Dawei Liang and Edison Thomaz. 2019. Audio-Based Activities of Daily Living (ADL) Recognition with Large-Scale Acoustic Embeddings from Online Videos. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 1, Article 17 (March 2019), 18 pages. https://doi.org/10.1145/3314404
[27]
Martino Lombardi, Roberto Vezzani, and Rita Cucchiara. 2015. Detection of Human Movements with Pressure Floor Sensors. In ICIAP.
[28]
Raspberry Pi Camera Module. [n.d.]. Raspberry Pi Camera Module. https://www.raspberrypi.org/documentation/usage/camera/
[29]
S. C. Mukhopadhyay. 2015. Wearable Sensors for Human Activity Monitoring: A Review. IEEE Sensors Journal 15, 3 (March 2015), 1321--1330. https://doi.org/10.1109/JSEN.2014.2370945
[30]
Helen Nissenbaum. 2004. Privacy As Contextual Integrity. Washington Law Review 79 (05 2004).
[31]
NPR. 2020. NPR and Edison Research Report: 60M U.S. Adults 18 Own a Smart Speaker. https://www.npr.org/about-npr/794588984/npr-and-edison-research-report-60m-u-s-adults-18-own-a-smart-speaker
[32]
Shigeyuki Odashima, Toshikazu Kanaoka, Katsushi Miura, Keiju Okabayashi, and Naoyuki Sawasaki. 2016. Human Activeness Recognition by Variety of Rare Sounds. In Proceedings of the 2016 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct (Heidelberg, Germany) (UbiComp '16). Association for Computing Machinery, New York, NY, USA, 181--184. https://doi.org/10.1145/2968219.2971400
[33]
Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI '18). Association for Computing Machinery, New York, NY, USA, Article 640, 12 pages. https://doi.org/10.1145/3173574.3174214
[34]
Martin Porcheron, Joel E. Fischer, Stuart Reeves, and Sarah Sharples. 2018. Voice Interfaces in Everyday Life. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI '18). ACM, New York, NY, USA, Article 640, 12 pages. https://doi.org/10.1145/3173574.3174214
[35]
Alisha Pradhan, Leah Findlater, and Amanda Lazar. 2019. "Phantom Friend" or "Just a Box with Information": Personification and Ontological Categorization of Smart Speaker-Based Voice Assistants by Older Adults. Proc. ACM Hum.-Comput. Interact. 3, CSCW, Article 214 (Nov. 2019), 21 pages. https://doi.org/10.1145/3359316
[36]
Alisha Pradhan, Kanika Mehta, and Leah Findlater. 2018. "Accessibility Came by Accident": Use of Voice-Controlled Intelligent Personal Assistants by People with Disabilities. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI '18). Association for Computing Machinery, New York, NY, USA, Article 459, 13 pages. https://doi.org/10.1145/3173574.3174033
[37]
Zafar Rafii and Bryan Pardo. 2012. Music/Voice Separation Using the Similarity Matrix. In ISMIR. 583--588.
[38]
Simon Robinson, Jennifer Pearson, Shashank Ahire, Rini Ahirwar, Bhakti Bhikne, Nimish Maravi, and Matt Jones. 2018. Revisiting "Hole in the Wall" Computing: Private Smart Speakers and Public Slum Settings. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (Montreal QC, Canada) (CHI '18). Association for Computing Machinery, New York, NY, USA, Article 498, 11 pages. https://doi.org/10.1145/3173574.3174072
[39]
Bradley Spicer. 2020. How Much Internet Speed Does Your Smart Home Need? https://www.smarthomebit.com/how-much-internet-speed-does-your-smart-home-need/
[40]
Seeed Studio. [n.d.]. ReSpeaker 4-Mic Linear Array Kit for Raspberry Pi. http://wiki.seeedstudio.com/ReSpeaker_4-Mic_Linear_Array_Kit_for_Raspberry_Pi/
[41]
Madiha Tabassum, Tomasz Kosiundefinedski, Alisa Frik, Nathan Malkin, Primal Wijesekera, Serge Egelman, and Heather Richter Lipford. 2019. Investigating Users' Preferences and Expectations for Always-Listening Voice Assistants. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 3, 4, Article 153 (Dec. 2019), 23 pages. https://doi.org/10.1145/3369807
[42]
Catherine Tong, Shyam A. Tailor, and Nicholas D. Lane. 2020. Are Accelerometers for Activity Recognition a Dead-end? arXiv:2001.08111 [cs.CV]
[43]
Sébastien Tremblay, Dany Fortin-Simard, Erika Blackburn-Verreault, Sébastien Gaboury, Bruno Bouchard, and Abdenour Bouzouane. 2015. Exploiting Environmental Sounds for Activity Recognition in Smart Homes. In AAAI Workshop: Artificial Intelligence Applied to Assistive Technologies and Smart Environments.
[44]
Toshifumi Tsukiyama. 2015. In-home Health Monitoring System for Solitary Elderly. Procedia Computer Science 63 (2015), 229--235. https://doi.org/10.1016/j.procs.2015.08.338 The 6th International Conference on Emerging Ubiquitous Systems and Pervasive Networks (EUSPN 2015)/ The 5th International Conference on Current and Future Trends of Information and Communication Technologies in Healthcare (ICTH-2015)/ Affiliated Workshops.
[45]
Prabitha Urwyler, Luca Rampa, Reto Stucki, Marcel Büchler, René Martin Müri, Urs Peter Mosimann, and Tobias Nef. 2015. Recognition of activities of daily living in healthy subjects using two ad-hoc classifiers. In Biomedical engineering online.
[46]
M. Vacher, D. Istrate, F. Portet, T. Joubert, T. Chevalier, S. Smidtas, B. Meillon, B. Lecouteux, M. Sehili, P. Chahuara, and S. Méniard. 2011. The sweet-home project: Audio technology in smart homes to improve well-being and reliance. In 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society. 5291--5294.
[47]
Wei Wang, Fatjon Seraj, Nirvana Meratnia, and Paul J. M. Havinga. 2019. Privacy-Aware Environmental Sound Classification for Indoor Human Activity Recognition. In Proceedings of the 12th ACM International Conference on PErvasive Technologies Related to Assistive Environments (Rhodes, Greece) (PETRA '19). Association for Computing Machinery, New York, NY, USA, 36--44. https://doi.org/10.1145/3316782.3321521
[48]
Jiaxuan Wu, Yunfei Feng, and Peng Sun. 2018. Sensor Fusion for Recognition of Activities of Daily Living. In Sensors.
[49]
Che-Chang Yang and Yeh-Liang Hsu. 2012. Remote monitoring and assessment of daily activities in the home environment. Journal of Clinical Gerontology and Geriatrics 3, 3 (2012), 97--104. https://doi.org/10.1016/j.jcgg.2012.06.002

Cited By

View all
  • (2024)Sensor event sequence prediction for proactive smart homeJournal of Ambient Intelligence and Smart Environments10.3233/AIS-23042916:3(275-308)Online publication date: 24-Sep-2024
  • (2024)Collecting Self-reported Physical Activity and Posture Data Using Audio-based Ecological Momentary AssessmentProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785848:3(1-35)Online publication date: 9-Sep-2024
  • (2024)Re-evaluating the Command-and-Control Paradigm in Conversational Search InteractionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679588(2260-2270)Online publication date: 21-Oct-2024
  • Show More Cited By

Index Terms

  1. Ok Google, What Am I Doing?: Acoustic Activity Recognition Bounded by Conversational Assistant Interactions

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies
    Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies  Volume 5, Issue 1
    March 2021
    1272 pages
    EISSN:2474-9567
    DOI:10.1145/3459088
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 30 March 2021
    Published in IMWUT Volume 5, Issue 1

    Permissions

    Request permissions for this article.

    Check for updates

    Author Tags

    1. Activities of Daily Living
    2. Audio Processing
    3. Conversational Assistants
    4. Deep Learning
    5. Environmental Sounds
    6. Google Home
    7. Human Activity Recognition
    8. Smart Environment
    9. Smart Speaker
    10. Voice Assistants

    Qualifiers

    • Research-article
    • Research
    • Refereed

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)89
    • Downloads (Last 6 weeks)18
    Reflects downloads up to 23 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Sensor event sequence prediction for proactive smart homeJournal of Ambient Intelligence and Smart Environments10.3233/AIS-23042916:3(275-308)Online publication date: 24-Sep-2024
    • (2024)Collecting Self-reported Physical Activity and Posture Data Using Audio-based Ecological Momentary AssessmentProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36785848:3(1-35)Online publication date: 9-Sep-2024
    • (2024)Re-evaluating the Command-and-Control Paradigm in Conversational Search InteractionsProceedings of the 33rd ACM International Conference on Information and Knowledge Management10.1145/3627673.3679588(2260-2270)Online publication date: 21-Oct-2024
    • (2024)Midas++: Generating Training Data of mmWave Radars From Videos for Privacy-Preserving Human Sensing With MobilityIEEE Transactions on Mobile Computing10.1109/TMC.2023.332539923:6(6650-6666)Online publication date: Jun-2024
    • (2024)Exploiting Voice-Controlled Devices to Infer the Layouts of Private Spaces2024 16th International Conference on Human System Interaction (HSI)10.1109/HSI61632.2024.10613549(1-8)Online publication date: 8-Jul-2024
    • (2024)Human Acoustic Events Detection as Anomalies in Industrial Environments Using Shallow Unsupervised TechniquesThe 19th International Conference on Soft Computing Models in Industrial and Environmental Applications SOCO 202410.1007/978-3-031-75013-7_10(98-107)Online publication date: 16-Nov-2024
    • (2023)Identification of Solid and Liquid Materials Using Acoustic Signals and Frequency-Graph FeaturesEntropy10.3390/e2508117025:8(1170)Online publication date: 5-Aug-2023
    • (2023)Automated Face-To-Face Conversation Detection on a Commodity Smartwatch with Acoustic SensingProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/36108827:3(1-29)Online publication date: 27-Sep-2023
    • (2023)CubeSense++: Smart Environment Sensing with Interaction-Powered Corner Reflector MechanismsProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology10.1145/3586183.3606744(1-12)Online publication date: 29-Oct-2023
    • (2023)Real-time Context-Aware Multimodal Network for Activity and Activity-Stage Recognition from Team Communication in Dynamic Clinical SettingsProceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies10.1145/35807987:1(1-28)Online publication date: 28-Mar-2023
    • Show More Cited By

    View Options

    Login options

    Full Access

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media