poster

Investigating Opportunities for Active Smart Assistants to Initiate Interactions With Users

Authors:

Moritz Ziarko, and

Sven MayerAuthors Info & Claims

MUM '23: Proceedings of the 22nd International Conference on Mobile and Ubiquitous Multimedia

December 2023

Pages 495 - 498

https://doi.org/10.1145/3626705.3631787

Published: 03 December 2023 Publication History

Abstract

Passive voice assistants such as Alexa are widespread, responding to user requests. However, due to the rise of domestic robots, we envision active smart assistants initiating interactions seamlessly, weaving themselves into the user’s context, and enabling more suitable interaction. While robots already deliver the hardware, only recently have the advancements in artificial intelligence enabled assistants to grasp the human and the environments to support such visions. We combined hardware with artificial intelligence to build an attentive robot. Here, we present a robotic head prototype discovering and following the users in a room supported by video and sound. We contribute (1) the design and implementation of a prototype system for an active smart assistant and (2) a discussion on design principles for systems engaging in human conversations. This work aims to provide foundations for future research for active smart assistants.

References

[1]

Elizabeth Broadbent, Vinayak Kumar, Xingyan Li, John Sollers 3rd, Rebecca Q. Stafford, Bruce A. MacDonald, and Daniel M. Wegner. 2013. Robots with Display Screens: A Robot with a More Humanlike Face Display Is Perceived To Have More Mind and a Better Personality. PLOS ONE 8, 8 (2013). https://doi.org/10.1371/journal.pone.0072589

[2]

Qian Chen, Zhu Zhuo, and Wen Wang. 2019. BERT for Joint Intent Classification and Slot Filling. (2019). https://doi.org/10.48550/arxiv.1902.10909

[3]

Roeland De Geest, Efstratios Gavves, Amir Ghodrati, Zhenyang Li, Cees Snoek, and Tinne Tuytelaars. 2016. Online Action Detection. arxiv:1604.06506 [cs]

[4]

Joseph Hector Dibiase. 2000. A High-Accuracy, Low-Latency Technique for Talker Localization in Reverberant Environments Using Microphone Arrays. Ph. D. Dissertation.

[5]

Rolf Dieter Schraft, Birgit Graf, Andreas Traub, and Dirk John. 2001. A Mobile Robot Platform for Assistance and Entertainment. Industrial Robot: An International Journal 28, 1 (2001), 29–35. https://doi.org/10.1108/01439910110380424

[6]

Jannik Fritsch, Marcus Kleinehagenbrock, Sebastian Lang, Gernot A. Fink, and Gerhard Sagerer. 2004. Audiovisual Person Tracking with a Mobile Robot. Proc. Int. Conf. on Intelligent Autonomous Systems (2004).

[7]

Randy Gomez, Deborah Szapiro, Kerl Galindo, and Keisuke Nakamura. 2018. Haru: Hardware Design of an Experimental Tabletop Robot Assistant. In Proceedings of the 2018 ACM/IEEE International Conference on Human-Robot Interaction (Chicago, IL, USA) (HRI ’18). Association for Computing Machinery, New York, NY, USA, 233–240. https://doi.org/10.1145/3171221.3171288

Digital Library

[8]

Van Bay Hoang, Van Hung Nguyen, Trung Dung Ngo, and Xuan-Tung Truong. 2023. Socially Aware Robot Navigation Framework: Where and How to Approach People in Dynamic Social Environments. IEEE Transactions on Automation Science and Engineering 20, 2 (2023), 1322–1336. https://doi.org/10.1109/tase.2022.3174141

[9]

Mohammed Moshiul Hoque, Yoshinori Kobayashi, and Yoshinori Kuno. 2014. A Proactive Approach of Robotic Framework for Making Eye Contact with Humans. Advances in Human-Computer Interaction 2014 (2014). https://doi.org/10.1155/2014/694046

[10]

Fei Jia, Somshubra Majumdar, and Boris Ginsburg. 2020. MarbleNet: Deep 1D Time-Channel Separable Convolutional Neural Network for Voice Activity Detection. (2020). https://doi.org/10.48550/arxiv.2010.13886

[11]

Daphne Karreman, Lex Utama, Michiel Joosse, Manja Lohse, Betsy van Dijk, and Vanessa Evers. 2014. Robot Etiquette: How to Approach a Pair of People?. In Proceedings of the 2014 ACM/IEEE International Conference on Human-Robot Interaction (Bielefeld, Germany) (HRI ’14). Association for Computing Machinery, New York, NY, USA, 196–197. https://doi.org/10.1145/2559636.2559839

Digital Library

[12]

Yusuke Kato, Takayuki Kanda, and Hiroshi Ishiguro. 2015. May I Help You? Design of Human-like Polite Approaching Behavior. In Proceedings of the Tenth Annual ACM/IEEE International Conference on Human-Robot Interaction (Portland, Oregon, USA) (HRI ’15). Association for Computing Machinery, New York, NY, USA, 35–42. https://doi.org/10.1145/2696454.2696463

Digital Library

[13]

Wesley Kerr and Paul Cohen. 2010. Recognizing Behaviors and the Internal State of the Participants. In 2010 IEEE 9th International Conference on Development and Learning (Ann Arbor, MI, USA, 2010-08). Ieee, 33–38. https://doi.org/10.1109/devlrn.2010.5578868

[14]

Davis King. 2017. High Quality Face Recognition with Deep Metric Learning.

[15]

Charles Knapp and G. Clifford Carter. 1976. The Generalized Correlation Method for Estimation of Time Delay. IEEE Transactions on Acoustics, Speech, and Signal Processing 24, 4 (1976), 320–327. https://doi.org/10.1109/tassp.1976.1162830

[16]

Sebastian Lang, Marcus Kleinehagenbrock, Sascha Hohenner, Jannik Fritsch, Gernot A. Fink, and Gerhard Sagerer. 2003. Providing the Basis for Human-Robot-Interaction: A Multi-Modal Attention System for a Mobile Robot. In Proceedings of the 5th International Conference on Multimodal Interfaces (Vancouver, British Columbia, Canada) (ICMI ’03). Association for Computing Machinery, New York, NY, USA, 28–35. https://doi.org/10.1145/958432.958441

Digital Library

[17]

Jan Leusmann, Chao Wang, Michael Gienger, Albrecht Schmidt, and Sven Mayer. 2023. Understanding the Uncertainty Loop of Human-Robot Interaction. https://doi.org/10.48550/arXiv.2303.07889 arxiv:2303.07889 [cs.HC]

[18]

Camillo Lugaresi, Jiuqiang Tang, Hadon Nash, Chris McClanahan, Esha Uboweja, Michael Hays, Fan Zhang, Chuo-Ling Chang, Ming Guang Yong, Juhyun Lee, Wan-Teh Chang, Wei Hua, Manfred Georg, and Matthias Grundmann. 2019. MediaPipe: A Framework for Building Perception Pipelines. (2019). https://doi.org/10.48550/arxiv.1906.08172

[19]

Kazuki Mizumaru, Satoru Satake, Takayuki Kanda, and Tetsuo Ono. 2019. Stop Doing It! Approaching Strategy for a Robot to Admonish Pedestrians. In 2019 14th ACM/IEEE International Conference on Human-Robot Interaction(HRI ’19). IEEE, Daegu, Korea (South), 449–457. https://doi.org/10.1109/hri.2019.8673017

[20]

Ali Mollahosseini, Hojjat Abdollahi, Timothy D. Sweeny, Ron Cole, and Mohammad H. Mahoor. 2018. Role of Embodiment and Presence in Human Perception of Robots’ Facial Cues. International Journal of Human-Computer Studies 116 (2018), 25–39. https://doi.org/10.1016/j.ijhcs.2018.04.005

[21]

D. Patel, Steve Fleming, and James Kilner. 2012. Inferring Subjective States through the Observation of Actions. Proceedings of the Royal Society B: Biological Sciences 279, 1748 (2012), 4853–4860. https://doi.org/10.1098/rspb.2012.1847

[22]

Manoranjan Paul, Shah M. E. Haque, and Subrata Chakraborty. 2013. Human Detection in Surveillance Videos and Its Applications - a Review. EURASIP Journal on Advances in Signal Processing 2013, 1 (2013), 176. https://doi.org/10.1186/1687-6180-2013-176

[23]

Danielle Pillet-Shore. 2018. How to Begin. Research on Language and Social Interaction 51, 3 (2018), 213–231. https://doi.org/10.1080/08351813.2018.1485224

[24]

Alec Radford, Jong Wook Kim, Tao Xu, Greg Brockman, Christine McLeavey, and Ilya Sutskever. 2022. Robust Speech Recognition via Large-Scale Weak Supervision. (2022). https://doi.org/10.48550/arxiv.2212.04356

[25]

Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi. 2016. You Only Look Once: Unified, Real-Time Object Detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). IEEE, Las Vegas, NV, USA, 779–788. https://doi.org/10.1109/cvpr.2016.91

[26]

Satoru Satake, Takayuki Kanda, Dylan F. Glas, Michita Imai, Hiroshi Ishiguro, and Norihiro Hagita. 2009. How to Approach Humans? Strategies for Social Robots to Initiate Interaction. In Proceedings of the 4th ACM/IEEE International Conference on Human Robot Interaction (La Jolla, California, USA) (HRI ’09). Association for Computing Machinery, New York, NY, USA, 109–116. https://doi.org/10.1145/1514095.1514117

Digital Library

[27]

Shayla Sharmin, Mohammed Moshiul Hoque, S. M. Riazul Islam, Md. Fazlul Kader, and Iqbal H. Sarker. 2021. Development of Duplex Eye Contact Framework for Human-Robot Inter Communication. IEEE Access 9 (2021), 54435–54456. https://doi.org/10.1109/access.2021.3071129

[28]

Chao Shi, Michihiro Shimada, Takayuki Kanda, Hiroshi Ishiguro, and Norihiro Hagita. 2012. Spatial Formation Model for Initiating Conversation. In Robotics. The MIT Press, 305–312. https://doi.org/10.7551/mitpress/9481.003.0044

[29]

Mei Si and Joseph Dean McDaniel. 2016. Using Facial Expression and Body Language to Express Attitude for Non-Humanoid Robot: (Extended Abstract). In Proceedings of the 2016 International Conference on Autonomous Agents & Multiagent Systems (Singapore, Singapore) (AAMAS ’16). International Foundation for Autonomous Agents and Multiagent Systems, Richland, SC, 1457–1458.

Digital Library

[30]

Robert P. Spunt and Ralph Adolphs. 2019. The Neuroscience of Understanding the Emotions of Others. Neuroscience Letters (2019), 44–48. https://doi.org/10.1016/j.neulet.2017.06.018

[31]

Shashi Suman, Francois Rivest, and Ali Etemad. 2022. Towards Personalization of User Preferences in Partially Observable Smart Home Environments. arxiv:2112.00971 [cs]

[32]

Paul Thagard and Ziva Kunda. 1997. Making Sense of People: Coherence Mechanisms. Hillsdale, NJ: Erlbaum.

[33]

Joshua Wainer, David Feil-seifer, Dylan Shell, and Maja Mataric. 2006. The Role of Physical Embodiment in Human-Robot Interaction. In ROMAN 2006 - The 15th IEEE International Symposium on Robot and Human Interactive Communication (2006-09). IEEE, 117–122. https://doi.org/10.1109/roman.2006.314404

Index Terms

Investigating Opportunities for Active Smart Assistants to Initiate Interactions With Users
1. Human-centered computing
  1. Human computer interaction (HCI)
    1. Interaction techniques

Recommendations

Tickling Proactivity: Exploring the Use of Humor in Proactive Voice Assistants
MUM '23: Proceedings of the 22nd International Conference on Mobile and Ubiquitous Multimedia

With rapid advances in artificial intelligence and natural language processing, voice assistants are evolving into advanced digital personal assistants capable of complex tasks. As they become more proficient at understanding people’s behaviors, ...
Read More
An Evaluation of Visual Embodiment for Voice Assistants on Smart Displays
CUI '21: Proceedings of the 3rd Conference on Conversational User Interfaces

Smart displays augment the concept of a smart home speaker with a touchscreen. Although the visual modality is added in this device variant, the virtual agent is still only represented through auditory output and remains invisible in most current ...
Read More
Evaluating users' reactions to human-like interfaces
From brows to trust

An increasing number of dialogue systems are deployed to provide public services in our everyday lives. They are becoming more service-minded and several of them provide different channels for interaction. The rationale is to make automatic services ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

MUM '23: Proceedings of the 22nd International Conference on Mobile and Ubiquitous Multimedia

December 2023

607 pages

ISBN:9798400709210

DOI:10.1145/3626705

Copyright © 2023 Owner/Author.

Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the Owner/Author.

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 December 2023

Check for updates

Author Tags

Qualifiers

Poster
Research
Refereed limited

Conference

MUM '23

MUM '23: International Conference on Mobile and Ubiquitous Multimedia

December 3 - 6, 2023

Vienna, Austria

Acceptance Rates

Overall Acceptance Rate 190 of 465 submissions, 41%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
31
Total Downloads

Downloads (Last 12 months)31
Downloads (Last 6 weeks)2

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents