Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3242969.3243029acmotherconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article
Open access

Multimodal Dialogue Management for Multiparty Interaction with Infants

Published: 02 October 2018 Publication History

Abstract

We present dialogue management routines for a system to engage in multiparty agent-infant interaction. The ultimate purpose of this research is to help infants learn a visual sign language by engaging them in naturalistic and socially contingent conversations during an early-life critical period for language development (ages 6 to 12 months) as initiated by an artificial agent. As a first step, we focus on creating and maintaining agent-infant engagement that elicits appropriate and socially contingent responses from the baby. Our system includes two agents, a physical robot and an animated virtual human. The system's multimodal perception includes an eye-tracker (measures attention) and a thermal infrared imaging camera (measures patterns of emotional arousal). A dialogue policy is presented that selects individual actions and planned multiparty sequences based on perceptual inputs about the baby's internal changing states of emotional engagement. The present version of the system was evaluated in interaction with 8 babies. All babies demonstrated spontaneous and sustained engagement with the agents for several minutes, with patterns of conversationally relevant and socially contingent behaviors. We further performed a detailed case-study analysis with annotation of all agent and baby behaviors. Results show that the baby's behaviors were generally relevant to agent conversations and contained direct evidence for socially contingent responses by the baby to specific linguistic samples produced by the avatar. This work demonstrates the potential for language learning from agents in very young babies and has especially broad implications regarding the use of artificial agents with babies who have minimal language exposure in early life.

References

[1]
ActiveMQ 2018. Apache ACTIVEMQ. http://activemq.apache.org. Accessed:2018-04-25.
[2]
Oya Aran, Ismail Ari, Lale Akarun, Bülent Sankur, Alexandre Benoit, Alice Caplier, Pavel Campr, Ana Huerta Carrillo, and Francc ois-Xavier Fanard. 2009. SignTutor: An Interactive System for Sign Language Tutoring. IEEE MultiMedia, Vol. 16, 1 (2009), 81--93.
[3]
Akiko Arita, Kazuo Hiraki, Takayuki Kanda, and Hiroshi Ishiguro. 2005. Can we talk to robots? Ten-month-old infants expected interactive humanoid robots to be talked to by persons. Cognition, Vol. 95, 3 (2005), B49--B57.
[4]
Paul Debevec. 2012. The light stages and their applications to photoreal digital actors. SIGGRAPH Asia, Vol. 2, 4 (2012).
[5]
Amy Sue Finn. 2010. The sensitive period for language acquisition: The role of age related differences in cognitive and neural function .University of California, Berkeley.
[6]
David J Greenberg, Donald Hillman, and Dean Grice. 1973. Infant and stranger variables related to stranger anxiety in the first year of life. Developmental Psychology, Vol. 9, 2 (1973), 207.
[7]
P Higgins. 1980. Outsiders in a hearing world. (1980).
[8]
William F House. 1976. Cochlear implants. Annals of Otology, Rhinology & Laryngology, Vol. 85, 3 (1976), 3--3.
[9]
Stephanos Ioannou, Vittorio Gallese, and Arcangelo Merla. 2014. Thermal infrared imaging in psychophysiology: potentialities and limits. Psychophysiology, Vol. 51, 10 (2014), 951--963.
[10]
IR thermal camera 2018. FLIR A655sc. https://www.flir.com/products/a655sc/. Accessed: 2018-04-25.
[11]
Kabil Jaballah and Mohamed Jemni. 2013. A Review on 3D signing avatars: Benefits, uses and challenges. International Journal of Multimedia Data Engineering and Management (IJMDEM), Vol. 4, 1 (2013), 21--45.
[12]
Michael Kipp, Alexis Heloir, and Quan Nguyen. 2011. Sign language avatars: Animation and comprehensibility. In International Workshop on Intelligent Virtual Agents. Springer, 113--126.
[13]
Edward S. Klima and Ursula Bellugi. 1979. The signs of language.
[14]
Hatice Kose, Neziha Akalin, and Pinar Uluer. 2014. Socially interactive robotic platforms as sign language tutors. International Journal of Humanoid Robotics, Vol. 11, 01 (2014), 1450003.
[15]
Hatice Kose, Rabia Yorganci, Esra H Algan, and Dag S Syrdal. 2012. Evaluation of the robot assisted sign language tutoring using video-based studies. International Journal of Social Robotics, Vol. 4, 3 (2012), 273--283.
[16]
Hatice Kose, Rabia Yorganci, and Itauma I Itauma. 2011. Humanoid robot assisted interactive sign language tutoring game. In Robotics and Biomimetics (ROBIO), 2011 IEEE International Conference on. IEEE, 2247--2248.
[17]
Marina Krcmar. 2011. Word learning in very young children from infant-directed DVDs. Journal of Communication, Vol. 61, 4 (2011), 780--794.
[18]
Marina Krcmar, Bernard Grela, and Kirsten Lin. 2007. Can toddlers learn vocabulary from television? An experimental approach. Media Psychology, Vol. 10, 1 (2007), 41--63.
[19]
Patricia K Kuhl. 2004. Early language acquisition: cracking the speech code. Nature reviews neuroscience, Vol. 5, 11 (2004), 831.
[20]
Patricia K. Kuhl, Feng-Ming Tsao, and Huei-Mei Liu. 2003. Foreign-language experience in infancy: Effects of short-term exposure and social interaction on phonetic learning. Proceedings of the National Academy of Sciences, Vol. 100, 15 (2003), 9096--9101. /www.pnas.org/content/100/15/9096.full.pdf
[21]
Daniel Leyzberg, Samuel Spaulding, Mariya Toneva, and Brian Scassellati. 2012. The physical presence of a robot tutor increases cognitive learning gains. In Proceedings of the Annual Meeting of the Cognitive Science Society, Vol. 34.
[22]
Barbara Manini, Daniela Cardone, Sjoerd Ebisch, Daniela Bafunno, Tiziana Aureli, and Arcangelo Merla. 2013. Mom feels what her child feels: thermal signatures of vicarious autonomic response while watching children in a stressful situation. Frontiers in human neuroscience, Vol. 7 (2013), 299.
[23]
Andrew N Meltzoff, Rechele Brooks, Aaron P Shon, and Rajesh PN Rao. 2010. "Social" robots are psychological agents for infants: A test of gaze following. Neural networks, Vol. 23, 8--9 (2010), 966--972.
[24]
Arcangelo Merla. 2014. Thermal expression of intersubjectivity offers new possibilities to human--machine and technologically mediated interactions. Frontiers in psychology, Vol. 5 (2014), 802.
[25]
Tim Payne. 2018. MAKI - A 3D Printable Humanoid Robot. https://www.kickstarter.com/projects/391398742/maki-a-3d-printable-humanoid-robot. Accessed: 2018-04--25.
[26]
Laura-Ann Petitto, Melody S Berens, Ioulia Kovelman, Matt H Dubins, K Jasinska, and M Shalinsky. 2012. The "Perceptual Wedge Hypothesis" as the basis for bilingual babies' phonetic processing advantage: New insights from fNIRS brain imaging. Brain and language, Vol. 121, 2 (2012), 130--143.
[27]
Laura Ann Petitto, Siobhan Holowka, Lauren E Sergio, Bronna Levy, and David J Ostry. 2004. Baby hands that move to the rhythm of language: hearing babies acquiring sign languages babble silently on the hands. Cognition, Vol. 93, 1 (2004), 43--73.
[28]
Laura Ann Petitto, Siobhan Holowka, Lauren E Sergio, and David Ostry. 2001. Language rhythms in baby hand movements. Nature, Vol. 413, 6851 (2001), 35.
[29]
Laura-Ann Petitto, Clifton Langdon, Adam Stone, Diana Andriola, Geo Kartheiser, and Casey Cochran. 2016. Visual sign phonology: Insights into human reading and language from a natural soundless phonology. Wiley Interdisciplinary Reviews: Cognitive Science, Vol. 7, 6 (2016), 366--381.
[30]
Laura Ann Petitto and Paula F Marentette. 1991. Babbling in the manual mode: Evidence for the ontogeny of language. Science, Vol. 251, 5000 (1991), 1493--1496.
[31]
Laura-Ann Petitto and BL Neuroimaging. {n. d.}. The Impact of Minimal Language Experience on Children During Sensitive Periods of Brain and Early Language Development: Myths Debunked and New Policy Implications. ({n. d.}).
[32]
Farzad Pezeshkpour, Ian Marshall, Ralph Elliott, and J Andrew Bangham. 1999. Development of a legible deaf-signing virtual human. In Multimedia Computing and Systems, 1999. IEEE International Conference on, Vol. 1. IEEE, 333--338.
[33]
Ragunathan Rajkumar, Michael Gagliardi, and Lui Sha. 1995. The real-time publisher/subscriber inter-process communication model for distributed real-time systems: design and implementation. In Real-Time Technology and Applications Symposium, 1995. Proceedings. IEEE, 66--75.
[34]
Judy Snitzer Reilly, Marina McIntire, and Ursula Bellugi. 1990. The acquisition of conditionals in Language: Grammaticized facial expressions. Applied Psycholinguistics, Vol. 11, 4 (1990), 369--392.
[35]
Rebekah A Richert, Michael B Robb, and Erin I Smith. 2011. Media as social partners: The social nature of young children's learning from screen media. Child Development, Vol. 82, 1 (2011), 82--95.
[36]
Jenny R Saffran, Ann Senghas, and John C Trueswell. 2001. The acquisition of language by children. Proceedings of the National Academy of Sciences, Vol. 98, 23 (2001), 12874--12875.
[37]
Brian Scassellati, Jake Brawer, Katherine Tsui, Setareh Nasihati Gilani, Melissa Malzkuhn, Barbara Manini, Adam Stone, Geo Kartheiser, Arcangelo Merla, Ari Shapiro, et almbox. 2018. Teaching Language to Deaf Infants with a Robot and a Virtual Human. In Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems. ACM, 553.
[38]
Jerome D Schein and Marcus T Delk Jr. 1974. The deaf population of the United States. (1974).
[39]
Jerry Schnepp, Rosalee Wolfe, John McDonald, and Jorge Toro. 2013. Generating co-occurring facial nonmanual signals in synthesized American sign language. (2013).
[40]
Ari Shapiro. 2011. Building a character animation system. In INTERNATIONAL Conference on Motion in Games. Springer, 98--109.
[41]
Michelle Starr. 2014. Toshiba's new robot can speak in sign language. https://www.cnet.com/news/toshibas-new-robot-can-speak-in-sign-language/. Accessed: 2018-04--25.
[42]
Adam Stone, Laura-Ann Petitto, and Rain Bosworth. 2018. Visual sonority modulates infants' attraction to sign language. Language Learning and Development, Vol. 14, 2 (2018), 130--148.
[43]
M Teena and A Manickavasagan. 2014. Thermal infrared imaging. In Imaging with Electromagnetic Spectrum. Springer, 147--173.
[44]
Tobii Eyetracker 2018. Tobii Pro X3-120.https://www.tobiipro.com/product-listing/tobii-pro-x3-120/. Accessed: 2018-04-25.
[45]
David Traum and Staffan Larsson. 2003. The Information State Approach to Dialogue Management. In Current and New Directions in Discourse and Dialogue, Jan van Kuppevelt and Ronnie Smith (Eds.). Kluwer, 325--353.
[46]
Pinar Uluer, Neziha Akalin, and Hatice Köse. 2015. A new robotic platform for sign language tutoring. International Journal of Social Robotics, Vol. 7, 5 (2015), 571--585.
[47]
Lynette van Zijl and Jaco Fourie. 2007. The development of a generic signing avatar. In Proceedings of the IASTED International Conference on Graphics and Visualization in Engineering, GVE, Vol. 7. 95--100.
[48]
Blake S Wilson, Charles C Finley, Dewey T Lawson, Robert D Wolford, Donald K Eddington, and William M Rabinowitz. 1991. Better speech recognition with cochlear implants. Nature, Vol. 352, 6332 (1991), 236--238.
[49]
Zhengyou Zhang. 2012. Microsoft kinect sensor and its effect. IEEE multimedia, Vol. 19, 2 (2012), 4--10.

Cited By

View all
  • (2024)Adaptive Interview Strategy Based on Interviewees’ Speaking Willingness Recognition for Interview RobotsIEEE Transactions on Affective Computing10.1109/TAFFC.2023.330964015:3(942-957)Online publication date: Jul-2024
  • (2023)Worldwide Overview and Country Differences in Metaverse Research: A Bibliometric AnalysisSustainability10.3390/su1504354115:4(3541)Online publication date: 15-Feb-2023
  • (2023)Identifying the Focus of Attention in Human-Robot Conversational GroupsProceedings of the 11th International Conference on Human-Agent Interaction10.1145/3623809.3623866(3-12)Online publication date: 4-Dec-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
ICMI '18: Proceedings of the 20th ACM International Conference on Multimodal Interaction
October 2018
687 pages
ISBN:9781450356923
DOI:10.1145/3242969
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

  • SIGCHI: Specialist Interest Group in Computer-Human Interaction of the ACM

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 02 October 2018

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. american sign language
  2. augmentative learning aids
  3. eye-tracking
  4. human-computer interaction
  5. multi-agent interaction
  6. multimodal interaction design
  7. thermal infrared (ir) imaging

Qualifiers

  • Research-article

Funding Sources

Conference

ICMI '18
Sponsor:
  • SIGCHI

Acceptance Rates

ICMI '18 Paper Acceptance Rate 63 of 149 submissions, 42%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)179
  • Downloads (Last 6 weeks)32
Reflects downloads up to 24 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)Adaptive Interview Strategy Based on Interviewees’ Speaking Willingness Recognition for Interview RobotsIEEE Transactions on Affective Computing10.1109/TAFFC.2023.330964015:3(942-957)Online publication date: Jul-2024
  • (2023)Worldwide Overview and Country Differences in Metaverse Research: A Bibliometric AnalysisSustainability10.3390/su1504354115:4(3541)Online publication date: 15-Feb-2023
  • (2023)Identifying the Focus of Attention in Human-Robot Conversational GroupsProceedings of the 11th International Conference on Human-Agent Interaction10.1145/3623809.3623866(3-12)Online publication date: 4-Dec-2023
  • (2023)Where Should I Stand? Robot Positioning in Human-Robot Conversational GroupsSocial Robotics10.1007/978-981-99-8718-4_16(182-192)Online publication date: 3-Dec-2023
  • (2022)Exploratory Analysis of Usage Statistics of Dialogue Systems by VisualizationInformation and Technology in Education and Learning10.12937/itel.2.1.Dev.p0012:1(Dev-p001-Dev-p001)Online publication date: 2022
  • (2022)Socially Interactive Agent DialogueThe Handbook on Socially Interactive Agents10.1145/3563659.3563663(45-76)Online publication date: 27-Oct-2022
  • (2022)Predicting Positions of People in Human-Robot Conversational Groups2022 17th ACM/IEEE International Conference on Human-Robot Interaction (HRI)10.1109/HRI53351.2022.9889628(402-411)Online publication date: 7-Mar-2022
  • (2022)A Review of Plan-Based Approaches for Dialogue ManagementCognitive Computation10.1007/s12559-022-09996-014:3(1019-1038)Online publication date: 31-Jan-2022
  • (2022)The Handbook on Socially Interactive AgentsundefinedOnline publication date: 27-Oct-2022
  • (2021)Recognizing Social Signals with Weakly Supervised Multitask Learning for Multimodal Dialogue SystemsProceedings of the 2021 International Conference on Multimodal Interaction10.1145/3462244.3479927(141-149)Online publication date: 18-Oct-2021
  • Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Login options

Figures

Tables

Media

Share

Share

Share this Publication link

Share on social media