Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Multimodal conversational interaction with robots

Published: 01 July 2019 Publication History
First page of PDF

References

[1]
S. Al Moubayed, J. Beskow, and B. Granström. 2010. Auditory-visual prominence: From intelligibility to behavior. Journal on Multimodal User Interfaces, 3(4): 299--311. . 82
[2]
S. Al Moubayed, J. Edlund, and J. Beskow. 2012. Taming Mona Lisa: Communicating gaze faithfully in 2D and 3D facial projections. ACM Transactions on Interactive Intelligent Systems, 1(2): 25. 82, 83
[3]
S. Al Moubayed, G. Skantze, and J. Beskow. 2013. The Furhat Back-Projected Humanoid Head---Lip reading, gaze and multiparty interaction. International Journal of Humanoid Robotics, 10(1). 83
[4]
P. D. Allopenna, J. S. Magnuson, and M. K. Tanenhaus. 1998. Tracking the time course of spoken word recognition using eye movements: Evidence for continuous mapping models. Journal of Memory and Language, 38(4): 419--439. 95
[5]
J. Allwood, J. Nivre, and E. Ahlsen. 1992. On the semantics and pragmatics of linguistic feedback. Journal of Semantics, 9(1): 1--26. 93
[6]
I. Almajai and B. Milner. 2008. Using audio-visual features for robust voice activity detection in clean and noisy speech. In Proceedings of the 16th European Signal Processing Conference, pp. 1--5. 86
[7]
M. Argyle and M. Cook. 1976. Gaze and mutual gaze. 81
[8]
S. Baron-Cohen. (1995). The eye direction detector (EDD) and the shared attention mechanism (SAM): Two cases for evolutionary psychology. In C. Moore, and P. J. Dunham, editors, Joint Attention: Its Origins and Role in Development (pp. 41--60). Hillsdale, NJ: Erlbaum. 95
[9]
J. Beskow, B. Granström, and D. House. 2006. Visual correlates to prominence in several expressive modes. In Proceedings of Interspeech 2006, pp. 1272--1275. Pittsburg, PA. 95
[10]
S. Bock, P. Dicke, and P. Thier. 2008. How precise is gaze following in humans? Vision Research, 48(7): 946--957. 81
[11]
D. Bohus, and E. Horvitz. 2009. Learning to Predict Engagement with a Spoken Dialog System in Open-World Settings. In Proceedings of SIGdial. London, UK. 92
[12]
D. Bohus and E. Horvitz. 2010. Facilitating multiparty dialog with gaze, gesture, and speech. In Proceedings of ICMI. Beijing, China. 92
[13]
D. Bohus and E. Horvitz. 2014. Managing Human-Robot Engagement with Forecasts and . . . um . . . Hesitations. In Proceedings of the 16th International Conference on Multimodal Interaction pp. 2--9. 92
[14]
J. D. Boucher, U. Pattacini, A. Lelong, G. Bailly, F. Elisei, S. Fagel, P. F. Dominey, and J. Ventre-Dominey. 2012. I reach faster when I see you look: Gaze effects in human-human and human-robot face-to-face cooperation. Frontiers in Neurorobotics, 6. 96
[15]
R. Braybrook. 2004. Three "d" missions---dull, dirty and dangerous. Armada International, 28(1): 10--12. 77
[16]
C. Breazeal. 2003. Toward sociable robots. Robotics and Autonomous Systems, 42(3): 167--175. 82
[17]
V. Bruce. 1996. The role of the face in communication: Implications for videophone design. Interacting with Computers, 8(2): 166--176. 80
[18]
J. Cassell, J. Sullivan, S. Prevost, and E. F. Churchill 2000. Embodied Conversational Agents. Boston, MA: MIT Press. 82
[19]
H. H. Clark and M. A. Krych. 2004. Speaking while monitoring addressees for understanding. Journal of Memory and Language, 50, 62--81. 95
[20]
H. H. Clark and C. R. Marshall. 1981. Definite reference and mutual knowledge. In A. K. Joshi, B. L. Webber, and I. A. Sag, editors, Elements of Discourse Understanding (pp. 10--63). Cambridge, UK: Cambridge University Press. 95
[21]
H. H. Clark. 1996. Using Language. Cambridge, UK: Cambridge University Press. 79, 92, 93, 774
[22]
H. H. Clark. 2005. Coordinating with each other in a material world. Discourse Studies, 7(4--5): 507--525. 97
[23]
F. Delaunay, J. De Greeff, and T. Belpaeme. 2009. Towards retro-projected robot faces: an alternative to mechatronic and android faces. In RO-MAN 2009-The 18th IEEE International Symposium on Robot and Human Interactive Communication (pp. 306--311). 83
[24]
S. Duncan. 1972. Some Signals and Rules for Taking Speaking Turns in Conversations. Journal of Personality and Social Psychology, 23(2): 283--292. 91
[25]
P. Ekman, and W. V. Friesen. 1971. Constants across cultures in the face and emotion. Journal of Personality and Social Psychology, 17, 124--129. 81
[26]
M. E. Foster, A. Gaschler, M. Giuliani, A. Isard, M. Pateraki, and R. Petrick. 2012. Two people walk into a bar: Dynamic multi-party social interaction with a robot agent. In Proceedings of the 14th ACM International Conference on Multimodal Interaction (pp. 3--10). 83
[27]
B. Granström, D. House, and M. G. Swerts. (2002). Multimodal feedback cues in human-machine interactions. In B. Bel, and I. Marlien, editors, Proceedings of the Speech Prosody 2002 Conference (pp. 347--350). Aix-en-Provence: Laboratoire Parole et Langage. 94
[28]
A. Gravano and J. Hirschberg. 2011. Turn-taking cues in task-oriented dialogue. Computer Speech & Language, 25(3): 601--634. 91
[29]
A. Gravano, S. Benus, H. Chavés, J. Hirschberg, and L. Wilcox. 2007. On the role of context and prosody in the interpretation of 'okey'. In Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics (pp. 800--807). Prague, Czech Republic. 94
[30]
D. Harel. 1987. Statecharts: A visual formalism for complex systems. Science of Computer Programming, 8, 231--274. 80
[31]
J. Hornstein, M. Lopes, J. Santos-Victor, and F. Lacerda. 2006. Sound localization for humanoid robots-building audio-motor maps based on the HRTF. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 1170--1176. 85
[32]
C-M. Huang and A. Thomaz. 2010. Joint attention in human-robot interaction. In Papers from the AAAI Fall Symposium on Dialog with Robots, pp. 32--37. Arlington, VA. 96, 97
[33]
M. Imai, T. Ono, and H. Ishiguro. 2003. Physical relation and expression: Joint attention for human-robot interaction. IEEE Transaction on Industrial Electronics, 50(4): 636--643. 97
[34]
R. Ishii, K. Otsuka, S. Kumano, and J. Yamato. 2014. Analysis of Respiration for Prediction of "Who will be next speaker and when?" in multi-party meetings. In Proceedings of ICMI, pp. 18--25. New York: ACM. 91
[35]
Itseez. 2016. Open source computer vision library. https://github.com/itseez/opencv. Accessed September 2016. 88
[36]
M. Johansson and G. Skantze. 2015. Opportunities and Obligations to Take Turns in Collaborative Multi-Party Human-Robot Interaction. In Proceedings of SIGDIAL. Prague, Czech Republic. 92
[37]
M. Johansson, G. Skantze, and J. Gustafson. 2013. Head Pose Patterns in Multiparty Human-Robot Team-Building Interactions. In International Conference on Social Robotics-ICSR 2013. Bristol, UK. 97
[38]
N. Kanwisher, J. McDermott, and M. Chun. 1997. The fusiform face area: a module in human extrastriate cortex specialized for face perception. Journal of Neuroscience, 17(11): 4302--4311. 80
[39]
M. Katzenmaier, R. Stiefelhagen, T. Schultz, I. Rogina, and A. Waibel. 2004. Identifying the Addressee in Human-Human-Robot Interactions based on Head Pose and Speech. In Proceedings of International Conference on Multimodal Interfaces ICMI 2004. State College, PA. 92
[40]
T. Kawahara, T. Iwatate, and K. Takanashi. 2012. Prediction of Turn-Taking by Combining Prosodic and Eye-Gaze Information in Poster Conversations . . . In Interspeech 2012. 97
[41]
A. Kendon. 1967. Some functions of gaze direction in social interaction. Acta Psychologica, 26, 22--63. 91
[42]
H. Koiso, Y. Horiuchi, S. Tutiya, A. Ichikawa, and Y. Den. 1998. An analysis of turn-taking and backchannels based on prosodic and syntactic features in Japanese Map Task dialogs. Language and Speech, 41, 295--321. 91, 92
[43]
T. Kuratate, Y. Matsusaka, B. Pierce, and G. Cheng. 2011. Mask-bot: a life-size robot head using talking head animation for human-robot communication. In Proceedings of the 11th IEEE-RAS International Conference on Humanoid Robots (Humanoids) pp. 99--104. 83
[44]
C. Lai. 2010. What do you mean, you're uncertain?: The interpretation of cue words and rising intonation in dialogue. In Proceedings of Interspeech. Makuhari, Japan. 94
[45]
S. Langton, R. Watt, and V. Bruce. 2000. Do the eyes have it? Cues to the direction of social attention. Trends in cognitive sciences, 4(2): 50--59. 81
[46]
P. Liu, D. Glas, T. Kanda, H. Ishiguro, and N. Hagita. 2014. How to train your robot-teaching service robots to reproduce human social behavior. In Proceedings of Robot and Human Interactive Communication (RO-MAN) pp. 961--968. 84
[47]
H. McGurk and J. MacDonald. 1976. Hearing lips and seeing voices. Nature, 264 (5588), 746--748. 81
[48]
M. McTear, Z. Callejas, and D. Griol. 2016. The Conversational Interface. Springer. 78, 89
[49]
R. Meena, G. Skantze, and J. Gustafson. 2014. Data-driven Models for timing feedback responses in a Map Task dialogue system. Computer Speech and Language, 28(4): 903--922. 92
[50]
A. Moon, D. Troniak, B. Gleeson, M. Pan, M. Zheng, B. Blumer, K. MacLean, and E. Croft. 2014. Meet Me Where I'm Gazing: How Shared Attention Gaze Affects Human-robot Handover Timing. In Proceedings of the 2014 ACM/IEEE International Conference on Human-robot Interaction, pp. 334--341. New York: ACM. 97
[51]
L. P. Morency, I. de Kok, and J. Gratch. 2008. Predicting listener backchannels: A probabilistic multimodal approach. In Proceedings of IVA, pp. 176--190. Tokyo, Japan. 93
[52]
M. Mori. 1970. The Uncanny Valley. Energy, 7(4): 33--35. 83
[53]
B. Mutlu, T. Kanda, J. Forlizzi, J. Hodgins, and H. Ishiguro. 2012. Conversational Gaze Mechanisms for Humanlike Robots. ACM Transactions on Interactive Intelligent Systems, 1(2), 12: 1--12:33. 92
[54]
D. Nguyen, and J. Canny. 2005. MultiView: spatially faithful group video conferencing. In Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, pp. 799--808. 82
[55]
D. C. O'Connell, S, Kowal, and E. Kaltenbacher. 1990. Turn-taking: A critical analysis of the research tradition. Journal of Psycholingistic Research, 19(6): 345--373. 92
[56]
Y. Okumura, Y. Kanakogi, T. Kanda, H. Ishiguro, and S. Itakura. 2013. Infants understand the referential nature of human gaze but not robot gaze. Journal of Experimental Child Psychology, 116, 86--95. 96
[57]
G. Potamianos, E. Marcheret, Y. Mroueh, V. Goel, A. Koumbaroulis, A. Vartholomaios, and S. Thermos. 2017. Audio and Visual Modality Combination in Speech Processing Applications. Association for Computing Machinery, pp. 489--543. Morgan & Claypool New York, NY. 88
[58]
C. Rich, B. Ponsler, A. Holroyd, and C. Sidner. 2010. Recognizing engagement in human-robot interaction. In 2010 5th ACM/IEEE International Conference on Human-Robot Interaction (HRI), pp. 375--382. 97
[59]
V. Rieser and O. Lemon. 2012. Reinforcement Learning for Adaptive Dialogue Systems. Berlin: Springer-Verlag. 89
[60]
M. Roddy, G. Shantze, and N. Harte. 2018. Multimodal continuous turn-taking prediction using multiscale RNNs. In Proceedings of the 2018 on International Conference on Multimodal Interaction-ICMI '18, pp. 186--190. New York, New York, USA: ACM Press. 92
[61]
N. Roy, G. Baltus, D. Fox, F. Gemperle, J. Goetz, T. Hirsch, D. Margaritis, M. Montemerlo, J. Pineau, and J. Schulte. 2000. Towards personal service robots for the elderly. In Workshop on Interactive Robots and Entertainment (WIRE 2000), p. 184. 83
[62]
H. Sacks, E. Schegloff, and G. Jefferson. 1974. A simplest systematics for the organization of turn-taking for conversation. Language, 50, 696--735. 90, 92
[63]
E. Shriberg, A. Stolcke, and S. Ravuri. 2013. Addressee detection for dialog systems using temporal and spectral dimensions of speaking style. In Interspeech 2013, pp. 2559--2563. 92
[64]
C. Sidner, C. Lee, L.-P. Morency, and C. ForLines. 2006. The effect of head-nod recognition in human-robot conversation. In Proceedings of the 1st Annual Conference on Human-Robot Interaction, pp. 290--296. ACM Press. 95
[65]
G. Skantze. 2017. Predicting and regulating participation equality in human-robot conversations. In Proceedings of the 2017 ACM/IEEE International Conference on Human-Robot Interaction - HRI '17, pp. 196--204). New York, New York, USA: ACM Press. 92
[66]
G. Skantze, and S. Al Moubayed. 2012a. IrisTK: a statechart-based toolkit for multi-party face-to-face interaction. In Proceedings of ICMI. Santa Monica, CA. 80, 84, 785
[67]
G. Skantze, S. Al Moubayed, J. Gustafson, J. Beskow, and B. Granström. 2012b. Furhat at Robotville: A Robot Head Harvesting the Thoughts of the Public through Multi-party Dialogue. In Proceedings of IVA-RCVA. Santa Cruz, CA. 83
[68]
G. Skantze, A. Hjalmarsson, and C. Oertel. 2014. Turn-taking, Feedback and Joint Attention in Situated Human-Robot Interaction. Speech Communication, 65, 50--66. 94, 96
[69]
G. Skantze, M. Johansson, and J. Beskow. 2015. Exploring Turn-taking Cues in Multi-party Human-robot Discussions about Objects. In Proceedings of ICMI. Seattle, WA. 84, 87, 92
[70]
G. Skantze. 2016. IrisTK. http://www.iristk.net Accessed September 2016. 84
[71]
M. Staudte and M. W. Crocker. 2011. Investigating joint attention mechanisms through spoken human-robot interaction. Cognition, 120, 268--291. 96
[72]
K. W. Strabala, M. K. Lee, A. D. Dragan, J. L Forlizzi, S. Srinivasa, M. Cakmak, V. Micelli. 2013. Towards seamless human-robot handovers. Journal of Human-Robot Interaction, 2(1): 112--132. 97
[73]
W. Sumby and I. Pollack. 1954. Visual Contribution to Speech Intelligibility in Noise. The Journal of the Acoustical Society of America, 26, 212--215. 81
[74]
M. Sun, A. Schwarz, M. Wu, N. Strom, S. Matsoukas, and S. Vitaladevuni. December 2017. An empirical study of cross-lingual transfer learning techniques for small-footprint keyword spotting. In Machine Learning and Applications (ICMLA), 2017 16th IEEE International Conference, on pp. 255--260. IEEE. 86
[75]
L. Takayama, W. Ju, and C. Nass. Beyond dirty, dangerous and dull: what everyday people think robots should do. In The 3rd ACM/IEEE International Conference on Human-Robot Interaction, HRI'08, pages 25--32. 77
[76]
M. Tomasello, B. Hare, H. Lehmann, and J. Call. 2007. Reliance on head versus eyes in the gaze following of great apes and human infants: the cooperative eye hypothesis. Journal of Human Evolution, 52(3): 314--320. 81
[77]
D. Traum and P. Heeman. 1997. Utterance units in spoken dialogue. In In Proceedings of ECAI Workshop on Dialogue Processing in Spoken Language Systems, pp. 125--140.
[78]
J.-M. Valin, F. Michaud, J. Rouat, and D. Létourneau. 2003. Robust sound source localization using a microphone array on a mobile robot. In Proceedings of Intelligent Robots and Systems (IROS), pp. 1228--1233. 85
[79]
O. Vinyals, D. Bohus, and R. Caruana. 2012. Learning speaker, addressee and overlap detection models from multimodal streams. In Proceedings of the 14th ACM International Conference on Multimodal Interaction, pp. 417--424. 92
[80]
P. Viola and M. Jones. 2001. Robust real-time object detection. International Journal of Computer Vision, 4. 88
[81]
å Wallers, J. Edlund, and G. Skantze. 2006. The effects of prosodic features on the interpretation of synthesised backchannels. In E. André, L. Dybkjaer, W. Minker, H. Neumann, and M. Weber, editors, Proceedings of Perception and Interactive Technologies, pp. 183--187. Springer. 94
[82]
N. Ward and W. Tsukahara. 2000. Prosodic features which cue back-channel responses in English and Japanese. Journal of Pragmatics, 32(8): 1177--1207. 93
[83]
V. H. Yngve. 1970. On getting a word in edgewise. In Papers from the Sixth Regional Meeting of the Chicago Linguistic Society, pp. 567--578. Chicago, IL. 92
[84]
M. von Grünau and C. Anston. 1995. The detection of gaze direction: A stare-in-the-crowd effect. Perception, 24(11): 1297--1313. 81

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Books
The Handbook of Multimodal-Multisensor Interfaces: Language Processing, Software, Commercialization, and Emerging Directions
July 2019
813 pages
ISBN:9781970001754
DOI:10.1145/3233795

Publisher

Association for Computing Machinery and Morgan & Claypool

Publication History

Published: 01 July 2019

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article

Appears in

ACM Books

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 197
    Total Downloads
  • Downloads (Last 12 months)11
  • Downloads (Last 6 weeks)4
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media