Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/1891903.1891912acmconferencesArticle/Chapter ViewAbstractPublication Pagesicmi-mlmiConference Proceedingsconference-collections
research-article

Focusing computational visual attention in multi-modal human-robot interaction

Published: 08 November 2010 Publication History

Abstract

Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we introduce a saliency-based model that reflects how multi-modal referring acts influence the visual search, i.e. the task to find a specific object in a scene. Therefore, we combine positional information obtained from pointing gestures with contextual knowledge about the visual appearance of the referred-to object obtained from language. The available information is then integrated into a biologically-motivated saliency model that forms the basis for visual search. We prove the feasibility of the proposed approach by presenting the results of an experimental evaluation.

References

[1]
Bangerter, A. Using pointing and describing to achieve joint focus of attention in dialogue. Psy. Sci. 15, 6 (2004), 415--419.
[2]
Bangerter, A., and Oppenheimer, D. M. Accuracy in detecting referents of pointing gestures unaccompanied by language. Gesture 6, 1 (2006), 85--102.
[3]
Berlin, B., and Kay, P. Basic color terms: their universality and evolution. University of California Press, Berkeley, 1969.
[4]
Beun, R., and Cremers, A. Object reference in a shared domain of conversation. Pragmatics and Cognition 1, 6 (1998), 111--142.
[5]
Breazeal, C. Social interactions in HRI: the robot view. IEEE Trans. Syst., Man, Cybern. C 34, 2 (2004), 181--186.
[6]
Breazeal, C., and Scassellati, B. A context-dependent attention system for a social robot. In Proc. Int. Joint Conf. Artif. Intell. (1999).
[7]
Brill, E. Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comp. Ling. 21, 4 (1995), 543--565.
[8]
Brooks, A. G. Coordinating Human-Robot Communication. PhD thesis, MIT, 2007.
[9]
Butko, N., Zhang, L., et al. Visual saliency model for robot cameras. In Proc. Int. Conf. Robot. Autom. (2008).
[10]
Butterworth, G., and Itakura, S. How the eyes, head and hand serve definite reference. Br. J. Dev. Psychol. 18 (2000), 25--50.
[11]
Clark, H. H., Schreuder, R., and Buttrick, S. Common ground and the understanding of demonstrative reference. Journal of Verbal Learning and Verbal Behavior, 22 (1983), 245--258.
[12]
Doniec, M., Sun, G., and Scassellati, B. Active learning of joint attention. In Humanoids (2006), pp. 34--39.
[13]
Elazary, L., and Itti, L. Interesting objects are visually salient. J. Vis. 8, 3 (2008), 1--15.
[14]
Elmagarmid, A., Ipeirotis, P., and Verykios, V. Duplicate record detection: A survey. IEEE Trans. Knowledge Data Eng. 19, 1 (2007), 1--16.
[15]
Foster, M. E., Bard, E. G., et al. The roles of haptic-ostensive referring expressions in cooperative, task-based human-robot dialogue. In Proc. Int. Conf. Human-Robot Interaction (2008), pp. 295--302.
[16]
Francis, W. N., and Kucera, H., compiled by. A standard corpus of present-day edited american english, for use with digital computers (brown), 1964, 1971, 1979.
[17]
Frintrop, S. VOCUS: A Visual Attention System for Object Detection and Goal-Directed Search, vol. 3899 of Lecture Notes in Computer Science. Springer, 2006.
[18]
Frintrop, S., Rome, E., and Christensen, H. I. Computational visual attention systems and their cognitive foundation: A survey. ACM Trans. Applied Perception 7, 1 (2010).
[19]
Gergle, D., Rosé, C. P., and Kraut, R. E. Modeling the impact of shared visual information on collaborative reference. In Proc. Int. Conf. Human Factors Comput. Syst. (CHI) (2007), pp. 1543--1552.
[20]
Guo, C., Ma, Q., and Zhang, L. Spatio-temporal saliency detection using phase spectrum of quaternion fourier transform. In Proc. Int. Conf. Comp. Vis. Pat. Rec. (2008), pp. 1--8.
[21]
Hato, Y., Satake, S., et al. Pointing to space: modeling of deictic interaction referring to regions. In Proc. Int. Conf. Human-Robot Interaction (2010), pp. 301--308.
[22]
Heidemann, G., Rae, R., et al. Integrating context-free and context-dependent attentional mechanisms for gestural object reference. Mach. Vis. Appl. 16, 1 (2004), 64--73.
[23]
Hou, X., and Zhang, L. Saliency detection: A spectral residual approach. In Proc. Int. Conf. Comp. Vis. Pat. Rec. (2007), pp. 1--8.
[24]
Itti, L., and Koch, C. Feature combination strategies for saliency-based visual attention systems. Journal of Electronic Imaging 10, 1 (2001), 161--169.
[25]
Kaplan, F., and Hafner, V. The challenges of joint attention. Interaction Studies 7, 2 (2006), 135--169.
[26]
Kranstedt, A., Lücking, A., et al. Deixis: How to determine demonstrated objects using a pointing cone. In Proc. Int. Gesture Workshop (2006), vol. 3881.
[27]
Liu, T., Sun, J., et al. Learning to detect a salient object. In Proc. Int. Conf. Comp. Vis. Pat. Rec. (2007), pp. 1--8.
[28]
Louwerse, M., and Bangerter, A. Focusing attention with deictic gestures and linguistic expressions. In Proc. Ann. Conf. Cog. Sci. Soc. (2005), pp. 21--23.
[29]
Matas, J., Chum, O., et al. Robust wide-baseline stereo from maximally stable extremal regions. Image Vis. Comp. 22, 10 (2004), 761--767.
[30]
Meger, D., Forssén, P.-E., et al. Curious George: An attentive semantic robot. In IROS Workshop: From sensors to human spatial concepts (2007).
[31]
Mojsilovic, A. A computational model for color naming and describing color composition of images. IEEE Trans. Image Processing 14, 5 (2005), 690--699.
[32]
Mundy, P., and Newell, L. Attention, joint attention, and social cognition. Curr. Dir. Psychol. Sci. 16, 5 (2007), 269--274.
[33]
Nagai, Y., Hosoda, K., et al. A constructive model for the development of joint attention. Conn. Sci. 15, 4 (2003), 211--229.
[34]
Navalpakkam, V., and Itti, L. Search goal tunes visual features optimally. Neuron 53, 4 (2007), 605--617.
[35]
Nickel, K., and Stiefelhagen, R. Visual recognition of pointing gestures for human-robot interaction. Image Vis. Comp. 25, 12 (2007), 1875--1884.
[36]
Oppenheim, A., and Lim, J. The importance of phase in signals. Proceedings of the IEEE 69, 5 (1981), 529--541.
[37]
Piwek, P. Salience in the generation of multimodal referring acts. In Proc. Int. Conf. Multimodal Interfaces (2009), ACM, pp. 207--210.
[38]
Piwek, P. L. A. Modality choice for generation of referring acts: Pointing versus describing. In Proc. Int. Workshop on Multimodal Output Generation (2007).
[39]
Richarz, J., Plötz, T., and Fink, G. A. Real-time detection and interpretation of 3D deictic gestures for interaction with an intelligent environment. In Proc. Int. Conf. Pat. Rec. (2008), pp. 1--4.
[40]
Roy, D. Grounding words in perception and action: computational insights. Trends in Cognitive Sciences 9, 8 (2005), 389--396.
[41]
Ruesch, J., Lopes, M., et al. Multimodal saliency-based bottom-up attention: A framework for the humanoid robot iCub. In Proc. Int. Conf. Robot. Autom. (2008), pp. 962--967.
[42]
Sato, E., Yamaguchi, T., and Harashima, F. Natural interface using pointing behavior for human-robot gestural interaction. IEEE Trans. Ind. Electron. 54, 2 (2007), 1105--1112.
[43]
Schauerte, B., and Fink, G. A. Web-based learning of naturalized color models for human-machine interaction. In Proc. Int. Conf. Digital Image Comput. Techn. App. (2010).
[44]
Schauerte, B., Richarz, J., and Fink, G. A. Saliency-based identification and recognition of pointed-at objects. In Proc. Int. Conf. Intell. Robots Syst. (2010).
[45]
Schmidt, J., Hofemann, N., et al. Interacting with a mobile robot: Evaluating gestural object references. In Proc. Int. Conf. Intell. Robots Syst. (2008), pp. 3804--3809.
[46]
Shubina, K., and Tsotsos, J. K. Visual search for an object in a 3d environment using a mobile robot. Comp. Vis. Image Understand. 114, 5 (2010), 535--547. Special issue on Intelligent Vision Systems.
[47]
Spivey, M. J., Tyler, M. J., Eberhard, K. M., and Tanenhaus, M. K. Linguistically mediated visual search. Psychological Science 12 (2001), 282--286.
[48]
Staudte, M., and Crocker, M. W. Visual attention in spoken human-robot interaction. In Proc. Int. Conf. Human-Robot Interaction (2009), pp. 77--84.
[49]
Sugiyama, O., Kanda, T., et al. Natural deictic communication with humanoid robots. In Proc. Int. Conf. Intell. Robots Syst. (2007).
[50]
Sun, Y., and Fisher, R. Object-based visual attention for computer vision. Artificial Intelligence 146, 1 (2003), 77--123.
[51]
Tjong Kim Sang, E. F., and Buchholz, S. Introduction to the CoNLL-2000 shared task: Chunking. In Proc. Int. Workshop on Comp. Nat. Lang. Learn. (2000), pp. 127--132.
[52]
Triesch, J., Teuscher, C., Deák, G. O., and Carlson, E. Gaze following: why (not) learn it? Dev. Sci. 9, 2 (2006), 125--147.
[53]
Tsotsos, J. K., Culhane, S. M., et al. Modeling visual attention via selective tuning. Artificial Intelligence 78, 1--2 (1995), 507--545.
[54]
Walther, D., Rutishauser, U., et al. Selective visual attention enables learning and recognition of multiple objects in cluttered scenes. Comp. Vis. Image Understand. 100, 1--2 (2005), 41--63.
[55]
Welke, K., Asfour, T., and Dillmann, R. Active multi-view object search on a humanoid head. In Proc. Int. Conf. Robot. Autom. (2009).
[56]
Wolfe, J. M., Horowitz, T. S., et al. How fast can you change your mind? the speed of top-down guidance in visual search. Vis. Res. 44 (2004), 1411--1426.

Cited By

View all
  • (2024)Anthropomorphic Human-Robot Interaction Framework: Attention Based ApproachRoboCup 2023: Robot World Cup XXVI10.1007/978-3-031-55015-7_22(262-274)Online publication date: 14-Mar-2024
  • (2023)A Methodology for Evaluating Multimodal Referring Expression Generation for Embodied Virtual AgentsCompanion Publication of the 25th International Conference on Multimodal Interaction10.1145/3610661.3616548(164-173)Online publication date: 9-Oct-2023
  • (2023)Multimodal Error Correction with Natural Language and Pointing Gestures2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00212(1968-1978)Online publication date: 2-Oct-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Conferences
ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction
November 2010
311 pages
ISBN:9781450304146
DOI:10.1145/1891903
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Sponsors

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 08 November 2010

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. attention
  2. color
  3. deictic interaction
  4. gestures
  5. human-robot interaction
  6. joint attention
  7. language
  8. multi-modal
  9. objects
  10. pointing
  11. saliency
  12. shared attention
  13. visual search

Qualifiers

  • Research-article

Conference

ICMI-MLMI '10
Sponsor:

Acceptance Rates

ICMI-MLMI '10 Paper Acceptance Rate 41 of 100 submissions, 41%;
Overall Acceptance Rate 453 of 1,080 submissions, 42%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)19
  • Downloads (Last 6 weeks)1
Reflects downloads up to 21 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Anthropomorphic Human-Robot Interaction Framework: Attention Based ApproachRoboCup 2023: Robot World Cup XXVI10.1007/978-3-031-55015-7_22(262-274)Online publication date: 14-Mar-2024
  • (2023)A Methodology for Evaluating Multimodal Referring Expression Generation for Embodied Virtual AgentsCompanion Publication of the 25th International Conference on Multimodal Interaction10.1145/3610661.3616548(164-173)Online publication date: 9-Oct-2023
  • (2023)Multimodal Error Correction with Natural Language and Pointing Gestures2023 IEEE/CVF International Conference on Computer Vision Workshops (ICCVW)10.1109/ICCVW60793.2023.00212(1968-1978)Online publication date: 2-Oct-2023
  • (2023)Interactive Multimodal Robot Dialog Using Pointing Gesture RecognitionComputer Vision – ECCV 2022 Workshops10.1007/978-3-031-25075-0_43(640-657)Online publication date: 19-Feb-2023
  • (2021)Coordinating Entrainment Phenomena: Robot Conversation Strategy for Object RecognitionApplied Sciences10.3390/app1105235811:5(2358)Online publication date: 7-Mar-2021
  • (2021)What Robots Need From ClothingProceedings of the 2021 ACM Designing Interactive Systems Conference10.1145/3461778.3462045(1345-1355)Online publication date: 28-Jun-2021
  • (2021)YouRefIt: Embodied Reference Understanding with Language and Gesture2021 IEEE/CVF International Conference on Computer Vision (ICCV)10.1109/ICCV48922.2021.00142(1365-1375)Online publication date: Oct-2021
  • (2019)Vision-Based Attentiveness Determination Using Scalable HMM Based on Relevance TheorySensors10.3390/s1923533119:23(5331)Online publication date: 3-Dec-2019
  • (2016)Alignment Approach Comparison between Implicit and Explicit Suggestions in Object Reference ConversationsProceedings of the Fourth International Conference on Human Agent Interaction10.1145/2974804.2974814(193-200)Online publication date: 4-Oct-2016
  • (2016)Modeling communicative behaviors for object references in human-robot interaction2016 IEEE International Conference on Robotics and Automation (ICRA)10.1109/ICRA.2016.7487510(3352-3359)Online publication date: May-2016
  • Show More Cited By

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media