research-article

Incrementally biasing visual search using natural language input

Authors:

Ekaterina Potapova,

Michael Zillich,

Matthias ScheutzAuthors Info & Claims

AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

Pages 31 - 38

Published: 06 May 2013 Publication History

Abstract

Humans expect interlocutors both human and robot to resolve spoken references to visually-perceivable objects incrementally as the referents are verbally described. For this reason, tight integration of visual search with natural language processing, and real-time operation of both are requirements for natural interactions between humans and robots. In this paper, we present an integrated robotic architecture with novel incremental vision and natural language processing. We demonstrate that incrementally refining attentional focus using linguistic constraints achieves significantly better performance of the vision system compared to non-incremental visual processing.

References

[1]

J. Allen, M. Dzikovska, M. Manshadi, and M. Swift. Deep linguistic processing for spoken dialogue systems. In Proceedings of the ACL Workshop on Deep Linguistic Processing, Prague, Czech Republic, 2007.

Digital Library

[2]

N. Bergstrom, M. Bjorkman, and D. Kragic. Generating Object Hypotheses in Natural Scenes through Human-Robot Interaction. In IROS, pages 827--833, 2011.

[3]

S.-B. Choi, S.-W. Ban, M. Lee, J.-K. Shin, D.-W. Seo, and H.-S. Yang. Biologically motivated trainable selective attention model using adaptive resonance theory network. In A. Ijspeert, M. Murata, and N. Wakamiya, editors, Biologically inspired approaches to advanced information technology, pages 456--471. Springer Berlin / Heidelberg, 2004.

[4]

H. Clark and C. Marshall. Definite reference and mutual knowledge. In A. K. Joshi, B. L. Webber, and I. A. Sag, editors, Elements of discourse understanding, pages 10--63. Cambridge University Press, Cambridge, 1981.

[5]

K. M. Eberhard, M. J. Spivey-Knowlton, J. C. Sedivy, and M. K. Tanenhaus. Eye movements as a window into real-time spoken language comprehension in natural contexts. Journal of Psycholinguistic Research, 24:409--436, 1995.

[6]

R. J. Firby. Adaptive Execution in Complex Dynamic Worlds. PhD thesis, Yale University, 1989.

Digital Library

[7]

S. Frintrop, G. Backer, and E. Rome. Selecting what is important: Training visual attention. In Proc. of the 28th Annual German Conference on AI (KI'05), 2005.

Digital Library

[8]

M. Johnson-Roberson, J. Bohg, M. Bjorkman, and D. Kragic. Attention based active 3D point cloud segmentation. In IROS, 2010.

[9]

M. Johnson-Roberson, J. Bohg, G. Skantze, J. Gustavson, R. Carlsson, and D. Kragic. Enhanced Visual Scene Understanding through Human-Robot Dialog. In IROS, pages 3342--3348, 2011.

[10]

P. Lison and G.-J. M. Kruijff. Efficient parsing of spoken inputs for human-robot interaction. In RO-MAN, 2009.

[11]

M. P. Michalowski, S. Sabanovic, C. DiSalvo, D. B. Font, L. Hiatt, N. Melchoir, and R. Simmons. Socially distributed perception: GRACE plays social tag at AAAI 2005. Autonomous Robots, 22(4):385--397, 2007.

Digital Library

[12]

A. Mishra and Y. Aloimonos. Active Segmentation for Robots. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009.

Digital Library

[13]

R. Moratz, K. Fischer, and T. Tenbrink. Cognitive modeling of spatial refernce for human-robot interaction. International Journal on Artificial Intelligence Tools, 10(4):589--611, 2001.

[14]

R. Müller, T. Rofer, A. Landkenau, A. Musto, K. Stein, and A. Eisenkolb. Coarse qualitative description in robot navigation. In C. Freksa, W. Braner, C. Habel, and K. Wender, editors, Spatial Cognition II, pages 265--276. Spinger-Verlag, 1998.

Digital Library

[15]

V. Navalpakkam and L. Itti. A theory of optimal feature selection during visual search. In Proc. Computational and Systems Neuroscience, Mar 2006.

[16]

R. B. Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In ICRA, 2011.

[17]

D. Schiffrin. Discourse Markers. Cambridge University Press, 1988.

[18]

W. Schuler, S. Wu, and L. Schwartz. A framework for fast incremental interpretation during speech decoding. Computational Linguistics, 35(3), 2009.

Digital Library

[19]

D. Skocaj, M. Janicek, M. Kristan, G.-J. M. Kruijff, A. Leonardis, P. Lison, A. Vrecko, and M. Zillich. A basic cognitive system for interactive continuous learning of visual concepts. In ICRA 2010 Workshop ICAIR - Interactive Communication for Autonomous Intelligent Robots, pages 30--36, 2010.

[20]

K. Toyama and G. Hager. Incremental focus of attention for robust visual tracking. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 189--195, 1996.

[21]

W. Wohlkinger and M. Vincze. Shape-Based Depth Image to 3D Model Matching and Classification with Inter-View Similarity. In IROS, pages 4865--4870, 2011.

[22]

C. Yu, M. Scheutz, and P. Schermerhorn. Investigating multimodal real-time patterns of joint attention in an hri word learning task. In HRI, pages 309--316, 2010.

Digital Library

[23]

M. Zillich. Incremental Indexing for Parameter-Free Perceptual Grouping. In Proc. of the 31st Workshop of the Austrian Association for Pattern Recognition (OAGM/AAPR), pages 25--32, 2007.

Cited By

Williams TYazdani FSuresh PScheutz MBeetz M(2019)Dempster-Shafer theoretic resolution of referential ambiguityAutonomous Robots10.1007/s10514-018-9795-543:2(389-414)Online publication date: 16-Mar-2019
https://dl.acm.org/doi/10.1007/s10514-018-9795-5

Index Terms

Incrementally biasing visual search using natural language input
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
    2. Natural language processing
      1. Discourse, dialogue and pragmatics

Recommendations

Focusing computational visual attention in multi-modal human-robot interaction
ICMI-MLMI '10: International Conference on Multimodal Interfaces and the Workshop on Machine Learning for Multimodal Interaction

Identifying verbally and non-verbally referred-to objects is an important aspect of human-robot interaction. Most importantly, it is essential to achieve a joint focus of attention and, thus, a natural interaction behavior. In this contribution, we ...
Natural Language, Mixed-initiative Personal Assistant Agents
IMCOM '18: Proceedings of the 12th International Conference on Ubiquitous Information Management and Communication

The increasing popularity and use of personal voice assistant technologies, such as Siri and Google Now, is driving and expanding progress toward the long-term and lofty goal of using artificial intelligence to build human-computer dialog systems ...
Visual display, pointing, and natural language: the power of multimodal interaction
AVI '98: Proceedings of the working conference on Advanced visual interfaces

This paper examines user behavior during multimodal human-computer interaction (HCI). It discusses how pointing, natural language, and graphical layout should be integrated to enhance the usability of multimodal systems. Two experiments were run to ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems

May 2013

1500 pages

ISBN:9781450319935

General Chairs:
Maria Gini
University of Minnesota, USA
,
Onn Shehory
IBM Haifa Research Lab, Israel
,
Program Chairs:
Takayuki Ito
Nagoya Institute of Technology, Japan
,
Catholijn Jonker
Delft Institute of Technology, The Netherlands

Sponsors

IFAAMAS

In-Cooperation

SIGAI: ACM Special Interest Group on Artificial Intelligence

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2013

Check for updates

Author Tags

Qualifiers

Research-article

Conference

AAMAS '13

Sponsor:

AAMAS '13: International conference on Autonomous Agents and Multi-Agent Systems

May 6 - 10, 2013

MN, St. Paul, USA

Acceptance Rates

AAMAS '13 Paper Acceptance Rate 140 of 599 submissions, 23%;

Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
90
Total Downloads

Downloads (Last 12 months)1
Downloads (Last 6 weeks)0

Reflects downloads up to 13 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Williams TYazdani FSuresh PScheutz MBeetz M(2019)Dempster-Shafer theoretic resolution of referential ambiguityAutonomous Robots10.1007/s10514-018-9795-543:2(389-414)Online publication date: 16-Mar-2019
https://dl.acm.org/doi/10.1007/s10514-018-9795-5

View Options

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Table of Contents