Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2484920.2484929acmotherconferencesArticle/Chapter ViewAbstractPublication PagesaamasConference Proceedingsconference-collections
research-article

Incrementally biasing visual search using natural language input

Published: 06 May 2013 Publication History

Abstract

Humans expect interlocutors both human and robot to resolve spoken references to visually-perceivable objects incrementally as the referents are verbally described. For this reason, tight integration of visual search with natural language processing, and real-time operation of both are requirements for natural interactions between humans and robots. In this paper, we present an integrated robotic architecture with novel incremental vision and natural language processing. We demonstrate that incrementally refining attentional focus using linguistic constraints achieves significantly better performance of the vision system compared to non-incremental visual processing.

References

[1]
J. Allen, M. Dzikovska, M. Manshadi, and M. Swift. Deep linguistic processing for spoken dialogue systems. In Proceedings of the ACL Workshop on Deep Linguistic Processing, Prague, Czech Republic, 2007.
[2]
N. Bergstrom, M. Bjorkman, and D. Kragic. Generating Object Hypotheses in Natural Scenes through Human-Robot Interaction. In IROS, pages 827--833, 2011.
[3]
S.-B. Choi, S.-W. Ban, M. Lee, J.-K. Shin, D.-W. Seo, and H.-S. Yang. Biologically motivated trainable selective attention model using adaptive resonance theory network. In A. Ijspeert, M. Murata, and N. Wakamiya, editors, Biologically inspired approaches to advanced information technology, pages 456--471. Springer Berlin / Heidelberg, 2004.
[4]
H. Clark and C. Marshall. Definite reference and mutual knowledge. In A. K. Joshi, B. L. Webber, and I. A. Sag, editors, Elements of discourse understanding, pages 10--63. Cambridge University Press, Cambridge, 1981.
[5]
K. M. Eberhard, M. J. Spivey-Knowlton, J. C. Sedivy, and M. K. Tanenhaus. Eye movements as a window into real-time spoken language comprehension in natural contexts. Journal of Psycholinguistic Research, 24:409--436, 1995.
[6]
R. J. Firby. Adaptive Execution in Complex Dynamic Worlds. PhD thesis, Yale University, 1989.
[7]
S. Frintrop, G. Backer, and E. Rome. Selecting what is important: Training visual attention. In Proc. of the 28th Annual German Conference on AI (KI'05), 2005.
[8]
M. Johnson-Roberson, J. Bohg, M. Bjorkman, and D. Kragic. Attention based active 3D point cloud segmentation. In IROS, 2010.
[9]
M. Johnson-Roberson, J. Bohg, G. Skantze, J. Gustavson, R. Carlsson, and D. Kragic. Enhanced Visual Scene Understanding through Human-Robot Dialog. In IROS, pages 3342--3348, 2011.
[10]
P. Lison and G.-J. M. Kruijff. Efficient parsing of spoken inputs for human-robot interaction. In RO-MAN, 2009.
[11]
M. P. Michalowski, S. Sabanovic, C. DiSalvo, D. B. Font, L. Hiatt, N. Melchoir, and R. Simmons. Socially distributed perception: GRACE plays social tag at AAAI 2005. Autonomous Robots, 22(4):385--397, 2007.
[12]
A. Mishra and Y. Aloimonos. Active Segmentation for Robots. In IEEE/RSJ International Conference on Intelligent Robots and Systems, 2009.
[13]
R. Moratz, K. Fischer, and T. Tenbrink. Cognitive modeling of spatial refernce for human-robot interaction. International Journal on Artificial Intelligence Tools, 10(4):589--611, 2001.
[14]
R. Müller, T. Rofer, A. Landkenau, A. Musto, K. Stein, and A. Eisenkolb. Coarse qualitative description in robot navigation. In C. Freksa, W. Braner, C. Habel, and K. Wender, editors, Spatial Cognition II, pages 265--276. Spinger-Verlag, 1998.
[15]
V. Navalpakkam and L. Itti. A theory of optimal feature selection during visual search. In Proc. Computational and Systems Neuroscience, Mar 2006.
[16]
R. B. Rusu and S. Cousins. 3D is here: Point Cloud Library (PCL). In ICRA, 2011.
[17]
D. Schiffrin. Discourse Markers. Cambridge University Press, 1988.
[18]
W. Schuler, S. Wu, and L. Schwartz. A framework for fast incremental interpretation during speech decoding. Computational Linguistics, 35(3), 2009.
[19]
D. Skocaj, M. Janicek, M. Kristan, G.-J. M. Kruijff, A. Leonardis, P. Lison, A. Vrecko, and M. Zillich. A basic cognitive system for interactive continuous learning of visual concepts. In ICRA 2010 Workshop ICAIR - Interactive Communication for Autonomous Intelligent Robots, pages 30--36, 2010.
[20]
K. Toyama and G. Hager. Incremental focus of attention for robust visual tracking. In Proceedings of the IEEE Conf. on Computer Vision and Pattern Recognition, pages 189--195, 1996.
[21]
W. Wohlkinger and M. Vincze. Shape-Based Depth Image to 3D Model Matching and Classification with Inter-View Similarity. In IROS, pages 4865--4870, 2011.
[22]
C. Yu, M. Scheutz, and P. Schermerhorn. Investigating multimodal real-time patterns of joint attention in an hri word learning task. In HRI, pages 309--316, 2010.
[23]
M. Zillich. Incremental Indexing for Parameter-Free Perceptual Grouping. In Proc. of the 31st Workshop of the Austrian Association for Pattern Recognition (OAGM/AAPR), pages 25--32, 2007.

Cited By

View all
  • (2019)Dempster-Shafer theoretic resolution of referential ambiguityAutonomous Robots10.1007/s10514-018-9795-543:2(389-414)Online publication date: 16-Mar-2019

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
AAMAS '13: Proceedings of the 2013 international conference on Autonomous agents and multi-agent systems
May 2013
1500 pages
ISBN:9781450319935

Sponsors

  • IFAAMAS

In-Cooperation

Publisher

International Foundation for Autonomous Agents and Multiagent Systems

Richland, SC

Publication History

Published: 06 May 2013

Check for updates

Author Tags

  1. incremental natural language processing
  2. object detection and recognition
  3. visual search

Qualifiers

  • Research-article

Conference

AAMAS '13
Sponsor:

Acceptance Rates

AAMAS '13 Paper Acceptance Rate 140 of 599 submissions, 23%;
Overall Acceptance Rate 1,155 of 5,036 submissions, 23%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)1
  • Downloads (Last 6 weeks)0
Reflects downloads up to 13 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2019)Dempster-Shafer theoretic resolution of referential ambiguityAutonomous Robots10.1007/s10514-018-9795-543:2(389-414)Online publication date: 16-Mar-2019

View Options

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media