Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2969033.2969073guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Recurrent models of visual attention

Published: 08 December 2014 Publication History

Abstract

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

References

[1]
Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. What is an object? In CVPR, 2010.
[2]
Bogdan Alexe, Nicolas Heess, Yee Whye Teh, and Vittorio Ferrari. Searching for objects driven by context. In NIPS, 2012.
[3]
James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13:281-305, 2012.
[4]
Nicholas J. Butko and Javier R. Movellan. Optimal scanning for faster object detection. In CVPR, 2009.
[5]
N.J. Butko and J.R. Movellan. I-pomdp: An infomax model of eye movement. In Proceedings of the 7th IEEE International Conference on Development and Learning, ICDL '08, pages 139-144, 2008.
[6]
Misha Denil, Loris Bazzani, Hugo Larochelle, and Nando de Freitas. Learning where to attend with deep architectures for image tracking. Neural Computation, 24(8):2151-2184, 2012.
[7]
Pedro F. Felzenszwalb, Ross B. Girshick, and David A. McAllester. Cascade object detection with deformable part models. In CVPR, 2010.
[8]
Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524, 2013.
[9]
Mary Hayhoe and Dana Ballard. Eye movements in natural behavior. Trends in Cognitive Sciences, 9(4):188-194, 2005.
[10]
Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.
[11]
L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259, 1998.
[12]
Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106-1114, 2012.
[13]
Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann. Beyond sliding windows: Object localization by efficient subwindow search. In CVPR, 2008.
[14]
Hugo Larochelle and Geoffrey E. Hinton. Learning to combine foveal glimpses with a third-order boltzmann machine. In NIPS, 2010.
[15]
Stefan Mathe and Cristian Sminchisescu. Action from still image dataset and inverse optimal control to learn task specific visual scanpaths. In NIPS, 2013.
[16]
Lucas Paletta, Gerald Fritz, and Christin Seifert. Q-learning of sequential attention for visual object recognition from informative local descriptors. In CVPR, 2005.
[17]
M. Ranzato. On Learning Where To Look. ArXiv e-prints, 2014.
[18]
Ronald A. Rensink. The dynamic representation of scenes. Visual Cognition, 7(1-3):17-42, 2000.
[19]
Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229, 2013.
[20]
Kenneth O. Stanley and Risto Miikkulainen. Evolving a roving eye for go. In GECCO, 2004.
[21]
Richard S. Sutton, David Mcallester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In NIPS, pages 1057-1063. MIT Press, 2000.
[22]
Antonio Torralba, Aude Oliva, Monica S Castelhano, and John M Henderson. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev, pages 766-786, 2006.
[23]
K E A van de Sande, J.R.R. Uijlings, T Gevers, and A.W.M. Smeulders. Segmentation as Selective Search for Object Recognition. In ICCV, 2011.
[24]
Paul A. Viola and Michael J. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001.
[25]
Daan Wierstra, Alexander Foerster, Jan Peters, and Juergen Schmidhuber. Solving deep memory pomdps with recurrent policy gradients. In ICANN. 2007.
[26]
R.J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229-256, 1992.

Cited By

View all
  • (2024)A Unified Framework for Multi-Domain CTR Prediction via Large Language ModelsACM Transactions on Information Systems10.1145/3698878Online publication date: 14-Oct-2024
  • (2024)Multi-Scale Feature Attention Fusion for Image Splicing Forgery DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369877021:1(1-20)Online publication date: 21-Dec-2024
  • (2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings
NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2
December 2014
3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

  • Article

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)0
  • Downloads (Last 6 weeks)0
Reflects downloads up to 03 Jan 2025

Other Metrics

Citations

Cited By

View all
  • (2024)A Unified Framework for Multi-Domain CTR Prediction via Large Language ModelsACM Transactions on Information Systems10.1145/3698878Online publication date: 14-Oct-2024
  • (2024)Multi-Scale Feature Attention Fusion for Image Splicing Forgery DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369877021:1(1-20)Online publication date: 21-Dec-2024
  • (2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
  • (2024)Neural Click Models for Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657939(2553-2558)Online publication date: 10-Jul-2024
  • (2024)Multi‐view stereo for weakly textured indoor 3D reconstructionComputer-Aided Civil and Infrastructure Engineering10.1111/mice.1314939:10(1469-1489)Online publication date: 6-Jan-2024
  • (2024)A Fuzzy Multigranularity Convolutional Neural Network With Double Attention Mechanisms for Measuring Semantic Textual SimilarityIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.342780132:10(5762-5776)Online publication date: 15-Jul-2024
  • (2024)RGAMIET Computer Vision10.1049/cvi2.1232318:8(1362-1375)Online publication date: 28-Dec-2024
  • (2023)Advanced Attention for Causality Classification of Verb Nodes of Knowledge GraphProceedings of the 7th International Conference on Algorithms, Computing and Systems10.1145/3631908.3631928(140-144)Online publication date: 19-Oct-2023
  • (2023)Attention-guided Adversarial Attack for Video Object SegmentationACM Transactions on Intelligent Systems and Technology10.1145/361706714:6(1-22)Online publication date: 14-Nov-2023
  • (2023)Dual Scene Graph Convolutional Network for Motivation PredictionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357291419:3s(1-23)Online publication date: 14-Mar-2023
  • Show More Cited By

View Options

View options

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media