Article

Recurrent models of visual attention

Authors:

Volodymyr Mnih,

Koray KavukcuogluAuthors Info & Claims

NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2

Pages 2204 - 2212

Published: 08 December 2014 Publication History

Abstract

Applying convolutional neural networks to large images is computationally expensive because the amount of computation scales linearly with the number of image pixels. We present a novel recurrent neural network model that is capable of extracting information from an image or video by adaptively selecting a sequence of regions or locations and only processing the selected regions at high resolution. Like convolutional neural networks, the proposed model has a degree of translation invariance built-in, but the amount of computation it performs can be controlled independently of the input image size. While the model is non-differentiable, it can be trained using reinforcement learning methods to learn task-specific policies. We evaluate our model on several image classification tasks, where it significantly outperforms a convolutional neural network baseline on cluttered images, and on a dynamic visual control problem, where it learns to track a simple object without an explicit training signal for doing so.

References

[1]

Bogdan Alexe, Thomas Deselaers, and Vittorio Ferrari. What is an object? In CVPR, 2010.

[2]

Bogdan Alexe, Nicolas Heess, Yee Whye Teh, and Vittorio Ferrari. Searching for objects driven by context. In NIPS, 2012.

Digital Library

[3]

James Bergstra and Yoshua Bengio. Random search for hyper-parameter optimization. The Journal of Machine Learning Research, 13:281-305, 2012.

Digital Library

[4]

Nicholas J. Butko and Javier R. Movellan. Optimal scanning for faster object detection. In CVPR, 2009.

[5]

N.J. Butko and J.R. Movellan. I-pomdp: An infomax model of eye movement. In Proceedings of the 7th IEEE International Conference on Development and Learning, ICDL '08, pages 139-144, 2008.

[6]

Misha Denil, Loris Bazzani, Hugo Larochelle, and Nando de Freitas. Learning where to attend with deep architectures for image tracking. Neural Computation, 24(8):2151-2184, 2012.

Digital Library

[7]

Pedro F. Felzenszwalb, Ross B. Girshick, and David A. McAllester. Cascade object detection with deformable part models. In CVPR, 2010.

[8]

Ross B. Girshick, Jeff Donahue, Trevor Darrell, and Jitendra Malik. Rich feature hierarchies for accurate object detection and semantic segmentation. CoRR, abs/1311.2524, 2013.

Digital Library

[9]

Mary Hayhoe and Dana Ballard. Eye movements in natural behavior. Trends in Cognitive Sciences, 9(4):188-194, 2005.

[10]

Sepp Hochreiter and Jurgen Schmidhuber. Long short-term memory. Neural computation, 9(8):1735-1780, 1997.

Digital Library

[11]

L. Itti, C. Koch, and E. Niebur. A model of saliency-based visual attention for rapid scene analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence, 20(11):1254-1259, 1998.

Digital Library

[12]

Alex Krizhevsky, Ilya Sutskever, and Geoff Hinton. Imagenet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25, pages 1106-1114, 2012.

Digital Library

[13]

Christoph H. Lampert, Matthew B. Blaschko, and Thomas Hofmann. Beyond sliding windows: Object localization by efficient subwindow search. In CVPR, 2008.

[14]

Hugo Larochelle and Geoffrey E. Hinton. Learning to combine foveal glimpses with a third-order boltzmann machine. In NIPS, 2010.

[15]

Stefan Mathe and Cristian Sminchisescu. Action from still image dataset and inverse optimal control to learn task specific visual scanpaths. In NIPS, 2013.

[16]

Lucas Paletta, Gerald Fritz, and Christin Seifert. Q-learning of sequential attention for visual object recognition from informative local descriptors. In CVPR, 2005.

Digital Library

[17]

M. Ranzato. On Learning Where To Look. ArXiv e-prints, 2014.

[18]

Ronald A. Rensink. The dynamic representation of scenes. Visual Cognition, 7(1-3):17-42, 2000.

[19]

Pierre Sermanet, David Eigen, Xiang Zhang, Michael Mathieu, Rob Fergus, and Yann LeCun. Overfeat: Integrated recognition, localization and detection using convolutional networks. CoRR, abs/1312.6229, 2013.

[20]

Kenneth O. Stanley and Risto Miikkulainen. Evolving a roving eye for go. In GECCO, 2004.

[21]

Richard S. Sutton, David Mcallester, Satinder Singh, and Yishay Mansour. Policy gradient methods for reinforcement learning with function approximation. In NIPS, pages 1057-1063. MIT Press, 2000.

Digital Library

[22]

Antonio Torralba, Aude Oliva, Monica S Castelhano, and John M Henderson. Contextual guidance of eye movements and attention in real-world scenes: the role of global features in object search. Psychol Rev, pages 766-786, 2006.

[23]

K E A van de Sande, J.R.R. Uijlings, T Gevers, and A.W.M. Smeulders. Segmentation as Selective Search for Object Recognition. In ICCV, 2011.

Digital Library

[24]

Paul A. Viola and Michael J. Jones. Rapid object detection using a boosted cascade of simple features. In CVPR, 2001.

[25]

Daan Wierstra, Alexander Foerster, Jan Peters, and Juergen Schmidhuber. Solving deep memory pomdps with recurrent policy gradients. In ICANN. 2007.

Digital Library

[26]

R.J. Williams. Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8(3):229-256, 1992.

Digital Library

Cited By

Fu ZLi XWu CWang YDong KZhao XZhao MGuo HTang R(2024)A Unified Framework for Multi-Domain CTR Prediction via Large Language ModelsACM Transactions on Information Systems10.1145/3698878Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3698878
Liang EZhang KHua ZJia X(2024)Multi-Scale Feature Attention Fusion for Image Splicing Forgery DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369877021:1(1-20)Online publication date: 21-Dec-2024
https://dl.acm.org/doi/10.1145/3698770
Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Show More Cited By

Index Terms

Recurrent models of visual attention
1. Computer systems organization
  1. Architectures
    1. Other architectures
      1. Neural networks
2. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Recurrent attention unit: A new gated recurrent unit for long-term memory of important parts in sequential data
Abstract
Gated recurrent unit (GRU) is a variant of the recurrent neural network (RNN). It has been widely used in many applications, such as handwriting recognition and natural language processing. However, GRU can only memorize the sequential ...
A new recurrent neural-network architecture for visual pattern recognition

We propose a new type of recurrent neural-network architecture, in which each output unit is connected to itself and is also fully connected to other output units and all hidden units. The proposed recurrent neural network differs from Jordan's and ...
Attention Recurrent Neural Networks for Image-Based Sequence Text Recognition
Pattern Recognition
Abstract
Image-based sequence text recognition is an important research direction in the field of computer vision. In this paper, we propose a new model called Attention Recurrent Neural Networks (ARNNs) for the image-based sequence text recognition. ARNNs ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'14: Proceedings of the 28th International Conference on Neural Information Processing Systems - Volume 2

December 2014

3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

189
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 03 Jan 2025

Other Metrics

View Author Metrics

Citations

Cited By

Fu ZLi XWu CWang YDong KZhao XZhao MGuo HTang R(2024)A Unified Framework for Multi-Domain CTR Prediction via Large Language ModelsACM Transactions on Information Systems10.1145/3698878Online publication date: 14-Oct-2024
https://dl.acm.org/doi/10.1145/3698878
Liang EZhang KHua ZJia X(2024)Multi-Scale Feature Attention Fusion for Image Splicing Forgery DetectionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/369877021:1(1-20)Online publication date: 21-Dec-2024
https://dl.acm.org/doi/10.1145/3698770
Sponner MWaschneck BKumar A(2024)Adapting Neural Networks at Runtime: Current Trends in At-Runtime Optimizations for Deep LearningACM Computing Surveys10.1145/365728356:10(1-40)Online publication date: 14-May-2024
https://dl.acm.org/doi/10.1145/3657283
Shirokikh MShenbin IAlekseev AVolodkevich AVasilev ASavchenko ANikolenko SHui Yang GWang HHan SHauff CZuccon GZhang Y(2024)Neural Click Models for Recommender SystemsProceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval10.1145/3626772.3657939(2553-2558)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3626772.3657939
Wang TGan V(2024)Multi‐view stereo for weakly textured indoor 3D reconstructionComputer-Aided Civil and Infrastructure Engineering10.1111/mice.1314939:10(1469-1489)Online publication date: 6-Jan-2024
https://dl.acm.org/doi/10.1111/mice.13149
Zhao BZhang RBai K(2024)A Fuzzy Multigranularity Convolutional Neural Network With Double Attention Mechanisms for Measuring Semantic Textual SimilarityIEEE Transactions on Fuzzy Systems10.1109/TFUZZ.2024.342780132:10(5762-5776)Online publication date: 15-Jul-2024
https://dl.acm.org/doi/10.1109/TFUZZ.2024.3427801
Ning GLiu PDai CSun MZhou QLi Q(2024)RGAMIET Computer Vision10.1049/cvi2.1232318:8(1362-1375)Online publication date: 28-Dec-2024
https://dl.acm.org/doi/10.1049/cvi2.12323
Fan JRen XZhang HMa HWei XYue Y(2023)Advanced Attention for Causality Classification of Verb Nodes of Knowledge GraphProceedings of the 7th International Conference on Algorithms, Computing and Systems10.1145/3631908.3631928(140-144)Online publication date: 19-Oct-2023
https://dl.acm.org/doi/10.1145/3631908.3631928
Yao RChen YZhou YHu FZhao JLiu BShao Z(2023)Attention-guided Adversarial Attack for Video Object SegmentationACM Transactions on Intelligent Systems and Technology10.1145/361706714:6(1-22)Online publication date: 14-Nov-2023
https://dl.acm.org/doi/10.1145/3617067
Wanyan YYang XMa XXu C(2023)Dual Scene Graph Convolutional Network for Motivation PredictionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357291419:3s(1-23)Online publication date: 14-Mar-2023
https://dl.acm.org/doi/10.1145/3572914
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents