research-article

Perceptual modeling in the problem of active object recognition in visual scenes

Authors:

Iván González-Díaz,

Jenny Benois-PineauAuthors Info & Claims

Pattern Recognition, Volume 56, Issue C

Pages 129 - 141

https://doi.org/10.1016/j.patcog.2016.03.007

Published: 01 August 2016 Publication History

Abstract

Incorporating models of human perception into the process of scene interpretation and object recognition in visual content is a strong trend in computer vision. In this paper we tackle the modeling of visual perception via automatic visual saliency maps for object recognition. Visual saliency represents an efficient way to drive the scene analysis towards particular areas considered 'of interest' for a viewer and an efficient alternative to computationally intensive sliding window methods for object recognition. Using saliency maps, we consider biologically inspired independent paths of central and peripheral vision and apply them to fundamental steps of the so-called Bag-of-Words (BoW) paradigm, such as features sampling, pooling and encoding. Our proposal has been evaluated addressing the challenging task of active object recognition, and the results show that our method not only improves the baselines, but also achieves state-of-the-art performance in various datasets at very competitive computational times. HighlightsPerceptual model that incorporates visual attention to the problem of active object recognition.Modeling of foveal and peripheral pathways in retina.Saliency-based non-uniform feature sampling in a variable-resolution space.Saliency-sensitive Coding of features.Saliency-based Pooling of features.

References

[1]

J. Sivic, A. Zisserman, Video Google: a text retrieval approach to object matching in videos, in: Proceedings of the International Conference on Computer Vision, vol. 2, pp. 1470-1477.

Digital Library

[2]

G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray, Visual categorization with bags of keypoints, in: Workshop on Statistical Learning in Computer Vision, ECCV, pp. 1-22.

[3]

P. Viola, M. Jones, Rapid object detection using a boosted cascade of simple features, in: IEEE Conference on Computer Vision and Pattern Recognition, vol. 1, pp. 511-518.

[4]

N. Dalal, B. Triggs, Histograms of oriented gradients for human detection, in: International Conference on Computer Vision & Pattern Recognition, vol. 2, pp. 886-893.

Digital Library

[5]

P.F. Felzenszwalb, R.B. Girshick, D.A. McAllester, D. Ramanan, Object detection with discriminatively trained part-based models, IEEE Trans. Pattern Anal. Mach. Intell., 32 (2010) 1627-1645.

Digital Library

[6]

C.H. Lampert, M.B. Blaschko, T. Hofmann, Beyond sliding windows: object localization by efficient subwindow search, in: IEEE Conference on Computer Vision and Pattern Recognition.

[7]

A. Borji, L. Itti, State-of-the-art in visual attention modeling, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 185-207.

Digital Library

[8]

X. Ren, C. Gu, Figure-ground segmentation improves handled object recognition in egocentric video, in: IEEE Conference on Computer Vision and Pattern Recognition.

[9]

B. Alexe, T. Deselaers, V. Ferrari, Measuring the objectness of image windows, IEEE Trans. Pattern Anal. Mach. Intell., 34 (2012) 2189-2202.

Digital Library

[10]

J. Uijlings, K. van de Sande, T. Gevers, A. Smeulders, Selective search for object recognition, Int. J. Comput. Vis., 104 (2013) 154-171.

Digital Library

[11]

L. Itti, C. Koch, Computational modelling of visual attention, Nat. Rev. Neurosci., 2 (2001) 194-203.

[12]

J. Harel, C. Koch, P. Perona, Graph-based visual saliency, in: Advances in Neural Information Processing Systems, vol. 19, MIT Press, Cambridge, MA, 2007, pp. 545-552.

Digital Library

[13]

R. de Carvalho Soares, I. da Silva, D. Guliato, Spatial locality weighting of features using saliency map with a BoW approach, in: International Conference on Tools with Artificial Intelligence, 2012, pp. 1070-1075.

Digital Library

[14]

G. Sharma, F. Jurie, C. Schmid, Discriminative spatial saliency for image classification, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2012, pp. 3506-3513.

Digital Library

[15]

M. San Biagio, L. Bazzani, M. Cristani, V. Murino, Weighted bag of visual words for object recognition, in: IEEE International Conference on Image Processing (ICIP), 2014, pp. 2734-2738.

[16]

V. Mahadevan, N. Vasconcelos, Biologically inspired object tracking using center-surround saliency mechanisms, IEEE Trans. Pattern Anal. Mach. Intell., 35 (2013) 541-554.

Digital Library

[17]

Y. Su, Q. Zhao, L. Zhao, D. Gu, Abrupt motion tracking using a visual saliency embedded particle filter, Pattern Recognit., 47 (2014) 1826-1834.

Digital Library

[18]

E. Vig, M. Dorr, D. Cox, Space-Variant Descriptor Sampling for Action Recognition Based on Saliency and Eye Movements, Springer, Firenze, Italy, pp. 84-97.

Digital Library

[19]

S. Mathe, C. Sminchisescu, Dynamic eye movement datasets and learnt saliency models for visual action recognition, in: European Conference on Computer Vision (ECCV), 2012, pp. 842-856.

[20]

I. González-Díaz, V. Buso, J. Benois-Pineau, G. Bourmaud, R. Megret, Modeling instrumental activities of daily living in egocentric vision as sequences of active objects and context for alzheimer disease research, in: ACM MM MIIRH Workshop.

[21]

S. Karaman, J. Benois-Pineau, V. Dovgalecs, R. Mégret, J. Pinquier, R. André-Obrecht, Y. Gaëstel, J.-F. Dartigues, Hierarchical hidden Markov model in detecting activities of daily living in wearable videos for studies of dementia, Multimed. Tools Appl. (2011) 1-29.

Digital Library

[22]

A. Fathi, X. Ren, J. M. Rehg, Learning to recognize objects in egocentric activities, in: IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 3281-3288.

Digital Library

[23]

A. Fathi, Y. Li, J.M. Rehg, Learning to recognize daily actions using gaze, in: European Conference on Computer Vision, ECCV' 12, pp. 314-327.

Digital Library

[24]

K. Ogaki, K. M. Kitani, Y. Sugano, Y. Sato, Coupling eye-motion and ego-motion features for first-person activity recognition, in: IEEE Conference on Computer Vision and Pattern Recognition Workshops, 2012, pp. 1-7.

[25]

H.L. Fernandes, I.H. Stevenson, A.N. Phillips, M.A. Segraves, K.P. Kording, Saliency and saccade encoding in the frontal eye field during natural scene search, Cereb. Cortex (2013).

[26]

D. Wooding, Eye movements of large populations, Behav. Res. Methods Instrum. Comput., 34 (2002) 518-528.

[27]

D. Walther, U. Rutishauser, C. Koch, P. Perona, On the usefulness of attention for object recognition, in: Workshop on Attention and Performance in Computational Vision at ECCV, pp. 96-103.

[28]

F. Moosmann, D. Larlus, F. Jurie, Learning saliency maps for object categorization, in: ECCV'06 Workshop on the Representation and Use of Prior Knowledge in Vision.

[29]

H. Larochelle, G.E. Hinton, Learning to combine foveal glimpses with a third-order Boltzmann machine, in: Advances in Neural Information Processing Systems, vol. 23, pp. 1243-1251.

[30]

H. Boujut, J. Benois-Pineau, R. Megret, Fusion of multiple visual cues for visual saliency extraction from wearable camera settings with strong motion, in: European Conference on Computer Vision Workshops, 2012.

Digital Library

[31]

O. Brouard, V. Ricordel, D. Barba, Cartes de Saillance Spatio-Temporelle basées Contrastes de Couleur et Mouvement Relatif, in: Compression et representation des signaux audiovisuels, CORESA 2009, Toulouse, France. 6 p. 2009.

[32]

C. Chamaret, J.-C. Chevet, O. Le Meur, Spatio-temporal combination of saliency maps and eye-tracking assessment of different strategies, in: IEEE International Conference on Image Processing (ICIP), 2010, pp. 1077-1080.

[33]

D. Ramirez-Moreno, O. Schwartz, J. Ramirez-Villegas, A saliency-based bottom-up visual attention model for dynamic scenes analysis, Biol. Cybern., 107 (2013) 141-160.

Digital Library

[34]

H. Bay, A. Ess, T. Tuytelaars, L. Van Gool, Speeded-up robust features (SURF), Comput. Vis. Image Underst., 110 (2008) 346-359.

Digital Library

[35]

D.G. Lowe, Distinctive image features from scale-invariant keypoints, Int. J. Comput. Vis., 60 (2004) 91-110.

Digital Library

[36]

V. Sreekanth, A. Vedaldi, C. Jawahar, A. Zisserman, Generalized RBF feature maps for efficient detection, in: British Machine Vision Conference 2010.

[37]

B.A. Wandell, Foundations of Vision, Sinauer Associates Inc., Sunderland, 1995.

[38]

S. Liversedge, I. Gilchrist, S. Everling, The Oxford Handbook of Eye Movements, Oxford Library of Psychology, Oxford, 2011 (Chapter 33).

[39]

E.-C. Chang, S. Mallat, C. Yap, Wavelet foveation, Appl. Comput. Harmon. Anal., 9 (2000) 312-335.

[40]

J.S. Perry, W. S. Geisler, Gaze-contingent real-time simulation of arbitrary visual fields, in: SPIE Proceedings on Human Vision and Electronic Imaging, pp. 57-69.

[41]

M. Marszalek, C. Schmid, Spatial weighting for bag-of-features, in: IEEE Conference on Computer Vision & Pattern Recognition, vol. 2, pp. 2118-2125.

Digital Library

[42]

J. Yang, K. Yu, Y. Gong, T. Huang, Linear spatial pyramid matching using sparse coding for image classification, in: IEEE Conference on Computer Vision and Pattern Recognition, 2009, pp. 1794-1801.

[43]

J. Wang, J. Yang, K. Yu, F. Lv, T. Huang, Y. Gong, Locality-constrained linear coding for image classification, in: IEEE Conference on Computer Vision and Pattern Recognition.

[44]

C.-P. Wei, Y.-W. Chao, Y.-R. Yeh, Y.-C.F. Wang, Locality-sensitive dictionary learning for sparse representation based classification, Pattern Recognit., 46 (2013) 1277-1287.

Digital Library

[45]

H. Pirsiavash, D. Ramanan, Detecting activities of daily living in first-person camera views, in: 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[46]

B. Yao, L. Fei-Fei, Grouplet: a structured image representation for recognizing human and object interactions, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, USA.

[47]

M. Everingham, L. Gool, C.K. Williams, J. Winn, A. Zisserman, The pascal visual object classes (VOC) challenge, Int. J. Comput. Vis., 88 (2010) 303-338.

Digital Library

[48]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, L. Fei-Fei, ImageNet: a large-scale hierarchical image database, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2009.

[49]

S. Lazebnik, C. Schmid, J. Ponce, Beyond bags of features: spatial pyramid matching for recognizing natural scene categories, in: IEEE Conference on Computer Vision and Pattern Recognition, 2006, pp. 2169-2178.

Digital Library

[50]

B. Yao, A. Khosla, L. Fei-Fei, Combining randomization and discrimination for fine-grained image categorization, in: IEEE Conference on Computer Vision and Pattern Recognition, 2011.

Digital Library

Cited By

Obeso ABenois-Pineau JGarcía Vázquez MAcosta A(2022)Visual vs internal attention mechanisms in deep neural networks for image classification and object detectionPattern Recognition10.1016/j.patcog.2021.108411123:COnline publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.patcog.2021.108411
Parikh AKoch MBlada TBuerger S(2020)Rapid Autonomous Semantic Mapping2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS45743.2020.9341564(6156-6163)Online publication date: 24-Oct-2020
https://dl.acm.org/doi/10.1109/IROS45743.2020.9341564
Obeso ABenois-Pineau JVázquez MAcosta A(2019)Saliency-based selection of visual content for deep convolutional neural networksMultimedia Tools and Applications10.1007/s11042-018-6515-278:8(9553-9576)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1007/s11042-018-6515-2
Show More Cited By

Index Terms

Perceptual modeling in the problem of active object recognition in visual scenes
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
  2. Machine learning

Index terms have been assigned to the content through auto-classification.

Recommendations

Color difference weighted adaptive residual preprocessing using perceptual modeling for video compression

In this paper, we investigate a method for selectively modifying a video stream using a color contrast sensitivity model based on the human visual system. The model identifies regions of high variance with frame-to-frame differences that are visually ...
Bayesian modeling of visual attention
ICONIP'12: Proceedings of the 19th international conference on Neural Information Processing - Volume Part II

The mechanism in the brain that determines which part of the multitude of sensory data is currently of most interest is called selective attention. There are two kinds of attention cues, stimulus-driven bottom-up cues and goal-driven top-down cues ...
Visual saliency detection based on region descriptors and prior knowledge

Visual saliency detection not only plays a significant role, but it is also a challenging task in computer vision. In this paper we propose a new method for saliency detection. It incorporates visual features and spatial information with a guidance of ...

Comments

Information & Contributors

Information

Published In

cover image Pattern Recognition

Pattern Recognition Volume 56, Issue C

August 2016

184 pages

ISSN:0031-3203

Issue’s Table of Contents

Copyright © Elsevier Ltd.

Publisher

Elsevier Science Inc.

United States

Publication History

Published: 01 August 2016

Author Tags

Qualifiers

Research-article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

7
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 09 Aug 2024

Other Metrics

View Author Metrics

Citations

Cited By

Obeso ABenois-Pineau JGarcía Vázquez MAcosta A(2022)Visual vs internal attention mechanisms in deep neural networks for image classification and object detectionPattern Recognition10.1016/j.patcog.2021.108411123:COnline publication date: 1-Mar-2022
https://dl.acm.org/doi/10.1016/j.patcog.2021.108411
Parikh AKoch MBlada TBuerger S(2020)Rapid Autonomous Semantic Mapping2020 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)10.1109/IROS45743.2020.9341564(6156-6163)Online publication date: 24-Oct-2020
https://dl.acm.org/doi/10.1109/IROS45743.2020.9341564
Obeso ABenois-Pineau JVázquez MAcosta A(2019)Saliency-based selection of visual content for deep convolutional neural networksMultimedia Tools and Applications10.1007/s11042-018-6515-278:8(9553-9576)Online publication date: 1-Apr-2019
https://dl.acm.org/doi/10.1007/s11042-018-6515-2
Wang MLuo CNi BYuan JWang JYan S(2018)First-Person Daily Activity Recognition With Manipulated Object Proposals and Non-Linear Feature FusionIEEE Transactions on Circuits and Systems for Video Technology10.1109/TCSVT.2017.271681928:10(2946-2955)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1109/TCSVT.2017.2716819
Obeso AVázquez MAcosta ABenois-Pineau J(2017)ConnoisseurProceedings of the 15th International Workshop on Content-Based Multimedia Indexing10.1145/3095713.3095730(1-7)Online publication date: 19-Jun-2017
https://dl.acm.org/doi/10.1145/3095713.3095730
Li AChen Z(2017)Individual trait oriented scanpath prediction for visual attention analysis2017 IEEE International Conference on Image Processing (ICIP)10.1109/ICIP.2017.8296982(3745-3749)Online publication date: 17-Sep-2017
https://dl.acm.org/doi/10.1109/ICIP.2017.8296982
Gmez LKaratzas D(2017)TextProposalsPattern Recognition10.1016/j.patcog.2017.04.02770:C(60-74)Online publication date: 1-Oct-2017
https://dl.acm.org/doi/10.1016/j.patcog.2017.04.027

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents