article

Saliency-based selection of visual content for deep convolutional neural networks

Authors:

A. Montoya Obeso,

J. Benois-Pineau,

M. S. Vázquez,

A. A. AcostaAuthors Info & Claims

Multimedia Tools and Applications, Volume 78, Issue 8

Pages 9553 - 9576

https://doi.org/10.1007/s11042-018-6515-2

Published: 01 April 2019 Publication History

Abstract

The automatic description of digital multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance of application of these methods. We address classification problem in cultural heritage such as classification of architectural styles in digital photographs of Mexican cultural heritage. In general, the selection of relevant content in the scene for training classification models makes the models more efficient in terms of accuracy and training time. Here we use a saliency-driven approach to predict visual attention in images and use it to train a Deep Convolutional Neural Network. Also, we present an analysis of the behavior of the models trained under the state-of-the-art image cropping and the saliency maps. To train invariant models to rotations, data augmentation of training set is required, which posses problems of filling normalization of crops, we study were different padding techniques and we find an optimal solution. The results are compared with the state-of-the-art in terms of accuracy and training time. Furthermore, we are studying saliency cropping in training and generalization for another classical task such as weak labeling of massive collections of images containing objects of interest. Here the experiments are conducted on a large subset of ImageNet database. This work is an extension of preliminary research in terms of image padding methods and generalization on large scale generic database.

References

[1]

Alexe B, Deselaers T, Ferrari V (2012) Measuring the objectness of image windows. IEEE Trans Pattern Anal Mach Intell 34(11):2189---2202

Digital Library

[2]

Ali H, Seifert C, Jindal N, Paletta L, Paar G (2007) Window detection in facades. In: 2007 14th international conference on image analysis and processing, ICIAP 2007. IEEE, pp 837---842

Digital Library

[3]

Benois-Pineau J, Callet PL (eds) (2017) Visual content indexing and retrieval with psychovisual models. Springer, Heidelberg

Digital Library

[4]

Benois-Pineau J, Mitrea M (2017) Extraction of saliency in images and video: Problems, methods and applications. A survey. In: 2017 Seventh international conference on image processing theory, tools and applications (IPTA). IEEE, Montreal, Canada. https://hal.archives-ouvertes.fr/hal-01766387

[5]

Berg AC, Grabler F, Malik J (2007) Parsing images of architectural scenes. In: IEEE 11th international conference on 2007 computer vision, ICCV 2007. IEEE, pp 1---8

[6]

Bhowmik N, Gouet-Brunet V, Bloch G, Besson S (2017) Combination of image descriptors for the exploration of cultural photographic collections. J Electron Imag 26(1):011,019---011,019

[7]

Buso V, González-díaz I, Benois-Pineau J (2015) Goal-oriented top-down probabilistic visual attention model for recognition of manipulated objects in egocentric videos. Sig Proc Image Commun 39:418---431.

Digital Library

[8]

Buswell GT (1935) How people look at pictures: a study of the psychology and perception in art

[9]

Bylinskii Z, Recasens A, Borji A, Oliva A, Torralba A, Durand F (2016) Where should saliency models look next?. In: European conference on computer vision. Springer, pp 809---824

[10]

de Carvalho Soares R, da Silva IR, Guliato D (2012) Spatial locality weighting of features using saliency map with a bag-of-visual-words approach. In: 2012 IEEE 24th international conference on tools with artificial intelligence (ICTAI), vol 1. IEEE, pp 1070---1075

Digital Library

[11]

De San Roman PP, Benois-Pineau J, Domenger JP, De Rugy A, Paclet F, Cataert D (2017) Saliency driven object recognition in egocentric videos with deep cnn: toward application in assistance to neuroprostheses Computer Vision and Image Understanding

[12]

Ghodrati A, Diba A, Pedersoli M, Tuytelaars T, Van Gool L (2017) Deepproposals: hunting objects and actions by cascading deep convolutional layers. Int J Comput Vis 124(2):115---131.

Digital Library

[13]

Girshick R (2015) Fast r-cnn. In: Proceedings of the IEEE international conference on computer vision, pp 1440---1448

Digital Library

[14]

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 580---587

Digital Library

[15]

González-Díaz I, Buso V, Benois-Pineau J (2016) Perceptual modeling in the problem of active object recognition in visual scenes. Pattern Recogn 56:129---141

Digital Library

[16]

GPU NDIDL (2015) Training system

[17]

Harel J, Koch C, Perona P (2007) Graph-based visual saliency. In: Advances in neural information processing systems, pp 545---552

Digital Library

[18]

He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 770---778

[19]

Howard AG (2013) Some improvements on deep convolutional neural network based image classification. arXiv:1312.5402

[20]

Itti L, Koch C (2001) Computational modelling of visual attention. Nature Rev Neuroscience 2(3):194

[21]

Jia Y, Shelhamer E, Donahue J, Karayev S, Long J, Girshick R, Guadarrama S, Darrell T (2014) Caffe: convolutional architecture for fast feature embedding. In: Proceedings of the 22nd ACM international conference on multimedia. ACM, pp 675---678

Digital Library

[22]

Krizhevsky A, Sutskever I, Hinton G (2012) Imagenet classification with deep convolutional neural networks. In: Advances in neural information processing systems, pp 1097---1105

Digital Library

[23]

Liu Z, Wang J, Liu W (2005) Building extraction from high resolution imagery based on multi-scale object oriented classification and probabilistic hough transform. In: 2005 Proceedings of the IEEE international geoscience and remote sensing symposium, 2005. IGARSS'05, vol 4. IEEE, pp 2250---2253

[24]

Llamas J, Lerones PM, Zalama E, Gómez-garcía-bermejo J (2016) Applying deep learning techniques to cultural heritage images within the inception project. In: Euro-mediterranean conference. Springer, pp 25---32

[25]

Mahadevan V, Vasconcelos N (2013) Biologically inspired object tracking using center-surround saliency mechanisms. IEEE Trans Pattern Anal Mach Intell 35 (3):541---554

Digital Library

[26]

Mathe S, Sminchisescu C (2012) Dynamic eye movement datasets and learnt saliency models for visual action recognition. Computer Vision---ECCV 2012:842---856

[27]

Mathias M, Martinovic A, Weissenberg J, Haegler S, Van Gool L (2011) Automatic architectural style recognition. ISPRS-international archives of the photogrammetry. Remote Sens Spatial Inform Sci 3816:171---176

[28]

Nesterov Y (1983) A method of solving a convex programming problem with convergence rate o (1/k2). In: Soviet mathematics Doklady, vol 27, pp 372---376

[29]

Obeso AM, Benois-Pineau J, Acosta AAR, Vázquez MSG (2016) Architectural style classification of mexican historical buildings using deep convolutional neural networks and sparse features. J Electron Imag 26(1):011,016.

[30]

Obeso AM, Reyes LMA, Rodriguez ML, Cruz MHM, Vázquez MSG, Benois-Pineau J, Fuentes LMZ, Martinez EC, Secundino JAF, Martinez JLR et al (2016) Image annotation for mexican buildings database. In: International society for optics and photonics of the SPIE optical engineering+ applications, pp 99,700y---99,700y

[31]

Obeso AM, Vázquez MSG, Acosta AAR, Benois-Pineau J (2017) Connoisseur: classification of styles of mexican architectural heritage with deep learning and visual attention prediction. In: Proceedings of the 15th international workshop on content-based multimedia indexing, vol 16. ACM

Digital Library

[32]

Papushoy A, Bors AG (2015) Image retrieval based on query by saliency content. Digital Signal Process 36:156---173

Digital Library

[33]

Pont-Tuset J, Arbeláez P, Barron JT, Marques F, Malik J (2017) Multiscale combinatorial grouping for image segmentation and object proposal generation. IEEE Trans Pattern Anal Mach Intell 39(1):128---140.

Digital Library

[34]

Ren S, He K, Girshick R, Sun J (2015) Faster r-cnn: towards real-time object detection with region proposal networks. In: Advances in neural information processing systems, pp 91---99

Digital Library

[35]

Ren X, Gu C (2010) Figure-ground segmentation improves handled object recognition in egocentric video. In: 2010 IEEE conference on Computer vision and pattern recognition (CVPR). IEEE, pp 3137---3144

[36]

Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, Berg AC, Fei-Fei L (2015) Imagenet large scale visual recognition challenge. International Journal of Computer Vision (IJCV) 115(3):211---252.

Digital Library

[37]

San Biagio M, Bazzani L, Cristani M, Murino V (2014) Weighted bag of visual words for object recognition. In: 2014 IEEE international conference on image processing (ICIP). IEEE, pp 2734---2738

[38]

Sermanet P, Eigen D, Zhang X, Mathieu M, Fergus R, LeCun Y (2013) Overfeat: integrated recognition, localization and detection using convolutional networks. CoRR arXiv:1312.6229

[39]

Shalunts G (2015) Architectural style classification of building facade towers. In: International symposium on visual computing. Springer, pp 285---294

[40]

Shalunts G, Haxhimusa Y, Sablatnig R (2011) Architectural style classification of building facade windows. In: International symposium on visual computing. Springer, pp 280---289

Digital Library

[41]

Shalunts G, Haxhimusa Y, Sablatnig R (2012) Classification of gothic and baroque architectural elements. In: 2012 19th international conference on systems, signals and image processing (IWSSIP). IEEE, pp 316---319

[42]

Sharma G, Jurie F, Schmid C (2012) Discriminative spatial saliency for image classification. In: 2012 IEEE conference on computer vision and pattern recognition (CVPR). IEEE, pp 3506---3513

Digital Library

[43]

Sikora T, Makai B (1995) Shape-adaptive dct for generic coding of video. IEEE Trans Circuit Syst Video Technol 5(1):59---62

Digital Library

[44]

Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556

[45]

Su Y, Zhao Q, Zhao L, Gu D (2014) Abrupt motion tracking using a visual saliency embedded particle filter. Pattern Recogn 47(5):1826---1834

Digital Library

[46]

Sutskever I, Martens J, Dahl G, Hinton G (2013) On the importance of initialization and momentum in deep learning. In: International conference on machine learning, pp 1139---1147

Digital Library

[47]

Szegedy C, Liu W, Jia Y, Sermanet P, Reed S, Anguelov D, Erhan D, Vanhoucke V, Rabinovich A (2015) Going deeper with convolutions. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 1---9

[48]

Uijlings JRR, Van De Sande KEA, Gevers T, Smeulders AWM (2013) Selective search for object recognition. Int J Comput Vis 104(2):154---171.

Digital Library

[49]

Viola PA, Jones MJ (2001) Rapid object detection using a boosted cascade of simple features. In: 2001 IEEE computer society conference on computer vision and pattern recognition (CVPR 2001), with CD-ROM, 8-14 December 2001, Kauai, HI, USA, pp 511---518.

[50]

Wang Q, Yuan Y, Yan P (2013) Visual saliency by selective contrast. IEEE Trans Circuit Syst Video Technol 23(7):1150---1155

Digital Library

[51]

Wang Q, Yuan Y, Yan P, Li X (2013) Saliency detection by multiple-instance learning. IEEE Trans Cybern 43(2):660---672

[52]

Xu Z, Tao D, Zhang Y, Wu J, Tsoi AC (2014) Architectural style classification using multinomial latent logistic regression. In: European conference on computer vision. Springer, pp 600---615

[53]

Zeiler MD, Fergus R (2014) Visualizing and understanding convolutional networks. In: European conference on computer vision. Springer, pp 818---833

[54]

Zhang B, Song Y, Guan SU, Zhang Y (2010) Historic chinese architectures image retrieval by svm and pyramid histogram of oriented gradients features. Int J Soft Comput 5(2):19---28

Cited By

Yebda TBenois-Pineau JPech MAmièva HGurrin CGurrin CÞór Jónsson BKando NSchoeffmann KChen PO'Connor N(2020)Detection of Semantic Risk Situations in Lifelog Data for Improving Life of Frail PeopleProceedings of the 2020 International Conference on Multimedia Retrieval10.1145/3372278.3391931(402-406)Online publication date: 8-Jun-2020
https://dl.acm.org/doi/10.1145/3372278.3391931

Index Terms

Saliency-based selection of visual content for deep convolutional neural networks
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
  2. Machine learning
    1. Learning paradigms
      1. Supervised learning
        Supervised learning by classification
    2. Machine learning approaches
      1. Neural networks

Index terms have been assigned to the content through auto-classification.

Recommendations

Connoisseur: classification of styles of Mexican architectural heritage with deep learning and visual attention prediction
CBMI '17: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

The automatic description of multimedia content was mainly developed for classification tasks, retrieval systems and massive ordering of data. Preservation of cultural heritage is a field of high importance for application to this method. Our problem is ...
Discriminative deep belief networks for visual data classification

Visual data classification using insufficient labeled data is a well-known hard problem. Semi-supervise learning, which attempts to exploit the unlabeled data in additional to the labeled ones, has attracted much attention in recent years. This paper ...
Semi- and Weakly- Supervised Semantic Segmentation with Deep Convolutional Neural Networks
MM '15: Proceedings of the 23rd ACM international conference on Multimedia

Successful semantic segmentation methods typically rely on the training datasets containing a large number of pixel-wise labeled images. To alleviate the dependence on such a fully annotated training dataset, in this paper, we propose a semi- and weakly-...

Comments

Information & Contributors

Information

Published In

cover image Multimedia Tools and Applications

Multimedia Tools and Applications Volume 78, Issue 8

Apr 2019

1542 pages

ISSN:1380-7501

Issue’s Table of Contents

Copyright © Copyright © 2019 Springer Science+Business Media, LLC, part of Springer Nature.

Publisher

Kluwer Academic Publishers

United States

Publication History

Published: 01 April 2019

Author Tags

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

1
Total Citations
View Citations
0
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 15 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Yebda TBenois-Pineau JPech MAmièva HGurrin CGurrin CÞór Jónsson BKando NSchoeffmann KChen PO'Connor N(2020)Detection of Semantic Risk Situations in Lifelog Data for Improving Life of Frail PeopleProceedings of the 2020 International Conference on Multimedia Retrieval10.1145/3372278.3391931(402-406)Online publication date: 8-Jun-2020
https://dl.acm.org/doi/10.1145/3372278.3391931

View Options

View options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Media

Figures

Other

Tables

View Issue’s Table of Contents