Article

Learning deep features for scene recognition using places database

Authors:

Agata Lapedriza,

Jianxiong Xiao,

Antonio Torralba,

Aude OlivaAuthors Info & Claims

NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1

Pages 487 - 495

Published: 08 December 2014 Publication History

Abstract

Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success. This may be because current deep features trained from ImageNet are not competitive enough for such tasks. Here, we introduce a new scene-centric database called Places with over 7 million labeled pictures of scenes. We propose new methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity. Using CNN, we learn deep features for scene recognition tasks, and establish new state-of-the-art results on several scene-centric datasets. A visualization of the CNN layers' responses allows us to show differences in the internal representations of object-centric and scene-centric networks.

References

[1]

P. Agrawal, R. Girshick, and J. Malik. Analyzing the performance of multilayer neural networks for object recognition. In Proc. ECCV. 2014.

[2]

Y. Bengio. Learning deep architectures for ai. Foundations and trends® in Machine Learning, 2009.

[3]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proc. CVPR, 2009.

[4]

C. Doersch, A. Gupta, and A. A. Efros. Mid-level visual element discovery as discriminative mode seeking. In In Advances in Neural Information Processing Systems, 2013.

[5]

J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. 2014.

[6]

R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. 2008.

[7]

L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 2007.

[8]

G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. 2007.

[9]

C. Heip, P. Herman, and K. Soetaert. Indices of diversity and evenness. Oceanis, 1998.

[10]

Y. Jia. Caffe: An open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/, 2013.

[11]

T. Konkle, T. F. Brady, G. A. Alvarez, and A. Oliva. Scene memory is more detailed than you think: the role of categories in visual long-term memory. Psych Science, 2010.

[12]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In In Advances in Neural Information Processing Systems, 2012.

[13]

S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR, 2006.

[14]

Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989.

[15]

L.-J. Li and L. Fei-Fei. What, where and who? classifying events by scene and object recognition. In Proc. ICCV, 2007.

[16]

A. Oliva. Scene perception (chapter 51). The New Visual Neurosciences, 2013.

[17]

A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int'l Journal of Computer Vision, 2001.

[18]

G. Patterson and J. Hays. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In Proc. CVPR, 2012.

[19]

A. Quattoni and A. Torralba. Recognizing indoor scenes. In Proc. CVPR, 2009.

[20]

A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. arXiv preprint arXiv:1403.6382, 2014.

[21]

J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. Int'l Journal of Computer Vision, 2013.

[22]

E. H. Simpson. Measurement of diversity. Nature, 1949.

[23]

A. Torralba and A. A. Efros. Unbiased look at dataset bias. In Proc. CVPR, 2011.

[24]

J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Proc. CVPR, 2010.

[25]

B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. Fei-Fei. Human action recognition by learning bases of action attributes and parts. In Proc. ICCV, 2011.

Cited By

Bai XLuo YJiang LGupta AKaveti PSingh HOstadabbas S(2024)Bridging the Domain Gap between Synthetic and Real-World Data for Autonomous DrivingACM Journal on Autonomous Transportation Systems10.1145/36334631:2(1-15)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3633463
Almazrouei kBou Nassif A(2023)Image Scenario classification using Machine learningProceedings of the 2023 6th International Conference on Computational Intelligence and Intelligent Systems10.1145/3638209.3638229(130-138)Online publication date: 25-Nov-2023
https://dl.acm.org/doi/10.1145/3638209.3638229
Zhang YKang BHooi BYan SFeng J(2023)Deep Long-Tailed Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326811845:9(10795-10816)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3268118
Show More Cited By

Learning deep features for scene recognition using places database
1. Computing methodologies
  1. Machine learning
    1. Machine learning approaches

Recommendations

Face recognition using dual-tree complex wavelet features

We propose a novel facial representation based on the dual-tree complex wavelet transform for face recognition. It is effective and efficient to represent the geometrical structures in facial image with low redundancy. Moreover, we experimentally verify ...
Food image recognition with deep convolutional features
UbiComp '14 Adjunct: Proceedings of the 2014 ACM International Joint Conference on Pervasive and Ubiquitous Computing: Adjunct Publication

In this paper, we report the feature obtained from the Deep Convolutional Neural Network boosts food recognition accuracy greatly by integrating it with conventional hand-crafted image features, Fisher Vectors with HoG and Color patches. In the ...
Transductive Semi-Supervised Deep Learning Using Min-Max Features
Computer Vision – ECCV 2018
Abstract
In this paper, we propose Transductive Semi-Supervised Deep Learning (TSSDL) method that is effective for training Deep Convolutional Neural Network (DCNN) models. The method applies transductive learning principle to DCNN training, introduces ...

Comments

Information & Contributors

Information

Published In

cover image Guide Proceedings

NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1

December 2014

3697 pages

Publisher

MIT Press

Cambridge, MA, United States

Publication History

Published: 08 December 2014

Qualifiers

Article

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

232
Total Citations
View Citations
1
Total Downloads

Downloads (Last 12 months)0
Downloads (Last 6 weeks)0

Reflects downloads up to 24 Dec 2024

Other Metrics

View Author Metrics

Citations

Cited By

Bai XLuo YJiang LGupta AKaveti PSingh HOstadabbas S(2024)Bridging the Domain Gap between Synthetic and Real-World Data for Autonomous DrivingACM Journal on Autonomous Transportation Systems10.1145/36334631:2(1-15)Online publication date: 8-Apr-2024
https://dl.acm.org/doi/10.1145/3633463
Almazrouei kBou Nassif A(2023)Image Scenario classification using Machine learningProceedings of the 2023 6th International Conference on Computational Intelligence and Intelligent Systems10.1145/3638209.3638229(130-138)Online publication date: 25-Nov-2023
https://dl.acm.org/doi/10.1145/3638209.3638229
Zhang YKang BHooi BYan SFeng J(2023)Deep Long-Tailed Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326811845:9(10795-10816)Online publication date: 1-Sep-2023
https://dl.acm.org/doi/10.1109/TPAMI.2023.3268118
Liao WZeng BLiu JWei PFang J(2022)Image-text interaction graph neural network for image-text sentiment analysisApplied Intelligence10.1007/s10489-021-02936-952:10(11184-11198)Online publication date: 1-Aug-2022
https://dl.acm.org/doi/10.1007/s10489-021-02936-9
Raipurkar PPal RRaman SChellappa RChaudhury SArora CChaudhuri PMaji S(2021)HDR-cGANProceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3490035.3490275(1-9)Online publication date: 19-Dec-2021
https://dl.acm.org/doi/10.1145/3490035.3490275
Jin XLi ZLiu KZou DLi XZhu XZhou ZSun QLiu QShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Focusing on PersonsProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3481544(1176-1184)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3481544
Yin YZhang YLiu ZLiang YWang SShah RZimmermann RShen HZhuang YSmith JYang YCesar PMetze FPrabhakaran B(2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
https://dl.acm.org/doi/10.1145/3474085.3475268
Dong ZWu WXu ZSun QYuan GLiu LFu X(2021)Tailored Reality: Perception-aware Scene Restructuring for Adaptive VR NavigationACM Transactions on Graphics10.1145/347084740:5(1-15)Online publication date: 8-Oct-2021
https://dl.acm.org/doi/10.1145/3470847
Lokoč JVeselý PMejzlík FKovalčík GSouček TRossetto LSchoeffmann KBailer WGurrin CSauter LSong JVrochidis SWu JJónsson B(2021)Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020ACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344503117:3(1-26)Online publication date: 22-Jul-2021
https://dl.acm.org/doi/10.1145/3445031
Gupta SSharma KDinesh DThenkanidiyoor V(2021)Visual Semantic-Based Representation Learning Using Deep CNNs for Scene RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/343649417:2(1-24)Online publication date: 11-May-2021
https://dl.acm.org/doi/10.1145/3436494
Show More Cited By

View Options

View options

Media

Figures

Other

Tables

View Table of Contents