Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.5555/2968826.2968881guideproceedingsArticle/Chapter ViewAbstractPublication PagesnipsConference Proceedingsconference-collections
Article

Learning deep features for scene recognition using places database

Published: 08 December 2014 Publication History

Abstract

Scene recognition is one of the hallmark tasks of computer vision, allowing definition of a context for object recognition. Whereas the tremendous recent progress in object recognition tasks is due to the availability of large datasets like ImageNet and the rise of Convolutional Neural Networks (CNNs) for learning high-level features, performance at scene recognition has not attained the same level of success. This may be because current deep features trained from ImageNet are not competitive enough for such tasks. Here, we introduce a new scene-centric database called Places with over 7 million labeled pictures of scenes. We propose new methods to compare the density and diversity of image datasets and show that Places is as dense as other scene datasets and has more diversity. Using CNN, we learn deep features for scene recognition tasks, and establish new state-of-the-art results on several scene-centric datasets. A visualization of the CNN layers' responses allows us to show differences in the internal representations of object-centric and scene-centric networks.

References

[1]
P. Agrawal, R. Girshick, and J. Malik. Analyzing the performance of multilayer neural networks for object recognition. In Proc. ECCV. 2014.
[2]
Y. Bengio. Learning deep architectures for ai. Foundations and trends® in Machine Learning, 2009.
[3]
J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. Imagenet: A large-scale hierarchical image database. In Proc. CVPR, 2009.
[4]
C. Doersch, A. Gupta, and A. A. Efros. Mid-level visual element discovery as discriminative mode seeking. In In Advances in Neural Information Processing Systems, 2013.
[5]
J. Donahue, Y. Jia, O. Vinyals, J. Hoffman, N. Zhang, E. Tzeng, and T. Darrell. Decaf: A deep convolutional activation feature for generic visual recognition. 2014.
[6]
R.-E. Fan, K.-W. Chang, C.-J. Hsieh, X.-R. Wang, and C.-J. Lin. LIBLINEAR: A library for large linear classification. 2008.
[7]
L. Fei-Fei, R. Fergus, and P. Perona. Learning generative visual models from few training examples: An incremental bayesian approach tested on 101 object categories. Computer Vision and Image Understanding, 2007.
[8]
G. Griffin, A. Holub, and P. Perona. Caltech-256 object category dataset. 2007.
[9]
C. Heip, P. Herman, and K. Soetaert. Indices of diversity and evenness. Oceanis, 1998.
[10]
Y. Jia. Caffe: An open source convolutional architecture for fast feature embedding. http://caffe.berkeleyvision.org/, 2013.
[11]
T. Konkle, T. F. Brady, G. A. Alvarez, and A. Oliva. Scene memory is more detailed than you think: the role of categories in visual long-term memory. Psych Science, 2010.
[12]
A. Krizhevsky, I. Sutskever, and G. E. Hinton. Imagenet classification with deep convolutional neural networks. In In Advances in Neural Information Processing Systems, 2012.
[13]
S. Lazebnik, C. Schmid, and J. Ponce. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proc. CVPR, 2006.
[14]
Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, and L. D. Jackel. Backpropagation applied to handwritten zip code recognition. Neural computation, 1989.
[15]
L.-J. Li and L. Fei-Fei. What, where and who? classifying events by scene and object recognition. In Proc. ICCV, 2007.
[16]
A. Oliva. Scene perception (chapter 51). The New Visual Neurosciences, 2013.
[17]
A. Oliva and A. Torralba. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int'l Journal of Computer Vision, 2001.
[18]
G. Patterson and J. Hays. Sun attribute database: Discovering, annotating, and recognizing scene attributes. In Proc. CVPR, 2012.
[19]
A. Quattoni and A. Torralba. Recognizing indoor scenes. In Proc. CVPR, 2009.
[20]
A. S. Razavian, H. Azizpour, J. Sullivan, and S. Carlsson. Cnn features off-the-shelf: an astounding baseline for recognition. arXiv preprint arXiv:1403.6382, 2014.
[21]
J. Sánchez, F. Perronnin, T. Mensink, and J. Verbeek. Image classification with the fisher vector: Theory and practice. Int'l Journal of Computer Vision, 2013.
[22]
E. H. Simpson. Measurement of diversity. Nature, 1949.
[23]
A. Torralba and A. A. Efros. Unbiased look at dataset bias. In Proc. CVPR, 2011.
[24]
J. Xiao, J. Hays, K. A. Ehinger, A. Oliva, and A. Torralba. Sun database: Large-scale scene recognition from abbey to zoo. In Proc. CVPR, 2010.
[25]
B. Yao, X. Jiang, A. Khosla, A. L. Lin, L. Guibas, and L. Fei-Fei. Human action recognition by learning bases of action attributes and parts. In Proc. ICCV, 2011.

Cited By

View all
  • (2024)Bridging the Domain Gap between Synthetic and Real-World Data for Autonomous DrivingACM Journal on Autonomous Transportation Systems10.1145/36334631:2(1-15)Online publication date: 8-Apr-2024
  • (2023)Image Scenario classification using Machine learningProceedings of the 2023 6th International Conference on Computational Intelligence and Intelligent Systems10.1145/3638209.3638229(130-138)Online publication date: 25-Nov-2023
  • (2023)Deep Long-Tailed Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326811845:9(10795-10816)Online publication date: 1-Sep-2023
  • Show More Cited By
  1. Learning deep features for scene recognition using places database

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Guide Proceedings
    NIPS'14: Proceedings of the 27th International Conference on Neural Information Processing Systems - Volume 1
    December 2014
    3697 pages

    Publisher

    MIT Press

    Cambridge, MA, United States

    Publication History

    Published: 08 December 2014

    Qualifiers

    • Article

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)0
    • Downloads (Last 6 weeks)0
    Reflects downloads up to 24 Dec 2024

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)Bridging the Domain Gap between Synthetic and Real-World Data for Autonomous DrivingACM Journal on Autonomous Transportation Systems10.1145/36334631:2(1-15)Online publication date: 8-Apr-2024
    • (2023)Image Scenario classification using Machine learningProceedings of the 2023 6th International Conference on Computational Intelligence and Intelligent Systems10.1145/3638209.3638229(130-138)Online publication date: 25-Nov-2023
    • (2023)Deep Long-Tailed Learning: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2023.326811845:9(10795-10816)Online publication date: 1-Sep-2023
    • (2022)Image-text interaction graph neural network for image-text sentiment analysisApplied Intelligence10.1007/s10489-021-02936-952:10(11184-11198)Online publication date: 1-Aug-2022
    • (2021)HDR-cGANProceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3490035.3490275(1-9)Online publication date: 19-Dec-2021
    • (2021)Focusing on PersonsProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3481544(1176-1184)Online publication date: 17-Oct-2021
    • (2021)Learning Multi-context Aware Location Representations from Large-scale Geotagged ImagesProceedings of the 29th ACM International Conference on Multimedia10.1145/3474085.3475268(899-907)Online publication date: 17-Oct-2021
    • (2021)Tailored Reality: Perception-aware Scene Restructuring for Adaptive VR NavigationACM Transactions on Graphics10.1145/347084740:5(1-15)Online publication date: 8-Oct-2021
    • (2021)Is the Reign of Interactive Search Eternal? Findings from the Video Browser Showdown 2020ACM Transactions on Multimedia Computing, Communications, and Applications10.1145/344503117:3(1-26)Online publication date: 22-Jul-2021
    • (2021)Visual Semantic-Based Representation Learning Using Deep CNNs for Scene RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/343649417:2(1-24)Online publication date: 11-May-2021
    • Show More Cited By

    View Options

    View options

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media