Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article
Free access

Using the forest to see the trees: exploiting context for visual object detection and localization

Published: 01 March 2010 Publication History
  • Get Citation Alerts
  • Abstract

    Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been much progress and there are already object recognition systems operating in commercial products. However, most of the algorithms for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem by brute force. In the real world, objects tend to covary with other objects, providing a rich collection of contextual associations. These contextual associations can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places.
    Most modeling attempts so far have defined the context of an object in terms of other previously recognized objects. The drawback of this approach is that inferring the context becomes as difficult as detecting each object. An alternative view of context relies on using the entire scene information holistically. This approach is algorithmically attractive since it dispenses with the need for a prior step of individual object recognition. In this paper, we use a probabilistic framework for encoding the relationships between context and object properties and we show how an integrated system provides improved performance. We view this as a significant step toward general purpose machine vision systems.

    References

    [1]
    Fei-Fei, L., Perona, P. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2005), 524--531.
    [2]
    Heeger, D., Bergen, J.R. Pyramid-based texture analysis/synthesis. In SIGGRAPH'95: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (New York, USA, 1995). ACM, NY, 229--238.
    [3]
    Hoiem, D., Efros, A., Hebert, M. Geometric context from a single image. In IEEE International Conference on Computer Vision (2005).
    [4]
    Jordan, M.I., Jacobs, R.A. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6 (1994), 181--214.
    [5]
    Koller, D., Friedman, N. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.
    [6]
    Kumar, S., Hebert, M. Discriminative random fields: A discriminative framework for contextual interaction in classification. In IEEE International Conference on Computer Vision (2003).
    [7]
    Lazebnik, S., Schmid, C., Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2006), 2169--2178.
    [8]
    Murphy, K., Torralba, A., Eaton, D., Freeman, W.T. Object detection and localization using local and global features. Toward Category-Level Object Recognition. J. Ponce, M. Hebert, C. Schmidt, and A. Zisserman, eds. 2006.
    [9]
    Murphy, K., Torralba, A., Freeman, W. Using the forest to see the trees: a graphical model relating features, objects and scenes. In Advances in Neural Information Proceedings Systems (2003).
    [10]
    Oliva, A., Schyns, P.G. Diagnostic color blobs mediate scene recognition. Cogn. Psychol. 41 (2000), 176--210.
    [11]
    Oliva, A., Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comp. Vision 42 (2001), 145--175.
    [12]
    Quattoni, A., Torralba, A. Recognizing indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009), 413--420.
    [13]
    Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S. Objects in context. In IEEE International Conference on Computer Vision (Rio de Janeiro, 2007).
    [14]
    Richard, X.H., Zemel, R.S., Carreiraperpinan, M.A. Multiscale conditional random fields for image labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2004), 695--702.
    [15]
    Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T. LabelMe: a database and web-based tool for image annotation. Int. J. Comp. Vision 77, 1--3 (2008), 157--173.
    [16]
    Strat, T.M., Fischler, M.A. Context-based vision: recognizing objects using information from both 2-D and 3-D imagery. IEEE Transaction on Pattern Analysis and Machine Intelligence 13, 10 (1991) 1050--1065.
    [17]
    Torralba, A. Contextual priming for object detection. Int. J. Comp. Vision 53, 2 (2003), 153--167.
    [18]
    Torralba, A., Murphy, K., Freeman, W. Contextual models for object detection using boosted random fields. In Advances in Neural Information Proceedings Systems (2004).
    [19]
    Torralba, A., Murphy, K.P., Freeman, W.T. Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Anal. Mach. Intell. 29, 5 (2007), 854--869.
    [20]
    Viola, P., Jones, M. Robust real-time object detection. Int. J. Comp. Vision 57, 2 (2004), 137--154.

    Cited By

    View all
    • (2024)A face retrieval technique combining large models and artificial neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.809436:15Online publication date: 25-Mar-2024
    • (2023)Driving Environment Inference from POI of Navigation Map: Fuzzy Logic and Machine Learning ApproachesSensors10.3390/s2322915623:22(9156)Online publication date: 13-Nov-2023
    • (2023)Context understanding in computer vision: A surveyComputer Vision and Image Understanding10.1016/j.cviu.2023.103646229(103646)Online publication date: Mar-2023
    • Show More Cited By

    Recommendations

    Comments

    Information & Contributors

    Information

    Published In

    cover image Communications of the ACM
    Communications of the ACM  Volume 53, Issue 3
    March 2010
    152 pages
    ISSN:0001-0782
    EISSN:1557-7317
    DOI:10.1145/1666420
    Issue’s Table of Contents
    Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

    Publisher

    Association for Computing Machinery

    New York, NY, United States

    Publication History

    Published: 01 March 2010
    Published in CACM Volume 53, Issue 3

    Permissions

    Request permissions for this article.

    Check for updates

    Qualifiers

    • Research-article
    • Popular
    • Refereed

    Funding Sources

    Contributors

    Other Metrics

    Bibliometrics & Citations

    Bibliometrics

    Article Metrics

    • Downloads (Last 12 months)227
    • Downloads (Last 6 weeks)15

    Other Metrics

    Citations

    Cited By

    View all
    • (2024)A face retrieval technique combining large models and artificial neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.809436:15Online publication date: 25-Mar-2024
    • (2023)Driving Environment Inference from POI of Navigation Map: Fuzzy Logic and Machine Learning ApproachesSensors10.3390/s2322915623:22(9156)Online publication date: 13-Nov-2023
    • (2023)Context understanding in computer vision: A surveyComputer Vision and Image Understanding10.1016/j.cviu.2023.103646229(103646)Online publication date: Mar-2023
    • (2022)From Node to Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV51458.2022.00171(1648-1657)Online publication date: Jan-2022
    • (2022)Bounding Boxes Are All We Need: Street View Image Classification via Context Encoding of Detected BuildingsIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2021.306431660(1-17)Online publication date: 2022
    • (2022)Research Review of Dispensing Based on Machine Vision2022 4th International Conference on Applied Machine Learning (ICAML)10.1109/ICAML57167.2022.00018(54-60)Online publication date: Jul-2022
    • (2022)Ghostbusters: How the Absence of Class Pairs in Multi-Class Multi-Label Datasets Impacts Classifier AccuracyAdvanced Computing10.1007/978-3-030-95502-1_29(377-398)Online publication date: 8-Feb-2022
    • (2021)Methodology of Calculating the Number of Trees Based on ALS Data for Forestry Applications for the Area of Samławki Forest DistrictRemote Sensing10.3390/rs1401001614:1(16)Online publication date: 21-Dec-2021
    • (2021)From Acquisition to Presentation—The Potential of Semantics to Support the Safeguard of Cultural HeritageRemote Sensing10.3390/rs1311222613:11(2226)Online publication date: 7-Jun-2021
    • (2021)Optimal attentional allocation in the presence of capacity constraints in uncued and cued visual searchJournal of Vision10.1167/jov.21.5.321:5(3)Online publication date: 4-May-2021
    • Show More Cited By

    View Options

    View options

    PDF

    View or Download as a PDF file.

    PDF

    eReader

    View online with eReader.

    eReader

    Digital Edition

    View this article in digital edition.

    Digital Edition

    Magazine Site

    View this article on the magazine site (external)

    Magazine Site

    Get Access

    Login options

    Full Access

    Media

    Figures

    Other

    Tables

    Share

    Share

    Share this Publication link

    Share on social media