research-article

Free access

Using the forest to see the trees: exploiting context for visual object detection and localization

Authors:

K. P. Murphy, and

W. T. FreemanAuthors Info & Claims

Communications of the ACM, Volume 53, Issue 3

Pages 107 - 114

https://doi.org/10.1145/1666420.1666446

Published: 01 March 2010 Publication History

All formats PDF

Abstract

Recognizing objects in images is an active area of research in computer vision. In the last two decades, there has been much progress and there are already object recognition systems operating in commercial products. However, most of the algorithms for detecting objects perform an exhaustive search across all locations and scales in the image comparing local image regions with an object model. That approach ignores the semantic structure of scenes and tries to solve the recognition problem by brute force. In the real world, objects tend to covary with other objects, providing a rich collection of contextual associations. These contextual associations can be used to reduce the search space by looking only in places in which the object is expected to be; this also increases performance, by rejecting patterns that look like the target but appear in unlikely places.

Most modeling attempts so far have defined the context of an object in terms of other previously recognized objects. The drawback of this approach is that inferring the context becomes as difficult as detecting each object. An alternative view of context relies on using the entire scene information holistically. This approach is algorithmically attractive since it dispenses with the need for a prior step of individual object recognition. In this paper, we use a probabilistic framework for encoding the relationships between context and object properties and we show how an integrated system provides improved performance. We view this as a significant step toward general purpose machine vision systems.

References

[1]

Fei-Fei, L., Perona, P. A Bayesian hierarchical model for learning natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2005), 524--531.

Digital Library

[2]

Heeger, D., Bergen, J.R. Pyramid-based texture analysis/synthesis. In SIGGRAPH'95: Proceedings of the 22nd Annual Conference on Computer Graphics and Interactive Techniques (New York, USA, 1995). ACM, NY, 229--238.

Digital Library

[3]

Hoiem, D., Efros, A., Hebert, M. Geometric context from a single image. In IEEE International Conference on Computer Vision (2005).

Digital Library

[4]

Jordan, M.I., Jacobs, R.A. Hierarchical mixtures of experts and the EM algorithm. Neural Computation, 6 (1994), 181--214.

Digital Library

[5]

Koller, D., Friedman, N. Probabilistic Graphical Models: Principles and Techniques. MIT Press, 2009.

Digital Library

[6]

Kumar, S., Hebert, M. Discriminative random fields: A discriminative framework for contextual interaction in classification. In IEEE International Conference on Computer Vision (2003).

Digital Library

[7]

Lazebnik, S., Schmid, C., Ponce, J. Beyond bags of features: Spatial pyramid matching for recognizing natural scene categories. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2006), 2169--2178.

Digital Library

[8]

Murphy, K., Torralba, A., Eaton, D., Freeman, W.T. Object detection and localization using local and global features. Toward Category-Level Object Recognition. J. Ponce, M. Hebert, C. Schmidt, and A. Zisserman, eds. 2006.

[9]

Murphy, K., Torralba, A., Freeman, W. Using the forest to see the trees: a graphical model relating features, objects and scenes. In Advances in Neural Information Proceedings Systems (2003).

[10]

Oliva, A., Schyns, P.G. Diagnostic color blobs mediate scene recognition. Cogn. Psychol. 41 (2000), 176--210.

[11]

Oliva, A., Torralba, A. Modeling the shape of the scene: A holistic representation of the spatial envelope. Int. J. Comp. Vision 42 (2001), 145--175.

Digital Library

[12]

Quattoni, A., Torralba, A. Recognizing indoor scenes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2009), 413--420.

[13]

Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., Belongie, S. Objects in context. In IEEE International Conference on Computer Vision (Rio de Janeiro, 2007).

[14]

Richard, X.H., Zemel, R.S., Carreiraperpinan, M.A. Multiscale conditional random fields for image labeling. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (2004), 695--702.

Digital Library

[15]

Russell, B.C., Torralba, A., Murphy, K.P., Freeman, W.T. LabelMe: a database and web-based tool for image annotation. Int. J. Comp. Vision 77, 1--3 (2008), 157--173.

Digital Library

[16]

Strat, T.M., Fischler, M.A. Context-based vision: recognizing objects using information from both 2-D and 3-D imagery. IEEE Transaction on Pattern Analysis and Machine Intelligence 13, 10 (1991) 1050--1065.

Digital Library

[17]

Torralba, A. Contextual priming for object detection. Int. J. Comp. Vision 53, 2 (2003), 153--167.

Digital Library

[18]

Torralba, A., Murphy, K., Freeman, W. Contextual models for object detection using boosted random fields. In Advances in Neural Information Proceedings Systems (2004).

[19]

Torralba, A., Murphy, K.P., Freeman, W.T. Sharing visual features for multiclass and multiview object detection. IEEE Trans. Pattern Anal. Mach. Intell. 29, 5 (2007), 854--869.

Digital Library

[20]

Viola, P., Jones, M. Robust real-time object detection. Int. J. Comp. Vision 57, 2 (2004), 137--154.

Digital Library

Cited By

Lu J(2024)A face retrieval technique combining large models and artificial neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.809436:15Online publication date: 25-Mar-2024
https://doi.org/10.1002/cpe.8094
Li YMetzner MSchwieger V(2023)Driving Environment Inference from POI of Navigation Map: Fuzzy Logic and Machine Learning ApproachesSensors10.3390/s2322915623:22(9156)Online publication date: 13-Nov-2023
https://doi.org/10.3390/s23229156
Wang XZhu Z(2023)Context understanding in computer vision: A surveyComputer Vision and Image Understanding10.1016/j.cviu.2023.103646229(103646)Online publication date: Mar-2023
https://doi.org/10.1016/j.cviu.2023.103646
Show More Cited By

Index Terms

Using the forest to see the trees: exploiting context for visual object detection and localization
1. Applied computing
  1. Document management and text processing
    1. Document capture
      1. Graphics recognition and interpretation
2. Computing methodologies

Recommendations

Using the forest to see the trees: a graphical model relating features, objects, and scenes
NIPS'03: Proceedings of the 16th International Conference on Neural Information Processing Systems

Standard approaches to object detection focus on local patches of the image, and try to classify them as background or not. We propose to use the scene context (image as a whole) as an extra source of (global) information, to help resolve local ...
Read More
Missing the Forest for the Trees — Object Technology's Second Hiatus
HICSS '06: Proceedings of the 39th Annual Hawaii International Conference on System Sciences - Volume 02

Object technology was first created in 1967, but it had to wait till the mid 80's for widespread acceptance and adoption. We observe that that was not the only hiatus for the technology. Another major hiatus is currently in progress. Years of experience ...
Read More
Learn to See: A Microwave-based Object Recognition System Using Learning Techniques
ICDCN '21: Adjunct Proceedings of the 2021 International Conference on Distributed Computing and Networking

The capability to recognize nearby objects automatically has numerous applications including asset tracking, lifestyle analysis, and navigation assistance for blind people. In recent years, several approaches were proposed, but they are either limited ...
Read More

Comments

Information & Contributors

Information

Published In

cover image Communications of the ACM

Communications of the ACM Volume 53, Issue 3

March 2010

152 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/1666420

Issue’s Table of Contents

Copyright © 2010 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 01 March 2010

Published in CACM Volume 53, Issue 3

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Popular
Refereed

Funding Sources

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

87
Total Citations
View Citations
6,265
Total Downloads

Downloads (Last 12 months)227
Downloads (Last 6 weeks)15

Other Metrics

View Author Metrics

Citations

Cited By

Lu J(2024)A face retrieval technique combining large models and artificial neural networksConcurrency and Computation: Practice and Experience10.1002/cpe.809436:15Online publication date: 25-Mar-2024
https://doi.org/10.1002/cpe.8094
Li YMetzner MSchwieger V(2023)Driving Environment Inference from POI of Navigation Map: Fuzzy Logic and Machine Learning ApproachesSensors10.3390/s2322915623:22(9156)Online publication date: 13-Nov-2023
https://doi.org/10.3390/s23229156
Wang XZhu Z(2023)Context understanding in computer vision: A surveyComputer Vision and Image Understanding10.1016/j.cviu.2023.103646229(103646)Online publication date: Mar-2023
https://doi.org/10.1016/j.cviu.2023.103646
Nie HWang RChen X(2022)From Node to Graph: Joint Reasoning on Visual-Semantic Relational Graph for Zero-Shot Detection2022 IEEE/CVF Winter Conference on Applications of Computer Vision (WACV)10.1109/WACV51458.2022.00171(1648-1657)Online publication date: Jan-2022
https://doi.org/10.1109/WACV51458.2022.00171
Zhao KLiu YHao SLu SLiu HZhou L(2022)Bounding Boxes Are All We Need: Street View Image Classification via Context Encoding of Detected BuildingsIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2021.306431660(1-17)Online publication date: 2022
https://doi.org/10.1109/TGRS.2021.3064316
Duan SQin MZhong YHuang SXu Y(2022)Research Review of Dispensing Based on Machine Vision2022 4th International Conference on Applied Machine Learning (ICAML)10.1109/ICAML57167.2022.00018(54-60)Online publication date: Jul-2022
https://doi.org/10.1109/ICAML57167.2022.00018
Kathpal SGanju SKoul A(2022)Ghostbusters: How the Absence of Class Pairs in Multi-Class Multi-Label Datasets Impacts Classifier AccuracyAdvanced Computing10.1007/978-3-030-95502-1_29(377-398)Online publication date: 8-Feb-2022
https://doi.org/10.1007/978-3-030-95502-1_29
Błaszczak-Bąk WJanicka JKozakiewicz TChudzikiewicz KBąk G(2021)Methodology of Calculating the Number of Trees Based on ALS Data for Forestry Applications for the Area of Samławki Forest DistrictRemote Sensing10.3390/rs1401001614:1(16)Online publication date: 21-Dec-2021
https://doi.org/10.3390/rs14010016
Ponciano JPrudhomme CBoochs F(2021)From Acquisition to Presentation—The Potential of Semantics to Support the Safeguard of Cultural HeritageRemote Sensing10.3390/rs1311222613:11(2226)Online publication date: 7-Jun-2021
https://doi.org/10.3390/rs13112226
Bates CJacobs R(2021)Optimal attentional allocation in the presence of capacity constraints in uncued and cued visual searchJournal of Vision10.1167/jov.21.5.321:5(3)Online publication date: 4-May-2021
https://doi.org/10.1167/jov.21.5.3
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

Media

Figures

Other

Tables

View Issue’s Table of Contents