research-article

Free access

Scene understanding by labeling pixels

Authors:

Stephen Gould,

Xuming HeAuthors Info & Claims

Communications of the ACM, Volume 57, Issue 11

Pages 68 - 77

https://doi.org/10.1145/2629637

Published: 27 October 2014 Publication History

All formats PDF

Abstract

Pixels labeled with a scene's semantics and geometry let computers describe what they see.

References

[1]

Achanta, R., Shaji, A., Smith, K., Lucchi, A., Fua, P., and Susstrunk, S. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (Nov. 2012), 2274--2282.

Digital Library

Google Scholar

[2]

Borenstein, E., Sharon, E., and Ullman, S. Combining top-down and bottom-up segmentation. In Proceedings of the IEEE Workshop on Perceptual Organization in Computer Vision at the IEEE Conference on Computer Vision and Pattern Recognition (Washington, D.C., June 27--July 2). IEEE Computer Society Press, 2004, 46--46.

Digital Library

Google Scholar

[3]

Boykov, Y., Veksler, O., and Zabih, R. Fast approximate energy minimization via graph cuts. IEEE Transactions on Pattern Analysis and Machine Intelligence 23, 11 (Nov. 2001), 1222--1239.

Digital Library

Google Scholar

[4]

Comaniciu, D. and Meer, P. Mean shift: A robust approach toward feature space analysis. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 5 (May 2002), 603--619.

Digital Library

Google Scholar

[5]

Dalal, N. and Triggs, B. Histograms of oriented gradients for human detection. In Proceedings of the Conference on Computer Vision and Pattern Recognition (San Diego, CA, June 20--25). IEEE Computer Society Press, 2005, 886--893.

Digital Library

Google Scholar

[6]

Everingham, M., Van Gool, L., Williams, C.K.I., Winn, J., and Zisserman, A. The Pascal visual object classes challenge. International Journal of Computer Vision 88, 2 (June 2010), 303--338.

Digital Library

Google Scholar

[7]

Farhadi, A., Endres, I., Hoiem, D., and Forsyth, D. Describing objects by their attributes. In Proceedings of the Conference on Computer Vision and Pattern Recognition (Miami, FL, June 20--25). IEEE Computer Society Press, 2009, 1778--1785.

Crossref

Google Scholar

[8]

Felzenszwalb, P.F. and Huttenlocher, D.P. Efficient graph-based image segmentation. International Journal of Computer Vision 59, 2 (Sept. 2004), 167--181.

Digital Library

Google Scholar

[9]

Fulkerson, B., Vedaldi, A., and Soatto, S. Class segmentation and object localization with superpixel neighborhoods. In Proceedings of the 12^th International Conference on Computer Vision (Kyoto, Japan, Sept. 29--Oct. 2). IEEE Computer Society Press, 2009, 670--677.

Crossref

Google Scholar

[10]

Gould, S., Fulton, R., and Koller, D. Decomposing a scene into geometric and semantically consistent regions. In Proceedings of the 12^th International Conference on Computer Vision (Kyoto, Japan, Sept. 29--Oct. 2). IEEE Computer Society Press, 2009, 1--8.

Crossref

Google Scholar

[11]

Gould, S., Gao, T., and Koller, D. Region-based segmentation and object detection. In Advances in Neural Information Processing Systems 22 (Vancouver, B.C., Canada, Dec. 6--11). Curran Associates, Inc., 2009, 655--663.

Google Scholar

[12]

Gould, S., Rodgers, J., Cohen, D., Elidan, G., and Koller, D. Multi-class segmentation with relative location prior. International Journal of Computer Vision 80, 3 (Dec. 2008), 300--316.

Digital Library

Google Scholar

[13]

He, X., Zemel, R.S., and Carreira-Perpinan, M. Multiscale conditional random fields for image labeling. In Proceedings of the Conference on Computer Vision and Pattern Recognition (Washington, D.C., June 27--July 2). IEEE Computer Society Press, 2004, 695--702.

Digital Library

Google Scholar

[14]

Hedau, V., Hoiem, D., and Forsyth, D. Recovering the spatial layout of cluttered rooms. In Proceedings of the International Conference on Computer Vision (Kyoto, Japan, Sept. 29--Oct. 2). IEEE Computer Society Press, 2009, 1849--1856.

Crossref

Google Scholar

[15]

Heitz, G., Gould, S., Saxena, A., and Koller, D. Cascaded classification models: Combining models for holistic scene understanding. In Advances in Neural Information Processing Systems 21 (Vancouver, B.C., Canada, Dec. 8--13). Curran Associates, Inc., 2008, 641--648.

Google Scholar

[16]

Heitz, G. and Koller, D. Learning spatial context: Using stuff to find things. In Proceedings of the European Conference on Computer Vision (Marseille, France, Oct. 12--18). Springer, Berlin, Heidelberg, 2008, 30--43.

Digital Library

Google Scholar

[17]

Hoiem, D., Efros, A.A., and Hebert, M. Recovering surface layout from an image. International Journal of Computer Vision 75, 1 (Oct. 2007), 151--172.

Digital Library

Google Scholar

[18]

Hoiem, D., Efros, A.A., and Hebert, M. Closing the loop on scene interpretation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (Anchorage, AK, June 23--28). IEEE Computer Society Press, 2008, 1--8.

Crossref

Google Scholar

[19]

Hoiem, D., Efros, A.A., and Hebert, M. Putting objects in perspective. International Journal of Computer Vision 80, 1 (Oct. 2008), 3--15.

Digital Library

Google Scholar

[20]

Kohli, P., Ladicky, L., and Torr, P.H. Robust higher order potentials for enforcing label consistency. International Journal of Computer Vision 82, 3 (May 2009), 302--324.

Digital Library

Google Scholar

[21]

Kolmogorov, V. and Zabih, R. What energy functions can be minimized via graph cuts? IEEE Transactions on Pattern Analysis and Machine Intelligence 26, 2 (Feb. 2004), 147--159.

Digital Library

Google Scholar

[22]

Komodakis, N., Paragios, N., and Tziritas, G. MRF optimization via dual decomposition: Message-passing revisited. In Proceedings of the International Conference on Computer Vision (Rio de Janeiro, Oct. 14--21). IEEE Computer Society Press, 2007, 1--8.

Crossref

Google Scholar

[23]

Krahenbuhl, P. and Koltun, V. Efficient inference in fully connected CRFs with gaussian edge potentials. In Advances in Neural Information Processing Systems 24 (Granada, Spain, Dec. 12--17). Curran Associates, Inc., 2011, 109--117.

Google Scholar

[24]

Ladicky, L., Russell, C., Kohli, P., and Torr, P.H. Graph cut-based inference with co-occurrence statistics. In Proceedings of the 11th European Conference on Computer Vision (Crete, Greece, Sept. 5--11). Springer, Berlin, Heidelberg, 2010, 239--253.

Digital Library

Google Scholar

[25]

Lafferty, J.D., McCallum, A., and Pereira, F.C.N. Conditional random fields: Probabilistic models for segmenting and labeling sequence data. In Proceedings of the International Conference on Machine Learning (Williamstown, MA, June 28--July 1). Morgan Kaufmann, San Francisco, 2001, 282--289.

Digital Library

Google Scholar

[26]

Le, Q.V., Ranzato, M., Monga, R., Devin, M., Chen, K., Corrado, G.S., Dean, J., and Ng, A.Y. Building high-level features using large scale unsupervised learning. In Proceedings of the International Conference on Machine Learning (Edinburgh, Scotland, June 26--July 1). Morgan Kaufmann, San Francisco, 2012.

Google Scholar

[27]

Levin, A. and Weiss, Y. Learning to combine bottom-up and top-down segmentation. International Journal of Computer Vision 81, 1 (Sept. 2008), 105--118.

Digital Library

Google Scholar

[28]

Li, L.-J., Socher, R., and Fei-Fei, L. Towards total scene understanding: Classification, annotation and segmentation in an automatic framework. In Proceedings of the Conference on Computer Vision and Pattern Recognition (Miami, FL, June 20--25). IEEE Computer Society Press, 2009, 2036--2043.

Crossref

Google Scholar

[29]

Liu, C., Yuen, J., and Torralba, A. Nonparametric scene parsing via label transfer. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 12 (Dec. 2011), 2368--2382.

Digital Library

Google Scholar

[30]

Malik, J., Belongie, S., Shi, J., and Leung, T. Textons, contours and regions: Cue integration in image segmentation. In Proceedings of the International Conference on Computer Vision (Corfu, Greece, Sept. 20--25). IEEE Computer Society Press, 1999, 918--925.

Digital Library

Google Scholar

[31]

Nowozin, S. and Lampert, C.W. Structured learning and prediction in computer vision. Foundations and Trends in Computer Graphics and Vision 6, 3--4 (May 2011), 185--365.

Digital Library

Google Scholar

[32]

Rabinovich, A., Vedaldi, A., Galleguillos, C., Wiewiora, E., and Belongie, S. Objects in context. In Proceedings of the International Conference on Computer Vision (Rio de Janeiro, Oct. 14--21). IEEE Computer Society Press, 2007, 1--8.

Crossref

Google Scholar

[33]

Ren, X. and Malik, J. Learning a classification model for segmentation. In Proceedings of the International Conference on Computer Vision (Nice, France, Oct. 13--16). IEEE Computer Society Press, 2003, 10--17.

Digital Library

Google Scholar

[34]

Shotton, J., Winn, J., Rother, C., and Criminisi, A. TextonBoost: Joint appearance, shape and context modeling for multi-class object recognition and segmentation. In Proceedings of the European Conference on Computer Vision (Graz, Austria, May 7--13). Springer, Berlin, Heidelberg, 2006, 1--15.

Digital Library

Google Scholar

[35]

Szeliski, R. Computer Vision: Algorithms and Applications. Springer, Berlin, Heidelberg, 2011.

Digital Library

Google Scholar

[36]

Tighe, J. and Lazebnik, S. SuperParsing: Scalable nonparametric image parsing with superpixels. In Proceedings of the European Conference on Computer Vision (Crete, Greece, Sept. 5--11). Springer, Berlin, Heidelberg, 2010, 352--365.

Digital Library

Google Scholar

[37]

Tu, Z., Chen, X., Yuille, A.L., and Zhu, S.-C. Image parsing: Unifying segmentation, detection and recognition. International Journal of Computer Vision 63, 2 (July 2005), 113--140.

Digital Library

Google Scholar

[38]

Tu, Z. and Zhu, S.-C. Image segmentation by data-driven Markov chain Monte Carlo. IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 5 (May 2002), 657--673.

Digital Library

Google Scholar

[39]

Wang, H., Gould, S., and Koller, D. Discriminative learning with latent variables for cluttered indoor scene understanding. In Proceedings of the 11^th European Conference on Computer Vision (Crete, Greece, Sept. 5--Sept. 11). Springer, Berlin, Heidelberg, 2010, 497--510.

Digital Library

Google Scholar

[40]

Yao, Y., Fidler, S., and Urtasun, R. Describing the scene as a whole: Joint object detection, scene classification and semantic segmentation. In Proceedings of the Conference on Computer Vision and Pattern Recognition (Providence, RI, June 16--21). IEEE Computer Society Press, 2012, 702--709.

Digital Library

Google Scholar

Cited By

View all

Bhowmick ASaharia SHazarika S(2022)Non-parametric scene parsing: Label transfer methods and datasetsComputer Vision and Image Understanding10.1016/j.cviu.2022.103418219(103418)Online publication date: May-2022
https://doi.org/10.1016/j.cviu.2022.103418
Bhowmick ASaharia SHazarika SChellappa RChaudhury SArora CChaudhuri PMaji S(2021)Enhancing label transfer in non-parametric scene parsing by superpixel-based dense alignmentProceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3490035.3490290(1-9)Online publication date: 19-Dec-2021
https://dl.acm.org/doi/10.1145/3490035.3490290
Khelifi LMignotte M(2019)MC-SSM: Nonparametric Semantic Image Segmentation With the ICM AlgorithmIEEE Transactions on Multimedia10.1109/TMM.2019.2891418(1-1)Online publication date: 2019
https://doi.org/10.1109/TMM.2019.2891418
Show More Cited By

Index Terms

Scene understanding by labeling pixels

Recommendations

Semantic Foggy Scene Understanding with Synthetic Data

This work addresses the problem of semantic foggy scene understanding (SFSU). Although extensive research has been performed on image dehazing and on semantic scene understanding with clear-weather images, little attention has been paid to SFSU. Due to ...
Outdoor Scene Labeling Using ALE and LSC Superpixels
CBMI '17: Proceedings of the 15th International Workshop on Content-Based Multimedia Indexing

Scene labeling has been an important and popular area of computer vision and image processing for the past few years. It is the process of assigning pixels to specific predefined categories in an image. A number of techniques have been proposed for ...
Co-labeling: A New Multi-view Learning Approach for Ambiguous Problems
ICDM '12: Proceedings of the 2012 IEEE 12th International Conference on Data Mining

We propose a multi-view learning approach called co-labeling which is applicable for several machine learning problems where the labels of training samples are uncertain, including semi-supervised learning (SSL), multi-instance learning (MIL) and max-...

Comments

Information & Contributors

Information

Published In

Communications of the ACM Volume 57, Issue 11

November 2014

95 pages

ISSN:0001-0782

EISSN:1557-7317

DOI:10.1145/2684442

Editor:
Moshe Y. Vardi
Association for Computing Machinery, New York, NY

Issue’s Table of Contents

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 27 October 2014

Published in CACM Volume 57, Issue 11

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Qualifiers

Research-article
Popular
Refereed

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

17
Total Citations
View Citations
34,513
Total Downloads

Downloads (Last 12 months)139
Downloads (Last 6 weeks)26

Reflects downloads up to 26 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

View all

Bhowmick ASaharia SHazarika S(2022)Non-parametric scene parsing: Label transfer methods and datasetsComputer Vision and Image Understanding10.1016/j.cviu.2022.103418219(103418)Online publication date: May-2022
https://doi.org/10.1016/j.cviu.2022.103418
Bhowmick ASaharia SHazarika SChellappa RChaudhury SArora CChaudhuri PMaji S(2021)Enhancing label transfer in non-parametric scene parsing by superpixel-based dense alignmentProceedings of the Twelfth Indian Conference on Computer Vision, Graphics and Image Processing10.1145/3490035.3490290(1-9)Online publication date: 19-Dec-2021
https://dl.acm.org/doi/10.1145/3490035.3490290
Khelifi LMignotte M(2019)MC-SSM: Nonparametric Semantic Image Segmentation With the ICM AlgorithmIEEE Transactions on Multimedia10.1109/TMM.2019.2891418(1-1)Online publication date: 2019
https://doi.org/10.1109/TMM.2019.2891418
Jiang ZWang QYuan Y(2018)Modeling With Prejudice: Small-Sample Learning via Adversary for Semantic SegmentationIEEE Access10.1109/ACCESS.2018.28845026(77965-77974)Online publication date: 2018
https://doi.org/10.1109/ACCESS.2018.2884502
Cheng FZhang HYuan DSun M(2018)Leveraging semantic segmentation with learning-based confidence measureNeurocomputing10.1016/j.neucom.2018.10.037Online publication date: Oct-2018
https://doi.org/10.1016/j.neucom.2018.10.037
Jiang ZYuan YWang Q(2018)Contour-aware network for semantic segmentation via adaptive depthNeurocomputing10.1016/j.neucom.2018.01.022284(27-35)Online publication date: Apr-2018
https://doi.org/10.1016/j.neucom.2018.01.022
Mauch LWang CYang B(2018)Subset selection for visualization of relevant image fractions for deep learning based semantic image segmentationJournal of the Franklin Institute10.1016/j.jfranklin.2017.08.001355:4(1931-1944)Online publication date: Mar-2018
https://doi.org/10.1016/j.jfranklin.2017.08.001
Talebi MVafaei AMonadjemi A(2018)Vision-based entrance detection in outdoor scenesMultimedia Tools and Applications10.1007/s11042-018-5846-377:20(26219-26238)Online publication date: 1-Oct-2018
https://dl.acm.org/doi/10.1007/s11042-018-5846-3
Cheng FHe XZhang H(2017)Stacked Learning to Search for Scene LabelingIEEE Transactions on Image Processing10.1109/TIP.2017.266821826:4(1887-1898)Online publication date: 1-Apr-2017
https://dl.acm.org/doi/10.1109/TIP.2017.2668218
Hueting MPătrăucean VOvsjanikov MMitra NGuthe MHullin MStamminger MWeinkauf T(2016)Scene structure inference through scene map estimationProceedings of the Conference on Vision, Modeling and Visualization10.5555/3056901.3056909(45-52)Online publication date: 10-Oct-2016
https://dl.acm.org/doi/10.5555/3056901.3056909
Show More Cited By

View Options

View options

PDF

View or Download as a PDF file.

PDF Chinese translation

eReader

View online with eReader.

eReader

Digital Edition

View this article in digital edition.

Digital Edition

Magazine Site

View this article on the magazine site (external)

Magazine Site

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Abstract

References

Cited By

Index Terms

Recommendations

Semantic Foggy Scene Understanding with Synthetic Data

Outdoor Scene Labeling Using ALE and LSC Superpixels

Co-labeling: A New Multi-view Learning Approach for Ambiguous Problems

Comments

Information

Published In

Publisher

Publication History

Permissions

Check for updates

Qualifiers

Contributors

Other Metrics

Bibliometrics

Article Metrics

Other Metrics

Citations

Cited By

View options

PDF

eReader

Digital Edition

Magazine Site

Get Access

Login options

Full Access

Figures

Other

Share

Share this Publication link

Share on social media

Affiliations