research-article

From RGB-D Images to RGB Images: Single Labeling for Mining Visual Models

Authors:

Ryosuke ShibasakiAuthors Info & Claims

ACM Transactions on Intelligent Systems and Technology (TIST), Volume 6, Issue 2

Article No.: 16, Pages 1 - 29

https://doi.org/10.1145/2629701

Published: 31 March 2015 Publication History

Abstract

Mining object-level knowledge, that is, building a comprehensive category model base, from a large set of cluttered scenes presents a considerable challenge to the field of artificial intelligence. How to initiate model learning with the least human supervision (i.e., manual labeling) and how to encode the structural knowledge are two elements of this challenge, as they largely determine the scalability and applicability of any solution. In this article, we propose a model-learning method that starts from a single-labeled object for each category, and mines further model knowledge from a number of informally captured, cluttered scenes. However, in these scenes, target objects are relatively small and have large variations in texture, scale, and rotation. Thus, to reduce the model bias normally associated with less supervised learning methods, we use the robust 3D shape in RGB-D images to guide our model learning, then apply the properly trained category models to both object detection and recognition in more conventional RGB images. In addition to model training for their own categories, the knowledge extracted from the RGB-D images can also be transferred to guide model learning for a new category, in which only RGB images without depth information in the new category are provided for training. Preliminary testing shows that the proposed method performs as well as fully supervised learning methods.

References

[1]

A. Aldoma, F. Tombari, L. D. Stefano, and M. Vincze. 2012. A global hypotheses verification method for 3D object recognition. In 12th European Conference on Computer Vision (ECCV). 511--524.

Digital Library

[2]

P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. 2011. Contour detection and hierarchical image segmentation. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 33, 5, 898--916.

Digital Library

[3]

G. Bradski and T. Hong. 2011. NIST and willow garage: Solution in perception challenge. Retrieved from http://www.willowgarage.com/blog/2011/02/28/nist-and-willow-garage-solutions-perception-challenge.

[4]

B. Browatzki, J. Fischer, G. Birgit, H. Bulthoff, and C.Wallraven. 2011. Going into depth: Evaluating 2d and 3D cues for object classification on a new, large-scale object dataset. In IEEE International Conference on Computer Vision Workshop (ICCV Workshops). 1189--1195.

[5]

C. K. Liu, A. Hertzmann, and Z. Popović. 2005. Learning physics-based motion style with nonlinear inverse optimization. In ACM Transactions on Graphics (TOG)—Proceedings of ACM SIGGRAPH 24, 3 (2005), 1071--1081.

Digital Library

[6]

T. S. Caetano, J. J. McAuley, L. Cheng, Q. V. Le, and A. J. Smola. 2009. Learning graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 31, 6, 1048--1058.

Digital Library

[7]

H.-Y. Chen, Y.-Y. Lin, and B.-Y. Chen. 2013. Robust feature matching with alternate Hough and inverted Hough transforms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2762--2769.

Digital Library

[8]

W.-C. Chiu and M. Fritz. 2013. Multi-class video co-segmentation with a generative multi-video model. In 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 321--328.

Digital Library

[9]

M. Cho, K. Alahari, and J. Ponce. 2013. Learning graphs to match. In International Conference on Computer Vision (ICCV).

Digital Library

[10]

M. Cho and K. M. Lee. 2012. Progressive graph matching: Making a move of graphs via probabilistic voting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 398--405.

Digital Library

[11]

M. Cho, Y. M. Shin, and K. M. Lee. 2010. Unsupervised detection and segmentation of identical objects. In International Conference on Computer Vision and Pattern Recognition (CVPR). 1617--1624.

[12]

A. Collet, S. S. Srinivasay, and M. Hebert. 2011. Structure discovery in multi-modal data: A region-based approach. In ICRA.

[13]

N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 886--893.

Digital Library

[14]

O. Duchenne, A. Joulin, and J. Ponce. 2011. A graph-matching kernel for object categorization. In IEEE International Conference on Computer Vision (ICCV). 1792--1799.

Digital Library

[15]

A. Faktor and M. Irani. 2012. “Clustering by Composition”--Unsupervised discovery of image categories. In 12th European Conference on Computer Vision (ECCV). 474--487.

Digital Library

[16]

V. Ferrari, F. Jurie, C. Schmid. 2010. From images to shape models for object detection. In International Journal on Computer Vision (IJCV), 87, 3, 284--303.

Digital Library

[17]

D. F. Fouhey, A. Gupta, and M. Hebert. 2013. Data-driven 3D primitives for single image understanding. In International Conference on Computer Vision (ICCV).

Digital Library

[18]

S. Helmer, D. Meger, M. Muja, J. J. Little, and D. G. Lowe. 2010. UBC robot vision survey. Retrieved February 13, 2015 from http://www.cs.ubc.ca/labs/lci/vrs/index.html.

[19]

P. Hong and T. S. Huang. 2004. Spatial pattern discovery by learning a probabilistic parametric model from multiple attributed relational graphs. Discrete Applied Mathematics 139, 113--135.

Digital Library

[20]

E. Hsiao, A. Collet, and M. Hebert. 2010. Making specific features less discriminative to improve point-based 3D object recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2653--2660.

[21]

N. Hu, R. M. Rustamov, and L. Guibas. 2013. Graph matching with anchor nodes: A learning approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2906--2913.

Digital Library

[22]

W. Hu. 2012. Learning 3D object templates by hierarchical quantization of geometry and appearance spaces. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2336--2343.

Digital Library

[23]

A. Janoch. 2012. The Berkeley 3D Object Dataset. Retrieved February 13, 2015 from http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-85.html. Master’s thesis. EECS Department, University of California, Berkeley.

[24]

H. Jiang and C.-W. Ngo. 2003. Image mining using inexact maximal common subgraph of multiple ARGs. In International Conference on Visual Information System. 63--76.

[25]

A. Joulin, F. Bach, and J. Ponce. 2012. Multi-class cosegmentation. In International Conference on Computer Vision and Pattern Recognition (CVPR). 542--549.

Digital Library

[26]

H. Kang, M. Hebert, and T. Kanade. 2011. Discovering object instances from scenes of daily living. In 13th International Conference on Computer Vision (ICCV). 762--769.

Digital Library

[27]

G. Kim, C. Faloutsos, and M. Hebert. 2008. Unsupervised modeling of object categories using link analysis techniques. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.

[28]

G. Kim and E. P. Xing. 2012. On multiple foreground cosegmentation. In International Conference on Computer Vision and Pattern Recognition (CVPR). 837--844.

Digital Library

[29]

G. Kim, E. P. Xing, L. Fei-Fei, and T. Kanade. 2011. Distributed cosegmentation via submodular optimization on anisotropic diffusion. In IEEE International Conference on Computer Vision (ICCV). 169--176.

Digital Library

[30]

K. I. Kim, J. Tompkin, M. Theobald, J. Kautz, and C. Theobalt. 2012. Match graph construction for large image databases. In 12th European Conference on Computer Vision (ECCV). 272--285.

Digital Library

[31]

V. Kolmogorov. 2006. Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 28, 10, 1568--1583.

Digital Library

[32]

H. S. Koppula, A. Anand, T. Joachims, and A. Saxena. 2011. Semantic labeling of 3D point clouds for indoor scenes. In Neural Information Processing Systems (NIPS). 244--252.

[33]

K. Lai, L. Bo, X. Ren, and D. Fox. 2011a. A large-scale hierarchical multi-view RGB-D object dataset. In IEEE International Conference on Robotics and Automation (ICRA). 1817--1824.

[34]

K. Lai, L. Bo, X. Ren, and D. Fox. 2011b. Sparse distance learning for object recognition combining RGB and depth information. In IEEE International Conference on Robotics and Automation (ICRA). 4007--4013.

[35]

K. Lai and D. Fox. 2010. Object recognition in 3D point clouds using web data and domain adaptation. Proceedings of International Journal of Robotic Research 29, 8, 1019--1037.

Digital Library

[36]

Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. 2012. Building high-level features using large scale unsupervised learning. In ICML.

[37]

Y. J. Lee and K. Grauman. 2011. Learning the easy things first: Self-paced visual category discovery. In International Conference on Computer Vision and Pattern Recognition (CVPR). 1721--1728.

Digital Library

[38]

M. Leordeanu and M. Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In 10th International Conference on Computer Vision (ICCV). 1482--1489.

Digital Library

[39]

M. Leordeanu and M. Hebert. 2008. Smoothing-based optimization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.

[40]

M. Leordeanu and M. Hebert. 2012. Unsupervised learning for graph matching. International Journal of Computer Vision 96, 1, 28--45.

Digital Library

[41]

M. Leordeanu, M. Hebert, and R. Sukthankar. 2007. Beyond local appearance: Category recognition from pairwise interactions of simple features. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.

[42]

C. Li, D. Parikh, and T. Chen. 2012. Automatic discovery of groups of objects for scene understanding. International Conference on Computer Vision and Pattern Recognition (CVPR). 2735--2742.

Digital Library

[43]

F.-F. Li, R. Fergus, and P. Perona. 2006. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 28, 4, 594--611.

Digital Library

[44]

L.-J. Li, G. Wang, and F.-F. Li. 2010. OPTIMOL: Automatic online picture collection via incremental model learning. International Journal on Computer Vision (IJCV), 88, 2, 147--154.

Digital Library

[45]

Z. Liao, A. Farhadi, Y. Wang, I. Endres, and D. Forsyth. 2012. Building a dictionary of image fragments. In International Conference on Computer Vision and Pattern Recognition (CVPR). 3442--3449.

Digital Library

[46]

C.-J. Lin and R. C. Weng. 2004. Simple probabilistic predictions for support vector regression. In Technical report Department of Computer Science, National Taiwan University, Taiwan.

[47]

H. Liu and S. Yan. 2010. Common visual pattern discovery via spatially coherent correspondences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1609--1616.

[48]

H. Liu and S. Yan. 2012. Efficient structure detection via random consensus graph. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 574--581.

Digital Library

[49]

K. Liu, Q. Wang, W. Driever, and O. Ronneberger. 2012. 2D/3D rotation-invariant detection using equivariant filters and kernelweighted mapping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 917--924.

Digital Library

[50]

S. Maji and J. Malik. 2009. Object detection using a max-margin Hough transform. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1038--1045.

[51]

Microsoft. 2011. Introducing Kinect for Xbox 360.

[52]

L. Mukherjee, V. Singh, J. Xu, and M.D. Collins. 2012. Analyzing the subspace structure of related images: Concurrent segmentation of image sets. In 12th European Conference on Computer Vision (ECCV). 128--142.

Digital Library

[53]

C. Olsson and Y. Boykov. 2012. Curvature-based regularization for surface approximation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1576--1583.

Digital Library

[54]

B. Pepik, P. Gehler, M. Stark, and B. Schiele. 2012. 3D2PM—3D deformable part models. In 12th European Conference on Computer Vision (ECCV). 356--370.

Digital Library

[55]

N. Razavi, J. Gall, P. Kohli, and L. v. Gool. 2012. Latent Hough transform for object detection. In 12th European Conference on Computer Vision (ECCV). 312--325.

Digital Library

[56]

X. Ren, L. Bo, and D. Fox. 2012. RGB-(D) scene labeling: Features and algorithms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2759--2766.

Digital Library

[57]

N. Silberman and R. Fergus. 2011. Indoor scene segmentation using a structured light sensor. In IEEE International Conference on Computer Vision Workshop (ICCV Workshops). 601--608.

[58]

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. In 12th European Conference on Computer Vision (ECCV). 746--760.

Digital Library

[59]

M. Sun, G. Bradski, B.-X. Xu, and S. Savarese. 2010. Depth-encoded Hough voting for joint object detection and shape recovery. In 11th European Conference on Computer Vision (ECCV). 658--671.

Digital Library

[60]

W. Susanto, M. Rohrbach, and B. Schiele. 2012. 3D object detection with multiple kinects. In 12th European Conference on Computer Vision (ECCV). 93--102.

Digital Library

[61]

H.-K. Tan and C.-W. Ngo. 2009. Localized matching using Earth Movers Distance towards discovery of common patterns from small image samples. Image and Vision Computing 27, 1470--1483.

Digital Library

[62]

L. Torresani, V. Kolmogorov, and C. Rother. 2008. Feature correspondence via graph matching: Models and global optimization. In 10th European Conference on Computer Vision (ECCV). 596--609.

Digital Library

[63]

T. Tuytelaars, C. H. Lampert, M. B. Blaschko, and W. Buntine. 2010. Unsupervised object discovery: A comparison. International Journal on Computer Vision 88, 2, 284--302.

Digital Library

[64]

S. Vijayanarasimhan and K. Grauman. 2011. Large-scale live active learning: Training object detectors with crawled data and crowds. In International Conference on Computer Vision and Pattern Recognition (CVPR). 1449--1456.

Digital Library

[65]

C. Wallraven and B. Caputo. 2003. Recognition with local features: The kernel recipe. In 9th International Conference on Computer Vision (ICCV). 257--264.

Digital Library

[66]

T. Wang, X. He, and N. Barnes. 2013. Learning structured Hough voting for joint object detection and occlusion reasoning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1790--1797.

Digital Library

[67]

X. Wang, X. Bai, T. Ma, W. Liu, and L. J. Latecki. 2012. Fan shape model for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 151--158.

Digital Library

[68]

W. Wohlkinger, A. Aldoma, R. B. Rusu, and M. Vincze. 2012. Large-scale object class recognition from CAD models. In IEEE International Conference on Robotics and Automation (ICRA). 5384--5391.

[69]

H. Xie, K. Gao, Y. Zhang, J. Li, and H. Ren. 2012. Common visual pattern discovery via graph matching. ACM Multimedia 1385--1388.

Digital Library

[70]

Y. Xu, Y. Quan, Z. Zhang, and H. Ji. 2012. Contour-based recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3402--3409.

Digital Library

[71]

J. Yuan, G. Zhao, Y. Fu, Z. Li, A. K. Katsaggelos, and Y. Wu. 2012. Discovering thematic objects in image collections and videos. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 21, 4, 2207--2219.

Digital Library

[72]

Q. Zhang. 2013. Category Dataset of Kinect RGBD Images.

[73]

Q. Zhang, X. Song, X. Shao, R. Shibasaki, and H. Zhao. 2013a. Category modeling from just a single labeling: Use depth information to guide the learning of 2D models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 193--200.

Digital Library

[74]

Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2013b. Learning graph matching for category modeling from cluttered scenes. In IEEE International Conference on Computer Vision (ICCV).

Digital Library

[75]

Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2013c. Unsupervised 3D category discovery and point labeling from a large urban environment. In Proceeding of the IEEE International Conference on Robotics and Automation (ICRA).

[76]

Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2014a. Attributed graph mining and matching: An attempt to define and extract soft attributed patterns. In International Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[77]

Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2014b. When 3D reconstruction meets ubiquitous RGB-D images. In International Conference on Computer Vision and Pattern Recognition (CVPR).

Digital Library

[78]

Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2014c. Start from minimum labeling: Learning of 3D object models and point labeling from a large and complex environment. In Proceeding of the IEEE International Conference on Robotics and Automation (ICRA).

[79]

G. Zhao and J. Yuan. 2010. Mining and cropping common objects from images. ACM Multimedia 975--978.

Digital Library

[80]

F. Zhou and F. de la Torre. 2013. Deformable graph matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2922--2929.

Digital Library

[81]

J.-Y. Zhu, J. Wu, Y. Wei, E. Chang, and Z. Tu. 2012. Unsupervised object class discovery via saliency-guided multiple class learning. In International Conference on Computer Vision and Pattern Recognition (CVPR). 3218--3225.

Digital Library

Cited By

Sager CJaniesch CZschech P(2021)A survey of image labelling for computer vision applicationsJournal of Business Analytics10.1080/2573234X.2021.1908861(1-20)Online publication date: 18-Apr-2021
https://doi.org/10.1080/2573234X.2021.1908861
Isaza CAnaya KFuentes-Silva CPaz JRizzo AGarcia-Moreno A(2019)Dynamic set point model for driver alert state using digital image processingMultimedia Tools and Applications10.1007/s11042-019-7218-z78:14(19543-19563)Online publication date: 2-Aug-2019
https://dl.acm.org/doi/10.1007/s11042-019-7218-z
Yan YTao YXu JRen SLin H(2018)Visual analytics of bike-sharing data based on tensor factorizationJournal of Visualization10.1007/s12650-017-0463-121:3(495-509)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s12650-017-0463-1
Show More Cited By

Index Terms

From RGB-D Images to RGB Images: Single Labeling for Mining Visual Models
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision problems
      2. Computer vision tasks
        Scene understanding
2. Information systems
  1. Information systems applications

Recommendations

Boosting RGB-D Saliency Detection by Leveraging Unlabeled RGB Images
Training deep models for RGB-D salient object detection (SOD) often requires a large number of labeled RGB-D images. However, RGB-D data is not easily acquired, which limits the development of RGB-D SOD techniques. To alleviate this issue, we present a ...
An automatic 2D to 3D video conversion approach based on RGB-D images
Abstract
3D movies/videos have become increasingly popular in the market; however, they are usually produced by professionals. This paper presents a new technique for the automatic conversion of 2D to 3D video based on RGB-D sensors, which can be easily ...
Weakly-Supervised 3D Hand Pose Estimation from Monocular RGB Images
Computer Vision – ECCV 2018
Abstract
Compared with depth-based 3D hand pose estimation, it is more challenging to infer 3D hand pose from monocular RGB images, due to substantial depth ambiguity and the difficulty of obtaining fully-annotated training data. Different from existing ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology

ACM Transactions on Intelligent Systems and Technology Volume 6, Issue 2

Special Section on Visual Understanding with RGB-D Sensors

May 2015

381 pages

ISSN:2157-6904

EISSN:2157-6912

DOI:10.1145/2753829

Editor:
Huan Liu
Arizona State University

Issue’s Table of Contents

Copyright © 2015 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2015

Accepted: 01 May 2014

Revised: 01 April 2014

Received: 01 February 2014

Published in TIST Volume 6, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

Grant-in-Aid for Young Scientists (23700192) of Japan's Ministry of Education, Culture, Sports, Science, and Technology (MEXT)
Japan's Ministry of Land, Infrastructure, Transport and Tourism (MLIT)
Microsoft Research

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

6
Total Citations
View Citations
291
Total Downloads

Downloads (Last 12 months)4
Downloads (Last 6 weeks)1

Reflects downloads up to 17 Oct 2024

Other Metrics

View Author Metrics

Citations

Cited By

Sager CJaniesch CZschech P(2021)A survey of image labelling for computer vision applicationsJournal of Business Analytics10.1080/2573234X.2021.1908861(1-20)Online publication date: 18-Apr-2021
https://doi.org/10.1080/2573234X.2021.1908861
Isaza CAnaya KFuentes-Silva CPaz JRizzo AGarcia-Moreno A(2019)Dynamic set point model for driver alert state using digital image processingMultimedia Tools and Applications10.1007/s11042-019-7218-z78:14(19543-19563)Online publication date: 2-Aug-2019
https://dl.acm.org/doi/10.1007/s11042-019-7218-z
Yan YTao YXu JRen SLin H(2018)Visual analytics of bike-sharing data based on tensor factorizationJournal of Visualization10.1007/s12650-017-0463-121:3(495-509)Online publication date: 1-Jun-2018
https://dl.acm.org/doi/10.1007/s12650-017-0463-1
Li TZeng CZhou WXue WHuang YLiu ZZhou QXia BWang QWang WZhu X(2017)FIU-Miner (a fast, integrated, and user-friendly system for data mining) and its applicationsKnowledge and Information Systems10.1007/s10115-016-1014-052:2(411-443)Online publication date: 1-Aug-2017
https://dl.acm.org/doi/10.1007/s10115-016-1014-0
Zhang QSong XShao XZhao HShibasaki R(2016)Object Discovery: Soft Attributed Graph MiningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2015.245689238:3(532-545)Online publication date: 1-Mar-2016
https://doi.org/10.1109/TPAMI.2015.2456892
Zhang QWu YZhu S(2015)Mining And-Or Graphs for Graph Matching and Object DiscoveryProceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2015.15(55-63)Online publication date: 7-Dec-2015
https://dl.acm.org/doi/10.1109/ICCV.2015.15

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents