Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

From RGB-D Images to RGB Images: Single Labeling for Mining Visual Models

Published: 31 March 2015 Publication History

Abstract

Mining object-level knowledge, that is, building a comprehensive category model base, from a large set of cluttered scenes presents a considerable challenge to the field of artificial intelligence. How to initiate model learning with the least human supervision (i.e., manual labeling) and how to encode the structural knowledge are two elements of this challenge, as they largely determine the scalability and applicability of any solution. In this article, we propose a model-learning method that starts from a single-labeled object for each category, and mines further model knowledge from a number of informally captured, cluttered scenes. However, in these scenes, target objects are relatively small and have large variations in texture, scale, and rotation. Thus, to reduce the model bias normally associated with less supervised learning methods, we use the robust 3D shape in RGB-D images to guide our model learning, then apply the properly trained category models to both object detection and recognition in more conventional RGB images. In addition to model training for their own categories, the knowledge extracted from the RGB-D images can also be transferred to guide model learning for a new category, in which only RGB images without depth information in the new category are provided for training. Preliminary testing shows that the proposed method performs as well as fully supervised learning methods.

References

[1]
A. Aldoma, F. Tombari, L. D. Stefano, and M. Vincze. 2012. A global hypotheses verification method for 3D object recognition. In 12th European Conference on Computer Vision (ECCV). 511--524.
[2]
P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. 2011. Contour detection and hierarchical image segmentation. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 33, 5, 898--916.
[3]
G. Bradski and T. Hong. 2011. NIST and willow garage: Solution in perception challenge. Retrieved from http://www.willowgarage.com/blog/2011/02/28/nist-and-willow-garage-solutions-perception-challenge.
[4]
B. Browatzki, J. Fischer, G. Birgit, H. Bulthoff, and C.Wallraven. 2011. Going into depth: Evaluating 2d and 3D cues for object classification on a new, large-scale object dataset. In IEEE International Conference on Computer Vision Workshop (ICCV Workshops). 1189--1195.
[5]
C. K. Liu, A. Hertzmann, and Z. Popović. 2005. Learning physics-based motion style with nonlinear inverse optimization. In ACM Transactions on Graphics (TOG)—Proceedings of ACM SIGGRAPH 24, 3 (2005), 1071--1081.
[6]
T. S. Caetano, J. J. McAuley, L. Cheng, Q. V. Le, and A. J. Smola. 2009. Learning graph matching. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 31, 6, 1048--1058.
[7]
H.-Y. Chen, Y.-Y. Lin, and B.-Y. Chen. 2013. Robust feature matching with alternate Hough and inverted Hough transforms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2762--2769.
[8]
W.-C. Chiu and M. Fritz. 2013. Multi-class video co-segmentation with a generative multi-video model. In 2013 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 321--328.
[9]
M. Cho, K. Alahari, and J. Ponce. 2013. Learning graphs to match. In International Conference on Computer Vision (ICCV).
[10]
M. Cho and K. M. Lee. 2012. Progressive graph matching: Making a move of graphs via probabilistic voting. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 398--405.
[11]
M. Cho, Y. M. Shin, and K. M. Lee. 2010. Unsupervised detection and segmentation of identical objects. In International Conference on Computer Vision and Pattern Recognition (CVPR). 1617--1624.
[12]
A. Collet, S. S. Srinivasay, and M. Hebert. 2011. Structure discovery in multi-modal data: A region-based approach. In ICRA.
[13]
N. Dalal and B. Triggs. 2005. Histograms of oriented gradients for human detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 886--893.
[14]
O. Duchenne, A. Joulin, and J. Ponce. 2011. A graph-matching kernel for object categorization. In IEEE International Conference on Computer Vision (ICCV). 1792--1799.
[15]
A. Faktor and M. Irani. 2012. “Clustering by Composition”--Unsupervised discovery of image categories. In 12th European Conference on Computer Vision (ECCV). 474--487.
[16]
V. Ferrari, F. Jurie, C. Schmid. 2010. From images to shape models for object detection. In International Journal on Computer Vision (IJCV), 87, 3, 284--303.
[17]
D. F. Fouhey, A. Gupta, and M. Hebert. 2013. Data-driven 3D primitives for single image understanding. In International Conference on Computer Vision (ICCV).
[18]
S. Helmer, D. Meger, M. Muja, J. J. Little, and D. G. Lowe. 2010. UBC robot vision survey. Retrieved February 13, 2015 from http://www.cs.ubc.ca/labs/lci/vrs/index.html.
[19]
P. Hong and T. S. Huang. 2004. Spatial pattern discovery by learning a probabilistic parametric model from multiple attributed relational graphs. Discrete Applied Mathematics 139, 113--135.
[20]
E. Hsiao, A. Collet, and M. Hebert. 2010. Making specific features less discriminative to improve point-based 3D object recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2653--2660.
[21]
N. Hu, R. M. Rustamov, and L. Guibas. 2013. Graph matching with anchor nodes: A learning approach. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2906--2913.
[22]
W. Hu. 2012. Learning 3D object templates by hierarchical quantization of geometry and appearance spaces. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2336--2343.
[23]
A. Janoch. 2012. The Berkeley 3D Object Dataset. Retrieved February 13, 2015 from http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-85.html. Master’s thesis. EECS Department, University of California, Berkeley.
[24]
H. Jiang and C.-W. Ngo. 2003. Image mining using inexact maximal common subgraph of multiple ARGs. In International Conference on Visual Information System. 63--76.
[25]
A. Joulin, F. Bach, and J. Ponce. 2012. Multi-class cosegmentation. In International Conference on Computer Vision and Pattern Recognition (CVPR). 542--549.
[26]
H. Kang, M. Hebert, and T. Kanade. 2011. Discovering object instances from scenes of daily living. In 13th International Conference on Computer Vision (ICCV). 762--769.
[27]
G. Kim, C. Faloutsos, and M. Hebert. 2008. Unsupervised modeling of object categories using link analysis techniques. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.
[28]
G. Kim and E. P. Xing. 2012. On multiple foreground cosegmentation. In International Conference on Computer Vision and Pattern Recognition (CVPR). 837--844.
[29]
G. Kim, E. P. Xing, L. Fei-Fei, and T. Kanade. 2011. Distributed cosegmentation via submodular optimization on anisotropic diffusion. In IEEE International Conference on Computer Vision (ICCV). 169--176.
[30]
K. I. Kim, J. Tompkin, M. Theobald, J. Kautz, and C. Theobalt. 2012. Match graph construction for large image databases. In 12th European Conference on Computer Vision (ECCV). 272--285.
[31]
V. Kolmogorov. 2006. Convergent tree-reweighted message passing for energy minimization. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 28, 10, 1568--1583.
[32]
H. S. Koppula, A. Anand, T. Joachims, and A. Saxena. 2011. Semantic labeling of 3D point clouds for indoor scenes. In Neural Information Processing Systems (NIPS). 244--252.
[33]
K. Lai, L. Bo, X. Ren, and D. Fox. 2011a. A large-scale hierarchical multi-view RGB-D object dataset. In IEEE International Conference on Robotics and Automation (ICRA). 1817--1824.
[34]
K. Lai, L. Bo, X. Ren, and D. Fox. 2011b. Sparse distance learning for object recognition combining RGB and depth information. In IEEE International Conference on Robotics and Automation (ICRA). 4007--4013.
[35]
K. Lai and D. Fox. 2010. Object recognition in 3D point clouds using web data and domain adaptation. Proceedings of International Journal of Robotic Research 29, 8, 1019--1037.
[36]
Q. Le, M. Ranzato, R. Monga, M. Devin, K. Chen, G. Corrado, J. Dean, and A. Ng. 2012. Building high-level features using large scale unsupervised learning. In ICML.
[37]
Y. J. Lee and K. Grauman. 2011. Learning the easy things first: Self-paced visual category discovery. In International Conference on Computer Vision and Pattern Recognition (CVPR). 1721--1728.
[38]
M. Leordeanu and M. Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In 10th International Conference on Computer Vision (ICCV). 1482--1489.
[39]
M. Leordeanu and M. Hebert. 2008. Smoothing-based optimization. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.
[40]
M. Leordeanu and M. Hebert. 2012. Unsupervised learning for graph matching. International Journal of Computer Vision 96, 1, 28--45.
[41]
M. Leordeanu, M. Hebert, and R. Sukthankar. 2007. Beyond local appearance: Category recognition from pairwise interactions of simple features. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1--8.
[42]
C. Li, D. Parikh, and T. Chen. 2012. Automatic discovery of groups of objects for scene understanding. International Conference on Computer Vision and Pattern Recognition (CVPR). 2735--2742.
[43]
F.-F. Li, R. Fergus, and P. Perona. 2006. One-shot learning of object categories. IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 28, 4, 594--611.
[44]
L.-J. Li, G. Wang, and F.-F. Li. 2010. OPTIMOL: Automatic online picture collection via incremental model learning. International Journal on Computer Vision (IJCV), 88, 2, 147--154.
[45]
Z. Liao, A. Farhadi, Y. Wang, I. Endres, and D. Forsyth. 2012. Building a dictionary of image fragments. In International Conference on Computer Vision and Pattern Recognition (CVPR). 3442--3449.
[46]
C.-J. Lin and R. C. Weng. 2004. Simple probabilistic predictions for support vector regression. In Technical report Department of Computer Science, National Taiwan University, Taiwan.
[47]
H. Liu and S. Yan. 2010. Common visual pattern discovery via spatially coherent correspondences. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1609--1616.
[48]
H. Liu and S. Yan. 2012. Efficient structure detection via random consensus graph. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 574--581.
[49]
K. Liu, Q. Wang, W. Driever, and O. Ronneberger. 2012. 2D/3D rotation-invariant detection using equivariant filters and kernelweighted mapping. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 917--924.
[50]
S. Maji and J. Malik. 2009. Object detection using a max-margin Hough transform. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1038--1045.
[51]
Microsoft. 2011. Introducing Kinect for Xbox 360.
[52]
L. Mukherjee, V. Singh, J. Xu, and M.D. Collins. 2012. Analyzing the subspace structure of related images: Concurrent segmentation of image sets. In 12th European Conference on Computer Vision (ECCV). 128--142.
[53]
C. Olsson and Y. Boykov. 2012. Curvature-based regularization for surface approximation. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1576--1583.
[54]
B. Pepik, P. Gehler, M. Stark, and B. Schiele. 2012. 3D2PM—3D deformable part models. In 12th European Conference on Computer Vision (ECCV). 356--370.
[55]
N. Razavi, J. Gall, P. Kohli, and L. v. Gool. 2012. Latent Hough transform for object detection. In 12th European Conference on Computer Vision (ECCV). 312--325.
[56]
X. Ren, L. Bo, and D. Fox. 2012. RGB-(D) scene labeling: Features and algorithms. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2759--2766.
[57]
N. Silberman and R. Fergus. 2011. Indoor scene segmentation using a structured light sensor. In IEEE International Conference on Computer Vision Workshop (ICCV Workshops). 601--608.
[58]
N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. In 12th European Conference on Computer Vision (ECCV). 746--760.
[59]
M. Sun, G. Bradski, B.-X. Xu, and S. Savarese. 2010. Depth-encoded Hough voting for joint object detection and shape recovery. In 11th European Conference on Computer Vision (ECCV). 658--671.
[60]
W. Susanto, M. Rohrbach, and B. Schiele. 2012. 3D object detection with multiple kinects. In 12th European Conference on Computer Vision (ECCV). 93--102.
[61]
H.-K. Tan and C.-W. Ngo. 2009. Localized matching using Earth Movers Distance towards discovery of common patterns from small image samples. Image and Vision Computing 27, 1470--1483.
[62]
L. Torresani, V. Kolmogorov, and C. Rother. 2008. Feature correspondence via graph matching: Models and global optimization. In 10th European Conference on Computer Vision (ECCV). 596--609.
[63]
T. Tuytelaars, C. H. Lampert, M. B. Blaschko, and W. Buntine. 2010. Unsupervised object discovery: A comparison. International Journal on Computer Vision 88, 2, 284--302.
[64]
S. Vijayanarasimhan and K. Grauman. 2011. Large-scale live active learning: Training object detectors with crawled data and crowds. In International Conference on Computer Vision and Pattern Recognition (CVPR). 1449--1456.
[65]
C. Wallraven and B. Caputo. 2003. Recognition with local features: The kernel recipe. In 9th International Conference on Computer Vision (ICCV). 257--264.
[66]
T. Wang, X. He, and N. Barnes. 2013. Learning structured Hough voting for joint object detection and occlusion reasoning. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 1790--1797.
[67]
X. Wang, X. Bai, T. Ma, W. Liu, and L. J. Latecki. 2012. Fan shape model for object detection. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 151--158.
[68]
W. Wohlkinger, A. Aldoma, R. B. Rusu, and M. Vincze. 2012. Large-scale object class recognition from CAD models. In IEEE International Conference on Robotics and Automation (ICRA). 5384--5391.
[69]
H. Xie, K. Gao, Y. Zhang, J. Li, and H. Ren. 2012. Common visual pattern discovery via graph matching. ACM Multimedia 1385--1388.
[70]
Y. Xu, Y. Quan, Z. Zhang, and H. Ji. 2012. Contour-based recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 3402--3409.
[71]
J. Yuan, G. Zhao, Y. Fu, Z. Li, A. K. Katsaggelos, and Y. Wu. 2012. Discovering thematic objects in image collections and videos. In IEEE Transactions on Pattern Analysis and Machine Intelligence (PAMI) 21, 4, 2207--2219.
[72]
Q. Zhang. 2013. Category Dataset of Kinect RGBD Images.
[73]
Q. Zhang, X. Song, X. Shao, R. Shibasaki, and H. Zhao. 2013a. Category modeling from just a single labeling: Use depth information to guide the learning of 2D models. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 193--200.
[74]
Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2013b. Learning graph matching for category modeling from cluttered scenes. In IEEE International Conference on Computer Vision (ICCV).
[75]
Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2013c. Unsupervised 3D category discovery and point labeling from a large urban environment. In Proceeding of the IEEE International Conference on Robotics and Automation (ICRA).
[76]
Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2014a. Attributed graph mining and matching: An attempt to define and extract soft attributed patterns. In International Conference on Computer Vision and Pattern Recognition (CVPR).
[77]
Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2014b. When 3D reconstruction meets ubiquitous RGB-D images. In International Conference on Computer Vision and Pattern Recognition (CVPR).
[78]
Q. Zhang, X. Song, X. Shao, H. Zhao, and R. Shibasaki. 2014c. Start from minimum labeling: Learning of 3D object models and point labeling from a large and complex environment. In Proceeding of the IEEE International Conference on Robotics and Automation (ICRA).
[79]
G. Zhao and J. Yuan. 2010. Mining and cropping common objects from images. ACM Multimedia 975--978.
[80]
F. Zhou and F. de la Torre. 2013. Deformable graph matching. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2922--2929.
[81]
J.-Y. Zhu, J. Wu, Y. Wei, E. Chang, and Z. Tu. 2012. Unsupervised object class discovery via saliency-guided multiple class learning. In International Conference on Computer Vision and Pattern Recognition (CVPR). 3218--3225.

Cited By

View all
  • (2021)A survey of image labelling for computer vision applicationsJournal of Business Analytics10.1080/2573234X.2021.1908861(1-20)Online publication date: 18-Apr-2021
  • (2019)Dynamic set point model for driver alert state using digital image processingMultimedia Tools and Applications10.1007/s11042-019-7218-z78:14(19543-19563)Online publication date: 2-Aug-2019
  • (2018)Visual analytics of bike-sharing data based on tensor factorizationJournal of Visualization10.1007/s12650-017-0463-121:3(495-509)Online publication date: 1-Jun-2018
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Intelligent Systems and Technology
ACM Transactions on Intelligent Systems and Technology  Volume 6, Issue 2
Special Section on Visual Understanding with RGB-D Sensors
May 2015
381 pages
ISSN:2157-6904
EISSN:2157-6912
DOI:10.1145/2753829
  • Editor:
  • Huan Liu
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 31 March 2015
Accepted: 01 May 2014
Revised: 01 April 2014
Received: 01 February 2014
Published in TIST Volume 6, Issue 2

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. Data mining
  2. RGB-D sensor
  3. big visual data
  4. computer vision
  5. transfer learning
  6. visual knowledge base
  7. visual mining

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • Grant-in-Aid for Young Scientists (23700192) of Japan's Ministry of Education, Culture, Sports, Science, and Technology (MEXT)
  • Japan's Ministry of Land, Infrastructure, Transport and Tourism (MLIT)
  • Microsoft Research

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)4
  • Downloads (Last 6 weeks)1
Reflects downloads up to 17 Oct 2024

Other Metrics

Citations

Cited By

View all
  • (2021)A survey of image labelling for computer vision applicationsJournal of Business Analytics10.1080/2573234X.2021.1908861(1-20)Online publication date: 18-Apr-2021
  • (2019)Dynamic set point model for driver alert state using digital image processingMultimedia Tools and Applications10.1007/s11042-019-7218-z78:14(19543-19563)Online publication date: 2-Aug-2019
  • (2018)Visual analytics of bike-sharing data based on tensor factorizationJournal of Visualization10.1007/s12650-017-0463-121:3(495-509)Online publication date: 1-Jun-2018
  • (2017)FIU-Miner (a fast, integrated, and user-friendly system for data mining) and its applicationsKnowledge and Information Systems10.1007/s10115-016-1014-052:2(411-443)Online publication date: 1-Aug-2017
  • (2016)Object Discovery: Soft Attributed Graph MiningIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2015.245689238:3(532-545)Online publication date: 1-Mar-2016
  • (2015)Mining And-Or Graphs for Graph Matching and Object DiscoveryProceedings of the 2015 IEEE International Conference on Computer Vision (ICCV)10.1109/ICCV.2015.15(55-63)Online publication date: 7-Dec-2015

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media