Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
research-article

Exploring Deep Learning for View-Based 3D Model Retrieval

Published: 17 February 2020 Publication History

Abstract

In recent years, view-based 3D model retrieval has become one of the research focuses in the field of computer vision and machine learning. In fact, the 3D model retrieval algorithm consists of feature extraction and similarity measurement, and the robust features play a decisive role in the similarity measurement. Although deep learning has achieved comprehensive success in the field of computer vision, deep learning features are used for 3D model retrieval only in a small number of works. To the best of our knowledge, there is no benchmark to evaluate these deep learning features. To tackle this problem, in this work we systematically evaluate the performance of deep learning features in view-based 3D model retrieval on four popular datasets (ETH, NTU60, PSB, and MVRED) by different kinds of similarity measure methods. In detail, the performance of hand-crafted features and deep learning features are compared, and then the robustness of deep learning features is assessed. Finally, the difference between single-view deep learning features and multi-view deep learning features is also evaluated. By quantitatively analyzing the performances on different datasets, it is clear that these deep learning features can consistently outperform all of the hand-crafted features, and they are also more robust than the hand-crafted features when different degrees of noise are added into the image. The exploration of latent relationships among different views in multi-view deep learning network architectures shows that the performance of multi-view deep learning outperforms that of single-view deep learning features with low computational complexity.

References

[1]
Marija Mavar-Haramija, Alberto Prats-Galino, Juan A. Juanes Mendez, Anna Puigdelvoll-Sanchez, and Matteo De Notaris. 2015. Interactive 3D-PDF presentations for the simulation and quantification of extended endoscopic endonasal surgical approaches. Journal of Medical Systems 39, 10 (2015), 1--9.
[2]
Z. Gao, S. H. Li, G. T. Zhang, Y. J. Zhu, C. Wang, and H. Zhang. 2017. Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimedia Tools and Applications 76, 19 (2017), 20125--20148.
[3]
Luren Yang and Fritz Albregtsen. 1994. Fast and exact computation of Cartesian geometric moments using discrete Green’s theorem. Pattern Recognition 29, 7 (1994), 1061--1073.
[4]
E. Persoon and K. S. Fu. 1977. Shape discrimination using Fourier descriptors. IEEE Transactions on Systems, Man, and Cybernetics 7, 3 (1977), 170--179.
[5]
Ke Lu, Qian Wang, Jian Xue, and Weiguo Pan. 2014. 3D model retrieval and classification by semi-supervised learning with content-based similarity. Information Sciences 281 (2014), 703--713.
[6]
Przemyslaw Polewski, Wei Yao, Marco Heurich, Peter Krzystek, and Uwe Stilla. 2015. Detection of fallen trees in ALS point clouds using a normalized cut approach trained by simulation. ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015), 252--271.
[7]
Biao Leng, Changchun Du, Shuang Guo, Xiangyang Zhang, and Zhang Xiong. 2015. A powerful 3D model classification mechanism based on fusing multi-graph. Neurocomputing 168 (2015), 761--769.
[8]
Anan Liu, Zhongyang Wang, Weizhi Nie, and Yuting Su. 2015. Graph-based characteristic view set extraction and matching for 3D model retrieval. Information Sciences 320 (2015), 429--442.
[9]
An-An Liu, Wei-Zhi Nie, Yue Gao, and Yu-Ting Su. 2016. Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Transactions on Image Processing 25, 5 (2016), 2103--2116.
[10]
Zan Gao, Deyu Wang, Shaohua Wan, Hua Zhang, and Yinglong Wang. Cognitive-inspired class-statistic matching with triple-constrain for camera free 3D object retrieval. Future Generation Computer Systems 94, C (2019), 641--653.
[11]
Ling-Yu Duan, Vijay Chandrasekhar, Shiqi Wang, Yihang Lou, Jie Lin, Yan Bai, Tiejun Huang, and Wen Gao. 2018. Compact descriptors for video analysis: The emerging MPEG standard. IEEE MultiMedia 26, 2 (2018), 44--54.
[12]
Ling Yu Duan, Jie Lin, Zhe Wang, Tiejun Huang, and Wen Gao. 2015. Weighted component hashing of binary aggregated descriptors for fast visual search. IEEE Transactions on Multimedia 17, 6 (2015), 828--842.
[13]
Ding Yun Chen, Xiao Pei Tian, Yu Te Shen, and Ouhyoung Ming. 2010. On visual similarity based 3D model retrieval. Computer Graphics Forum 22, 3 (2010), 223--232.
[14]
Jau Ling Shih, Chang Hsing Lee, and Jian Tang Wang. 2007. A new 3D model retrieval approach based on the elevation descriptor. Pattern Recognition 40, 1 (2007), 283--295.
[15]
Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05), Vol. 1. IEEE, Los Alamitos, CA, 886--893.
[16]
Alireza Khotanzad and Y. H. Hong. 1990. Invariant image recognition by Zernike moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 5 (1990), 489--497.
[17]
Ryutarou Ohbuchi and Takahiko Furuya. 2009. Scale-weighted dense bag of visual features for 3D model retrieval from a partial view 3D model. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 63--70.
[18]
David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision. 1150.
[19]
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.
[20]
Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.
[21]
Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. arXiv:1409.4842.
[22]
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.
[23]
Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A comprehensive survey on graph neural networks. arXiv:1901.00596.
[24]
Shaohua Wan, Lianyong Qi, Xiaolong Xu, Chao Tong, and Zonghua Gu. 2019. Deep learning models for real-time human activity recognition with smartphones. Mobile Networks and Applications. Epub ahead of print (Dec. 30, 2019).
[25]
Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. arXiv:1812.08434.
[26]
Shaohua Wan, Zonghua Gu, and Qiang Ni. 2019. Cognitive computing and wireless communications on the edge for healthcare service robots. Computer Communications 149 (2019), 99--106.
[27]
Ning Xu, An-An Liu, Yongkang Wong, Yongdong Zhang, Weizhi Nie, Yuting Su, and Mohan Kankanhalli. 2019. Dual-stream recurrent neural network for video captioning. IEEE Transactions on Circuits and Systems for Video Technology 29, 8 (2019), 2482--2493.
[28]
Shaohua Wan and Sotirios Goudos. 2019. Faster R-CNN for multi-class fruit detection using a robotic vision system. Computer Networks 168 (2019), 107036.
[29]
Jun Liu, Henghui Ding, Amir Shahroudy, Ling-Yu Duan, Xudong Jiang, Gang Wang, and Alex Kot Chichung. 2019. Feature boosting network for 3D pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2019), 494--501.
[30]
Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264--272.
[31]
Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. PVNet: A joint convolutional network of point cloud and multi-view for 3D shape recognition. In Proceedings of the ACM Conference on Multimedia. 1--8.
[32]
Alexander Grabner, Peter M. Roth, and Vincent Lepetit. 2018. 3D pose estimation and 3D model retrieval for objects in the wild. arXiv:1803.11493.
[33]
Zan Gao, Kai Xin Xue, and Hua Zhang. 2017. Multi-view and multivariate Gaussian descriptor for 3D object retrieval. Multimedia Tools and Applications 1 (2017), 1--18.
[34]
Y. Gao, J. Tang, R. Hong, S. Yan, Q. Dai, N. Zhang, and T. S. Chua. 2012. Camera constraint-free view-based 3-D object retrieval. IEEE Transactions on Image Processing 21, 4 (2012), 2269--2281.
[35]
Petros Daras and Apostolos Axenopoulos. 2010. A 3D shape retrieval framework supporting multimodal queries. International Journal of Computer Vision 89, 2--3 (2010), 229--247.
[36]
Yue Gao, Qionghai Dai, and Nai Yao Zhang. 2010. 3D model comparison using spatial structure circular descriptor. Pattern Recognition 43, 3 (2010), 1142--1151.
[37]
Bo Li and Henry Johan. 2013. 3D model retrieval using hybrid features and class information. Multimedia Tools and Applications 62, 3 (2013), 821--846.
[38]
Laurent Lucas, Cline Loscos, and Yannick Remion. 2013. 3D model retrieval. In 3D Video: From Capture to Diffusion. John Wiley 8 Sons, 347--368.
[39]
S. Haykin and B. Kosko. 2001. Gradient-based learning applied to document recognition. In Intelligent Signal Processing. IEEE, Los Alamitos, CA, 306--351.
[40]
Weizhi Nie, Qun Cao, Anan Liu, and Yuting Su. 2017. Convolutional deep learning for 3D object retrieval. Multimedia Systems 23, 3 (2017), 1--8.
[41]
A. A. Liu, W. Z. Nie, Y. Gao, and Y. T. Su. 2017. View-based 3-D model retrieval: A benchmark. IEEE Transactions on Cybernetics 48, 3 (2017), 916--928.
[42]
M. P. Dubuisson and A. K. Jain. 2002. A modified Hausdorff distance for object matching. In Proceedings of the International Conference on Pattern Recognition, Vol. 1. 566--568.
[43]
M. Steinbach, G. Karypis, and V. Kumar. 2000. A comparison of document clustering techniques. In Proceedings of the KDD Workshop on Text Mining.
[44]
Tarik Filali Ansary, Mohamed Daoudi, and J. P. Vandeborre. 2006. A Bayesian 3-D search engine using adaptive views clustering. IEEE Transactions on Multimedia 9, 1 (2006), 78--88.
[45]
X. Liu, M. Wang, B. C. Yin, B. Huet, and X. Li. 2015. Event-based media enrichment using an adaptive probabilistic hypergraph model. IEEE Transactions on Cybernetics 45, 11 (2015), 2461.
[46]
A. Liu, W. Nie, Y. Gao, and Y. Su. 2016. Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Transactions on Image Processing 25, 5 (2016), 2103--2116.
[47]
Ke Lu, Ning He, Jian Xu, Jiyang Dong, and Ling Shao. 2015. Learning view-model joint relevance for 3D object retrieval. IEEE Transactions on Image Processing 24, 5 (2015), 1449--1459.
[48]
Yue Gao, Qionghai Dai, Meng Wang, and Naiyao Zhang. 2011. 3D model retrieval using weighted bipartite graph matching. Signal Processing Image Communication 26, 1 (2011), 39--47.
[49]
M. Leordeanu and M. Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In Proceedings of the 10th IEEE International Conference on Computer Vision. 1482--1489.
[50]
Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted random walks for graph matching. In Proceedings of the European Conference on Computer Vision. 492--505.
[51]
Jia Deng, Wei Dong, Richard Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’09). 248--255.
[52]
Z. Gao, H. Xuan, H. Zhang, S. Wan, and K. R. Choo. 2019. Adaptive fusion and category-level dictionary learning model for multi-view human action recognition. IEEE Internet of Things Journal 6, 6 (2019), 9280--9293.
[53]
Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learnedmiller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945--953.
[54]
Zan Gao, Deyu Wang, Y. B. Xue, G. P Xu, H. Zhang, and Y. L. Wang. 2018. 3D object recognition based on pairwise multi-view convolutional neural networks. Journal of Visual Communication and Image Representation 56, C (2018), 305--315.
[55]
Zan Gao, Deyu Wang, Xiangnan He, and Hua Zhang. 2018. Group-pair convolutional neural networks for multi-view based 3D object retrieval. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 1--8.
[56]
Bernt Schiele and Bastian Leibe. 2003. Analyzing appearance and contour based methods for object categorization. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2.
[57]
Philip Shilane, Patrick Min, Michael Kazhdan, and Thomas Funkhouser. 2004. The Princeton shape benchmark. In Proceedings of Shape Modeling Applications. IEEE, Los Alamitos, CA, 167--178.
[58]
Yue Gao, Meng Wang, Rongrong Ji, Xindong Wu, and Qionghai Dai. 2013. 3-D object retrieval with Hausdorff distance learning. IEEE Transactions on Industrial Electronics 61, 4 (2013), 2088--2098.
[59]
Yue Gao, Meng Wang, Zheng Jun Zha, Qi Tian, Qionghai Dai, and Naiyao Zhang. 2011. Less is more: Efficient 3-D object retrieval with query view selection. IEEE Transactions on Multimedia 13, 5 (2011), 1007--1018.
[60]
Wei-Zhi Nie, An-An Liu, and Yu-Ting Su. 2016. 3D object retrieval based on sparse coding in weak supervision. Journal of Visual Communication and Image Representation 37, C (2016), 40--45.

Cited By

View all
  • (2024)Shell stand: Stable thin shell models for 3D fabricationComputational Visual Media10.1007/s41095-024-0402-8Online publication date: 24-Jun-2024
  • (2023)Self-supervised Multi-view Learning via Auto-encoding 3D TransformationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359761320:1(1-23)Online publication date: 18-Sep-2023
  • (2023)Context-Aware 3D Points of Interest Detection via Spatial Attention MechanismACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359702619:6(1-19)Online publication date: 12-Jul-2023
  • Show More Cited By

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications
ACM Transactions on Multimedia Computing, Communications, and Applications  Volume 16, Issue 1
February 2020
363 pages
ISSN:1551-6857
EISSN:1551-6865
DOI:10.1145/3384216
Issue’s Table of Contents
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2020
Accepted: 01 January 2020
Revised: 01 November 2019
Received: 01 August 2019
Published in TOMM Volume 16, Issue 1

Permissions

Request permissions for this article.

Check for updates

Author Tags

  1. 3D model retrieval
  2. benchmark
  3. deep learning features
  4. handcrafted features

Qualifiers

  • Research-article
  • Research
  • Refereed

Funding Sources

  • National Natural Science Foundation of China
  • National Key R8D Program of China
  • Jinan's innovation team

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • Downloads (Last 12 months)157
  • Downloads (Last 6 weeks)11
Reflects downloads up to 01 Sep 2024

Other Metrics

Citations

Cited By

View all
  • (2024)Shell stand: Stable thin shell models for 3D fabricationComputational Visual Media10.1007/s41095-024-0402-8Online publication date: 24-Jun-2024
  • (2023)Self-supervised Multi-view Learning via Auto-encoding 3D TransformationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359761320:1(1-23)Online publication date: 18-Sep-2023
  • (2023)Context-Aware 3D Points of Interest Detection via Spatial Attention MechanismACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359702619:6(1-19)Online publication date: 12-Jul-2023
  • (2023)HCMS: Hierarchical and Conditional Modality Selection for Efficient Video RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357277620:2(1-18)Online publication date: 27-Sep-2023
  • (2023)Virtual Reality Solutions Employing Artificial Intelligence Methods: A Systematic Literature ReviewACM Computing Surveys10.1145/356502055:10(1-29)Online publication date: 2-Feb-2023
  • (2023)Monocular Vision-aided Depth Measurement from RGB Images for Autonomous UAV NavigationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355048520:2(1-22)Online publication date: 27-Sep-2023
  • (2023)Robust Searching-Based Gradient Collaborative Management in Intelligent Transportation SystemACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354993920:2(1-23)Online publication date: 27-Sep-2023
  • (2023)Self-supervised Image-based 3D Model RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354869019:2(1-18)Online publication date: 23-Mar-2023
  • (2023)A Lightweight Specific Emitter Identification Model for IIoT Devices Based on Adaptive Broad LearningIEEE Transactions on Industrial Informatics10.1109/TII.2022.320630919:5(7066-7075)Online publication date: May-2023
  • (2023)Matching evaluation based on image content discriminative features for different image typesThe Imaging Science Journal10.1080/13682199.2023.217155572:1(23-51)Online publication date: 5-Sep-2023
  • Show More Cited By

View Options

Get Access

Login options

Full Access

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media