research-article

Exploring Deep Learning for View-Based 3D Model Retrieval

Authors:

Shaohua WanAuthors Info & Claims

ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM), Volume 16, Issue 1

Article No.: 18, Pages 1 - 21

https://doi.org/10.1145/3377876

Published: 17 February 2020 Publication History

Abstract

In recent years, view-based 3D model retrieval has become one of the research focuses in the field of computer vision and machine learning. In fact, the 3D model retrieval algorithm consists of feature extraction and similarity measurement, and the robust features play a decisive role in the similarity measurement. Although deep learning has achieved comprehensive success in the field of computer vision, deep learning features are used for 3D model retrieval only in a small number of works. To the best of our knowledge, there is no benchmark to evaluate these deep learning features. To tackle this problem, in this work we systematically evaluate the performance of deep learning features in view-based 3D model retrieval on four popular datasets (ETH, NTU60, PSB, and MVRED) by different kinds of similarity measure methods. In detail, the performance of hand-crafted features and deep learning features are compared, and then the robustness of deep learning features is assessed. Finally, the difference between single-view deep learning features and multi-view deep learning features is also evaluated. By quantitatively analyzing the performances on different datasets, it is clear that these deep learning features can consistently outperform all of the hand-crafted features, and they are also more robust than the hand-crafted features when different degrees of noise are added into the image. The exploration of latent relationships among different views in multi-view deep learning network architectures shows that the performance of multi-view deep learning outperforms that of single-view deep learning features with low computational complexity.

References

[1]

Marija Mavar-Haramija, Alberto Prats-Galino, Juan A. Juanes Mendez, Anna Puigdelvoll-Sanchez, and Matteo De Notaris. 2015. Interactive 3D-PDF presentations for the simulation and quantification of extended endoscopic endonasal surgical approaches. Journal of Medical Systems 39, 10 (2015), 1--9.

Digital Library

[2]

Z. Gao, S. H. Li, G. T. Zhang, Y. J. Zhu, C. Wang, and H. Zhang. 2017. Evaluation of regularized multi-task leaning algorithms for single/multi-view human action recognition. Multimedia Tools and Applications 76, 19 (2017), 20125--20148.

Digital Library

[3]

Luren Yang and Fritz Albregtsen. 1994. Fast and exact computation of Cartesian geometric moments using discrete Green’s theorem. Pattern Recognition 29, 7 (1994), 1061--1073.

[4]

E. Persoon and K. S. Fu. 1977. Shape discrimination using Fourier descriptors. IEEE Transactions on Systems, Man, and Cybernetics 7, 3 (1977), 170--179.

[5]

Ke Lu, Qian Wang, Jian Xue, and Weiguo Pan. 2014. 3D model retrieval and classification by semi-supervised learning with content-based similarity. Information Sciences 281 (2014), 703--713.

Digital Library

[6]

Przemyslaw Polewski, Wei Yao, Marco Heurich, Peter Krzystek, and Uwe Stilla. 2015. Detection of fallen trees in ALS point clouds using a normalized cut approach trained by simulation. ISPRS Journal of Photogrammetry and Remote Sensing 105 (2015), 252--271.

[7]

Biao Leng, Changchun Du, Shuang Guo, Xiangyang Zhang, and Zhang Xiong. 2015. A powerful 3D model classification mechanism based on fusing multi-graph. Neurocomputing 168 (2015), 761--769.

Digital Library

[8]

Anan Liu, Zhongyang Wang, Weizhi Nie, and Yuting Su. 2015. Graph-based characteristic view set extraction and matching for 3D model retrieval. Information Sciences 320 (2015), 429--442.

Digital Library

[9]

An-An Liu, Wei-Zhi Nie, Yue Gao, and Yu-Ting Su. 2016. Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Transactions on Image Processing 25, 5 (2016), 2103--2116.

Digital Library

[10]

Zan Gao, Deyu Wang, Shaohua Wan, Hua Zhang, and Yinglong Wang. Cognitive-inspired class-statistic matching with triple-constrain for camera free 3D object retrieval. Future Generation Computer Systems 94, C (2019), 641--653.

[11]

Ling-Yu Duan, Vijay Chandrasekhar, Shiqi Wang, Yihang Lou, Jie Lin, Yan Bai, Tiejun Huang, and Wen Gao. 2018. Compact descriptors for video analysis: The emerging MPEG standard. IEEE MultiMedia 26, 2 (2018), 44--54.

[12]

Ling Yu Duan, Jie Lin, Zhe Wang, Tiejun Huang, and Wen Gao. 2015. Weighted component hashing of binary aggregated descriptors for fast visual search. IEEE Transactions on Multimedia 17, 6 (2015), 828--842.

Digital Library

[13]

Ding Yun Chen, Xiao Pei Tian, Yu Te Shen, and Ouhyoung Ming. 2010. On visual similarity based 3D model retrieval. Computer Graphics Forum 22, 3 (2010), 223--232.

[14]

Jau Ling Shih, Chang Hsing Lee, and Jian Tang Wang. 2007. A new 3D model retrieval approach based on the elevation descriptor. Pattern Recognition 40, 1 (2007), 283--295.

Digital Library

[15]

Navneet Dalal and Bill Triggs. 2005. Histograms of oriented gradients for human detection. In Proceedings of the 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR ’05), Vol. 1. IEEE, Los Alamitos, CA, 886--893.

Digital Library

[16]

Alireza Khotanzad and Y. H. Hong. 1990. Invariant image recognition by Zernike moments. IEEE Transactions on Pattern Analysis and Machine Intelligence 12, 5 (1990), 489--497.

Digital Library

[17]

Ryutarou Ohbuchi and Takahiko Furuya. 2009. Scale-weighted dense bag of visual features for 3D model retrieval from a partial view 3D model. In Proceedings of the IEEE International Conference on Computer Vision Workshops. 63--70.

[18]

David G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the 7th IEEE International Conference on Computer Vision. 1150.

[19]

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Proceedings of the International Conference on Neural Information Processing Systems. 1097--1105.

Digital Library

[20]

Karen Simonyan and Andrew Zisserman. 2014. Very deep convolutional networks for large-scale image recognition. arXiv:1409.1556.

[21]

Christian Szegedy, Wei Liu, Yangqing Jia, Pierre Sermanet, Scott Reed, Dragomir Anguelov, Dumitru Erhan, Vincent Vanhoucke, and Andrew Rabinovich. 2014. Going deeper with convolutions. arXiv:1409.4842.

[22]

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 770--778.

[23]

Zonghan Wu, Shirui Pan, Fengwen Chen, Guodong Long, Chengqi Zhang, and Philip S. Yu. 2019. A comprehensive survey on graph neural networks. arXiv:1901.00596.

[24]

Shaohua Wan, Lianyong Qi, Xiaolong Xu, Chao Tong, and Zonghua Gu. 2019. Deep learning models for real-time human activity recognition with smartphones. Mobile Networks and Applications. Epub ahead of print (Dec. 30, 2019).

[25]

Jie Zhou, Ganqu Cui, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, and Maosong Sun. 2018. Graph neural networks: A review of methods and applications. arXiv:1812.08434.

[26]

Shaohua Wan, Zonghua Gu, and Qiang Ni. 2019. Cognitive computing and wireless communications on the edge for healthcare service robots. Computer Communications 149 (2019), 99--106.

[27]

Ning Xu, An-An Liu, Yongkang Wong, Yongdong Zhang, Weizhi Nie, Yuting Su, and Mohan Kankanhalli. 2019. Dual-stream recurrent neural network for video captioning. IEEE Transactions on Circuits and Systems for Video Technology 29, 8 (2019), 2482--2493.

[28]

Shaohua Wan and Sotirios Goudos. 2019. Faster R-CNN for multi-class fruit detection using a robotic vision system. Computer Networks 168 (2019), 107036.

[29]

Jun Liu, Henghui Ding, Amir Shahroudy, Ling-Yu Duan, Xudong Jiang, Gang Wang, and Alex Kot Chichung. 2019. Feature boosting network for 3D pose estimation. IEEE Transactions on Pattern Analysis and Machine Intelligence 42, 2 (2019), 494--501.

[30]

Yifan Feng, Zizhao Zhang, Xibin Zhao, Rongrong Ji, and Yue Gao. 2018. GVCNN: Group-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. 264--272.

[31]

Haoxuan You, Yifan Feng, Rongrong Ji, and Yue Gao. 2018. PVNet: A joint convolutional network of point cloud and multi-view for 3D shape recognition. In Proceedings of the ACM Conference on Multimedia. 1--8.

Digital Library

[32]

Alexander Grabner, Peter M. Roth, and Vincent Lepetit. 2018. 3D pose estimation and 3D model retrieval for objects in the wild. arXiv:1803.11493.

[33]

Zan Gao, Kai Xin Xue, and Hua Zhang. 2017. Multi-view and multivariate Gaussian descriptor for 3D object retrieval. Multimedia Tools and Applications 1 (2017), 1--18.

[34]

Y. Gao, J. Tang, R. Hong, S. Yan, Q. Dai, N. Zhang, and T. S. Chua. 2012. Camera constraint-free view-based 3-D object retrieval. IEEE Transactions on Image Processing 21, 4 (2012), 2269--2281.

Digital Library

[35]

Petros Daras and Apostolos Axenopoulos. 2010. A 3D shape retrieval framework supporting multimodal queries. International Journal of Computer Vision 89, 2--3 (2010), 229--247.

Digital Library

[36]

Yue Gao, Qionghai Dai, and Nai Yao Zhang. 2010. 3D model comparison using spatial structure circular descriptor. Pattern Recognition 43, 3 (2010), 1142--1151.

Digital Library

[37]

Bo Li and Henry Johan. 2013. 3D model retrieval using hybrid features and class information. Multimedia Tools and Applications 62, 3 (2013), 821--846.

Digital Library

[38]

Laurent Lucas, Cline Loscos, and Yannick Remion. 2013. 3D model retrieval. In 3D Video: From Capture to Diffusion. John Wiley 8 Sons, 347--368.

[39]

S. Haykin and B. Kosko. 2001. Gradient-based learning applied to document recognition. In Intelligent Signal Processing. IEEE, Los Alamitos, CA, 306--351.

[40]

Weizhi Nie, Qun Cao, Anan Liu, and Yuting Su. 2017. Convolutional deep learning for 3D object retrieval. Multimedia Systems 23, 3 (2017), 1--8.

Digital Library

[41]

A. A. Liu, W. Z. Nie, Y. Gao, and Y. T. Su. 2017. View-based 3-D model retrieval: A benchmark. IEEE Transactions on Cybernetics 48, 3 (2017), 916--928.

[42]

M. P. Dubuisson and A. K. Jain. 2002. A modified Hausdorff distance for object matching. In Proceedings of the International Conference on Pattern Recognition, Vol. 1. 566--568.

[43]

M. Steinbach, G. Karypis, and V. Kumar. 2000. A comparison of document clustering techniques. In Proceedings of the KDD Workshop on Text Mining.

[44]

Tarik Filali Ansary, Mohamed Daoudi, and J. P. Vandeborre. 2006. A Bayesian 3-D search engine using adaptive views clustering. IEEE Transactions on Multimedia 9, 1 (2006), 78--88.

Digital Library

[45]

X. Liu, M. Wang, B. C. Yin, B. Huet, and X. Li. 2015. Event-based media enrichment using an adaptive probabilistic hypergraph model. IEEE Transactions on Cybernetics 45, 11 (2015), 2461.

[46]

A. Liu, W. Nie, Y. Gao, and Y. Su. 2016. Multi-modal clique-graph matching for view-based 3D model retrieval. IEEE Transactions on Image Processing 25, 5 (2016), 2103--2116.

Digital Library

[47]

Ke Lu, Ning He, Jian Xu, Jiyang Dong, and Ling Shao. 2015. Learning view-model joint relevance for 3D object retrieval. IEEE Transactions on Image Processing 24, 5 (2015), 1449--1459.

Digital Library

[48]

Yue Gao, Qionghai Dai, Meng Wang, and Naiyao Zhang. 2011. 3D model retrieval using weighted bipartite graph matching. Signal Processing Image Communication 26, 1 (2011), 39--47.

Digital Library

[49]

M. Leordeanu and M. Hebert. 2005. A spectral technique for correspondence problems using pairwise constraints. In Proceedings of the 10th IEEE International Conference on Computer Vision. 1482--1489.

[50]

Minsu Cho, Jungmin Lee, and Kyoung Mu Lee. 2010. Reweighted random walks for graph matching. In Proceedings of the European Conference on Computer Vision. 492--505.

Digital Library

[51]

Jia Deng, Wei Dong, Richard Socher, Li Jia Li, Kai Li, and Fei Fei Li. 2009. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition (CVPR ’09). 248--255.

[52]

Z. Gao, H. Xuan, H. Zhang, S. Wan, and K. R. Choo. 2019. Adaptive fusion and category-level dictionary learning model for multi-view human action recognition. IEEE Internet of Things Journal 6, 6 (2019), 9280--9293.

[53]

Hang Su, Subhransu Maji, Evangelos Kalogerakis, and Erik Learnedmiller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the IEEE International Conference on Computer Vision. 945--953.

Digital Library

[54]

Zan Gao, Deyu Wang, Y. B. Xue, G. P Xu, H. Zhang, and Y. L. Wang. 2018. 3D object recognition based on pairwise multi-view convolutional neural networks. Journal of Visual Communication and Image Representation 56, C (2018), 305--315.

[55]

Zan Gao, Deyu Wang, Xiangnan He, and Hua Zhang. 2018. Group-pair convolutional neural networks for multi-view based 3D object retrieval. In Proceedings of the 32nd AAAI Conference on Artificial Intelligence. 1--8.

[56]

Bernt Schiele and Bastian Leibe. 2003. Analyzing appearance and contour based methods for object categorization. In Proceedings of the 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, Vol. 2.

[57]

Philip Shilane, Patrick Min, Michael Kazhdan, and Thomas Funkhouser. 2004. The Princeton shape benchmark. In Proceedings of Shape Modeling Applications. IEEE, Los Alamitos, CA, 167--178.

[58]

Yue Gao, Meng Wang, Rongrong Ji, Xindong Wu, and Qionghai Dai. 2013. 3-D object retrieval with Hausdorff distance learning. IEEE Transactions on Industrial Electronics 61, 4 (2013), 2088--2098.

[59]

Yue Gao, Meng Wang, Zheng Jun Zha, Qi Tian, Qionghai Dai, and Naiyao Zhang. 2011. Less is more: Efficient 3-D object retrieval with query view selection. IEEE Transactions on Multimedia 13, 5 (2011), 1007--1018.

Digital Library

[60]

Wei-Zhi Nie, An-An Liu, and Yu-Ting Su. 2016. 3D object retrieval based on sparse coding in weak supervision. Journal of Visual Communication and Image Representation 37, C (2016), 40--45.

Cited By

Xing YWang XLu LSharf ACohen-Or DTu C(2024)Shell stand: Stable thin shell models for 3D fabricationComputational Visual Media10.1007/s41095-024-0402-8Online publication date: 24-Jun-2024
https://doi.org/10.1007/s41095-024-0402-8
Gao XHu WQi G(2023)Self-supervised Multi-view Learning via Auto-encoding 3D TransformationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359761320:1(1-23)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1145/3597613
Shu ZGao LYi SWu FDing XWan TXin S(2023)Context-Aware 3D Points of Interest Detection via Spatial Attention MechanismACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359702619:6(1-19)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597026
Show More Cited By

Index Terms

Exploring Deep Learning for View-Based 3D Model Retrieval
1. Information systems
  1. Information retrieval
    1. Evaluation of retrieval results
      1. Presentation of retrieval results
    2. Retrieval tasks and goals
      1. Information extraction
2. Networks
  1. Network performance evaluation
    1. Network performance analysis

Recommendations

Multi-View Graph Matching for 3D Model Retrieval

3D model retrieval has been widely utilized in numerous domains, such as computer-aided design, digital entertainment, and virtual reality. Recently, many graph-based methods have been proposed to address this task by using multi-view information of 3D ...
Group-pair deep feature learning for multi-view 3d model retrieval
Abstract
This paper employs Convolutional Neural Networks with pooling module to extract view descriptor of 3D model, and proposes the Group-Pair Deep Feature Learning method for multi-view 3D model retrieval. In the method, view descriptor is learned by ...
View-based 3D model retrieval via supervised multi-view feature learning

With the development of the processing technologies of 3D model and the increasing of 3D model in different application flieds, 3D model retrieval is attracting more and more people's attention. In order to handle this problem, most of approaches focus ...

Comments

Information & Contributors

Information

Published In

cover image ACM Transactions on Multimedia Computing, Communications, and Applications

ACM Transactions on Multimedia Computing, Communications, and Applications Volume 16, Issue 1

February 2020

363 pages

ISSN:1551-6857

EISSN:1551-6865

DOI:10.1145/3384216

Editor:
Alberto Del Bimbo
University of Firenze, Italy

Issue’s Table of Contents

Copyright © 2020 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 February 2020

Accepted: 01 January 2020

Revised: 01 November 2019

Received: 01 August 2019

Published in TOMM Volume 16, Issue 1

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed

Funding Sources

National Natural Science Foundation of China
National Key R8D Program of China
Jinan's innovation team

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

86
Total Citations
View Citations
1,082
Total Downloads

Downloads (Last 12 months)157
Downloads (Last 6 weeks)11

Reflects downloads up to 01 Sep 2024

Other Metrics

View Author Metrics

Citations

Cited By

Xing YWang XLu LSharf ACohen-Or DTu C(2024)Shell stand: Stable thin shell models for 3D fabricationComputational Visual Media10.1007/s41095-024-0402-8Online publication date: 24-Jun-2024
https://doi.org/10.1007/s41095-024-0402-8
Gao XHu WQi G(2023)Self-supervised Multi-view Learning via Auto-encoding 3D TransformationsACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359761320:1(1-23)Online publication date: 18-Sep-2023
https://dl.acm.org/doi/10.1145/3597613
Shu ZGao LYi SWu FDing XWan TXin S(2023)Context-Aware 3D Points of Interest Detection via Spatial Attention MechanismACM Transactions on Multimedia Computing, Communications, and Applications10.1145/359702619:6(1-19)Online publication date: 12-Jul-2023
https://dl.acm.org/doi/10.1145/3597026
Weng ZWu ZLi HChen JJiang Y(2023)HCMS: Hierarchical and Conditional Modality Selection for Efficient Video RecognitionACM Transactions on Multimedia Computing, Communications, and Applications10.1145/357277620:2(1-18)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3572776
Ribeiro de Oliveira TBiancardi Rodrigues BMoura da Silva MAntonio N. Spinassé RGiesen Ludke GRuy Soares Gaudio MIglesias Rocha Gomes GGuio Cotini Lda Silva Vargens DQueiroz Schimidt MVarejão Andreão RMestria M(2023)Virtual Reality Solutions Employing Artificial Intelligence Methods: A Systematic Literature ReviewACM Computing Surveys10.1145/356502055:10(1-29)Online publication date: 2-Feb-2023
https://dl.acm.org/doi/10.1145/3565020
Padhy RSa PNarducci FBisogni CBakshi S(2023)Monocular Vision-aided Depth Measurement from RGB Images for Autonomous UAV NavigationACM Transactions on Multimedia Computing, Communications, and Applications10.1145/355048520:2(1-22)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3550485
Shi HWang HMa RHua YSong TGao HGuan H(2023)Robust Searching-Based Gradient Collaborative Management in Intelligent Transportation SystemACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354993920:2(1-23)Online publication date: 27-Sep-2023
https://dl.acm.org/doi/10.1145/3549939
Song DZhang CZhao XWang TNie WLi XLiu A(2023)Self-supervised Image-based 3D Model RetrievalACM Transactions on Multimedia Computing, Communications, and Applications10.1145/354869019:2(1-18)Online publication date: 23-Mar-2023
https://dl.acm.org/doi/10.1145/3548690
Xu ZHan GLiu LZhu HPeng J(2023)A Lightweight Specific Emitter Identification Model for IIoT Devices Based on Adaptive Broad LearningIEEE Transactions on Industrial Informatics10.1109/TII.2022.320630919:5(7066-7075)Online publication date: May-2023
https://doi.org/10.1109/TII.2022.3206309
Sabry EElagooz SE. Abd El-Samie FA. El-Bahnasawy NM. El-Banby GA. Ramadan R(2023)Matching evaluation based on image content discriminative features for different image typesThe Imaging Science Journal10.1080/13682199.2023.217155572:1(23-51)Online publication date: 5-Sep-2023
https://doi.org/10.1080/13682199.2023.2171555
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Issue’s Table of Contents