Multi-View CNN Feature Aggregation with ELM Auto-Encoder for 3D Shape Recognition

Yang, Zhi-Xin; Tang, Lulu; Zhang, Kun; Wong, Pak Kin

doi:10.1007/s12559-018-9598-1

Multi-View CNN Feature Aggregation with ELM Auto-Encoder for 3D Shape Recognition

Published: 10 October 2018

Volume 10, pages 908–921, (2018)
Cite this article

Cognitive Computation Aims and scope Submit manuscript

Zhi-Xin Yang ORCID: orcid.org/0000-0001-9151-7758¹,
Lulu Tang¹,
Kun Zhang¹ &
…
Pak Kin Wong¹

1532 Accesses
Explore all metrics

Abstract

Fast and accurate detection of 3D shapes is a fundamental task of robotic systems for intelligent tracking and automatic control. View-based 3D shape recognition has attracted increasing attention because human perceptions of 3D objects mainly rely on multiple 2D observations from different viewpoints. However, most existing multi-view-based cognitive computation methods use straightforward pairwise comparisons among the projected images then follow with weak aggregation mechanism, which results in heavy computation cost and low recognition accuracy. To address such problems, a novel network structure combining multi-view convolutional neural networks (M-CNNs), extreme learning machine auto-encoder (ELM-AE), and ELM classifer, named as MCEA, is proposed for comprehensive feature learning, effective feature aggregation, and efficient classification of 3D shapes. Such novel framework exploits the advantages of deep CNN architecture with the robust ELM-AE feature representation, as well as the fast ELM classifier for 3D model recognition. Compared with the existing set-to-set image comparison methods, the proposed shape-to-shape matching strategy could convert each high informative 3D model into a single compact feature descriptor via cognitive computation. Moreover, the proposed method runs much faster and obtains a good balance between classification accuracy and computational efficiency. Experimental results on the benchmarking Princeton ModelNet, ShapeNet Core 55, and PSB datasets show that the proposed framework achieves higher classification and retrieval accuracy in much shorter time than the state-of-the-art methods.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Fig. 7

Deep learning for non-rigid 3D shape classification based on informative images

Article 05 September 2020

FuseNet: a multi-modal feature fusion network for 3D shape classification

Article 26 July 2024

3D shape recognition based on multi-modal information fusion

Article 23 January 2020

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

References

Wang F, Kang L, Li Y. 2015. Sketch-based 3D shape retrieval using convolutional neural networks, Computer Science.
Deng J, Dong W, Socher R, Li L-J, Li K, Fei-Fei L. Imagenet: a large-scale hierarchical image database. IEEE Conference on Computer Vision and Pattern Recognition, 2009. CVPR 2009 IEEE, pp. 248–255; 2009.
Krizhevsky A, Sutskever I, Hinton GE. Imagenet classification with deep convolutional neural networks. Adv Neural Inf Proces Syst 2012;25(2):2012.
Google Scholar
Su H, Maji S, Kalogerakis E, et al. Multi-view convolutional neural networks for 3D shape recognition. IEEE International Conference on Computer Vision. IEEE Computer Society; 2015. p. 945–953.
Rifai S, Vincent P, Muller X, Glorot X, Bengio Y. Contractive auto-encoders: explicit invariance during feature extraction. Proceedings of the 28th International Conference on Machine Learning (ICML-11), pp. 833–840; 2011.
Achlioptas D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins. Journal of computer and System Sciences 2003;66(4):671–687.
Article Google Scholar
Lee DD, Seung HS. Learning the parts of objects by non-negative matrix factorization. Nature 1999;401 (6755):788–791.
Article CAS PubMed Google Scholar
Kasun LLC, Yang Y, Huang G-B, Zhang Z. Dimension reduction with extreme learning machine. IEEE Trans Image Process 2016;25(8):3906–3918.
Article PubMed Google Scholar
Qi CR, Su H, Mo K, Guibas LJ. Pointnet: deep learning on point sets for 3D classification and segmentation. Proc. Computer Vision and Pattern Recognition (CVPR), IEEE 2017;1(2):4.
Google Scholar
Li Y, Bu R, Sun M, Chen B. 2018. Pointcnn. arXiv:1801.07791.
Wu J, Zhang C, Xue T, Freeman B, Tenenbaum J. Learning a probabilistic latent space of object shapes via 3D generative-adversarial modeling. Advances in Neural Information Processing Systems, pp. 82–90; 2016.
Tatarchenko M, Dosovitskiy A, Brox T. 2017. Octree generating networks: efficient convolutional architectures for high-resolution 3d outputs. arXiv:1703.09438.
Yi L, Kim VG, Ceylan D, Shen I, Yan M, Su H, Lu C, Huang Q, Sheffer A, Guibas L, et al. A scalable active framework for region annotation in 3D shape collections. ACM Trans Graph (TOG) 2016;35(6):210.
Article Google Scholar
Maturana D, Scherer S. Voxnet: a 3D convolutional neural network for real-time object recognition. International Conference on Intelligent Robots and Systems (IROS), 2015 IEEE/RSJ. IEEE, pp. 922–928; 2015.
Wu Z, Song S, Khosla A, Yu F, Zhang L, Tang X, Xiao J. 2015. 3D shapenets: a deep representation for volumetric shapes, Eprint Arxiv, pp. 1912–1920.
Nchez J, Perronnin F, Mensink T, Verbeek J. Image classification with the fisher vector: theory and practice. Int J Comput Vis 2013;105(3):222–245.
Article Google Scholar
Eitz M, Richter R, Boubekeur T, Hildebrand K, Alexa M. Sketch-based shape retrieval. ACM Trans Graph 2012;31(4):31–1.
Google Scholar
Chen D-Y, Tian X-P, Shen Y-T, Ouhyoung M. On visual similarity based 3d model retrieval. Computer Graphics Forum, vol. 122, no. 3. Wiley Online Library, pp. 223–232; 2003.
Shih J-L, Lee C-H, Wang JT. A new 3D model retrieval approach based on the elevation descriptor. Pattern Recogn 2007;40(1):283–295.
Article Google Scholar
Kazhdan M, Funkhouser T, Rusinkiewicz S. Rotation invariant spherical harmonic representation of 3D shape descriptors. Eurographics/acm SIGGRAPH Symposium on Geometry Processing, pp. 156–164; 2003.
Bai S, Bai X, Zhou Z, Zhang Z, Latecki LJ. Gift: a real-time and scalable 3D shape search engine. IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2016. IEEE, pp. 5023–5032; 2016.
Huang GB, Zhou H, Ding X, Zhang R. Extreme learning machine for regression and multiclass classification. IEEE Trans Syst Man Cybern Part B Cybern 2012;42(2):513–529.
Article Google Scholar
Huang GB, Cambria E, Toh KA, Widrow B, Xu Z. New trends of learning in computational intelligence. IEEE Comput Intell Mag 2015;10(2):16–17.
Article Google Scholar
Zhang PB, Yang ZX. A novel adaboost framework with robust threshold and structural optimization. IEEE Transactions on Cybernetics 2018;48(1):64–76.
Article PubMed Google Scholar
Huang G, Huang GB, Song S, You K. Trends in extreme learning machines: a review. Neural Netw 2015;61:32–48.
Article PubMed Google Scholar
Yang ZX, Wang XB, Wong PK. Single and simultaneous fault diagnosis with application to a multistage gearbox: a versatile dual-elm network approach. IEEE Trans Ind Inf 2018;PP(99):1–1.
Google Scholar
Zhou H, Huang GB, Lin Z, Wang H, Soh YC. Stacked extreme learning machines. IEEE Transactions on Cybernetics 2015;45(9):2013–2025.
Article PubMed Google Scholar
Phong BT. Illumination for computer generated pictures. Commun ACM 1975;18(6):311–317.
Article Google Scholar
LeCun Y, Bengio Y, Hinton G. Deep learning. Nature 2015;521(7553):436–444.
Article CAS PubMed Google Scholar
Russakovsky O, Deng J, Su H, Krause J, Satheesh S, Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M. Imagenet large scale visual recognition challenge. Int J Comput Vis 2015;115(3):211–252.
Article Google Scholar
Tang J, Deng C, Huang GB. Extreme learning machine for multilayer perceptron. IEEE Transactions on Neural Networks and Learning Systems 2016;27(4):809–821.
Article PubMed Google Scholar
Vincent P, Larochelle H, Lajoie I, Bengio Y, Manzagol P-A. Stacked denoising autoencoders: learning useful representations in a deep network with a local denoising criterion. J Mach Learn Res 2010;11(Dec): 3371–3408.
Google Scholar
Shilane P, Min P, Kazhdan M, Funkhouser T. The princeton shape benchmark. Shape modeling applications, 2004. Proceedings. IEEE, pp. 167–178; 2004.
Savva M, Yu F, Su H, Kanezaki A, Furuya T, Ohbuchi R, Zhou Z, Yu R, Bai S, Bai X, Aono M, Tatsuma A, Thermos S, Axenopoulos A, Papadopoulos GT, Daras P, Deng X, Lian Z, Li B, Johan H, Lu Y, Mk S. Large-scale 3D shape retrieval from shapenet Core55. Eurographics Workshop on 3D Object Retrieval. In: Pratikakis I, Dupont F, and Ovsjanikov M, editors; 2017. The Eurographics Association.
Trimble. 3D warehouse. 2012. https://3dwarehouse.sketchup.com/.
Shi B, Bai S, Zhou Z, Bai X. Deeppano: Deep panoramic representation for 3-D shape recognition. IEEE Signal Process Lett 2015;22(12):2339–2343.
Article Google Scholar
Xie Z, Xu K, Shan W, Liu L, Xiong Y, Huang H. Projective feature learning for 3D shapes with multi-view depth images. Comput Graphics Forum 2015;7:34.
Google Scholar
Maaten Lvd, Hinton G. Visualizing data using t-sne. J Mach Learn Res 2008;9(Nov):2579–2605.
Google Scholar

Download references

Funding

This study was supported in part by the Science and Technology Development Fund of Macao S.A.R (FDCT) under grant FDCT/121/2016/A3 and MoST-FDCT Joint Grant 015/2015/AMJ, in part by University of Macau under grant MYRG2016-00160-FST.

Author information

Authors and Affiliations

Department of Electromechanical Engineering, Faculty of Science and Technology, University of Macau, Macau SAR, China
Zhi-Xin Yang, Lulu Tang, Kun Zhang & Pak Kin Wong

Authors

Zhi-Xin Yang
View author publications
You can also search for this author in PubMed Google Scholar
Lulu Tang
View author publications
You can also search for this author in PubMed Google Scholar
Kun Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Pak Kin Wong
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Lulu Tang.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Ethical Approval

This article does not contain any studies with human participants or animals performed by any of the authors.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Yang, ZX., Tang, L., Zhang, K. et al. Multi-View CNN Feature Aggregation with ELM Auto-Encoder for 3D Shape Recognition. Cogn Comput 10, 908–921 (2018). https://doi.org/10.1007/s12559-018-9598-1

Download citation

Received: 15 November 2017
Accepted: 20 September 2018
Published: 10 October 2018
Issue Date: December 2018
DOI: https://doi.org/10.1007/s12559-018-9598-1

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Multi-View CNN Feature Aggregation with ELM Auto-Encoder for 3D Shape Recognition

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Deep learning for non-rigid 3D shape classification based on informative images

FuseNet: a multi-modal feature fusion network for 3D shape classification

3D shape recognition based on multi-modal information fusion

Explore related subjects

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Ethical Approval

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now