Boosting Multi-view Convolutional Neural Networks for 3D Object Recognition via View Saliency

Ma, Yanxin; Zheng, Bin; Guo, Yulan; Lei, Yinjie; Zhang, Jun

doi:10.1007/978-981-10-7389-2_20

Yanxin Ma¹⁶,
Bin Zheng¹⁷,
Yulan Guo^16,18,
Yinjie Lei¹⁹ &
…
Jun Zhang¹⁶

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 757))

Included in the following conference series:

Chinese Conference on Image and Graphics Technologies

1057 Accesses
4 Citations

Abstract

2D views of objects play an important role in 3D object recognition. In this paper, we focus on 3D object recognition using the 2D projective views. The discriminativeness of each view of an object is first investigated with view saliency using 2D Zernike Moments. The proposed view saliency is then used to boost a multi-view convolutional neural network for 3D object recognition. The proposed method is compared with several state-of-the-art methods on the ModelNet dataset. Experimental results have shown that the performance of our method has been significantly improved over the existing multi-view based 3D object recognition methods.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 39.99; Price excludes VAT (USA)

Softcover Book: USD 54.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

A Multi-view Images Classification Based on Shallow Convolutional Neural Network

3D model retrieval based on multi-view attentional convolutional neural network

Article 29 March 2019

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition

References

Brock, A., Lim, T., Ritchie, J.M., Weston, N.: Generative and discriminative voxel modeling with convolutional neural networks. arXiv preprint arXiv:1608.04236 (2016)
Chang, A.X., Funkhouser, T., Guibas, L., Hanrahan, P., Huang, Q., Li, Z., Savarese, S., Savva, M., Song, S., Su, H.: Shapenet: an information-rich 3D model repository. arXiv preprint arXiv:1512.03012 (2015)
Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: delving deep into convolutional nets. In: British Machine Vision Conference (2014)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Fan, R.E., Chang, K.W., Hsieh, C.J., Wang, X.R., Lin, C.J.: Liblinear: a library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)
MATH Google Scholar
Garcia-Garcia, A., Gomez-Donoso, F., Garcia-Rodriguez, J., Orts-Escolano, S., Cazorla, M., Azorin-Lopez, J.: Pointnet: a 3D convolutional neural network for real-time object class recognition. In: International Joint Conference on Neural Networks, pp. 1578–1584. IEEE (2016)
Google Scholar
Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J.: 3D object recognition in cluttered scenes with local surface features: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 36(11), 2270–2287 (2014)
Article Google Scholar
Guo, Y., Bennamoun, M., Sohel, F., Lu, M., Wan, J.: An integrated framework for 3-D modeling, object detection, and pose estimation from point-clouds. IEEE Trans. Instrum. Meas. 64(3), 683–693 (2015)
Article Google Scholar
Hegde, V., Zadeh, R.: Fusionnet: 3D object classification using multiple data representations. arXiv preprint arXiv:1607.05695 (2016)
Johns, E., Leutenegger, S., Davison, A.J.: Pairwise decomposition of image sequences for active multi-view recognition. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 3813–3822 (2016)
Google Scholar
Klokov, R., Lempitsky, V.: Escape from cells: deep kd-networks for the recognition of 3D point cloud models. arXiv preprint arXiv:1704.01222 (2017)
Maturana, D., Scherer, S.: Voxnet: a 3D convolutional neural network for real-time object recognition. In: IEEE/RSJ International Conference on Intelligent Robots and Systems, pp. 922–928. IEEE (2015)
Google Scholar
Phong, B.T.: Illumination for computer generated pictures. Commun. ACM 18(6), 311–317 (1975)
Article Google Scholar
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: deep learning on point sets for 3D classification and segmentation. arXiv preprint arXiv:1612.00593 (2016)
Qi, C.R., Su, H., Nießner, M., Dai, A., Yan, M., Guibas, L.J.: Volumetric and multi-view CNNs for object classification on 3D data. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 5648–5656 (2016)
Google Scholar
Ravanbakhsh, S., Schneider, J., Poczos, B.: Deep learning with sets and point clouds. arXiv preprint arXiv:1611.04500 (2016)
Russakovsky, O., Deng, J., Su, H., Krause, J., Satheesh, S., Ma, S., Huang, Z., Karpathy, A., Khosla, A., Bernstein, M., Berg, A.C., Fei-Fei, L.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vision 115(3), 211–252 (2015)
Article MathSciNet Google Scholar
Sedaghat, N., Zolfaghari, M., Brox, T.: Orientation-boosted voxel nets for 3D object recognition. arXiv preprint arXiv:1604.03351 (2016)
Sharma, A., Grau, O., Fritz, M.: VConv-DAE: deep volumetric shape learning without object labels. In: Hua, G., Jégou, H. (eds.) ECCV 2016. LNCS, vol. 9915, pp. 236–250. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-49409-8_20
Google Scholar
Shi, B., Bai, S., Zhou, Z., Bai, X.: Deeppano: deep panoramic representation for 3D shape recognition. IEEE Signal Process. Lett. 22(12), 2339–2343 (2015)
Article Google Scholar
Sinha, A., Bai, J., Ramani, K.: Deep learning 3D shape surfaces using geometry images. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9910, pp. 223–240. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46466-4_14
Chapter Google Scholar
Song, S., Lichtenberg, S.P., Xiao, J.: Sun RGB-D: a RGB-D scene understanding benchmark suite. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 567–576 (2015)
Google Scholar
Su, H., Maji, S., Kalogerakis, E., Learned-Miller, E.: Multi-view convolutional neural networks for 3D shape recognition. In: IEEE International Conference on Computer Vision, pp. 945–953 (2015)
Google Scholar
Vedaldi, A., Lenc, K.: MatConvNet - convolutional neural networks for MATLAB. In: Proceeding of the ACM International Conference on Multimedia (2015)
Google Scholar
Wang, D., Wang, B., Zhao, S., Yao, H.: View-based 3D object retrieval with discriminative views. Neurocomputing (2017)
Google Scholar
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3D shapenets: a deep representation for volumetric shapes. In: IEEE Conference on Computer Vision and Pattern Recognition, pp. 1912–1920 (2015)
Google Scholar
Xie, Z., Xu, K., Shan, W., Liu, L., Xiong, Y., Huang, H.: Projective feature learning for 3D shapes with multi-view depth images. Comput. Graph. Forum 34(7), 1–11 (2015)
Article Google Scholar
Zhi, S., Liu, Y., Li, X., Guo, Y.: Lightnet: a lightweight 3D convolutional neural network for real-time 3D object recognition. In: Eurographics Workshop on 3D Object Retrieval (2017)
Google Scholar

Download references

Acknowledgments

This work was supported by the National Natural Science Foundation of China (Nos. 61602499, 61601488, 61471371, 61403265), the National Postdoctoral Program for Innovative Talents (No. BX201600172), the Science and Technology Plan of Sichuan Province (No. 2015SZ0226), and China Postdoctoral Science Foundation.

Author information

Authors and Affiliations

College of Electronic Science and Engineering, National University of Defense Technology, Changsha, China
Yanxin Ma, Yulan Guo & Jun Zhang
Henan Information and Engineering College, Zhengzhou, China
Bin Zheng
Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
Yulan Guo
College of Electronics and Information Engineering, Sichuan University, Chengdu, China
Yinjie Lei

Authors

Yanxin Ma
View author publications
You can also search for this author in PubMed Google Scholar
Bin Zheng
View author publications
You can also search for this author in PubMed Google Scholar
Yulan Guo
View author publications
You can also search for this author in PubMed Google Scholar
Yinjie Lei
View author publications
You can also search for this author in PubMed Google Scholar
Jun Zhang
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Yulan Guo .

Editor information

Editors and Affiliations

Beijing Institute of Technology, Beijing, China
Yongtian Wang
Tsinghua University, Beijing, China
Shengjin Wang
Beijing Institute of Technology, Beijing, China
Yue Liu
Beijing Institute of Technology, Beijing, China
Jian Yang
School of EECS, Center for Information Science, Peking University, Beijing, China
Xiaoru Yuan
Institute of Automation, Chinese Academy of Sciences, Beijing, China
Ran He
La Trobe University, Melbourne, Victoria, Australia
Henry Been-Lirn Duh

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Ma, Y., Zheng, B., Guo, Y., Lei, Y., Zhang, J. (2018). Boosting Multi-view Convolutional Neural Networks for 3D Object Recognition via View Saliency. In: Wang, Y., et al. Advances in Image and Graphics Technologies. IGTA 2017. Communications in Computer and Information Science, vol 757. Springer, Singapore. https://doi.org/10.1007/978-981-10-7389-2_20

Download citation

DOI: https://doi.org/10.1007/978-981-10-7389-2_20
Published: 26 November 2017
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-7388-5
Online ISBN: 978-981-10-7389-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Boosting Multi-view Convolutional Neural Networks for 3D Object Recognition via View Saliency

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Multi-view Images Classification Based on Shallow Convolutional Neural Network

3D model retrieval based on multi-view attentional convolutional neural network

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Publish with us

Subscribe and save

Buy Now

Navigation

Boosting Multi-view Convolutional Neural Networks for 3D Object Recognition via View Saliency

Abstract

Access this chapter

Subscribe and save

Buy Now

Similar content being viewed by others

A Multi-view Images Classification Based on Shallow Convolutional Neural Network

3D model retrieval based on multi-view attentional convolutional neural network

OVPT: Optimal Viewset Pooling Transformer for 3D Object Recognition

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation