survey

Deep Learning Advances in Computer Vision with 3D Data: A Survey

Authors:

Anastasia Ioannidou,

Elisavet Chatzilari,

Spiros Nikolopoulos, and

Ioannis KompatsiarisAuthors Info & Claims

ACM Computing Surveys (CSUR), Volume 50, Issue 2

Article No.: 20, Pages 1 - 38

https://doi.org/10.1145/3042064

Published: 06 April 2017 Publication History

Abstract

Deep learning has recently gained popularity achieving state-of-the-art performance in tasks involving text, sound, or image processing. Due to its outstanding performance, there have been efforts to apply it in more challenging scenarios, for example, 3D data processing. This article surveys methods applying deep learning on 3D data and provides a classification based on how they exploit them. From the results of the examined works, we conclude that systems employing 2D views of 3D data typically surpass voxel-based (3D) deep models, which however, can perform better with more layers and severe data augmentation. Therefore, larger-scale datasets and increased resolutions are required.

Supplementary Material

a20-ioannidou-apndx.pdf (ioannidou.zip)

Supplemental movie, appendix, image and software files for, Deep Learning Advances in Computer Vision with 3D Data: A Survey

Download
72.54 KB

References

[1]

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, R. Jozefowicz, Y. Jia, L. Kaiser, M. Kudlur, J. Levenberg, D. Man, M. Schuster, R. Monga, S. Moore, D. Murray, C. Olah, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke, V. Vasudevan, F. Vigas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, and X. Zheng. 2015. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. (2015). http://tensorflow.org/ Software available from tensorflow.org.

[2]

R. Achanta, A. Shaji, K. Smith, A. Lucchi, P. Fua, and S. Susstrunk. 2012. SLIC superpixels compared to state-of-the-art superpixel methods. IEEE Transactions on Pattern Analysis and Machine Intelligence 34, 11 (2012), 2274--2282.

Digital Library

[3]

A. Agarwal, E. Akchurin, C. Basoglu, G. Chen, S. Cyphers, J. Droppo, A. Eversole, B. Guenter, M. Hillebrand, R. Hoens, X. Huang, Z. Huang, V. Ivanov, A. Kamenev, P. Kranen, O. Kuchaiev, W. Manousek, A. May, B. Mitra, O. Nano, G. Navarro, A. Orlov, M. Padmilac, H. Parthasarathi, B. Peng, A. Reznichenko, F. Seide, M. L. Seltzer, M. Slaney, A. Stolcke, Y. Wang, H. Wang, K. Yao, D. Yu, Y. Zhang, and G. Zweig. 2014. An Introduction to Computational Networks and the Computational Network Toolkit. Technical Report MSR-TR-2014-112. Microsoft Research.

[4]

A. K. Aijazi, P. Checchin, and L. Trassoudaine. 2013. Segmentation based classification of 3D urban point clouds: A super-voxel based approach with evaluation. Remote Sensing 5, 4 (2013), 1624--1650.

[5]

A. Aldoma, F. Tombari, L. Di Stefano, and M. Vincze. 2012a. A global hypotheses verification method for 3D object recognition. In Proceedings of the 12th European Conference on Computer Vision. 511--524.

Digital Library

[6]

A. Aldoma, F. Tombari, R. B. Rusu, and M. Vincze. 2012b. Pattern Recognition: Joint 34th DAGM and 36th OAGM Symposium. Chapter OUR-CVFH -- Oriented, Unique and Repeatable Clustered Viewpoint Feature Histogram for Object Recognition and 6DOF Pose Estimation, 113--122.

[7]

A. Aldoma, M. Vincze, N. Blodow, D. Gossow, S. Gedikli, R. B. Rusu, and G. Bradski. 2011. CAD-model recognition and 6DOF pose estimation using 3D cues. In IEEE ICCV Workshops. 585--592.

[8]

L. A. Alexandre. 2012. 3D descriptors for object and category recognition: A comparative evaluation. In Workshop on Color-Depth Camera Fusion in Robotics at the IEEE/RSJ IROS.

[9]

L. A. Alexandre. 2014. 3D Object recognition using convolutional neural networks with transfer learning between input channels. In 13th International Conference on Intelligent Autonomous Systems, Vol. 301.

[10]

S. Bahrampour, N. Ramakrishnan, L. Schott, and M. Shah. 2015. Comparative study of Caffe, Neon, Theano, and Torch for deep learning. CoRR abs/1511.06435 (2015).

[11]

S. Bai, X. Bai, Z. Zhou, Z. Zhang, and L. Jan Latecki. 2016. GIFT: A real-time and scalable 3D shape search engine. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]

P. Baldi and P. J. Sadowski. 2013. Understanding dropout. In Advances in Neural Information Processing Systems 26. 2814--2822.

[13]

F. Bastien, P. Lamblin, R. Pascanu, J. Bergstra, I. J. Goodfellow, A. Bergeron, N. Bouchard, D. Warde-Farley, and Y. Bengio. 2012. Theano: New features and speed improvements. Deep Learning and Unsupervised Feature Learning NIPS 2012 Workshop.

[14]

S. Bell, C. L. Zitnick, K. Bala, and R. B. Girshick. 2015. Inside-outside net: Detecting objects in context with skip pooling and recurrent neural networks. CoRR abs/1512.04143 (2015).

[15]

J. A. Benediktsson, J. A. Palmason, and J. Sveinsson. 2005. Classification of hyperspectral data from urban areas based on extended morphological profiles. IEEE TGRS 43, 3 (2005), 480--491.

[16]

Y. Bengio. 2012. Neural Networks: Tricks of the Trade: Second Edition. Chapter: Practical Recommendations for Gradient-Based Training of Deep Architectures, 437--478.

[17]

Y. Bengio, P. Lamblin, D. Popovici, and H. Larochelle. 2007. Greedy layer-wise training of deep networks. In Advances in Neural Information Processing Systems 19. 153--160.

[18]

J. Bergstra, R. Bardenet, Y. Bengio, and B. Kégl. 2011. Algorithms for hyper-parameter optimization. In 25th Annual Conference on Neural Information Processing Systems (NIPS 2011), Vol. 24.

[19]

J. Bergstra and Y. Bengio. 2012. Random search for hyper-parameter optimization. The Journal of Machine Learning Research 13 (2012), 281--305.

Digital Library

[20]

J. Bergstra, O. Breuleux, F. Bastien, P. Lamblin, R. Pascanu, G. Desjardins, J. Turian, D. Warde-Farley, and Y. Bengio. 2010. Theano: A CPU and GPU math expression compiler. In Proceedings of the Python for Scientific Computing Conference (SciPy). Oral Presentation.

[21]

P. J. Besl and N. D. McKay. 1992. A method for registration of 3-D shapes. IEEE Transactions on Pattern Analysis and Machine Intelligence 14, 2 (1992), 239--256.

Digital Library

[22]

J. M. Bioucas-Dias, A. Plaza, G. Camps-Valls, P. Scheunders, N. Nasrabadi, and J. Chanussot. 2013. Hyperspectral remote sensing data analysis and future challenges. IEEE Geoscience and Remote Sensing Magazine 1, 2 (2013), 6--36.

[23]

L. Bo, X. Ren, and D. Fox. 2013. Unsupervised feature learning for RGB-D based object recognition. In Experimental Robotics: The 13th International Symposium on Experimental Robotics. 387--402.

[24]

D. Borrmann, J. Elseberg, K. Lingemann, and A. Nüchter. 2011. The 3D Hough transform for plane detection in point clouds: A review and a new accumulator design. 3D Research 2, 2 (2011), 1--13.

[25]

F. Bosche, Y. Turkan, C. Haas, and R. Haas. 2010. Fusing 4D modeling and laser scanning for automated construction progress control. 26th ARCOM Annual Conference and Annual General Meeting (2010).

[26]

Y.-L. Boureau, J. Ponce, and Y. LeCun. 2010. A theoretical analysis of feature pooling in vision algorithms. In Proceedings of the International Conference on Machine learning (ICML’10).

[27]

A. Brock, Th. Lim, J. M. Ritchie, and N. Weston. 2016. Generative and discriminative voxel modeling with convolutional neural networks. CoRR abs/1608.04236 (2016).

[28]

M. M. Bronstein and I. Kokkinos. 2010. Scale-invariant heat kernel signatures for non-rigid shape recognition. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’10). 1704--1711.

[29]

S. Bu, P. Han, Z. Liu, J. Han, and H. Lin. 2015. Local deep feature learning framework for 3D shape. Computers 8 Graphics 46 (2015), 117--129. Shape Modeling International 2014.

[30]

S. Bu, Z. Liu, J. Han, J. Wu, and R. Ji. 2014. Learning high-level feature by deep belief networks for 3-D model retrieval and recognition. IEEE Transactions on Multimedia 16, 8 (2014), 2154--2167.

[31]

B. Bustos, D. Keim, D. Saupe, and T. Schreck. 2007. Content-based 3D object retrieval. IEEE Computer Graphics and Applications 27, 4 (2007), 22--27.

Digital Library

[32]

W. Byeon, T. M. Breuel, F. Raue, and M. Liwicki. 2015. Scene labeling with LSTM recurrent neural networks. In IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 3547--3555.

[33]

Z. Cai, J. Han, L. Liu, and L. Shao. 2016. RGB-D datasets using microsoft kinect or similar sensors: A survey. Multimedia Tools and Applications (2016), 1--43.

[34]

N. Charbonneau, J. Burgess, and L. Robichaud. 2015. Using 4D modelling in a university-museum research partnership. In 2015 Digital Heritage, Vol. 2. 603--610.

[35]

K. Chatfield, K. Simonyan, A. Vedaldi, and A. Zisserman. 2014. Return of the devil in the details: Delving deep into convolutional nets. In British Machine Vision Conference.

[36]

D.-Y. Chen, X. P. Tian, Y.-T. Shen, and M. Ouhyoung. 2003. On visual similarity based 3D model retrieval. Computer Graphics Forum (EUROGRAPHICS’03) 22, 3 (2003), 223--232.

[37]

H. Chen and B. Bhanu. 2007. 3D free-form object recognition in range images using local surface patches. Pattern Recognition Letters 28, 10 (2007), 1252--1262.

Digital Library

[38]

W. Chen, J. T. Wilson, S. Tyree, K. Q. Weinberger, and Y. Chen. 2015a. Compressing neural networks with the hashing trick. CoRR abs/1504.04788 (2015).

[39]

Y. Chen, H. Jiang, C. Li, X. Jia, and P. Ghamisi. 2016. Deep feature extraction and classification of hyperspectral images based on convolutional neural networks. IEEE TGRS 54, 10 (2016), 6232--6251.

[40]

Y. Chen, Z. Lin, X. Zhao, G. Wang, and Y. Gu. 2014. Deep learning-based classification of hyperspectral data. IEEE J-STARS 7, 6 (2014), 2094--2107.

[41]

Y. Chen, X. Zhao, and X. Jia. 2015b. Spectral-spatial classification of hyperspectral data based on deep belief network. IEEE J-STARS 8, 6 (2015), 2381--2392.

[42]

R. Collobert, K. Kavukcuoglu, and C. Farabet. 2011. Torch7: A matlab-like environment for machine learning. In BigLearn, NIPS Workshop.

[43]

R. Collobert, K. Kavukcuoglu, and C. Farabet. 2012. Neural Networks: Tricks of the Trade: Second Edition. Chapter: Implementing Neural Networks Efficiently, 537--557.

[44]

C. Cortes and V. Vapnik. 1995. Support-vector networks. Machine Learning 20, 3 (1995), 273--297.

[45]

C. Couprie, C. Farabet, L. Najman, and Y. Lecun. 2013. Indoor semantic segmentation using depth information. CoRR abs/1301.3572 (2013).

[46]

P. Daras and A. Axenopoulos. 2010. A 3D shape retrieval framework supporting multimodal queries. International Journal of Computer Vision 89, 2 (2010), 229--247.

Digital Library

[47]

J. Deng, W. Dong, R. Socher, L.-J. Li, K. Li, and L. Fei-Fei. 2009. ImageNet: A large-scale hierarchical image database. In IEEE Computer Vision and Pattern Recognition (CVPR’09).

[48]

L. Deng. 2014. A tutorial survey of architectures, algorithms, and applications for deep learning. APSIPA Transactions on Signal and Information Processing 3 (2014), e5.

[49]

M. Denil, B. Shakibi, L. Dinh, M. A. Ranzato, and N. de Freitas. 2013. Predicting parameters in deep learning. CoRR abs/1306.0543 (2013).

[50]

E. Denton, E. Zaremba, J. Bruna, Y. LeCun, and R. Fergus. 2014. Exploiting linear structure within convolutional networks for efficient evaluation. CoRR abs/1404.0736 (2014).

[51]

B. Douillard, J. Underwood, N. Kuntz, V. Vlaskine, A. Quadros, P. Morton, and A. Frenkel. 2011. On the segmentation of 3D LIDAR point clouds. In IEEE ICRA. 2798--2805.

[52]

A. Doulamis, M. Ioannides, N. Doulamis, A. Hadjiprocopis, D. Fritsch, O. Balet, M. Julien, E. Protopapadakis, and others. 2013. 4D reconstruction of the past. Proceedings of SPIE 8795 (2013), 87950J-1--87950J-11.

[53]

A. Doulamis, S. Soile, N. Doulamis, C. Chrisouli, N. Grammalidis, K. Dimitropoulos, C. Manesis, C. Potsiou, and C. Ioannidis. 2015. Selective 4D modelling framework for spatial-temporal land information management system. Proceedings of SPIE 9535, 3rd RSCy (2015).

[54]

N. Doulamis and A. Doulamis. 2012. Fast and adaptive deep fusion learning for detecting visual objects. In Proceedings of ECCV 2012. Workshops and Demonstrations. 345--354.

Digital Library

[55]

A. Eitel, J. T. Springenberg, L. Spinello, M. Riedmiller, and W. Burgard. 2015. Multimodal deep learning for robust RGB-D object recognition. In IEEE/RSJ International Conference on IROS.

[56]

Y. Fang, J. Xie, G. Dai, M. Wang, F. Zhu, T. Xu, and E. Wong. 2015. 3D deep shape descriptor. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR’15). 2319--2328.

[57]

M. Fauvel, J. Chanussot, and J. A. Benediktsson. 2012. A spatial--spectral kernel-based approach for the classification of remote-sensing images. Pattern Recognition 45, 1 (2012), 381--392.

Digital Library

[58]

J. Feng, Y. Wang, and S.-F. Chang. 2016. 3D shape retrieval using single depth image from low-cost sensors. In IEEE Winter Conference on Applications of Computer Vision (WACV’16).

[59]

S. Filipe and L. A. Alexandre. 2014. A comparative evaluation of 3D keypoint detectors in a RGB-D object dataset. In 9th International Conference on Computer Vision Theory and Applications. 476--483.

[60]

S. Filipe, L. Itti, and L. A. Alexandre. 2015. BIK-BUS: Biologically motivated 3D keypoint based on bottom-up saliency. IEEE Transactions on Image Processing 24, 1 (2015), 163--175.

[61]

A. Frome, D. Huber, R. Kolluri, T. Bulow, and J. Malik. 2004. Recognizing objects in range data using regional point descriptors. In ECCV 2004. Lecture Notes in Computer Science, Vol. 3023. 224--237.

[62]

Y. Gao and Q. Dai. 2014. View-based 3D object retrieval: Challenges and approaches. IEEE MultiMedia 21, 3 (2014), 52--57.

[63]

Y. Gao, M. Wang, D. Tao, R. Ji, and Q. Dai. 2012. 3-D object retrieval and recognition with hypergraph analysis. IEEE Transactions on Image Processing 21, 9 (2012), 4290--4303.

Digital Library

[64]

Y. Gao, M. Wang, Z. J. Zha, Q. Tian, Q. Dai, and N. Zhang. 2011. Less is more: Efficient 3-D object retrieval with query view selection. IEEE Transactions on Multimedia 13, 5 (2011), 1007--1018.

Digital Library

[65]

D. Giorgi, S. Biasotti, and L. Paraboschi. 2007. Shape retrieval contest 2007: Watertight models track. SHREC Competition 8 (2007).

[66]

A. Godil, H. Dutagaci, C. Akgul, A. Axenopoulos, B. Bustos, M. Chaouch, P. Daras, and others. 2009. SHREC’09 track: Generic shape retrieval. In Proceedings of the Eurographics Workshop on 3D Object Retrieval. 61--68.

[67]

Y. Gong, L. Liu, M. Yang, and L. D. Bourdev. 2014. Compressing deep convolutional networks using vector quantization. CoRR abs/1412.6115 (2014).

[68]

I. J. Goodfellow, D. Warde-Farley, M. Mirza, A. Courville, and Y. Bengio. 2013. Maxout networks. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1319--1327.

[69]

A. Graves, A. Mohamed, and G. E. Hinton. 2013. Speech recognition with deep recurrent neural networks. In IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP’13). 6645--6649.

[70]

K. Gregor, I. Danihelka, A. Graves, and D. Wierstra. 2015. DRAW: A recurrent neural network for image generation. CoRR abs/1502.04623 (2015).

[71]

Y. Guo, M. Bennamoun, F. Sohel, M. Lu, and J. Wan. 2014. 3D object recognition in cluttered scenes with local surface features: A survey. IEEE TPAMI 36, 11 (2014), 2270--2287.

[72]

Y. Guo, M. Bennamoun, F. Sohel, M. Lu, J. Wan, and N. Kwok. 2016a. A comprehensive performance evaluation of 3D local feature descriptors. IJCV 116, 1 (2016), 66--89.

Digital Library

[73]

Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, and M. S. Lew. 2016b. Deep learning for visual understanding: A review. Neurocomputing 187 (2016), 27--48. Recent Developments on Deep Big Vision.

Digital Library

[74]

Y. Guo, F. A. Sohel, M. Bennamoun, M. Lu, and J. Wan. 2013. Rotational projection statistics for 3D local surface description and object recognition. CoRR abs/1304.3192 (2013).

[75]

Y. Guo, F. A. Sohel, M. Bennamoun, J. Wan, and M. Lu. 2015. A novel local surface feature for 3D object recognition under clutter and occlusion. Information Sciences 293 (2015), 196--213.

[76]

Y. Guo, J. Zhang, M. Lu, J. Wan, and Y. Ma. 2014. Benchmark datasets for 3D computer vision. In 9th IEEE Conference on Industrial Electronics and Applications (ICIEA’14). 1846--1851.

[77]

S. Gupta, R. Girshick, P. Arbelaez, and J. Malik. 2014. Learning rich features from RGB-D images for object detection and segmentation. In Proceedings of the 13th European Conference on Computer Vision.

[78]

Z. Han, Z. Liu, J. Han, C. M. Vong, S. Bu, and C. L. P. Chen. 2016. Mesh convolutional restricted Boltzmann machines for unsupervised learning of features with structure preservation on 3-D meshes. IEEE Transactions on Neural Networks and Learning Systems PP, 99 (2016), 1--14.

[79]

K. He, X. Zhang, S. Ren, and J. Sun. 2014. Spatial pyramid pooling in deep convolutional networks for visual recognition. CoRR abs/1406.4729 (2014).

[80]

K. He, X. Zhang, S. Ren, and J. Sun. 2015a. Deep residual learning for image recognition. CoRR abs/1512.03385 (2015).

[81]

K. He, X. Zhang, S. Ren, and J. Sun. 2015b. Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. CoRR abs/1502.01852 (2015).

[82]

K. He, X. Zhang, S. Ren, and J. Sun. 2016. Identity mappings in deep residual networks. CoRR abs/1603.05027 (2016).

[83]

V. Hegde and R. Zadeh. 2016. FusionNet: 3D object classification using multiple data representations. CoRR abs/1607.05695 (2016).

[84]

M. Hilaga, Y. Shinagawa, T. Kohmura, and T. L. Kunii. 2001. Topology matching for fully automatic similarity estimation of 3D shapes. In Proceedings of the 28th SIGGRAPH. 203--212.

Digital Library

[85]

G. E. Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural Computation 14, 8 (2002), 1771--1800.

Digital Library

[86]

G. E. Hinton, P. Dayan, B. Frey, and R. M. Neal. 1995. The wake-sleep algorithm for self-organizing neural networks. Science 268, 5124 (1995), 1158--1161.

[87]

G. E. Hinton, S. Osindero, and Y.-W. Teh. 2006. A fast learning algorithm for deep belief nets. Neural Computation 18, 7 (2006), 1527--1554.

Digital Library

[88]

G. E. Hinton and R. R. Salakhutdinov. 2006. Reducing the dimensionality of data with neural networks. Science 313, 5786 (2006), 504--507.

[89]

G. E. Hinton, N. Srivastava, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. CoRR abs/1207.0580 (2012).

[90]

G. E. Hinton, O. Vinyals, and J. Dean. 2015. Distilling the knowledge in a neural network. CoRR abs/1503.02531 (2015).

[91]

S. Hochreiter. 1991. Untersuchungen Zu Dynamischen Neuronalen Netzen. Diploma thesis. Technical University Munich, Institute of Computer Science.

[92]

S. Hochreiter. 1998. The vanishing gradient problem during learning recurrent neural nets and problem solutions. IJUFKS 6, 2 (1998), 107--116.

Digital Library

[93]

S. Hochreiter and J. Schmidhuber. 1997. Long short-term memory. Neural Computation 9, 8 (1997), 1735--1780.

Digital Library

[94]

W. Hu, Y. Huang, L. Wei, F. Zhang, and H. Li. 2015. Deep convolutional neural networks for hyperspectral image classification. Journal of Sensors 2015, Article 258619 (2015).

[95]

G. Huang, Y. Sun, Z. Liu, D. Sedra, and K. Q. Weinberger. 2016. Deep networks with stochastic depth. CoRR abs/1603.09382 (2016).

[96]

G. B. Huang, H. Zhou, X. Ding, and R. Zhang. 2012. Extreme learning machine for regression and multiclass classification. IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics) 42, 2 (2012), 513--529.

Digital Library

[97]

G. B. Huang, Q.-Y. Zhu, and C.-K. Siew. 2006. Extreme learning machine: Theory and applications. Neurocomputing 70, 13 (2006), 489--501.

[98]

M. Ioannides, A. Hadjiprocopis, N. Doulamis, A. Doulamis, E. Protopapadakis, K. Makantasis, and others. 2013. Online 4D reconstruction using multi-images available under open access. ISPRS Annals of Photogrammetry, Remote Sensing and Spatial Information Sciences 1 (2013), 169--174.

[99]

S. Ioffe and C. Szegedy. 2015. Batch normalization: Accelerating deep network training by reducing internal covariate shift. CoRR abs/1502.03167 (2015).

[100]

M. Jaderberg, A. Vedaldi, and A. Zisserman. 2014. Speeding up convolutional neural networks with low rank expansions. CoRR abs/1405.3866 (2014).

[101]

K. Jarrett, K. Kavukcuoglu, M. Ranzato, and Y. LeCun. 2009. What is the best multi-stage architecture for object recognition? In 12th IEEE International Conference on Computer Vision. 2146--2153.

[102]

S. Jayanti, Y. Kalyanaraman, N. Iyer, and K. Ramani. 2006. Developing an engineering shape benchmark for CAD models. Computer-Aided Design 38, 9 (2006), 939--953.

[103]

H. Jegou, M. Douze, and C. Schmid. 2011. Product quantization for nearest neighbor search. IEEE Transactions on Pattern Analysis and Machine Intelligence 33, 1 (2011), 117--128.

Digital Library

[104]

Y. Jia, E. Shelhamer, J. Donahue, S. Karayev, J. Long, R. Girshick, S. Guadarrama, and T. Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. arXiv preprint arXiv:1408.5093 (2014).

[105]

E. Johns, S. Leutenegger, and A. J. Davison. 2016. Pairwise decomposition of image sequences for active multi-view recognition. In Proceedings of the IEEE Conference on CVPR. 3183--3822.

[106]

A. E. Johnson and M. Hebert. 1999. Using spin images for efficient object recognition in cluttered 3D scenes. IEEE Transactions on Pattern Analysis and Machine Intelligence 21, 5 (1999), 433--449.

Digital Library

[107]

N. Kalchbrenner, E. Grefenstette, and P. Blunsom. 2014. A convolutional neural network for modelling sentences. CoRR abs/1404.2188 (2014).

[108]

L. L. C. Kasun, H. Zhou, G.-B. Huang, and C. M. Vong. 2013. Representational learning with extreme learning machine for big data. IEEE Intelligent Systems 28, 6 (2013), 31--34.

Digital Library

[109]

M. Kazhdan, Th. Funkhouser, and S. Rusinkiewicz. 2003. Rotation invariant spherical harmonic representation of 3D shape descriptors. In Symposium on Geometry Processing.

Digital Library

[110]

J. M. Khatib, N. Chileshe, and S. Sloan. 2007. Antecedents and benefits of 3D and 4D modelling for construction planners. Journal of Engineering, Design and Technology 5, 2 (2007), 159--172.

[111]

A. Krizhevsky, I. Sutskever, and G. E. Hinton. 2012. ImageNet classification with deep convolutional neural networks. In Advances in Neural Information Processing Systems 25. 1097--1105.

[112]

A. Krogh and J. A. Hertz. 1992. A simple weight decay can improve generalization. In Advances in Neural Information Processing Systems, Vol. 4. 950--957.

[113]

G. Kyriakaki, A. Doulamis, N. Doulamis, M. Ioannides, K. Makantasis, E. Protopapadakis, A. Hadjiprocopis, K. Wenzel, and others. 2014. 4D reconstruction of tangible cultural heritage objects from web-retrieved images. International Journal of Heritage in the Digital Era 3, 2 (2014), 431--451.

[114]

L. Ladicky, C. Russell, P. Kohli, and P. H. S. Torr. 2009. Associative hierarchical CRFs for object class image segmentation. Proceedings of the IEEE 12th International Conference on Computer Vision (2009).

[115]

K. Lai, L. Bo, X. Ren, and D. Fox. 2011. A large-scale hierarchical multi-view RGB-D object dataset. In IEEE International Conference on on Robotics and Automation.

[116]

G. Lavoué. 2012. Combination of bag-of-words descriptors for robust partial shape retrieval. The Visual Computer 28, 9 (2012), 931--942.

Digital Library

[117]

V. Lebedev, Y. Ganin, M. Rakhuba, I. V. Oseledets, and V. S. Lempitsky. 2014. Speeding-up convolutional neural networks using fine-tuned CP-decomposition. CoRR abs/1412.6553 (2014).

[118]

Y. LeCun, Y. Bengio, and G. E. Hinton. 2015. Deep learning. Nature 521 (2015), 436--444.

[119]

Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of IEEE 86, 11 (1998), 2278--2324.

[120]

Y. LeCun, K. Kavukcuoglu, and C. Farabet. 2010. Convolutional networks and applications in vision. In Proceedings of the 2010 IEEE International Symposium on Circuits and Systems (ISCAS’10). 253--256.

[121]

H. Lee, E. Chaitanya, and A. Y. Ng. 2008. Sparse deep belief net model for visual area V2. In Advances in Neural Information Processing Systems 20. 873--880.

[122]

B. Leng, S. Guo, X. Zhang, and Z. Xiong. 2015. 3D object retrieval with stacked local convolutional autoencoder. Signal Processing 112, C (2015), 119--128.

Digital Library

[123]

B. Leng, Y. Liu, K. Yu, X. Zhang, and Z. Xiong. 2016. 3D object understanding with 3D convolutional neural networks. Information Sciences 336, C (Oct. 2016), 188--201.

Digital Library

[124]

B. Leng, X. Zhang, M. Yao, and Z. Xiong. 2014. MultiMedia Modeling: 20th Anniversary International Conference, Part II. Chapter: 3D Object Classification Using Deep Belief Networks, 128--139.

[125]

B. Li, Y. Lu, A. Godil, T. Schreck, B. Bustos, A. Ferreira, and others. 2014a. A comparison of methods for sketch-based 3D shape retrieval. Computer Vision and Image Understanding 119 (2014), 57--80.

Digital Library

[126]

B. Li, Y. Lu, C. Li, and others. 2015. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding 131 (2015).

Digital Library

[127]

B. Li, E. Zhou, B. Huang, J. Duan, Y. Wang, N. Xu, J. Zhang, and H. Yang. 2014b. Large scale recurrent neural network on GPU. In International Joint Conference on Neural Networks (IJCNN’14). 4062--4069.

[128]

Z. Lian, A. Godil, B. Bustos, M. Daoudi, and others. 2011. SHREC’11 track: Shape retrieval on non-rigid 3D watertight meshes. In Proceedings of the 4th Eurographics Conference on 3D Object Retrieval. 79--88.

[129]

M. Lin, Q. Chen, and S. Yan. 2013. Network in network. CoRR abs/1312.4400 (2013).

[130]

Q. Liu. 2012. A survey of recent view-based 3D model retrieval methods. CoRR abs/1208.3670 (2012).

[131]

Z. Liu, S. Chen, S. Bu, and K. Li. 2014. High-level semantic feature for 3D shape based on deep belief networks. In IEEE International Conference on Multimedia and Expo (ICME’14). 1--6.

[132]

D. G. Lowe. 1999. Object recognition from local scale-invariant features. In Proceedings of the International Conference on Computer Vision (ICCV’99), Vol. 2. 1150--1157.

[133]

A. Maas, A. Hannun, and A. Ng. 2013. Rectifier nonlinearities improve neural network acoustic models. In ICML Workshop on Deep Learning for Audio, Speech, and Language Processing.

[134]

A. L. Maas, R. E. Daly, P. T. Pham, D. Huang, A. Y. Ng, and C. Potts. 2011. Learning word vectors for sentiment analysis. In Proceedings of the 49th Annual Meeting of the ACL. 142--150.

[135]

A. Mademlis, P. Daras, D. Tzovaras, and M. G. Strintzis. 2009. 3D object retrieval using the 3D shape impact descriptor. Pattern Recognition 42, 11 (2009), 2447--2459.

Digital Library

[136]

K. Makantasis, A. Doulamis, N. Doulamis, and M. Ioannides. 2016. In the wild image retrieval and clustering for 3D cultural heritage landmarks reconstruction. MTAP 75, 7 (2016), 3593--3629.

Digital Library

[137]

K. Makantasis, K. Karantzalos, A. Doulamis, and N. Doulamis. 2015. Deep supervised learning for hyperspectral data classification through convolutional neural networks. In IEEE IGARSS. 4959--4962.

[138]

J. Martens and I. Sutskever. 2011. Learning recurrent neural networks with Hessian-free optimization. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1033--1040.

[139]

H. P. Martínez and G. N. Yannakakis. 2014. Deep multimodal fusion: Combining discrete events and continuous signals. In Proceedings of the 16th International Conference on Multimodal Interaction. 34--41.

Digital Library

[140]

M. Mathieu, M. Henaff, and Y. LeCun. 2013. Fast training of convolutional networks through FFTs. CoRR abs/1312.5851 (2013).

[141]

D. Maturana and S. Scherer. 2015. VoxNet: A 3D convolutional neural network for real-time object recognition. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 922--928.

[142]

W. McCulloch and W. Pitts. 1943. A logical calculus of the ideas immanent in nervous activity. The Bulletin of Mathematical Biophysics 5, 4 (1943), 115--133.

[143]

A. Merentitis and C. Debes. 2015. Automatic fusion and classification using random forests and features extracted with deep learning. In International Geoscience and Remote Sensing Symposium. 2943--2946.

[144]

A. Mian, M. Bennamoun, and R. Owens. 2010. On the repeatability and quality of keypoints for local feature-based 3D object retrieval from cluttered scenes. IJCV 89, 2--3 (2010), 348--361.

Digital Library

[145]

K. Mikolajczyk and C. Schmid. 2005. A performance evaluation of local descriptors. IEEE Transactions on Pattern Analysis and Machine Intelligence 27, 10 (2005), 1615--1630.

Digital Library

[146]

K. Mikolajczyk, T. Tuytelaars, C. Schmid, A. Zisserman, J. Matas, F. Schaffalitzky, T. Kadir, and L. Van Gool. 2005. A comparison of affine region detectors. IJCV 65, 1 (2005), 43--72.

Digital Library

[147]

M. Muja and D. G. Lowe. 2009. Fast approximate nearest neighbors with automatic algorithm configuration. In International Conference on Computer Vision Theory and Application (VISSAPP’09). 331--340.

[148]

V. Nair and G. E. Hinton. 2010. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the 27th International Conference on Machine Learning (ICML’10). 807--814.

Digital Library

[149]

A. Nguyen and B. Le. 2013. 3D point cloud segmentation: A survey. In Proceedings of the 6th IEEE Conference on Robotics, Automation and Mechatronics (RAM). 225--230.

[150]

M. Niepert, M. Ahmed, and K. Kutzkov. 2016. Learning convolutional neural networks for graphs. CoRR abs/1605.05273 (2016).

[151]

W. Ouyang, P. Luo, X. Zeng, S. Qiu, Y. Tian, H. Li, S. Yang, and others. 2014. DeepID-Net: Multi-stage and deformable deep convolutional neural networks for object detection. CoRR abs/1409.3505 (2014).

[152]

J. Papon, A. Abramov, M. Schoeler, and F. Worgotter. 2013. Voxel cloud connectivity segmentation—Supervoxels for point clouds. In IEEE Conference on Computer Vision and Pattern Recognition. 2027--2034.

Digital Library

[153]

R. Pascanu, C. Gülçehre, K. Cho, and Y. Bengio. 2013a. How to construct deep recurrent neural networks. CoRR abs/1312.6026 (2013).

[154]

R. Pascanu, T. Mikolov, and Y. Bengio. 2013b. On the difficulty of training recurrent neural networks. In Proceedings of the 30th International Conference on Machine Learning (ICML’13). 1310--1318.

[155]

C. R. Qi, H. Su, M. Niessner, A. Dai, M. Yan, and L. J. Guibas. 2016. Volumetric and multi-view CNNs for object classification on 3D data. arXiv preprint arXiv:1604.03265v2 (2016).

[156]

T. Rabbani, F. Van Den Heuvel, and G. Vosselmann. 2006. Segmentation of point clouds using smoothness constraint. ISPRS Archives 36, 5 (2006), 248--253.

[157]

M. Ranzato, Y. Boureau, and Y. LeCun. 2008. Sparse feature learning for deep belief networks. In Advances in Neural Information Processing Systems 20. 1185--1192.

[158]

S. Ren, K. He, R. Girshick, and J. Sun. 2015. Faster R-CNN: Towards real-time object detection with region proposal networks. In Advances in Neural Information Processing Systems. 91--99.

[159]

S. Rifai, P. Vincent, X. Muller, X. Glorot, and Y. Bengio. 2011. Contracting auto-encoders: Explicit invariance during feature extraction. In Proceedings of the 28th ICML. 833--840.

[160]

A. Romero, N. Ballas, S. E. Kahou, A. Chassang, C. Gatta, and Y. Bengio. 2014. FitNets: Hints for thin deep nets. CoRR abs/1412.6550 (2014).

[161]

F. Rosenblatt. 1958. The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review 65, 6 (1958), 386--408.

[162]

D. E. Rumelhart, G. E. Hinton, and R. J. Williams. 1986. Learning representations by back-propagating errors. Nature 323 (1986), 533--536.

[163]

R. B. Rusu, N. Blodow, Z. C. Marton, and M. Beetz. 2008. Aligning point cloud views using persistent feature histograms. In IEEE/RSJ International Conference on Intelligent Robots and Systems. 3384--3391.

[164]

R. B. Rusu, G. Bradski, R. Thibaux, and J. Hsu. 2010. Fast 3D recognition and pose using the viewpoint feature histogram. In IEEE/RSJ International Conference on IROS. 2155--2162.

[165]

R. B. Rusu and S. Cousins. 2011. 3D is here: Point Cloud Library (PCL). In IEEE International Conference on Robotics and Automation (ICRA’11). 1--4.

[166]

S. Salti, A. Petrelli, F. Tombari, and L. Di Stefano. 2012. On the affinity between 3D detectors and descriptors. In Proceedings of the 2nd International Conference on 3D Imaging, Modeling, Processing, Visualization and Transmission. 424--431.

[167]

J. Sanchez-Riera, K.-L. Hua, Y.-S. Hsiao, T. Lim, S. C. Hidayati, and W.-H. Cheng. 2016. A comparative study of data fusion for RGB-D based visual recognition. Pattern Recognition Letters 73 (2016), 1--6.

Digital Library

[168]

D. Scherer, A. Muller, and S. Behnke. 2010. Evaluation of pooling operations in convolutional architectures for object recognition. In 20th ICANN. Vol. 6354. 92--101.

[169]

G. Schindler and F. Dellaert. 2012. 4D cities: Analyzing visualizing and interacting with historical urban photo collections. Journal of Multimedia (2012).

[170]

C. Schmid, R. Mohr, and C. Bauckhage. 2000. Evaluation of interest point detectors. International Journal of Computer Vision 37, 2 (2000), 151--172.

Digital Library

[171]

J. Schmidhuber. 1992. Learning complex, extended sequences using the principle of history compression. Neural Computation 4, 2 (1992), 234--242.

Digital Library

[172]

J. Schmidhuber. 2015. Deep learning in neural networks: An overview. Neural Networks 61 (2015), 85--117.

Digital Library

[173]

R. Schnabel, R. Wahl, and R. Klein. 2007. Efficient RANSAC for point-cloud shape detection. Computer Graphics Forum 26, 2 (2007), 214--226.

[174]

M. Schwarz, H. Schulz, and S. Behnke. 2015. RGB-D object recognition and pose estimation based on pre-trained convolutional neural network features. In IEEE ICRA. 1329--1335.

[175]

N. Sedaghat, M. Zolfaghari, and Th. Brox. 2016. Orientation-boosted voxel nets for 3D object recognition. CoRR abs/1604.03351 (2016).

[176]

P. Shilane, P. Min, M. Kazhdan, and T. Funkhouser. 2004. The Princeton shape benchmark. In Shape Modeling International.

[177]

K. Siddiqi, J. Zhang, D. Macrini, A. Shokoufandeh, S. Bouix, and S. Dickinson. 2008. Retrieving articulated 3-D models using medial surfaces. Machine Vision and Applications 19, 4 (2008), 261--275.

Digital Library

[178]

N. Silberman, D. Hoiem, P. Kohli, and R. Fergus. 2012. Indoor segmentation and support inference from RGBD images. In ECCV.

Digital Library

[179]

M.-C. Sima and A. Nuchter. 2013. An extension of the Felzenszwalb-Huttenlocher segmentation to 3D point clouds. 5th ICMV: Computer Vision, Image Analysis and Processing 8783 (2013).

[180]

D. Smeets, Th. Fabry, J. Hermans, D. Vandermeulen, and P. Suetens. 2009. Isometric deformation modelling for object recognition. In Proceedings of the 13th International Conference on Computer Analysis of Images and Patterns. 757--765.

Digital Library

[181]

R. Socher, B. Huval, B. Bhat, C. D. Manning, and A. Y. Ng. 2012. Convolutional-recursive deep learning for 3D object classification. In Advances in Neural Information Processing Systems 25. 656--664.

[182]

S. Song and J. Xiao. 2014. Sliding shapes for 3D object detection in depth images. In Proceedings of the 13th European Conference on Computer Vision (ECCV’14). 634--651.

[183]

S. Song and J. Xiao. 2016. Deep sliding shapes for amodal 3D object detection in RGB-D images. In Proceedings of the 29th IEEE Conference on Computer Vision and Pattern Recognition.

[184]

H. Su, S. Maji, E. Kalogerakis, and E. Learned-Miller. 2015. Multi-view convolutional neural networks for 3D shape recognition. In Proceedings of the International Conference on Computer Vision (ICCV’15).

Digital Library

[185]

I. Sutskever. 2012. Training Recurrent Neural Networks. Ph.D. dissertation. University of Toronto.

Digital Library

[186]

I. Sutskever, J. Martens, and G. E. Hinton. 2011. Generating text with recurrent neural networks. In Proceedings of the 28th International Conference on Machine Learning (ICML’11). 1017--1024.

[187]

C. Szegedy, S. Ioffe, and V. Vanhoucke. 2016. Inception-v4, inception-ResNet and the impact of residual connections on learning. CoRR abs/1602.07261 (2016).

[188]

C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, and A. Rabinovich. 2014. Going deeper with convolutions. CoRR abs/1409.4842 (2014).

[189]

H. Tabia, H. Laga, D. Picard, and P.-H. Gosselin. 2014. Covariance descriptors for 3D shape matching and retrieval. In IEEE Conference on Computer Vision and Pattern Recognition. 4185--4192.

Digital Library

[190]

J. Tang, S. Miller, A. Singh, and P. Abbeel. 2012. A textured object recognition pipeline for color and depth image data. In Proceedings of the IEEE International Conference on Robotics and Automation (ICRA’12).

[191]

J. W. H. Tangelder and R. C. Veltkamp. 2007. A survey of content based 3D shape retrieval methods. Multimedia Tools and Applications 39, 3 (2007), 441.

Digital Library

[192]

L. Theis and M. Bethge. 2015. Generative image modeling using spatial LSTMs. In Advances in Neural Information Processing Systems 28.

[193]

F. Tombari and L. Di Stefano. 2012. Hough voting for 3D object recognition under occlusion and clutter. IPSJ Transactions on Computer Vision and Applications 4 (2012), 20--29.

[194]

F. Tombari, S. Salti, and L. Di Stefano. 2010. Unique signatures of histograms for local surface description. In Proceedings of the 11th European Conference on Computer Vision: Part III (ECCV’10). 356--369.

[195]

F. Tombari, S. Salti, and L. Di Stefano. 2013. Performance evaluation of 3D keypoint detectors. International Journal of Computer Vision 102, 1--3 (2013), 198--220.

Digital Library

[196]

J. R. R. Uijlings, K. E. A. van de Sande, T. Gevers, and A. W. M. Smeulders. 2013. Selective search for object recognition. International Journal of Computer Vision (2013).

Digital Library

[197]

J. P. C. Valentin, S. Sengupta, J. Warrell, A. Shahrokni, and P. H. S. Torr. 2013. Mesh based semantic modelling for indoor and outdoor scenes. In IEEE CVPR. 2067--2074.

Digital Library

[198]

V. Vanhoucke, A. Senior, and M. Z. Mao. 2011. Improving the speed of neural networks on CPUs. In Deep Learning and Unsupervised Feature Learning Workshop, NIPS 2011.

[199]

P. Vincent, H. Larochelle, I. Lajoie, Y. Bengio, and P.-A. Manzagol. 2010. Stacked denoising autoencoders: Learning useful representations in a deep network with a local denoising criterion. Journal of Machine Learning Research 11 (2010), 3371--3408.

Digital Library

[200]

F. Visin, K. Kastner, K. Cho, M. Matteucci, A. C. Courville, and Y. Bengio. 2015. ReNet: A recurrent neural network based alternative to convolutional networks. CoRR abs/1505.00393 (2015).

[201]

A.-V. Vo, L. Truong-Hong, D. F. Laefer, and M. Bertolotto. 2015. Octree-based region growing for point cloud segmentation. ISPRS Journal of Photogrammetry and Remote Sensing 104 (2015), 88--100.

[202]

L. Wan, M. Zeiler, S. Zhang, Y. LeCun, and R. Fergus. 2013. Regularization of neural networks using dropconnect. In Proceedings of the 30th ICML, Vol. 28. 1058--1066.

[203]

F. Wang, L. Kang, and Y. Li. 2015. Sketch-based 3D shape retrieval using convolutional neural networks. In IEEE Conference on Computer Vision and Pattern Recognition.

[204]

W. Wang, L. Chen, Z. Liu, K. Kühnlenz, and D. Burschka. 2013. Textured/textureless object recognition and pose estimation using RGB-D image. Journal of Real-Time Image Processing 10, 4 (2013), 667--682.

Digital Library

[205]

Y. Wang, Z. Xie, K. Xu, Y. Dou, and Y. Lei. 2016. An efficient and effective convolutional auto-encoder extreme learning machine network for 3D feature learning. Neurocomputing 174 (2016), 988--998.

Digital Library

[206]

D. Weikersdorfer, D. Gossow, and M. Beetz. 2012. Depth-adaptive superpixels. In 21st International Conference on Pattern Recognition (ICPR’12). 2087--2090.

[207]

P. J. Werbos. 1990. Backpropagation through time: What it does and how to do it. Proceedings of the IEEE 78, 10 (1990), 1550--1560.

[208]

W. Wohlkinger and M. Vincze. 2011. Ensemble of shape functions for 3D object classification. In IEEE International Conference on Robotics and Biomimetics (ROBIO’11). 2987--2992.

[209]

H. Wu and X. Gu. 2015. Towards dropout training for convolutional neural networks. Neural Networks 71 (2015), 1--10.

Digital Library

[210]

Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao. 2015. 3D shapenets: A deep representation for volumetric shapes. In IEEE Conference on Computer Vision and Pattern Recognition. 1912--1920.

[211]

J. Xie, Y. Fang, F. Zhu, and E. Wong. 2015a. DeepShape: Deep learned shape descriptor for 3D shape matching and retrieval. In Proceedings of the IEEE Conference on CVPR. 1275--1283.

[212]

Z. Xie, K. Xu, W. Shan, L. Liu, Y. Xiong, and H. Huang. 2015b. Projective feature learning for 3D shapes with multi-view depth images. Computer Graphics Forum (Proceedings of Pacific Graphics 2015) 34, 6 (2015).

[213]

B. Xu, N. Wang, T. Chen, and M. Li. 2015b. Empirical evaluation of rectified activations in convolutional network. CoRR abs/1505.00853 (2015).

[214]

Q. Xu, S. Jiang, W. Huang, F. Ye, and S. Xu. 2015a. Feature fusion based image retrieval using deep learning. Journal of Information and Computational Science 12, 6 (2015), 2361--2373.

[215]

Z. Yan, H. Zhang, Y. Jia, Th. Breuel, and Y. Yu. 2016. Combining the best of convolutional layers and recurrent layers: A hybrid network for semantic segmentation. CoRR abs/1603.04871 (2016).

[216]

J. Yue, S. Mao, and M. Li. 2016. A deep learning framework for hyperspectral image classification using spatial pyramid pooling. Remote Sensing Letters 7, 9 (2016), 875--884.

[217]

A. Zaharescu, E. Boyer, K. Varanasi, and R. Horaud. 2009. Surface feature detection and description with applications to mesh matching. In IEEE Conference on CVPR. 373--380.

[218]

H. F. M. Zaki, F. Shafait, and A. Mian. 2016. Convolutional hypercube pyramid for accurate RGB-D object category and instance recognition. In IEEE ICRA. 1685--1692.

Digital Library

[219]

D. Zarpalas, P. Daras, A. Axenopoulos, D. Tzovaras, and M. G. Strintzis. 2006. 3D model search and retrieval using the spherical trace transform. EURASIP Journal on Advances in Signal Processing 2007 (2006).

[220]

M. D. Zeiler and R. Fergus. 2013. Stochastic pooling for regularization of deep convolutional neural networks. CoRR abs/1301.3557 (2013).

[221]

A. Zelener. 2015. Survey of object classification in 3D range scans. (2015).

[222]

L. Zhang, L. Zhang, and B. Du. 2016a. Deep learning for remote sensing data: A technical tutorial on the state of the art. IEEE Geoscience and Remote Sensing Magazine 4, 2 (2016), 22--40.

[223]

X. Zhang, H. Zhang, Y. Zhang, Y. Yang, M. Wang, H. Luan, J. Li, and T. S. Chua. 2016b. Deep fusion of multiple semantic cues for complex event recognition. IEEE TIP 25, 3 (2016), 1033--1046.

[224]

X. Zhang, J. Zou, X. Ming, K. He, and J. Sun. 2014. Efficient and accurate approximations of nonlinear convolutional networks. CoRR abs/1411.4229 (2014).

[225]

W. Zhao and S. Du. 2016. Learning multiscale and deep representations for classifying remotely sensed imagery. ISPRS Journal of Photogrammetry and Remote Sensing 113 (2016), 155--165.

[226]

Y. Zhong. 2009. Intrinsic shape signatures: A shape descriptor for 3D object recognition. In 12th IEEE International Conference on Computer Vision Workshops (ICCV Workshops). 689--696.

[227]

Y. Zhou and Y. Wei. 2016. Learning hierarchical spectral-spatial features for hyperspectral image classification. IEEE Transactions on Cybernetics 46, 7 (2016), 1667--1678.

[228]

Z. Zhu, X. Wang, S. Bai, C. Yao, and X. Bai. 2014. Deep learning representation using autoencoder for 3D shape retrieval. CoRR abs/1409.7164 (2014).

Cited By

Troles JSchmid UFan WTian J(2024)BAMFORESTS: Bamberg Benchmark Forest Dataset of Individual Tree Crowns in Very-High-Resolution UAV ImagesRemote Sensing10.3390/rs1611193516:11(1935)Online publication date: 28-May-2024
https://doi.org/10.3390/rs16111935
Show More Cited By

Index Terms

Deep Learning Advances in Computer Vision with 3D Data: A Survey
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision
      1. Computer vision tasks
        Scene understanding
      2. Image and video acquisition
        3D imaging
  2. Machine learning
    1. Machine learning approaches
      1. Neural networks
2. General and reference
  1. Document types
    1. Surveys and overviews

Recommendations

Benchmarking deep learning techniques for face recognition
Highlights
- Training networks for face recognition is very complex and time-consuming.
- ...
Abstract
Recent progresses in Convolutional Neural Networks (CNNs) and GPUs have greatly advanced the state-of-the-art performance for face recognition. However, training CNNs for face recognition is complex and time-consuming. Multiple factors ...
Read More
Multi-view convolutional vision transformer for 3D object recognition
Abstract
With the rapid development of three-dimensional (3D) vision technology and the increasing application of 3D objects, there is an urgent need for 3D object recognition in the fields of computer vision, virtual reality, and artificial intelligence ...
Highlights
- Proposing a new architecture for view-based 3D object recognition.
- Combining the respective advantages of convolutional neural network and transformer.
- Designing a multi-scale feature fusion module.
- Designing a masking ...
Read More
Deep imitation learning for 3D navigation tasks

Deep learning techniques have shown success in learning from raw high-dimensional data in various applications. While deep reinforcement learning is recently gaining popularity as a method to train intelligent agents, utilizing deep learning in ...
Read More

Comments

Information & Contributors

Information

Published In

cover image ACM Computing Surveys

ACM Computing Surveys Volume 50, Issue 2

March 2018

567 pages

ISSN:0360-0300

EISSN:1557-7341

DOI:10.1145/3071073

Editor:
Sartaj Sahni
Department of Computer and Information Science and Engineering / University of Florida / Gainesville, FL 32611

Issue’s Table of Contents

Copyright © 2017 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 06 April 2017

Accepted: 01 January 2017

Revised: 01 December 2016

Received: 01 June 2016

Published in CSUR Volume 50, Issue 2

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Survey
Research
Refereed

Funding Sources

EU Horizon 2020 Programme: DigiArt

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

230
Total Citations
View Citations
5,566
Total Downloads

Downloads (Last 12 months)424
Downloads (Last 6 weeks)44

Other Metrics

View Author Metrics

Citations

Cited By

Troles JSchmid UFan WTian J(2024)BAMFORESTS: Bamberg Benchmark Forest Dataset of Individual Tree Crowns in Very-High-Resolution UAV ImagesRemote Sensing10.3390/rs1611193516:11(1935)Online publication date: 28-May-2024
https://doi.org/10.3390/rs16111935
Tang KArmstrong RMostaghimi PNiu YMeyer QZhao CFinegan DPopeil MSingh KMenke HDimou ABultreys TMascini AKnackstedt MDa Wang Y(2024)Scaling Deep Learning for Material Imaging: A Pseudo-3d Model for Tera-Scale 3d Domain TransferSSRN Electronic Journal10.2139/ssrn.4808378Online publication date: 2024
https://doi.org/10.2139/ssrn.4808378
Wang YLópez JNilsson UVarró Dd'Amorim M(2024)Using Run-Time Information to Enhance Static Analysis of Machine Learning Code in NotebooksCompanion Proceedings of the 32nd ACM International Conference on the Foundations of Software Engineering10.1145/3663529.3663785(497-501)Online publication date: 10-Jul-2024
https://dl.acm.org/doi/10.1145/3663529.3663785
Li YDang XMa LKlein JLe Traon YBissyandé T(2024)Test Input Prioritization for 3D Point CloudsACM Transactions on Software Engineering and Methodology10.1145/364367633:5(1-44)Online publication date: 4-Jun-2024
https://dl.acm.org/doi/10.1145/3643676
Sharma GGupta CAgarwal ASharma LDhall A(2024)Generating Point Cloud Augmentations via Class-Conditioned Diffusion Model2024 IEEE/CVF Winter Conference on Applications of Computer Vision Workshops (WACVW)10.1109/WACVW60836.2024.00057(480-488)Online publication date: 1-Jan-2024
https://doi.org/10.1109/WACVW60836.2024.00057
Wang HTian Y(2024)Sequential Point Clouds: A SurveyIEEE Transactions on Pattern Analysis and Machine Intelligence10.1109/TPAMI.2024.336597046:8(5504-5523)Online publication date: Aug-2024
https://doi.org/10.1109/TPAMI.2024.3365970
Luo ZZeng ZTang WWan JXie ZXu Y(2024)Dense Dual-Branch Cross Attention Network for Semantic Segmentation of Large-Scale Point CloudsIEEE Transactions on Geoscience and Remote Sensing10.1109/TGRS.2023.334189462(1-16)Online publication date: 2024
https://doi.org/10.1109/TGRS.2023.3341894
Loukil RGazehi WBesbes M(2024)Improving the classification of a nanocomposite using nanoparticles based on a meta-analysis study, recurrent neural network and recurrent neural network Monte-Carlo algorithmsNanocomposites10.1080/20550324.2024.236718110:1(322-350)Online publication date: 8-Jul-2024
https://doi.org/10.1080/20550324.2024.2367181
Show More Cited By

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Article

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

Media

Figures

Other

Tables

View Issue’s Table of Contents