Fast and accurate detection of 3D shapes is a fundamental task of robotic systems for intelligent tracking and automatic control. View-based 3D shape recognition has attracted increasing attention because human perceptions of 3D objects mainly rely on multiple 2D observations from different viewpoints. However, most existing multi-view-based cognitive computation methods use straightforward pairwise comparisons among the projected images then follow with weak aggregation mechanism, which results in heavy computation cost and low recognition accuracy. To address such problems, a novel network structure combining multi-view convolutional neural networks (M-CNNs), extreme learning machine auto-encoder (ELM-AE), and ELM classifer, named as MCEA, is proposed for comprehensive feature learning, effective feature aggregation, and efficient classification of 3D shapes. Such novel framework exploits the advantages of deep CNN architecture with the robust ELM-AE feature representation, as well as the fast ELM classifier for 3D model recognition. Compared with the existing set-to-set image comparison methods, the proposed shape-to-shape matching strategy could convert each high informative 3D model into a single compact feature descriptor via cognitive computation. Moreover, the proposed method runs much faster and obtains a good balance between classification accuracy and computational efficiency. Experimental results on the benchmarking Princeton ModelNet, ShapeNet Core 55, and PSB datasets show that the proposed framework achieves higher classification and retrieval accuracy in much shorter time than the state-of-the-art methods.

This study was supported in part by the Science and Technology Development Fund of Macao S.A.R (FDCT) under grant FDCT/121/2016/A3 and MoST-FDCT Joint Grant 015/2015/AMJ, in part by University of Macau under grant MYRG2016-00160-FST.
The authors declare that they have no conflict of interest.
This article does not contain any studies with human participants or animals performed by any of the authors.
Yang, ZX., Tang, L., Zhang, K. et al. Multi-View CNN Feature Aggregation with ELM Auto-Encoder for 3D Shape Recognition. Cogn Comput 10, 908–921 (2018). https://doi.org/10.1007/s12559-018-9598-1
DOI: https://doi.org/10.1007/s12559-018-9598-1