Abstract
Transformers have been widely applied in various vision tasks processing different data, such as images, videos and point clouds. However, the use of transformers in 3D mesh analysis remains largely unexplored. To address this gap, we propose a mesh transformer (MeT) that utilizes local self-attention on edges. MeT is based on a transformer layer that uses vector attention for edges, which is a kind of attention operator that supports adaptive modulation to both feature vectors and individual feature channels. Based on the transformer block, we build a lightweight mesh transformer network that consists of encoder and decoder. MeT provides general backbones for subsequent 3D mesh analysis tasks. To evaluate the effectiveness of our network MeT, we conduct experiments on two classic mesh analysis tasks: shape classification and shape segmentation. MeT achieves the state-of-the-art performance on multiple datasets for two tasks. We also conduct ablation studies to show the effectiveness of key designs in our network.
Similar content being viewed by others
Explore related subjects
Discover the latest articles, news and stories from top researchers in related subjects.Data availability
Data sharing is not applicable to this article as no datasets were generated during the current study.
References
Lv, C., Lin, W., Zhao, B.: Voxel structure-based mesh reconstruction from a 3d point cloud. IEEE Trans. Multimed. 24, 1815–1829 (2021)
Wang, W., Su, T., Liu, H., Li, X., Jia, Z., Zhou, L., Song, Z., Ding, M.: Surface reconstruction from unoriented point clouds by a new triangle selection strategy. Comput. Graph. 84, 144–159 (2019)
Mao, A., Dai, C., Liu, Q., Yang, J., Gao, L., He, Y.,Liu, Y.J.: Std-net: Structure-preserving and topology-adaptive deformation network for single-view 3d reconstruction. In: IEEE Transactions on Visualization and Computer Graphics (2021)
Hanocka, R., Metzer, G., Giryes, R., Cohen-Or, D.: Point2mesh: A self-prior for deformable meshes. arXiv preprint arXiv:2005.11084 (2020)
Tan, H., Zhu, J., Xu, Y., Meng, X., Wang, L., Yan, L.Q.: Real-time microstructure rendering with mip-mapped normal map samples. In: Computer Graphics Forum, vol. 41 (Wiley Online Library, 2022), pp. 495–506
Guo, J., Hu, B., Chen, Y., Li, Y., Guo, Y., Yan, L.Q.: Rendering discrete participating media with geometrical optics approximation. arXiv preprint arXiv:2102.12285 (2021)
Guan, S., Xu, J., Wang, Y., Ni,B., Yang, X.: Bilevel online adaptation for out-of-domain human mesh reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10472–10481 (2021)
Luan, T., Wang, Y., Zhang, J., Wang, Z., Zhou, Z., Qiao, Y.: Pc-hmr: Pose calibration for 3d human mesh recovery from 2d images/videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2269–2276 (2021)
Feng, Y., Feng, Y., You, H., Zhao, X., Gao, Y.: Meshnet: Mesh neural network for 3d shape representation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8279–8286 (2019)
Milano, F., Loquercio, A., Rosinol, A., Scaramuzza, D., Carlone, L.: Primal-dual mesh convolutional neural networks. Adv. Neural. Inf. Process. Syst. 33, 952–963 (2020)
Hanocka, R., Hertz, A., Fish, N., Giryes, R., Fleishman, S., Cohen-Or, D.: Meshcnn: a network with an edge. ACM Trans. Graph. (TOG) 38(4), 1–12 (2019)
Singh, V.V., Sheshappanavar, S.V., Kambhamettu, C.: Meshnet++: A network with a face. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4883–4891 (2021)
Hu, S.M., Liu, Z.N., Guo, M.H., Cai, J.X., Huang, J., Mu, T.J., Martin, R.R.: Subdivision-based mesh convolution networks. ACM Trans. Graph. (TOG) 41(3), 1–16 (2022)
Liang, Y., Zhao, S., Yu, B., Zhang, J., He, F.: Meshmae: Masked autoencoders for 3d mesh data analysis. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 37–54. Springer (2022)
Li, X., Li, R., Zhu, L., Fu, C.W., Heng, P.A.: Dnf-net: A deep normal filtering network for mesh denoising. IEEE Trans. Visual Comput. Graph. 27(10), 4060–4072 (2020)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
Chen, C.F.R., Fan, Q., Panda, R.: Crossvit: Cross-attention multiscale vision transformer for image classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2021), pp. 357–366
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11936–11945 (2021)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted Windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16259–16268 (2021)
Guo, M.H., Cai, J.X., Liu, Z.N., Mu, T.J., Martin, R.R., Hu, S.M.: Pct: Point cloud transformer. Comput. Vis. Media 7(2), 187–199 (2021)
Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., Zhou, J.: Pointr: Diverse point cloud completion with geometry-aware transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12498–12507 (2021)
Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19313–19322 (2022)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920 (2015)
Maturana, D., Scherer, S.: Voxnet: A 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 922–928. IEEE (2015)
Riegler, G., Osman Ulusoy, A., Geiger, A.: Octnet: Learning deep 3d representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586 (2017)
Min, C., Zhao, D., Xiao, L., Nie, Y., Dai, B.: Voxel-mae: Masked autoencoders for pre-training large-scale point clouds. arXiv preprint arXiv:2206.09900 (2022)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30, 5105–5114 (2017)
Luo, S., Hu, W.: Diffusion probabilistic models for 3d point cloud Generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2837–2845 (2021)
Pang, Y., Wang, W., Tay, F.E., Liu, W., Tian, Y., Yuan, L.: Masked autoencoders for point cloud self-supervised learning. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pp. 604–621. Springer (2022)
Masci, J., Boscaini, D., Bronstein, M., Vandergheynst, P.: Geodesic convolutional neural networks on riemannian manifolds. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 37–45 (2015)
Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.: Learning shape correspondence with anisotropic convolutional neural networks. Adv Neural Inf. Process. Syst. 29, 3197–3205 (2016)
Monti, F., Boscaini, D., Masci, J., Rodola, E., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model Cnns. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Lahav, A., Tal, A.: Meshwalker: Deep mesh understanding by random walks. ACM Trans. Graph. (TOG) 39(6), 1–13 (2020)
Xu, H., Dong, M., Zhong, Z.: Directionally convolutional networks for 3d shape segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2698–2707 (2017)
Loop, C.T.: Smooth subdivision surfaces based on triangles. Masters Thesis University of Utah Department of Mathematics (1987)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 6000–6010 (2017)
Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. AI Open 3, 111–132 (2022)
Han, X.F., Jin, Y.F., Cheng, H.X., Xiao, G.Q.: Dual transformer for point cloud analysis. arXiv preprint arXiv:2104.13044 (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 213–229. Springer (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inf. Process. Syst. 27, 2366–2374 (2014)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: Vivit: A video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6836–6846 (2021)
Mao, J., Xue, Y., Niu, M., Bai, H., Feng, J., Liang, X., Xu, H., Xu, C.: Voxel transformer for 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3164–3173 (2021)
Lian, Z., Godil, A., Bustos, B., Daoudi, M., Hermans, J., Kawamura, S., Kurita, Y., Lavoua, G., Suetens, P.D., et al.: Shape retrieval on nonrigid 3d watertight meshes. In: Eurographics Workshop on 3D Object Retrieval (3DOR). Citeseer (2011)
Maron, H., Galun, M., Aigerman, N., Trope, M., Dym, N., Yumer, E., Kim, V.G., Lipman, Y.: Convolutional neural networks on surfaces via seamless toric covers. ACM Trans. Graph. 36(4), 71–1 (2017)
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer (2015)
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image Recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085 (2020)
Wang, Y., Asafi, S., Van Kaick, O., Zhang, H., Cohen-Or, D., Chen, B.: Active co-analysis of a set of shapes. ACM Trans. Graph. (TOG) 31(6), 1–10 (2012)
Smirnov, D., Solomon, J.: Hodgenet: Learning spectral geometry on triangle meshes. ACM Trans. Graph. (TOG) 40(4), 1–11 (2021)
Latecki, L.J., Lakamper, R.: Shape similarity measure based on correspondence of visual parts. IEEE Trans. Pattern Anal. Mach. Intell. 22(10), 1185–1190 (2000)
Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. In: ACM SIGGRAPH 2008 papers, pp. 1–9 (2008)
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. In: ACM SIGGRAPH 2005 Papers, pp. 408–416 (2005)
Bogo, F., Romero, J., Loper, M., Black, M.J.: Faust: Dataset and evaluation for 3d mesh registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3794–3801 (2014)
Giorgi, D., Biasotti, S., Paraboschi, L.: Shape retrieval contest 2007: Watertight models track. SHREC Compet. 8(7), 7 (2007)
Funding
This work was supported by National Natural Science Foundation of China (61972327, 62272402), Natural Science Foundation of Fujian Province (2022J01001), Fundamental Research Funds for the Central Universities (20720220037), and Start-up Fund from BNU-HKBU United International College (UICR0700052-23).
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflict of interest.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Zhou, P., Dong, X., Cao, J. et al. MeT: mesh transformer with an edge. Vis Comput 39, 3235–3246 (2023). https://doi.org/10.1007/s00371-023-02966-z
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s00371-023-02966-z