MeT: mesh transformer with an edge

Zhou, Pengwei; Dong, Xiao; Cao, Juan; Chen, Zhonggui

doi:10.1007/s00371-023-02966-z

MeT: mesh transformer with an edge

Original article
Published: 14 July 2023

Volume 39, pages 3235–3246, (2023)
Cite this article

The Visual Computer Aims and scope Submit manuscript

Pengwei Zhou¹,
Xiao Dong³,
Juan Cao² &
…
Zhonggui Chen ORCID: orcid.org/0000-0002-9960-4896¹

690 Accesses
2 Citations
Explore all metrics

Abstract

Transformers have been widely applied in various vision tasks processing different data, such as images, videos and point clouds. However, the use of transformers in 3D mesh analysis remains largely unexplored. To address this gap, we propose a mesh transformer (MeT) that utilizes local self-attention on edges. MeT is based on a transformer layer that uses vector attention for edges, which is a kind of attention operator that supports adaptive modulation to both feature vectors and individual feature channels. Based on the transformer block, we build a lightweight mesh transformer network that consists of encoder and decoder. MeT provides general backbones for subsequent 3D mesh analysis tasks. To evaluate the effectiveness of our network MeT, we conduct experiments on two classic mesh analysis tasks: shape classification and shape segmentation. MeT achieves the state-of-the-art performance on multiple datasets for two tasks. We also conduct ablation studies to show the effectiveness of key designs in our network.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation

Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Disentangled Shape and Pose Based on Attention and Mesh Autoencoder

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Data availability

Data sharing is not applicable to this article as no datasets were generated during the current study.

References

Lv, C., Lin, W., Zhao, B.: Voxel structure-based mesh reconstruction from a 3d point cloud. IEEE Trans. Multimed. 24, 1815–1829 (2021)
Article Google Scholar
Wang, W., Su, T., Liu, H., Li, X., Jia, Z., Zhou, L., Song, Z., Ding, M.: Surface reconstruction from unoriented point clouds by a new triangle selection strategy. Comput. Graph. 84, 144–159 (2019)
Article Google Scholar
Mao, A., Dai, C., Liu, Q., Yang, J., Gao, L., He, Y.,Liu, Y.J.: Std-net: Structure-preserving and topology-adaptive deformation network for single-view 3d reconstruction. In: IEEE Transactions on Visualization and Computer Graphics (2021)
Hanocka, R., Metzer, G., Giryes, R., Cohen-Or, D.: Point2mesh: A self-prior for deformable meshes. arXiv preprint arXiv:2005.11084 (2020)
Tan, H., Zhu, J., Xu, Y., Meng, X., Wang, L., Yan, L.Q.: Real-time microstructure rendering with mip-mapped normal map samples. In: Computer Graphics Forum, vol. 41 (Wiley Online Library, 2022), pp. 495–506
Guo, J., Hu, B., Chen, Y., Li, Y., Guo, Y., Yan, L.Q.: Rendering discrete participating media with geometrical optics approximation. arXiv preprint arXiv:2102.12285 (2021)
Guan, S., Xu, J., Wang, Y., Ni,B., Yang, X.: Bilevel online adaptation for out-of-domain human mesh reconstruction. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10472–10481 (2021)
Luan, T., Wang, Y., Zhang, J., Wang, Z., Zhou, Z., Qiao, Y.: Pc-hmr: Pose calibration for 3d human mesh recovery from 2d images/videos. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 35, pp. 2269–2276 (2021)
Feng, Y., Feng, Y., You, H., Zhao, X., Gao, Y.: Meshnet: Mesh neural network for 3d shape representation. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 8279–8286 (2019)
Milano, F., Loquercio, A., Rosinol, A., Scaramuzza, D., Carlone, L.: Primal-dual mesh convolutional neural networks. Adv. Neural. Inf. Process. Syst. 33, 952–963 (2020)
Google Scholar
Hanocka, R., Hertz, A., Fish, N., Giryes, R., Fleishman, S., Cohen-Or, D.: Meshcnn: a network with an edge. ACM Trans. Graph. (TOG) 38(4), 1–12 (2019)
Article Google Scholar
Singh, V.V., Sheshappanavar, S.V., Kambhamettu, C.: Meshnet++: A network with a face. In: Proceedings of the 29th ACM International Conference on Multimedia, pp. 4883–4891 (2021)
Hu, S.M., Liu, Z.N., Guo, M.H., Cai, J.X., Huang, J., Mu, T.J., Martin, R.R.: Subdivision-based mesh convolution networks. ACM Trans. Graph. (TOG) 41(3), 1–16 (2022)
Article Google Scholar
Liang, Y., Zhao, S., Yu, B., Zhang, J., He, F.: Meshmae: Masked autoencoders for 3d mesh data analysis. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part III, pp. 37–54. Springer (2022)
Li, X., Li, R., Zhu, L., Fu, C.W., Heng, P.A.: Dnf-net: A deep normal filtering network for mesh denoising. IEEE Trans. Visual Comput. Graph. 27(10), 4060–4072 (2020)
Article Google Scholar
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805 (2018)
Raffel, C., Shazeer, N., Roberts, A., Lee, K., Narang, S., Matena, M., Zhou, Y., Li, W., Liu, P.J., et al.: Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 21(140), 1–67 (2020)
MathSciNet MATH Google Scholar
Chen, C.F.R., Fan, Q., Panda, R.: Crossvit: Cross-attention multiscale vision transformer for image classification. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2021), pp. 357–366
He, K., Chen, X., Xie, S., Li, Y., Dollár, P., Girshick, R.: Masked autoencoders are scalable vision learners. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 16000–16009 (2022)
Dosovitskiy, A., Beyer, L., Kolesnikov, A., Weissenborn, D., Zhai, X., Unterthiner, T., Dehghani, M., Minderer, M., Heigold, G., Gelly, S. et al.: An image is worth 16x16 words: Transformers for image recognition at scale. arXiv preprint arXiv:2010.11929 (2020)
Heo, B., Yun, S., Han, D., Chun, S., Choe, J., Oh, S.J.: Rethinking spatial dimensions of vision transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 11936–11945 (2021)
Liu, Z., Lin, Y., Cao, Y., Hu, H., Wei, Y., Zhang, Z., Lin, S., Guo, B.: Swin transformer: Hierarchical vision transformer using shifted Windows. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 10012–10022 (2021)
Zhao, H., Jiang, L., Jia, J., Torr, P.H., Koltun, V.: Point transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 16259–16268 (2021)
Guo, M.H., Cai, J.X., Liu, Z.N., Mu, T.J., Martin, R.R., Hu, S.M.: Pct: Point cloud transformer. Comput. Vis. Media 7(2), 187–199 (2021)
Article Google Scholar
Yu, X., Rao, Y., Wang, Z., Liu, Z., Lu, J., Zhou, J.: Pointr: Diverse point cloud completion with geometry-aware transformers. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 12498–12507 (2021)
Yu, X., Tang, L., Rao, Y., Huang, T., Zhou, J., Lu, J.: Point-bert: Pre-training 3d point cloud transformers with masked point modeling. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 19313–19322 (2022)
Wu, Z., Song, S., Khosla, A., Yu, F., Zhang, L., Tang, X., Xiao, J.: 3d shapenets: A deep representation for volumetric shapes. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp. 1912–1920 (2015)
Maturana, D., Scherer, S.: Voxnet: A 3d convolutional neural network for real-time object recognition. In: 2015 IEEE/RSJ international conference on intelligent robots and systems (IROS), pp. 922–928. IEEE (2015)
Riegler, G., Osman Ulusoy, A., Geiger, A.: Octnet: Learning deep 3d representations at high resolutions. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3577–3586 (2017)
Min, C., Zhao, D., Xiao, L., Nie, Y., Dai, B.: Voxel-mae: Masked autoencoders for pre-training large-scale point clouds. arXiv preprint arXiv:2206.09900 (2022)
Qi, C.R., Su, H., Mo, K., Guibas, L.J.: Pointnet: Deep learning on point sets for 3d classification and segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 652–660 (2017)
Qi, C.R., Yi, L., Su, H., Guibas, L.J.: Pointnet++: Deep hierarchical feature learning on point sets in a metric space. Adv. Neural Inf. Process. Syst. 30, 5105–5114 (2017)
Google Scholar
Luo, S., Hu, W.: Diffusion probabilistic models for 3d point cloud Generation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2837–2845 (2021)
Pang, Y., Wang, W., Tay, F.E., Liu, W., Tian, Y., Yuan, L.: Masked autoencoders for point cloud self-supervised learning. In: Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, October 23–27, 2022, Proceedings, Part II, pp. 604–621. Springer (2022)
Masci, J., Boscaini, D., Bronstein, M., Vandergheynst, P.: Geodesic convolutional neural networks on riemannian manifolds. In: Proceedings of the IEEE International Conference on Computer Vision Workshops, pp. 37–45 (2015)
Boscaini, D., Masci, J., Rodolà, E., Bronstein, M.: Learning shape correspondence with anisotropic convolutional neural networks. Adv Neural Inf. Process. Syst. 29, 3197–3205 (2016)
Google Scholar
Monti, F., Boscaini, D., Masci, J., Rodola, E., Bronstein, M.M.: Geometric deep learning on graphs and manifolds using mixture model Cnns. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2017)
Lahav, A., Tal, A.: Meshwalker: Deep mesh understanding by random walks. ACM Trans. Graph. (TOG) 39(6), 1–13 (2020)
Article Google Scholar
Xu, H., Dong, M., Zhong, Z.: Directionally convolutional networks for 3d shape segmentation. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2698–2707 (2017)
Loop, C.T.: Smooth subdivision surfaces based on triangles. Masters Thesis University of Utah Department of Mathematics (1987)
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 6000–6010 (2017)
Lin, T., Wang, Y., Liu, X., Qiu, X.: A survey of transformers. AI Open 3, 111–132 (2022)
Han, X.F., Jin, Y.F., Cheng, H.X., Xiao, G.Q.: Dual transformer for point cloud analysis. arXiv preprint arXiv:2104.13044 (2021)
Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End-to-end object detection with transformers. In: Computer Vision–ECCV 2020: 16th European Conference, Glasgow, UK, August 23–28, 2020, Proceedings, Part I 16, pp. 213–229. Springer (2020)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. Commun. ACM 60(6), 84–90 (2017)
Article Google Scholar
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Girshick, R., Donahue, J., Darrell, T., Malik, J.: Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 580–587 (2014)
Long, J., Shelhamer, E., Darrell, T.: Fully convolutional networks for semantic segmentation. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3431–3440 (2015)
Eigen, D., Puhrsch, C., Fergus, R.: Depth map prediction from a single image using a multi-scale deep network. Adv. Neural Inf. Process. Syst. 27, 2366–2374 (2014)
Eigen, D., Fergus, R.: Predicting depth, surface normals and semantic labels with a common multi-scale convolutional architecture. In: Proceedings of the IEEE International Conference on Computer Vision (ICCV) (2015)
Arnab, A., Dehghani, M., Heigold, G., Sun, C., Lučić, M., Schmid, C.: Vivit: A video vision transformer. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 6836–6846 (2021)
Mao, J., Xue, Y., Niu, M., Bai, H., Feng, J., Liang, X., Xu, H., Xu, C.: Voxel transformer for 3d object detection. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 3164–3173 (2021)
Lian, Z., Godil, A., Bustos, B., Daoudi, M., Hermans, J., Kawamura, S., Kurita, Y., Lavoua, G., Suetens, P.D., et al.: Shape retrieval on nonrigid 3d watertight meshes. In: Eurographics Workshop on 3D Object Retrieval (3DOR). Citeseer (2011)
Maron, H., Galun, M., Aigerman, N., Trope, M., Dym, N., Yumer, E., Kim, V.G., Lipman, Y.: Convolutional neural networks on surfaces via seamless toric covers. ACM Trans. Graph. 36(4), 71–1 (2017)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-net: Convolutional networks for biomedical image segmentation. In: International Conference on Medical Image Computing and Computer-assisted Intervention, pp. 234–241. Springer (2015)
Zhao, H., Jia, J., Koltun, V.: Exploring self-attention for image Recognition. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 10076–10085 (2020)
Wang, Y., Asafi, S., Van Kaick, O., Zhang, H., Cohen-Or, D., Chen, B.: Active co-analysis of a set of shapes. ACM Trans. Graph. (TOG) 31(6), 1–10 (2012)
Article Google Scholar
Smirnov, D., Solomon, J.: Hodgenet: Learning spectral geometry on triangle meshes. ACM Trans. Graph. (TOG) 40(4), 1–11 (2021)
Article Google Scholar
Latecki, L.J., Lakamper, R.: Shape similarity measure based on correspondence of visual parts. IEEE Trans. Pattern Anal. Mach. Intell. 22(10), 1185–1190 (2000)
Article Google Scholar
Vlasic, D., Baran, I., Matusik, W., Popović, J.: Articulated mesh animation from multi-view silhouettes. In: ACM SIGGRAPH 2008 papers, pp. 1–9 (2008)
Anguelov, D., Srinivasan, P., Koller, D., Thrun, S., Rodgers, J., Davis, J.: Scape: shape completion and animation of people. In: ACM SIGGRAPH 2005 Papers, pp. 408–416 (2005)
Bogo, F., Romero, J., Loper, M., Black, M.J.: Faust: Dataset and evaluation for 3d mesh registration. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 3794–3801 (2014)
Giorgi, D., Biasotti, S., Paraboschi, L.: Shape retrieval contest 2007: Watertight models track. SHREC Compet. 8(7), 7 (2007)

Download references

Funding

This work was supported by National Natural Science Foundation of China (61972327, 62272402), Natural Science Foundation of Fujian Province (2022J01001), Fundamental Research Funds for the Central Universities (20720220037), and Start-up Fund from BNU-HKBU United International College (UICR0700052-23).

Author information

Authors and Affiliations

School of Informatics, Xiamen University, Xiamen, 361005, China
Pengwei Zhou & Zhonggui Chen
School of Mathematical Sciences, Xiamen University, Xiamen, 361005, China
Juan Cao
Department of Computer Science, BNU-HKBU United International College, Zhuhai, 519087, China
Xiao Dong

Authors

Pengwei Zhou
View author publications
You can also search for this author in PubMed Google Scholar
Xiao Dong
View author publications
You can also search for this author in PubMed Google Scholar
Juan Cao
View author publications
You can also search for this author in PubMed Google Scholar
Zhonggui Chen
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Zhonggui Chen.

Ethics declarations

Conflict of interest

The authors declare that they have no conflict of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Zhou, P., Dong, X., Cao, J. et al. MeT: mesh transformer with an edge. Vis Comput 39, 3235–3246 (2023). https://doi.org/10.1007/s00371-023-02966-z

Download citation

Accepted: 09 June 2023
Published: 14 July 2023
Issue Date: August 2023
DOI: https://doi.org/10.1007/s00371-023-02966-z

Keywords

Access this article

Log in via an institution

Subscribe and save

Springer+ Basic

$34.99 /Month

Get 10 units per month
Download Article/Chapter or eBook
1 Unit = 1 Article or 1 Chapter
Cancel anytime

Buy Now

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

MeT: mesh transformer with an edge

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation

Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Disentangled Shape and Pose Based on Attention and Mesh Autoencoder

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Subscribe and save

Buy Now

Navigation

MeT: mesh transformer with an edge

Abstract

Access this article

Subscribe and save

Buy Now

Similar content being viewed by others

Laplacian Mesh Transformer: Dual Attention and Topology Aware Network for 3D Mesh Classification and Segmentation

Cross-Attention of Disentangled Modalities for 3D Human Mesh Recovery with Transformers

Disentangled Shape and Pose Based on Attention and Mesh Autoencoder

Explore related subjects

Data availability

References

Funding

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Subscribe and save

Buy Now

Search

Navigation