Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
skip to main content
10.1145/3650400.3650475acmotherconferencesArticle/Chapter ViewAbstractPublication PageseitceConference Proceedingsconference-collections
research-article

An Efficient Multi-view Stereo Reconstruction Method Based On MA-MVSNet

Published: 17 April 2024 Publication History

Abstract

To address the problems of multi-view stereo reconstruction, such as low reconstruction completeness and poor reconstruction accuracy, we propose a novel multi-view reconstruction network named MA-MVSNet based on the improved attention mechanism. This network takes the basic MVSNet as the backbone and introduces Local-grouped Self-attention (LGSA) and Global Adaptive Average-pooling Attention (GAAA) into the reconstruction framework to make the network have both long-range dependence and local receptive field, which solves the problem that the existing convolutional neural network-based methods can not efficiently model the global contextual information of images and improves the reconstruction quality. The experiment shows that the proposed network can achieve excellent performance on DTU dataset, especially in terms of reconstruction completeness. Compared with the existing benchmark network MVSNet, our network has improved reconstruction accuracy by 5% and reconstruction completeness by 50%.

References

[1]
Fangjinhua Wang, Silvano Galliani, 2021. Patchmatchnet: Learned multi-view patchmatch stereo. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 14194-14203.
[2]
Jingyang Zhang, 2022. Visibility-aware multi-view stereo network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 768-779.
[3]
Yao Yao, Zixin Luo, Shiwei Li, 2018. Mvsnet:Depth inference for unstructured multi-view stereo. In Proceedings of the European Conference on Computer Vision, pages 767–783.
[4]
Yao Yao, Zixin Luo,et al. 2019. Recurrent mvsnet for high-resolution multi-view stereo depth inference. In Proceedings of theIEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5525–5534.
[5]
Xiaodong Gu, Zhiwen Fan, 2020. Cascade cost volume for high-resolution multi-view stereo and stereo matching. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2495-2504.
[6]
Rui Chen, Songfang Han, 2019. Point-based multi-view stereo network. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 1538-1547.
[7]
Shuo Cheng, 2020. Deep stereo using adaptive thin volume representation with uncertainty awareness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2524-2534.
[8]
Youze Xue, Jiansheng Chen, 2019. Mvscrf: Learning multi-view stereo with conditional random fields. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4312-4321.
[9]
Keyang Luo, Tao Guan, 2019. P-mvsnet: Learning patch-wise matching confidence aggregation for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 10452-10461.
[10]
Xinjun Ma, Yue Gong, 2021. Epp-mvsnet: Epipolar-assembling based depth prediction for multi-view stereo. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 5732-5740.
[11]
Dosovitskiy A, Beyer L, Kolesnikov A, 2020. An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929.
[12]
Jianfeng Yan, Zizhuang Wei, 2020. Dense hybrid recurrent multi-view stereo net with dy-namic consistency checking. In European Conference onComputer Vision, pages 674–689. Springer.
[13]
Ashish V aswani, 2017. Attention is all you need. In Advances in neuralinformation processing systems, pages 5998–6008.
[14]
Dai J, 2017. Deformable convolutional networks[C]//Proceedings of the IEEE international conference on computer vision, pages 764-773.
[15]
Henrik Aanæs, Rasmus Ramsbøl Jensen, 2016. Large-scale data formultiple-view stereopsis. International Journal of ComputerVision, pages 120(2):153–168.
[16]
Merrell, P., Akbarzadeh, 2007. Real-time visibility-based fusion of depth maps. International Conference on Computer Vision.
[17]
Yikang Ding, 2022. Transmvsnet: Global context-aware multi-view stereo network with transformers. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8585-8594.
[18]
Silvano Galliani, Katrin Lasinger, and Konrad Schindler. 2015. Massively parallel multiview stereopsis by surface normaldiffusion. In Proceedings of the IEEE International Conference on Computer Vision, pages 873–881.
[19]
Johannes L Schönberger, Enliang Zheng, 2016. Pixelwise view selection for unstructured multi-view stereo. In European Conference on Computer Vision, pages 501–518, Springer.
[20]
Wenhai Wang, Enze Xie, 2021. Pyramid vision transformer: A versatile backbone for dense prediction without convolutions. In Proceedings of the IEEE/CVF international conference on computer vision, pages 568-578.
[21]
Wenhai Wang, Enze Xie, 2022. Pvt v2: Improved baselines with pyramid vision transformer. Computational Visual Media 8.3 : 415-424.
[22]
Xiangxiang Chu, Zhi Tian, 2021. Conditional positional encodings for vision transformers. arXiv preprint arXiv:2102.10882.
[23]
Ze Liu, Yutong Lin, Yue Cao, 2021. Swin transformer: Hierarchical vision transformer using shifted windows. In Proceedings of the IEEE/CVF international conference on computer vision, pages 10012-10022.
[24]
Paul-Edouard,et al. 2020. Superglue: Learning featurematching with graph neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4938–4947.

Recommendations

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences
EITCE '23: Proceedings of the 2023 7th International Conference on Electronic Information Technology and Computer Engineering
October 2023
1809 pages
ISBN:9798400708305
DOI:10.1145/3650400
Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 17 April 2024

Permissions

Request permissions for this article.

Check for updates

Qualifiers

  • Research-article
  • Research
  • Refereed limited

Conference

EITCE 2023

Acceptance Rates

Overall Acceptance Rate 508 of 972 submissions, 52%

Contributors

Other Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

  • 0
    Total Citations
  • 5
    Total Downloads
  • Downloads (Last 12 months)5
  • Downloads (Last 6 weeks)1
Reflects downloads up to 15 Oct 2024

Other Metrics

Citations

View Options

Get Access

Login options

View options

PDF

View or Download as a PDF file.

PDF

eReader

View online with eReader.

eReader

HTML Format

View this article in HTML Format.

HTML Format

Media

Figures

Other

Tables

Share

Share

Share this Publication link

Share on social media