research-article

PVT-Unet: Road Extraction in Remote Sensing Imagery Based on U-shaped Pyramid Vision Transformer Neural Network

Authors:

Youqiang Xiong,

Shubo ZhangAuthors Info & Claims

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

Pages 199 - 204

https://doi.org/10.1145/3647649.3647682

Published: 03 May 2024 Publication History

Abstract

Road extraction from remote sensing images has gradually become a prominent research hotspot in the field of autonomous driving and smart city construction. In recent years, with the developments of computing power, deep learning has been widely used in this field and convolution neural networks are usually used to extract roads. However, since the roads in the remote sensing images are easy to be occluded by trees and buildings, the roads extracted by these methods are usually fragmented. In this paper, a U-shaped Neural Network based on Pyramid Vision Transformer (PVT-Unet) is designed. This network combines Transformer's long term learning capability with U-shaped network multi-scale feature extraction capability to predict the roads well. Experimental results show that PVT-Unet outperforms the state-of-the-art methods in all evaluation metrics on the Istanbul City Road Dataset. The source code has been made publicly available at: https://github.com/XYQ1517/PVT-Unet.

References

[1]

L. Qiu, D. Yu, C. Zhang, and X. Zhang, “A semantics-geometry framework for road extraction from remote sensing images,” IEEE Geoscience and Remote Sensing Letters, 2023.

[2]

Y. Wang, Y. Peng, W. Li, G. C. Alexandropoulos, J. Yu, D. Ge, and W. Xiang, “Ddu-net: Dual-decoder-u-net for road extraction using highresolution remote sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 60, pp. 1–12, 2022.

[3]

L. Dai, G. Zhang, and R. Zhang, “Radanet: road augmented deformable attention network for road extraction from complex high-resolution remote-sensing images,” IEEE Transactions on Geoscience and Remote Sensing, vol. 61, pp. 1–13, 2023.

[4]

Z. Miao, W. Shi, H. Zhang, and X. Wang, “Road centerline extraction from high-resolution imagery based on shape features and multivariate adaptive regression splines,” IEEE geoscience and remote sensing letters, vol. 10, no. 3, pp. 583–587, 2012.

[5]

H. Zhang, W. Shi, Y. Wang, M. Hao, and Z. Miao, “Classification of very high spatial resolution imagery based on a new pixel shape feature set,” IEEE Geoscience and Remote Sensing Letters, vol. 11, no. 5, pp. 940–944, 2013.

[6]

E. F. Martins, A. P. Dal Poz, and R. A. Gallis, “Semiautomatic object- ´ space road extraction combining a stereoscopic image pair and a tinbased dtm,” IEEE Geoscience and Remote Sensing Letters, vol. 12, no. 8, pp. 1790–1794, 2015.

[7]

G. Cheng, F. Zhu, S. Xiang, and C. Pan, “Road centerline extraction via semisupervised segmentation and multidirection nonmaximum suppression,” IEEE Geoscience and Remote Sensing Letters, vol. 13, no. 4, pp. 545–549, 2016.

[8]

G. Cheng, Y. Wang, F. Zhu and C. Pan, "Road extraction via adaptive graph cuts with multiple features," 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 2015, pp. 3962-3966.

Digital Library

[9]

T. Pham, “Semantic road segmentation using deep learning,” in 2020 Applying New Technology in Green Buildings (ATiGB). IEEE, 2021, pp. 45–48.

[10]

D. Guanlin, “Research on semantic segmentation algorithm based on deep learning control tools,” in 2020 International Conference on Computer Communication and Network Security (CCNS). IEEE, 2020, pp. 35–38.

[11]

A. Do Hong, H. D. Chi, and T. Pham, “Medical image segmentation using deep learning and blending loss,” in 2022 7th National Scientific Conference on Applying New Technology in Green Buildings (ATiGB). IEEE, 2022, pp. 109–113.

[12]

Y. Wang, J. Seo, and T. Jeon, “Nl-linknet: Toward lighter but more accurate road extraction with nonlocal operations,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.

[13]

Y. Wei, Z. Wang, and M. Xu, “Road structure refined cnn for road extraction in aerial image,” IEEE Geoscience and Remote Sensing Letters, vol. 14, no. 5, pp. 709–713, 2017.

[14]

Z. Zhang, Q. Liu and Y. Wang, "Road Extraction by Deep Residual U-Net," in IEEE Geoscience and Remote Sensing Letters, vol. 15, no. 5, pp. 749-753, May 2018.

[15]

Y. Wang et al., "Re-DLinkNet: Based on DLinkNet and ReNet for Road Extraction from High Resolution Satellite Imagery," 2021 IEEE International Geoscience and Remote Sensing Symposium IGARSS, Brussels, Belgium, 2021, pp. 4664-4667.

[16]

Z. Liu, R. Feng, L. Wang, Y. Zhong and L. Cao, "D-Resunet: Resunet and Dilated Convolution for High Resolution Satellite Imagery Road Extraction," IGARSS 2019 - 2019 IEEE International Geoscience and Remote Sensing Symposium, Yokohama, Japan, 2019, pp. 3927-3930.

[17]

Sutskever I, Vinyals O, Le Q V. Sequence to sequence learning with neural networks[J]. Advances in neural information processing systems, 2014, 27.

[18]

Qiu X, Sun T, Xu Y, Pre-trained models for natural language processing: A survey[J]. Science China Technological Sciences, 2020, 63(10): 1872-1897.

[19]

Cordonnier J B, Loukas A, Jaggi M. On the relationship between self-attention and convolutional layers[J]. arXiv preprint arXiv:1911.03584, 2019.

[20]

Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[21]

Carion N, Massa F, Synnaeve G, End-to-end object detection with transformers[C]//European conference on computer vision. Cham: Springer International Publishing, 2020: 213-229.

[22]

Wang H, Zhu Y, Adam H, Max-deeplab: End-to-end panoptic segmentation with mask transformers[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 5463-5474.

[23]

Chen X, Yan B, Zhu J, Transformer tracking[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 8126-8135.

[24]

Jiang Y, Chang S, Wang Z. Transgan: Two pure transformers can make one strong gan, and that can scale up[J]. Advances in Neural Information Processing Systems, 2021, 34: 14745-14758.

[25]

Chen H, Wang Y, Guo T, Pre-trained image processing transformer[C]//Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. 2021: 12299-12310.

[26]

O. Ronneberger, P. Fischer, and T. Brox, “U-net: Convolutional networks for biomedical image segmentation,” in Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, October 5-9, 2015, Proceedings, Part III 18. Springer, 2015, pp. 234–241.

[27]

Dosovitskiy A, Beyer L, Kolesnikov A, An image is worth 16x16 words: Transformers for image recognition at scale[J]. arXiv preprint arXiv:2010.11929, 2020.

[28]

Wang W, Xie E, Li X, Pyramid vision transformer: A versatile backbone for dense prediction without convolutions[C]//Proceedings of the IEEE/CVF international conference on computer vision. 2021: 568-578.

[29]

Vaswani A, Shazeer N, Parmar N, Attention is all you need[J]. Advances in neural information processing systems, 2017, 30.

[30]

Zaremba W, Sutskever I, Vinyals O. Recurrent neural network regularization[J]. arXiv preprint arXiv:1409.2329, 2014.

[31]

O. Ozturk, M. S. Isik, B. Sariturk, and D. Z. Seker, “Generation of istanbul road data set using google map api for deep learning-based segmentation,” International Journal of Remote Sensing, vol. 43, no. 8, pp. 2793–2812, 2022.

[32]

L. Zhou, C. Zhang, and M. Wu, “D-linknet: Linknet with pretrained encoder and dilated convolution for high resolution satellite imagery road extraction,” in Proceedings of the IEEE conference on computer vision and pattern recognition workshops, 2018, pp. 182–186.

[33]

S.-B. Chen, Y.-X. Ji, J. Tang, B. Luo, W.-Q. Wang, and K. Lv, “Dbranet: Road extraction by dual-branch encoder and regional attention decoder,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.

[34]

R. Li, S. Zheng, C. Duan, J. Su, and C. Zhang, “Multistage attention resu-net for semantic segmentation of fine-resolution remote sensing images,” IEEE Geoscience and Remote Sensing Letters, vol. 19, pp. 1–5, 2021.

[35]

E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: Simple and efficient design for semantic segmentation with transformers,” Advances in Neural Information Processing Systems, vol. 34, pp. 12 077–12 090, 2021.

Index Terms

PVT-Unet: Road Extraction in Remote Sensing Imagery Based on U-shaped Pyramid Vision Transformer Neural Network
1. Computing methodologies
  1. Artificial intelligence
    1. Computer vision

Recommendations

Study on Road Extraction Method in Remote Sensing Image
ICICEE '12: Proceedings of the 2012 International Conference on Industrial Control and Electronics Engineering

A road extraction method combining radiation and topology feature is proposed. First, wavelet transform is used to remove noise and detail information which impact extraction of radiation feature. Then road and other objects with same spectrum are ...
Research and Application of Urban Road Extraction from Remote Sensing Images Based on Convolutional neural network U-net
ISIA '23: Proceedings of the 2023 International Conference on Intelligent Sensing and Industrial Automation

Abstract: The urban road network is the backbone of a city, and the development speed of a city largely depends on whether the planning of the urban road network is reasonable. How to accurately obtain road distribution has profound significance for ...
Automatic Road Extraction from Remote Sensing Images Based on Fuzzy Connectedness
GIT4NDM '13: Proceedings of the 2013 Fifth International Conference on Geo-Information Technologies for Natural Disaster Management

With the rapid development of space technology, space remote sensing activities get a full extension and application. Remote sensing information has become an essential part of geographic information data source. As a very important kind of geographic ...

Comments

Information & Contributors

Information

Published In

cover image ACM Other conferences

ICIGP '24: Proceedings of the 2024 7th International Conference on Image and Graphics Processing

January 2024

480 pages

ISBN:9798400716720

DOI:10.1145/3647649

Copyright © 2024 ACM.

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected].

Publisher

Association for Computing Machinery

New York, NY, United States

Publication History

Published: 03 May 2024

Permissions

Request permissions for this article.

Request Permissions

Check for updates

Author Tags

Qualifiers

Research-article
Research
Refereed limited

Funding Sources

Beijing Information Science and Technology University

Conference

ICIGP 2024

ICIGP 2024: 2024 the 7th International Conference on Image and Graphics Processing

January 19 - 21, 2024

Beijing, China

Contributors

Other Metrics

View Article Metrics

Bibliometrics & Citations

Bibliometrics

Article Metrics

0
Total Citations
25
Total Downloads

Downloads (Last 12 months)25
Downloads (Last 6 weeks)6

Reflects downloads up to 10 Aug 2024

Other Metrics

View Author Metrics

Citations

View Options

Get Access

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

View options

PDF

View or Download as a PDF file.

eReader

View online with eReader.

HTML Format

View this article in HTML Format.

Media

Figures

Other

Tables

View Table of Contents