RIRNet: A Direction-Guided Post-Processing Network for Road Information Reasoning

Zhou, Guoyuan; He, Changxian; Wang, Hao; Xie, Qiuchang; Chen, Qiong; Hong, Liang; Chen, Jie

doi:10.3390/rs16142666

Open AccessArticle

RIRNet: A Direction-Guided Post-Processing Network for Road Information Reasoning

by

Guoyuan Zhou

¹,

Changxian He

²,

Hao Wang

²,

Qiuchang Xie

¹,

Qiong Chen

¹,

Liang Hong

³ and

Jie Chen

^2,*

¹

Jiangxi Institute of Land Space Survey and Planning, Nanchang 330029, China

²

School of Geosciences and Info-Physics, Central South University, Changsha 410083, China

³

Faculty of Geography, Yunnan Normal University, Kunming 650500, China

^*

Author to whom correspondence should be addressed.

Remote Sens. 2024, 16(14), 2666; https://doi.org/10.3390/rs16142666

Submission received: 26 May 2024 / Revised: 10 July 2024 / Accepted: 11 July 2024 / Published: 21 July 2024

(This article belongs to the Special Issue AI-Driven Mapping Using Remote Sensing Data)

Download

Browse Figures

Versions Notes

Abstract

:

Road extraction from high-resolution remote sensing images (HRSIs) is one of the tasks in image analysis. Deep convolutional neural networks have become the primary method for road extraction due to their powerful feature representation capability. However, roads are often obscured by vegetation, buildings, and shadows in HRSIs, resulting in incomplete and discontinuous road extraction results. To address this issue, we propose a lightweight post-processing network called RIRNet in this study, which include an information inference module and a road direction inference task branch. The information inference module can infer spatial information relationships between different rows or columns of feature images from different directions, effectively inferring and repairing road fractures. The road direction inference task branch performs the road direction prediction task, which can constrain and promote the road extraction task, thereby indirectly enhancing the inference ability of the post-processing model and realizing the optimization of the initial road extraction results. Experimental results demonstrate that the RIRNet model can achieve an excellent post-processing effect, which is manifested in the effective repair of broken road segments, as well as the handling of errors such as omission, misclassification, and noise, proving the effectiveness and generalization of the model in post-processing optimization.

Keywords:

deep learning; semantic segmentation; road extraction; post-processing; information inference

Graphical Abstract

1. Introduction

In recent years, remote sensing imagery has exhibited a trend toward massive, multi-source, and high-resolution data, providing convenient, reliable, and high-quality data support for the task of high-precision road extraction [1,2]. Currently, road information extracted from remote sensing imagery has been widely applied in various fields, including urban planning, land management, traffic management, automated navigation, route analysis, and emergency response [3,4,5,6].

Traditional road extraction methods manually design effective features for the spectral, geometric, color, texture, and topological attributes of the road, and then machine learning algorithms such as clustering and classification are used to distinguish the road and the background area [6,7]. Depending on the different scales of the analysis unit, traditional methods can be categorized into pixel-based and object-based methods. However, these methods generally have the disadvantages of low automation and poor generalization ability, as manually designed road features are easily over-designed and incomplete [8]. In recent years, with the development of deep learning technology, deep convolutional neural networks (DCNNs) have shown excellent performance in road extraction tasks and have gradually become the mainstream road extraction technique. A significant portion of deep learning-based methods is constructed using the encoder–decoder design. It can well extract road semantic features and handle both complex and various roads [1,2].

Although deep learning techniques provide a reliable way for road extraction from high-resolution remote sensing images (HRSIs), the problem of occlusion is still one of the major issues affecting the performance of the road extraction. Factors that constitute occlusion to roads include trees, vehicles, buildings or shadows, which affect the spectral, color and texture consistency of roads to different degrees, directly leading to incomplete and discontinuous road extraction results [9,10,11,12,13,14,15]. Therefore, it is crucial in road extraction tasks to effectively maintain the integrity and continuity of roads and to enhance the model’s resistance to occlusions during the extraction process. Some researchers utilize the context information computation to enhance the road semantic features of the occluded part, such as multi-scale representation that models the dependency between geo-objects and the background environment, and an attention mechanism that models the correlation between similar geo-objects [15,16,17]. However, the existing research still suffers from the multi-scale feature module being underutilized and not coupled tightly enough with the feature learning process, and the number of parameters and computation of the self-attention mechanism are too large to be applied to high-resolution feature maps. Some other researchers directly optimize and process the initial road extraction results by post-processing [18], aiming to repair incomplete and discontinuous road segments. However, it is difficult for the existing post-processing models to effectively deal with the complete occlusion situation. Moreover, the computational complexity of these post-processing methods makes it hard to achieve efficient post-processing optimization.

To this end, this study proposes a lightweight post-processing model named the Road Information Reasoning Network (RIRNet). It aims to effectively reason about the information from the initial road prediction results, repair broken road areas, and eliminate the existence of omissions and misclassifications. The model has a small number of parameters and can effectively optimize the extracted results. The contributions of this study are as follows: (1) A lightweight post-processing network is proposed to optimize the prediction results of road extraction to improve road network connectivity; (2) The proposed directional guidance information inference module has the ability to well model the spatial relationship of the road; (3) By setting up multi-task learning that includes a road direction inference task, RIRNet can be guided to repair a broken road.

The structure of this article is organized as follows. Section 2 provides a brief overview of the relevant studies. Section 3 describes in detail the proposed method. Section 4.1, Section 4.2 and Section 4.3 describe the datasets, experimental settings and evaluation metrics. Section 4.4 and Section 4.5 analyze the results and performs ablation experiments. Section 5 contains conclusions and future work.

2. Related Works

2.1. Deep Learning-Based Road Extraction Methods

In recent years, DCNNs have rapidly attracted attention in the field of road extraction due to their strong nonlinear feature-learning ability. Some studies have improved the accuracy of road extraction by optimizing the existing DCNN model structure. For example, Zhang et al. [19] combined the benefits of residual learning with UNet to engineer a residual UNet network for road recognition. Yang et al. [20] proposed a new recursive convolutional neural network UNet (RCNN-UNet), which effectively integrates spatial–semantic information and rich visual features to mitigate challenges posed by noisy and complex road backgrounds. Gao et al. [21] proposed an end-to-end multi-feature pyramid network framework, MFPN, which effectively leverages multi-level semantic features of HRSI. Li et al. [22] proposed a new end-to-end deep learning model that road features can be learned from three different levels: pixel, boundary, and region, which can eliminate background interference and enhance feature representation. Chen et al. [23] introduced CR-HR-RoadNet, which is a network that integrates local and global contextual reasoning. This model features a road-adaptive high-resolution network as a feature encoder, which effectively preserves detailed and spatial information specific to roads.

Some other studies, in order to further enhance the accuracy of road extraction, introduced a multi-task learning strategy, including extracting the road surface, road centerline, and road boundary. These related multi-tasks are able to constrain and improve each other. For example, road surface extraction and road centerline extraction are interrelated to some extent; the extraction of road surfaces influences the emergence of the road centerline, while the road centerline enhances the linear features of the road surface extraction. Cheng et al. [24] introduced CasNet, a cascaded end-to-end convolutional neural network designed to simultaneously extract both the road surface and the centerline. CasNet comprises two networks: one dedicated to road surface extraction, leveraging its robust expressive capabilities to effectively handle complex backgrounds; while the other one is cascaded with the original network to fully utilize the feature maps generated by the original network to obtain smooth, complete, and single-pixel-width road centerline results. Liu et al. [25] proposed RoadNet, which is a road network for complex urban scenes based on HRSIs. This network simultaneously extracts the road surface, boundary and centerline; is able to automatically learn multi-scale and multi-level features to cope with road scenes in various scenarios and scales; and can generate approximate single-pixel-width road boundaries or road centerlines without the need for non-maximum suppression post-processing operations. Yang et al. [19] designed two tasks, road surface extraction and centerline extraction, to improve the learning effect and efficiency; meanwhile, the RCNN module can better utilize the spatial context and rich underlying visual features, and it can also mitigate the noise and complex road background problems. Lu et al. [26] proposed a novel multi-scale multi-task deep learning framework named MSMT-RE, which is capable of completing both the road and centerline extraction tasks, taking into account the relationship between the road surface and the centerline of the road. Li et al. [18] introduced a multi-level post-processing strategy based on a linear region growth algorithm to connect road breakpoints.

2.2. Road Connectivity Modeling

Encoder–decoder based road extraction models have achieved relatively good results, but they still have some limitations. Since most road extraction models extract information within the local receptive field, it is difficult to effectively establish the topological relationship among road segments separated by occlusions [12,15]. Currently, there are two main solutions to the occlusion problem: one is to model the prior information of the road environment and improve the accuracy of road extraction by enhancing the feature learning of the model; another is to design the post-processing for the road extraction results to fix the problems of incompleteness, discontinuity, segmentation error, segmentation omission, and noise prediction in the initial prediction results.

For the first solution, background information, as one of the important aspects of prior knowledge, mainly includes the dependence between the foreground objects and the background environment as well as the correlation between similar foreground objects. As an effective auxiliary information, the context can well enhance the anti-occlusion ability of the road extraction model. Currently, most the relevant studies focus on utilizing multi-scale features and attention mechanisms to obtain effective road environment information. For example, D-LinkNet [27] is a UNet-like network which contains multi-scale null convolution modules. Gao et al. [20] proposed a customized pyramid pooling structure based on UNet and null-space pyramid pooling for strip roads. Wu et al. [28] proposed a dense-global residual network that can reduce the loss of spatial information and enhance context awareness. This study constructs a dense-global spatial pyramid pooling model to sense and aggregate contextual information based on hollow spatial pyramid pooling. Luo et al. [29] proposed a hybrid encoder–decoder approach for Bidirectional Transformer Networks (BDTNets), which is based on self-attention to capture long-distance feature dependencies and by constructing a Bidirectional Transformer Module (BDTM) to capture contextual road information in feature maps at different scales. Liu et al. [30] used the self-attention mechanism of Swin Transformer to capture the contextual information of the road and used spatial and channel separable convolution to obtain fine-grained and global features. Ding et al. [31] proposed NFSNet, a non-local feature search network designed to improve the segmentation accuracy of roads. Zhu et al. [16] incorporated a global context module within an encoder–decoder model to efficiently integrate the global contextual features of roads using a self-attention mechanism, thereby enhancing the completeness of the generated road regions. Zhao et al. [32] proposed the RFE-LinkNet, designed a multisensory field enhancement module in order to enhance spatial information perception and capture long-range dependencies. Additionally, they employed a dual-attention module to refine multi-scale features from different levels of feature maps.

The studies about first road connectivity modeling approach mainly focuses on multi-scale means and an attention mechanism to capture the road context information, but there are shortages such as an insufficient utilization of multi-scale features and insufficiently tight coupling with feature learning. The computational amount of the attention mechanism is always very large.

As for the second post-processing optimization approach, the road extraction accuracy is improved, and the integrity and continuity of the road network is enhanced by post-processing optimization of the initial extraction results. For instance, Gao et al. [33] adopted a two-stage road extraction method. Firstly, a refined deep residual convolutional neural network, consisting of a residual connection module and an extended perception module, was proposed and utilized to obtain pixel-level road segmentation results. Secondly, post-processing connections for broken roads are implemented using mathematical morphology and tensor voting algorithms. Zhou et al. [15] proposed a boundary- and topology-aware road recognition network, BT-RoadNet. The network is a coarse-to-fine architecture, which consists of a coarse prediction submodel (CMPM) and a fine prediction submodel (FMPM). The CMPM is used to learn to predict the coarse road extraction images, while FMPM refines these rough road maps by learning the differences between the initial road recognition results and the ground truth. Ding et al. [34] proposed a direction-aware residual network, DiResNet, which contains two sub-networks, DiResSeg and DiResRef. DiResSeg, through the pixel-level local direction supervision, can effectively enhance the embedding of linear features on roads. DiResRef is a refinement sub-network designed to optimize segmentation results with the goal of addressing issues such as interruptions and errors in initial segmentation results. Wang et al. [11] proposed an inner convolutionally-integrated encoder–decoder network, and they used a direction-conditional random field for the initial segmentation results for post-processing. The inner convolution can propagate information inside the feature map and enhance the learning of linear features. Meanwhile, the road direction is added to the energy function of the fully connected conditional random field to enhance the accuracy and effectiveness of the post-processing. Wei et al. [13] introduced a multi-stage framework based on deep learning, comprising three steps: segmentation, tracking, and fusion. Initially, an augmented segmentation provides the initial result, and multiple tracking starting points are established using corner detection. Subsequently, a convolutional neural network employs an iterative search strategy in a convolutional neural network. Finally, the segmentation and tracking results are fused to generate the final road results.

The second road connectivity modeling effort uses segmentation result post-processing strategies to optimize the road extraction results, such as graph models, line-tracing models, conditional random fields, etc. Although they can effectively improve the accuracy of road extraction, the lack of information inference in response to road interruption makes it difficult for the existing post-processing models to effectively face the situation of complete occlusion.

3. Methodology

The post-processing model RIRNet proposed in this study belongs to the encoder–decoder structure of the UNet-like model. In addition, we replace the convolutional layers in UNet with residual learning units, which stitch together the inputs and outputs via skip connections to improve the performance of feature representation and prevent network degradation. It is worth noting that the input of the proposed post-processing model includes the original remote sensing image data, as well as the initial extraction results of the previous stage, which together constitute the four-channel data input. Its structure is shown in Figure 1.

RIRNet consists of two main parts: the direction-guided information inference module and the road direction inference task branch. The former makes use of the structural and directional characteristics of roads to reason the spatial information of broken roads from different directions, which prompts the model to effectively build the spatial relationships between different rows or columns in feature images. The road inference task branch, on the other hand, uses the road direction inference as an auxiliary task, where the model outputs the prediction results of both the road surface and road direction and then utilizes the labeling information for loss supervision. The network loss is the sum of the losses of the road segmentation task and the road direction extraction task. Due to the strong correlation between road direction and road surface, the two tasks can constrain and promote each other to enhance the inference ability of the post-processing model and improve the extraction accuracy of the road surface.

At the same time, since the input of the RIRNet contains the initial road extraction results, it is able to explicitly guide the learning and inference. Therefore, the use of the encoder–decoder structure and jump connection in the post-processing model does not lead to the loss of narrow road information and the introduction of irrelevant noise information. Moreover, in order to ensure that the model does not lose the initial road information, this study adds the initial extraction results to the final jump connection operation.

3.1. Direction-Guided Information Inference Module

Roads have very unique structural features, mostly narrow and long, with clear directions. Therefore, the roads in HRSIs usually show linear features with spatial continuity and strong visual prior structure. In contrast, although DCNNs have strong feature extraction capabilities, they do not have sufficiently adequate capabilities to explore the spatial relationships between rows and columns in feature images. However, these relationships are important for learning objects with a priori shapes, especially those with weak visual coherence; e.g., roads are prone to be completely obscured by features such as trees and buildings, resulting in discontinuity. Therefore, the continuity of roads in road extraction is an urgent problem to be solved.

Inspired by [35,36], this study uses a direction-guided information inference module to explicitly guide the post-processing model to repair broken road segments. This module converts the traditional “layer-by-layer” convolution of convolutional layers into a “slice-by-slice” convolution of feature maps, which allows pixels in the feature maps to transfer information between rows and columns. Specifically, the information inference module herein considers each row or column of the feature map as a separate object, performs convolution operations on each row or column of the feature map in turn, and uses the processing result of the current row or column as auxiliary information for processing the next row or column, so as to realize information inference in the spatial dimension. This allows spatial information to be propagated within the same feature map, thus enhancing spatial information. This is suitable for recognizing long-distance continuous shape objects with strong spatial relationships but poor visual cues, such as roads.

The specific structure of the direction-guided information inference module is shown in Figure 2, which is mainly divided into four stages of convolution operations. The module slices the feature map by rows or columns and then sequentially performs convolution according to a certain direction order (up, down, left, or right), which is used to corporately extract road semantic features in different directions (rows or columns). Specifically, the feature map F ∈ R^C×H×W is firstly sliced by rows to obtain the multi-row independent feature tensor F_r ∈ R^C×1×W, and the sliced feature map is created sequentially through two inference processes, top-to-bottom and bottom-to-top. Each inference process sequentially performs convolution operation on each row, and the convolution result of the current row and the data of the next row undergo an element-by-element summation operation to obtain the input data when processing the next row, realizing the information transfer between different rows. After two convolution operations in different directions, all row tensors are recombined to obtain the feature map F′ ∈ R^C×H×W. Then, the feature map F′ is sliced according to the columns to obtain the multi-column independent feature tensor F_c ∈ R^C×H×1 and pass through two inference processes from left to right and from right to left in turn, which are similar to the first two inference processes. Finally, all the column tensors are recombined to obtain the final output G ∈ R^C×H×W.

The specific calculations for the top-to-bottom reasoning stage are as follows, and so on for the remaining three directions:

{F^{'}}_{r}^{i} = \{\begin{matrix} R e L U (C o n v (F_{r}^{i})) & i = 1 \\ R e L U (C o n v (F_{r}^{i} + {F^{'}}_{r}^{i - 1})) & i = 2, 3, \dots, H \end{matrix}

(1)

where

F_{r}^{i}

denotes the feature tensor of the i-th row after slicing by row,

{F^{'}}_{r}^{i}

denotes the feature tensor of the i-th row after information inference,

C o n v (.)

denotes the convolution operation, and

R e L U (.)

denotes the nonlinear activation operation.

The spatial relationship between rows and columns in the feature map can be effectively modeled after information inference by this module. The spatial structure information can be augmented by propagation between rows or columns, enabling the model to effectively use the road information on both sides of the occlusion for inference, and to supplement the information at the breaks to repair discontinuous roads for effective post-processing optimization. The module can be easily integrated into any part of the convolutional neural network, but the feature map in the top layer of the neural network model contains rich road semantic information. Therefore, in the road information inference network, this study applies the information inference module to feature maps in the topmost layer of the encoder, and it carries out the transfer of road semantic information through the spatial relation inference in four directions.

3.2. Road Direction Inference Task Branch

Directionality is an important attribute of the road and is the key prior knowledge of road extraction. Some existing studies [11,34] found that there is a close correlation between road direction inference and road surface extraction, and these two tasks can constrain and promote each other. Learning the road direction is conducive to improving the connectivity of the road extraction results. Therefore, in addition to adopting the direction-guided information reasoning module inside the model, this study takes road direction reasoning as an important auxiliary task. The post-processing model is guided to learn the direction of the road by task supervision so as to effectively enhance the reasoning ability of the post-processing model and improve the road repair effect. In this study, based on the road binary labeling data, a generation algorithm is used to obtain road direction labels [34] for multi-task learning, as shown in Algorithm 1. The main idea of the algorithm is to calculate the number of road pixels at different locations in a certain pixel neighborhood, and the direction with the most road pixels is the road direction of the pixels. The algorithm takes the real road binary label as the input and the road direction labeling data as the output. Figure 3 shows the example of road direction labeling data.

This algorithm needs to set two key parameters: the detection radius and angular step. The detection radius is used to determine the neighborhood range for the calculation, and the angular step is used to determine the number of directions for the calculation. The specific values of these two parameters are determined by the minimum and maximum road widths in the dataset. Specifically, this study sets the detection radius and angle step for different datasets and generates valid road direction labels at first. Then, the directions of all road pixels are normalized to four main directions (east–west, south–north, sout–west–northeast, and southeast–northwest). In this study, RIRNet is trained using a multi-task learning scenario, including a road extraction task and a road direction inference task. Particularly, the road direction inference task was used as an additional auxiliary task in the training process of the post-processing model. This task enables RIRNet to more accurately and efficiently reason about road occlusion and complex road areas, repair feature representations at breaks, and eliminate mispredictions such as road omissions, misclassifications, and noise from the initial extraction results.

Algorithm 1. Road direction label generation

Input : the real road labeling binary map T

;

Parameters : detection radius r, angular step ∆_{θ}

;

Output : the road direction labeling map T_{d}

;

for T (i, j)

in T

do

if (T (i, j)

= 1)

then

for θ

= 0 to π

step ∆_{θ}

do

d_{θ} (i, j)

= \sum_{ρ = 1}^{1} T (ρ s i n θ, ρ c o s θ)

+ T (- ρ s i n θ, - ρ c o s θ)

end for

find θ_{m a x}

that:

d_{θ_{m a x}} (i, j)

= {m a x {d}_{θ} (i, j)}

, θ \in [0, π]

T_{d} (i, j) = θ_{m a x}

else:

T_{d} (i, j) =

invalid

else if
end for

return T_{d}

3.3. Loss Function

Accordingly, the overall loss function consists of the sum of the loss functions of the two tasks:

L o s s = {L o s s}_{s e g} + {L o s s}_{d i r},

(2)

For the road segmentation task, two loss functions are commonly used in semantic segmentation: the Binary Cross-Entropy Loss (BCE Loss) and the Dice Loss are adopted in this study to realize the pixel-level supervised constraints. BCE Loss is one of the most commonly used loss function in road extraction tasks, which is able to measure the discrepancy information between two probability distributions. It can calculate the pixel-level difference between the prediction results and the real labeled values, which is used for the parameter training of the model. The specific calculation formula is as follows:

{L o s s}_{B C E} = - \frac{1}{N} \sum_{i = 1}^{N} y^{(i)} l o g {\hat{y}}^{(i)} + (1 - y^{(i)}) l o g (1 - {\hat{y}}^{(i)}),

(3)

where

y^{(i)}

represents the true label value of pixel

i

, which can only take the value of 0 or 1;

{\hat{y}}^{(i)}

represents the probability that pixel

i

is predicted as a road; and

N

represents the number of pixels in all samples.

The Dice coefficient is usually used as a similarity measure function to calculate the similarity of two samples, taking values ranging from 0 to 1. The Dice loss function is a commonly used classification loss function based on the Dice coefficient, which is calculated as follows:

{L o s s}_{D i c e} = 1 - \frac{2 \times | X \cap Y |}{|X| + | Y |},

(4)

{L o s s}_{s e g} = α {L o s s}_{B C E} + (1 - α) {L o s s}_{D i c e},

(5)

where

X

represents the model predicted result image and

Y

represents the real label image;

| X \cap Y |

is the intersection between

X

and

Y

, which is approximated as the dot product between the predicted result image and the real label image. The result of the elements of the dot product is summed up, and

|X|

and

| Y |

denote the elements in the predicted result image

X

and the real label image

Y

summed up, respectively. Meanwhile, the numerator in the formula has a factor of 2. This is because the common elements between

X

and

Y

are double-counted in the denominator. The segmentation loss consists of BCE Loss and Dice Loss; α is the hyperparameter that balances each loss function.

Meanwhile, due to there being four directions in the road direction inference multi-task, Cross-Entropy Loss (CE Loss) is selected as the loss function for the auxiliary task in this study to measure the difference between the probability distributions of the model outputs and the true labels in multiclassification problem. The specific formula is as follows:

{{L o s s}_{d i r} = L o s s}_{C E} = - \sum_{i = 1}^{N} y^{(i)} l o g {\hat{y}}^{(i)},

(6)

where

y^{(i)}

represents the true label value of pixel

i

;

{\hat{y}}^{(i)}

represents the probability that pixel

i

is predicted to be a road; and

N

represents the number of pixels in all samples.

4. Experiments and Result Analysis

4.1. Datasets

In order to validate the effect and performance of the proposed road extraction methodology, three open-source road extraction datasets are selected for the model accuracy evaluation in this study: the Massachusetts road dataset [37], the DeepGlobe road dataset [38], and the CH6-CUG road dataset [16]. In addition, in order to verify the performance and generalization ability of the model, we construct a dataset based on remote sensing imagery and road labels in Anyi County, Jiangxi Province, China to fine-tune and evaluate the proposed model. The example of the datasets is shown in Figure 4.

4.1.1. Massachusetts Dataset

The Massachusetts road dataset [37] comprises airborne remote sensing images collected in Massachusetts, USA, covering approximately 2.25 square kilometers. The roads in these images are not large-scale and are relatively homogeneous, yet they feature numerous occlusions, providing a robust testing ground for road extraction models. The dataset consists of 1171 images, with 1108 images allocated for model training, 14 images for validation, and 49 images for testing. Each remote sensing image in the dataset has a spatial resolution of 1 m and dimensions of 1500

\times

1500 pixels.

Considering the limitation of hardware performance, the remote sensing images sized 1500

\times

1500 in the training and validation sets are randomly cropped into 256

\times

256 image blocks. This process yields a total of 20,000 images obtained for model training and 500 images for model validation. Meanwhile, in the process of model training, this study randomly performs data augmentation such as rotating, flipping, and folding on the training set to enhance the generalization ability of the model.

4.1.2. DeepGlobe Dataset

DeepGlobe [38] is a satellite remote sensing image dataset for road extraction tasks. It includes geographic scenes such as urban and suburban areas, and the images are rich in road types, with a complex background environment and numerous road occlusions. These characteristics present significant challenges for road extraction, allowing for a comprehensive evaluation of model performance. The original dataset contains 8570 three-channel satellite remote sensing images with 6226 images having corresponding ground truth labels. Each image is 1024

\times

1024 pixels in size with a spatial resolution of 50 cm.

In this study, the remote sensing images containing real labels are proportionally divided into the training set, validation set and test set. As a result, the training set comprises 5000 images, the validation set comprises 226 images, and the test set comprises 1000 images. Likewise, the remote sensing images of 1024

\times

1024 size in the training set and validation set are randomly cropped into several image blocks of 256

\times

256 blocks, resulting in 25,000 images for model training and 1130 images for model validation. In the same way, during the model training process, data augmentations are randomly performed on the training images to expand the dataset and enhance the model’s generalization ability.

4.1.3. CHN6-CUG Dataset

The CHN6-CUG dataset [16] is a large satellite image dataset comprising cities in China including Beijing, Shanghai, Wuhan, Shenzhen, Hong Kong, and Macao. The remote sensing images are sourced from Google Earth and include various types of roads, such as railroads, highways, urban roads, rural roads, etc. Moreover, the dataset features diverse road scales, complex backgrounds, and significant differences between road types, providing a comprehensive test for road extraction models. The CHN6-CUG road dataset contains a total of 4511 remote sensing images, each sized 512

\times

512 pixels, along with corresponding ground truth labels. Among them, 3608 images are used for model training and 903 images are used for testing and accuracy evaluation, and the spatial resolution of the images is 50 cm. All the 512

\times

512 size images in the training set are randomly cropped into several image blocks of 256

\times

256 size, resulting a total of 23,000 images obtained for model training. Meanwhile, in the process of model training, data augmentations are randomly applied to the training set images to expand the dataset and enhance the model’s generalization ability.

4.1.4. Anyi County Dataset

Anyi is a county-level administrative region belonging to Jiangxi Province, China, and the types of roads in this region mainly include county-level roads and rural roads, which are relatively narrow, smaller in scale and more uniform in remote sensing images, as shown in Figure 5. There are many road interruptions in this dataset, so it can be used to test the ability of the proposed model to maintain road connectivity. The spatial resolution of the remote sensing images in this dataset is 0.8 m. The road results were manually corrected based on the land survey in Anyi County to obtain the corresponding road labels. By cropping the remote sensing images and road labels to a size of 512 × 512, we finally obtained a total of 537 remote sensing image–road label image pairs. This study uses the strategy of fine-tuning the pre-trained model to test the performance of the proposed model in road extraction, where 430 image–label pairs are used for model fine-tuning, and 107 image–label pairs are used for testing.

4.2. Experimental Settings

All experiments in this paper are implemented based on the PyTorch framework. In order to validate the performance of the road information inference network, some related DCNNs are selected as pre-processing segmentation models, including UNet [39], DeepLapV3+ [40], OCNet [41], DANet [42], ResUNet [19], and RFE-LinkNet [32], which are used as pre-processing models, and the proposed road information inference network RIRNet is used as a post-processing model to optimize the prediction results of all the above pre-processing models. The chosen pre-processing models encompass classical encoder–decoder architectures like UNet and ResUNet, as well as DANet and OCNet, which capture contextual information using attention mechanisms. Additionally, RFE-LinkNet, which integrates FCN with an attention module, is included. The experimental settings are designed to evaluate the performance of RIRNet’s post-processing optimization across different models. The prediction results of all these road extraction models are individually trained and tested on each of the three open-source road datasets. The specific experimental setup is as follows: the Adam optimizer with a momentum of 0.5 and weight decay of 0.999 is adopted as the main optimizer for training, the parameter weights are set via random initialization, all the network learning rates are initialized to 1 × 10⁻⁴, the batch size is set to 16, and the number of training times is set to 100 epochs. A poly learning strategy is used to dynamically adjust the training learning rate.

In addition, this study uses the dataset of Anyi County to validate and evaluate the proposed model. Due to the limitation in the number of samples, this paper adopts the strategy of fine-tuning the pre-trained model in the road extraction on the Anyi County dataset. The DeepGlobe large-scale dataset is firstly used to pre-train the pre-processing models and RIRNet, and then the Anyi County dataset is used to fine-tune the model with 10 epochs of training times with experimental setup parameters remaining unchanged. The comparative experiments are designed to validate the road optimization effect of the post-processing model on different road extraction models and different study areas.

4.3. Evaluation Indicators

Two commonly used and effective metrics, F1-score and Intersection over Union (IoU), are taken to evaluate the performance and accuracy of the proposed methodology. The higher the value of these metrics, the better the performance of the road extraction model. Both metrics need to be calculated based on the confusion matrix. The F1-score is a combined Precision and Recall metric, which aims to balance the impact of accuracy and recall, and it is usually used for model evaluation in the case of unbalanced data categories. The formula is as follows:

P r e c i s i o n = \frac{T P}{T P + F P},

(7)

R e c a l l = \frac{T P}{T P + F N},

(8)

F_{1} = 2 \times \frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l},

(9)

where

T P

stands for the True Positive,

F N

stands for the False Negative,

F P

stands for the False Positive, and

T N

stands for the True Negative.

The

I o U

represents the ratio between the intersection and concatenation of real and predicted values, which can evaluate the performance of the model in a more comprehensive way. The formula is as follows:

I o U = \frac{T P}{F P + T P + F N},

(10)

4.4. Evaluation of Results

Due to the different performance of the above road extraction models, their prediction results will be different. Accordingly, the post-processing optimization effect of the road information inference network proposed in this paper on these road extraction models is different. Therefore, by analyzing the optimization effect of the post-processing model on different road extraction models, the effectiveness of the post-processing model and the generalization ability of different road extraction models can be objectively and comprehensively verified.

The quantitative analysis of the post-processing model on the Massachusetts dataset is shown in Table 1. After the post-processing of RIRNet, the road extraction accuracies of the comparative experiments were all improved. The road extraction model with the best post-processing optimization effect is the DANet, which improves the accuracy by 9.04% on F1 and 10.95% on IoU; the road extraction model with the worst post-processing optimization effect is the DeepLabV3+, but it is also able to improve the accuracy by 0.34% on F1 and 0.43% on IoU. This proves the effectiveness of the RIRNet model on post-processing optimization.

Figure 6 shows the results of post-processing model on the Massachusetts dataset, and the ResUNet and RFE-LinkNet models are selected as qualitative analysis models. It can be clearly seen in the visualization results that the proposed RIRNet model has an excellent post-processing optimization effect on the Massachusetts dataset: not only the integrity and continuity of the road can be greatly improved but also the road boundary is smoother, and the noise information is also significantly reduced. In the visualization results, there are several regions marked by red circles, which are mainly the complex background regions and the regions where the roads are severely obscured, which leads to the errors of omission, misclassification, incompleteness, and discontinuity in the initial road extraction results, and the RIRNet can effectively optimize these erroneous predictions. Specifically, in the first row of Figure 6, there are serious omissions in the dense areas of roads in the extraction results of the ResUNet model, while there are a small number of road discontinuities in the extraction results of the RFE-LinkNet model, and the RIRNet model is able to effectively optimize and deal with these mispredictions, which is mainly due to the information provided by the inference module and the effectiveness of the road direction inference task. In the second and fourth rows, there are road discontinuities due to complete occlusion situations in the initial extraction results of both the ResUNet and RFE-LinkNet models, and the RIRNet can reason well about the information in these road discontinuities and thus repair the interrupted road segments effectively. In the fourth row, the RIRNet model can not only effectively deal with the road discontinuity in the dense urban road area but also effectively deal with the road omission in the initial extraction results.

The quantitative analysis of the post-processing model on the DeepGlobe dataset is shown in Table 2. After the post-processing of RIRNet, the road extraction accuracies of the comparative experiments were all improved. The road extraction model with the best post-processing optimization effect is the ResUNet model, which improves the accuracy by 5.02% on the F1 metrics and 6.49% on the IoU metrics; the road extraction model with the worst post-processing optimization effect is the UNet model, but it is also able to improve the accuracy on the F1 metrics by 1.15%, and at the same time, it is also able to improve the accuracy on the IoU metrics by 1.45%.

Figure 7 shows the results of the post-processing model on the DeepGlobe dataset, in which the ResUNet model and the RFE-LinkNet model are selected as the analysis targets. From the visualization results, it can be clearly seen that the RIRNet model has excellent post-processing optimization performance, while the initial extraction results of the above two models are optimized by post-processing, and the extraction results have a greater effect on the completeness and continuity of the road. At the same time, the optimized road boundaries are smoother, and the extraneous noise information is also significantly reduced. Specifically, there are multiple regions marked by red arrows, which have errors in the initial extraction results such as omission, misclassification, breakage, etc. As a result, the RIRNet model can effectively deal with the erroneous predictions in these regions, thus improving the road extraction accuracy. The visualization results in the first to third rows indicate the post-processing optimization effect of the RIRNet model in the suburban road region, and the initial road extraction results of the ResUNet and RFE-LinkNet models have road breaks caused by vegetation or building obstructions, while RIRNet has the ability to integrate the information of these discontinuous road regions and repair the breaks. The results in the fourth row show that the RIRNet model can also achieve excellent post-processing results in complex background regions, and it can effectively repair the broken roads and extract the missing road segments. The visualization results in the fifth row indicate the post-processing effect of RIRNet in the dense road area in the city, and due to the interference of vegetation and buildings, a large number of road breaks and the missed and misdetected roads appear in the extraction results, whereas the RIRNet model can effectively focus on the breaks and repair the broken road segments.

Table 3 shows the quantitative analysis of the post-processing model on the CHN6-CUG dataset. After the post-processing of RIRNet, the road extraction accuracies in the comparative experiments were all improved. The road extraction model with the best post-processing optimization effect is the ResUNet model, which improves the accuracy by 6.53% on F1 and 8.33% on IoU; the road extraction model with the worst post-processing optimization effect is DeepLabV3+, but it still can improve the accuracy by 0.94% on F1 and 0.71% on IoU. This also proves the effectiveness of the proposed RIRNet in post-processing optimization.

Figure 8 shows the qualitative analysis results of the post-processing model on the CHN6-CUG dataset, and the ResUNet and RFE-LinkNet model are selected as the main qualitative analysis targets in this study. From the visualization results, it is obvious that the RIRNet model can obtain excellent post-processing results, which can effectively improve the integrity and continuity of the road, maintain the smoothness of the road boundary, and accurately deal with the incorrect prediction situations such as missed scores and misclassifications. Specifically, most of the roads extracted with the ResUNet model in the visualization results are not complete enough, and the road boundaries are rough, as shown in the third column of Figure 8. It is worth noting that the RIRNet model can effectively optimize and process the extraction results of the ResUNet model, and most of the processed roads are more complete and continuous, the road boundaries are smoother, and the noise information is significantly reduced thanks to the effectiveness of the information inference module and the road direction inference task in the RIRNet model. Meanwhile, there are some road discontinuities caused by occlusions such as aircraft, trees and building shadows in the initial road extraction results of the RFE-LinkNet model, as shown in the fifth column of Figure 8. The RIRNet model can effectively connect the broken roads by means of information reasoning, and the boundaries of the processed roads are smoother and more accurate, which achieves significant results. At the same time, the region marked by red arrows in the fifth row of the results has some misclassification, and the RIRNet model is also able to eliminate the erroneous prediction by means of effective information reasoning.

Table 4 shows the quantitative analysis of the post-processing model on the Anyi County dataset. After the post-processing of RIRNet, the road extraction accuracies in the comparative experiments were all improved. The road extraction model with the best post-processing optimization effect is the DANet model, which improves the accuracy by 9.56% on F1 and 6.55% on IoU; the road extraction model with the worst post-processing optimization effect is the RFE-LinkNet model, but it is also able to improve the accuracy by 0.93% on F1 as well as 0.69% on IoU accuracy.

Figure 9 shows the qualitative analysis results of the post-processing model on the road dataset of Anyi County, and ResUNet and RFE-LinkNet are qualitatively analyzed. From the visualization results, it can be clearly seen that due to the characteristics of the roads in Anyi County and the limitation of the number of samples, the pre-processing model has discontinuous roads when extracting the narrow roads in Anyi County, whereas after the post-processing of the RIRNet model, the completeness and continuity of the roads in the results have been effectively enhanced, and the incorrect predictions such as omission of scores and misclassification have been improved. Specifically, most of the roads extracted from the road extraction results of the ResUNet and RFE-LinkNet models in the first to fifth rows are not complete enough, the road boundaries are rough and there is irrelevant noise information in the background, whereas most of the roads after processing are more complete and continuous, and the noise information is significantly reduced, which indicates that the effectiveness of the information inference module in the RIRNet model and road direction inference tasks in the RIRNet model are effective in optimizing and repairing erroneous predictions. At the same time, there are more cases of the RFE-LinkNet model, i.e., the area marked by red arrows in the fifth column of the results. Likewise, the RIRNet model is able to supplement these neglected regions effectively, increasing the completeness of the road extraction results. Meanwhile, there are many cases of road misclassification in the region marked by red arrows in the fifth row of the ResUNet extraction results, and the RIRNet model can eliminate the wrong predictions through effective information reasoning.

In addition, the number of parameters and the computation amount of the RIRNet model are analyzed accordingly. Table 5 shows the efficiency analysis of different models. The Params measures the spatial complexity of the model; the higher the Params, the higher the storage requirement of the model. The FLOPs, on the other hand, is a metric for evaluating the computational complexity of the model; the higher the FLOPs, the more computational resources the model requires for inference or training. The Params and FLOPs of the RIRNet model are 2.85 Mb and 42.88 Gbps, the Params are much smaller than other mainstream DCNNs, and the FLOPs are only higher than those of RFE-LinkNet, which belongs to the lightweight model. Therefore, the RIRNet model is easy to train and reason, and it can achieve efficient post-processing optimization.

Combining all the above quantitative and qualitative analysis results, we can see that the lightweight RIRNet model proposed in this study has excellent post-processing optimization effects on different road datasets, and it can achieve a certain accuracy improvement on all road extraction models. The RIRNet model is not only able to perform effective information inference and repair at the road occlusion; it is also able to achieve an optimization effect in the region with complex background, and at the same time, it can efficiently handle the situation of omission, misclassification, and irrelevant noise information. This proves the effectiveness and generalization of the post-processing model RIRNet.

4.5. Ablation Experiments

The effectiveness and generalization of the post-processing model RIRNet have been demonstrated through both quantitative and qualitative analysis. In order to further validate the usefulness of the information inference module and road direction inference task branch, the corresponding ablation experiments are designed in this study. Specifically, the road information reasoning and road direction reasoning are taken as ablation variables, and the RFE-LinkNet model is taken as the road extraction model to explore the roles played by different components in the post-processing model.

Table 6 shows the results of ablation post-processing experiments on the RFE-LinkNet model and the DeepGlobe dataset. From the quantitative results, it can be seen that when neither the information inference module nor the road direction inference task exists in this post-processing model, its post-processing effect is the worst, and it decreases in both the F1 and IoU, which suggests that without taking into account the occlusion problems existing in the road extraction and the feature characteristics of the roads, it is difficult to obtain a stable and reliable effect optimization and accuracy improvement. When both the information inference module and the road direction inference task are used, the post-processing model can obtain the best post-processing effect with an accuracy improvement of 2.20% in the F1-score and 2.75% in the IoU. Specifically, when only the road direction inference task exists in the model, it is able to improve the accuracy by 0.54% in the F1-score and 0.73% in the IoU, which proves the importance of multi-task learning for model performance improvement. When only the information inference module exists in the model, it is able to improve the accuracy by 1.63% in the F1-score and 1.77% in the IoU, proving the importance of spatial information reasoning for road information restoration. It is worth noting that when both the information inference module and the road direction inference task are present, the magnitude of the accuracy improved on the F1 and IoU metrics is higher than the sum of the individual improvements of the two parts, indicating that the two parts can promote and complement each other, thus showing the best post-processing effect. Meanwhile, the information inference module is better than the road direction inference task, which may be due to the fact that the multi-tasking approach only supervises the model externally, whereas the information inference module is able to explicitly spatially reason about the road information during the internal feature learning process, which fully utilizes the a priori knowledge and enhances the inference ability of the model itself, resulting in a more obvious optimization of the post-processing model.

Figure 10 shows the results of the qualitative ablation experiments of the RIRNet model on the DeepGlobe dataset. It is obvious from the visualization results that the RIRNet model is able to achieve the most excellent post-processing results when it contains both the information inference module and the road direction inference task branch (i.e., Model C). Compared to Model A and B, Model C has better road integrity and continuity in the post-processing results, smoother road boundaries, and better handling of road omission, misclassification, and noise. Specifically, Model C is able to obtain better post-processing optimization results in all the areas marked by red circles in the visualization results. The results in the first and second rows show that both Model A and Model B fail to completely repair the broken road due to the complete occlusion of vegetation, but Model C, which combines the information inference module and the road direction inference task, is able to effectively deal with the problem and thus ensure the continuity of the road extraction results. The results in the third and fifth rows show that Model C can effectively deal with the problems of road omission and noise prediction in the initial extraction results, and the optimized results of Model C reduce a large amount of noise and have more complete road structures and more accurate road boundaries. The results in the fourth and fifth rows show that in the urban areas with dense roads and complex backgrounds, there are a large number of omissions in the initial road extraction results, and Model C can effectively deal with the problems of road omission and incompleteness in the initial segmentation results, which can ensure the continuity of the road extraction results. Model C can effectively deal with the problem of omission and incompleteness in the initial extraction results, so as to supplement the omitted regions and ensure the completeness of the road network extraction results. In summary, the qualitative ablation experiment results can prove the effectiveness and necessity of the information reasoning module and road direction reasoning task.

5. Conclusions

In this paper, we focus on the problems of road discontinuity and misclassification due to complete occlusion and complex background, and accordingly, we propose a lightweight post-processing network, RIRNet, for optimizing and repairing the prediction results of the road extraction model and obtaining better road extraction accuracy and results. The RIRNet model is an encoder–decoder structure that contains two main parts: the direction-guided information inference module and the road direction inference task branch. The direction-guided information inference model can model the spatial relationship between the pixels of different rows or columns in the feature map and has better spatial inference ability, while the road direction inference task can be used as an auxiliary task to improve the road extraction accuracy and guide the post-processing model to repair the broken roads. The qualitative and quantitative analyses in the four remote sensing road extraction datasets show that the RIRNet model has excellent post-processing capability and generalization, which can effectively improve the accuracy of the road extraction model and obtain more complete and continuous road extraction results. Moreover, the results of ablation experiments can fully prove the effectiveness and generalizability of the direction-guided information inference module and road direction inference task.

The proposed RIRNet model mainly relies on a lightweight convolutional neural network implementation, which still requires tedious training to learn road features. There are other post-processing modes, such as graph convolution, conditional random field, and so on. Therefore, in our future work, we will delve into different paradigms of lightweight post-processing algorithms, discuss and supplement the system of road extraction post-processing algorithms, and explore the balance of post-processing algorithms in terms of cost and benefit.

Author Contributions

Conceptualization, J.C.; methodology, C.H. and H.W.; software, C.H. and Q.X.; validation, C.H., Q.X. and Q.C.; formal analysis, Q.C. and L.H.; investigation, Q.X. and Q.C.; resources, G.Z.; data curation, Q.X. and Q.C.; writing—original draft preparation, G.Z., C.H. and H.W.; writing—review and editing, J.C.; visualization, C.H., Q.C. and L.H.; supervision, J.C.; project administration, J.C.; funding acquisition, G.Z. and L.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the major scientific and technological projects of Yunnan Province (Grant No. 202202AD080010).

Data Availability Statement

The Massachusetts dataset can be downloaded from https://www.cs.toronto.edu/~vmnih/data/, the DeepGlobe dataset can be downloaded from http://deepglobe.org/challenge.html, and the CHN6-CUG dataset can be downloaded from http://cugurs5477.mikecrm.com/ZtMn5tR. The Anyi County dataset is part of an ongoing study that is not readily available.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Lian, R.; Wang, W.; Mustafa, N.; Huang, L. Road Extraction Methods in high-Resolution Remote Sensing Images: A Comprehensive Review. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 13, 5489–5507. [Google Scholar] [CrossRef]
Abdollahi, A.; Pradhan, B.; Shukla, N.; Chakraborty, S.; Alamri, A. Deep Learning Approaches Applied to Remote Sensing Datasets for Road Extraction: A State-Of-The-Art Review. Remote Sens. 2020, 12, 1444. [Google Scholar] [CrossRef]
Chen, Z.; Wang, C.; Li, J.; Fan, W.; Du, J.; Zhong, B. Adaboost-Like End-to-End Multiple Lightweight U-Nets for Road Extraction from Optical Remote Sensing Images. Int. J. Appl. Earth Obs. Geoinf. 2021, 100, 102341. [Google Scholar] [CrossRef]
Shan, B.; Fang, Y. A Cross Entropy Based Deep Neural Network Model for Road Extraction from Satellite Images. Entropy 2020, 22, 535. [Google Scholar] [CrossRef] [PubMed]
Chen, S.-B.; Ji, Y.-X.; Tang, J.; Luo, B.; Wang, W.-Q.; Lv, K. DBRANet: Road Extraction by Dual-Branch Encoder and Regional Attention Decoder. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Yang, K.; Yi, J.; Chen, A.; Liu, J.; Chen, W. ConDinet++: Full-Scale Fusion Network Based on Conditional Dilated Convolution to Extract Roads From Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2021, 19, 1–5. [Google Scholar] [CrossRef]
Pan, H.; Jia, Y.; Lv, Z. An Adaptive Multifeature Method for Semiautomatic Road Extraction From High-Resolution Stereo Mapping Satellite Images. IEEE Geosci. Remote Sens. Lett. 2018, 16, 201–205. [Google Scholar] [CrossRef]
Chen, Z.; Deng, L.; Luo, Y.; Li, D.; Junior, J.M.; Gonçalves, W.N.; Nurunnabi, A.A.M.; Li, J.; Wang, C.; Li, D. Road Extraction in Remote Sensing Data: A Survey. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102833. [Google Scholar] [CrossRef]
Wang, Y.; Seo, J.; Jeon, T. NL-LinkNet: Toward Lighter but More Accurate Road Extraction with Nonlocal Operations. IEEE Geosci. Remote Sens. Lett. 2021, 19, 3000105. [Google Scholar] [CrossRef]
Xu, Y.; Chen, H.; Du, C.; Li, J. MSACon: Mining Spatial Attention-Based Contextual Information for Road Extraction. IEEE Trans. Geosci. Remote Sens. 2021, 60, 5604317. [Google Scholar] [CrossRef]
Wang, S.; Mu, X.; Yang, D.; He, H.; Zhao, P. Road Extraction from Remote Sensing Images Using the Inner Convolution Integrated Encoder-Decoder Network and Directional Conditional Random Fields. Remote Sens. 2021, 13, 465. [Google Scholar] [CrossRef]
Lu, X.; Zhong, Y.; Zheng, Z.; Zhang, L. GAMSNet: Globally Aware Road Detection Network with Multi-Scale Residual Learning. ISPRS J. Photogramm. Remote Sens. 2021, 175, 340–352. [Google Scholar] [CrossRef]
Cira, C.-I.; Alcarria, R.; Manso-Callejo, M.-Á.; Serradilla, F. A Deep Learning-Based Solution for Large-Scale Extraction of the Secondary Road Network from High-Resolution Aerial Orthoimagery. Appl. Sci. 2020, 10, 7272. [Google Scholar] [CrossRef]
Wei, Y.; Zhang, K.; Ji, S. Simultaneous Road Surface and Centerline Extraction From Large-Scale Remote Sensing Images Using cnn-Based Segmentation and Tracing. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8919–8931. [Google Scholar] [CrossRef]
Zhou, M.; Sui, H.; Chen, S.; Wang, J.; Chen, X. BT-RoadNet: A Boundary and Topologically-Aware Neural Network for Road Extraction from High-Resolution Remote Sensing Imagery. ISPRS J. Photogramm. Remote Sens. 2020, 168, 288–306. [Google Scholar] [CrossRef]
Zhu, Q.; Zhang, Y.; Wang, L.; Zhong, Y.; Guan, Q.; Lu, X.; Zhang, L.; Li, D. A Global Context-Aware and Batch-Independent Network for Road Extraction from VHR Satellite Imagery. ISPRS J. Photogramm. Remote Sens. 2021, 175, 353–365. [Google Scholar] [CrossRef]
Jing, R.; Gong, Z.; Zhu, W.; Guan, H.; Zhao, W. Island road centerline extraction based on a multiscale united feature. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2018, 11, 3940–3953. [Google Scholar] [CrossRef]
Li, J.; Meng, Y.; Dorjee, D.; Wei, X.; Zhang, Z.; Zhang, W. Automatic Road Extraction from Remote Sensing Imagery Using Ensemble Learning and Postprocessing. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2021, 14, 10535–10547. [Google Scholar] [CrossRef]
Zhang, Z.; Liu, Q.; Wang, Y. Road extraction by deep residual u-net. IEEE Geosci. Remote Sens. Lett. 2018, 15, 749–753. [Google Scholar] [CrossRef]
Yang, X.; Li, X.; Ye, Y.; Lau, R.Y.K.; Zhang, X.; Huang, X. Road Detection and Centerline Extraction Via Deep Recurrent Convolutional Neural Network U-Net. IEEE Trans. Geosci. Remote Sens. 2019, 57, 7209–7220. [Google Scholar] [CrossRef]
Gao, X.; Sun, X.; Zhang, Y.; Yan, M.; Xu, G.; Sun, H.; Jiao, J.; Fu, K. An End-to-End Neural Network for Road Extraction From Remote Sensing Imagery by Multiple Feature Pyramid Network. IEEE Access 2018, 6, 39401–39414. [Google Scholar] [CrossRef]
Li, X.; Wang, Y.; Zhang, L.; Liu, S.; Mei, J.; Li, Y. Topology-Enhanced Urban Road Extraction via a Geographic Feature-Enhanced Network. IEEE Trans. Geosci. Remote Sens. 2020, 58, 8819–8830. [Google Scholar] [CrossRef]
Chen, J.; Yang, L.; Wang, H.; Zhu, J.; Sun, G.; Dai, X.; Deng, M.; Shi, Y. Road Extraction from High-Resolution Remote Sensing Images via Local and Global Context Reasoning. Remote Sens. 2023, 15, 4177. [Google Scholar] [CrossRef]
Cheng, G.; Wang, Y.; Xu, S.; Wang, H.; Xiang, S.; Pan, C. Automatic Road Detection and Centerline Extraction via Cascaded End-to-End Convolutional Neural Network. IEEE Trans. Geosci. Remote Sens. 2017, 55, 3322–3337. [Google Scholar] [CrossRef]
Liu, Y.; Yao, J.; Lu, X.; Xia, M.; Wang, X.; Liu, Y. RoadNet: Learning to Comprehensively Analyze Road Networks in Complex Urban Scenes From High-Resolution Remotely Sensed Images. IEEE Trans. Geosci. Remote Sens. 2018, 57, 2043–2056. [Google Scholar] [CrossRef]
Lu, X.; Zhong, Y.; Zheng, Z.; Liu, Y.; Zhao, J.; Ma, A.; Yang, J. Multi-Scale and Multi-Task Deep Learning Framework for Automatic Road Extraction. IEEE Trans. Geosci. Remote Sens. 2019, 57, 9362–9377. [Google Scholar] [CrossRef]
Zhou, L.; Zhang, C.; Wu, M. D-LinkNet: LinkNet With Pretrained Encoder and Dilated Convolution for High Resolution Satellite Imagery Road Extraction. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2018, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar] [CrossRef]
Wu, Q.; Luo, F.; Wu, P.; Wang, B.; Yang, H.; Wu, Y. Automatic Road Extraction from High-Resolution Remote Sensing Images Using a Method Based on Densely Connected Spatial Feature-Enhanced Pyramid. IEEE J. Sel. Top. Appl. Earth Obs. Remote Sens. 2020, 14, 3–17. [Google Scholar] [CrossRef]
Luo, L.; Wang, J.-X.; Chen, S.-B.; Tang, J.; Luo, B. BDTNet: Road Extraction by Bi-Direction Transformer From Remote Sensing Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 2505605. [Google Scholar] [CrossRef]
Liu, X.; Wang, Z.; Wan, J.; Zhang, J.; Xi, Y.; Liu, R.; Miao, Q. RoadFormer: Road Extraction Using a Swin Transformer Combined with a Spatial and Channel Separable Convolution. Remote Sens. 2023, 15, 1049. [Google Scholar] [CrossRef]
Ding, C.; Weng, L.; Xia, M.; Lin, H. Non-Local Feature Search Network for Building and Road Segmentation of Remote Sensing Image. ISPRS Int. J. Geo-Inf. 2021, 10, 245. [Google Scholar] [CrossRef]
Zhao, H.; Zhang, H.; Zheng, X. RFE-LinkNet: LinkNet with Receptive Field Enhancement for Road Extraction from High Spatial Resolution Imagery. IEEE Access 2023, 11, 106412–106422. [Google Scholar] [CrossRef]
Gao, L.; Song, W.; Dai, J.; Chen, Y. Road Extraction from High-Resolution Remote Sensing Imagery Using Refined Deep Residual Convolutional Neural Network. Remote Sens. 2019, 11, 552. [Google Scholar] [CrossRef]
Ding, L.; Bruzzone, L. DiResNet: Direction-Aware Residual Network for Road Extraction in VHR Remote Sensing Images. IEEE Trans. Geosci. Remote Sens. 2020, 59, 10243–10254. [Google Scholar] [CrossRef]
Tao, C.; Qi, J.; Li, Y.; Wang, H.; Li, H. Spatial Information Inference Net: Road extraction Using Road-Specific Contextual Information. ISPRS J. Photogramm. Remote Sens. 2019, 158, 155–166. [Google Scholar] [CrossRef]
Pan, X.; Shi, J.; Luo, P.; Wang, X.; Tang, X. Spatial as Deep: Spatial CNN for Traffic Scene Understanding. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Volodymyr, M. Machine Learning for Aerial Image Labeling; University of Toronto: Toronto, ON, Canada, 2013. [Google Scholar]
Demir, I.; Koperski, K.; Lindenbaum, D.; Pang, G.; Huang, J.; Basu, S.; Hughes, F.; Tuia, D.; Raskar, R. Deepglobe 2018: A Challenge to Parse the Earth through Satellite Images. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops 2018, Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015. [Google Scholar]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018. [Google Scholar]
Yuan, Y.; Huang, L.; Guo, J.; Zhang, C.; Chen, X.; Wang, J. Ocnet: Object Context Network for Scene Parsing. arXiv 2018, arXiv:1809.00916. [Google Scholar]
Fu, J.; Liu, J.; Tian, H.; Li, Y.; Bao, Y.; Fang, Z.; Lu, H. Dual Attention Network for Scene Segmentation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]

Figure 1. Overview of the RIRNet.

Figure 2. Structure of the direction-guided information inference module.

Figure 3. Example of road direction labeling data.

Figure 4. Example of road datasets.

Figure 5. The location and road samples of study area.

Figure 6. Qualitative evaluation results of the RIRNet model on the Massachusetts dataset.

Figure 7. Qualitative evaluation results of the RIRNet model on the DeepGlobe dataset.

Figure 8. Qualitative evaluation results of the RIRNet model on the CHN6-CUG dataset.

Figure 9. Qualitative evaluation results of the RIRNet model on the Anyi County dataset.

Figure 10. Results of qualitative ablation experiments of the RIRNet model on the RFE-LinkNet model and the DeepGlobe dataset. Model A represents the RIRNet model with only the road direction inference task, Model B represents the RIRNet model with only the information inference module, and Model C represents the RIRNet model with both modules.

Table 1. Quantitative evaluation results of the RIRNet model on the Massachusetts dataset.

Post-Processing Method	Road Extraction Methods	F1 (%)	IoU (%)
RIRNet	UNet [39]	77.66 (+0.77)	63.48 (+1.02)
	DeepLabV3+ [40]	75.60 (+0.34)	60.77 (+0.43)
	OCNet [41]	77.91 (+6.23)	62.96 (+7.1)
	DANet [42]	75.99 (+9.04)	61.27 (+10.95)
	ResUNet [19]	78.00 (+1.11)	63.94 (+1.49)
	RFE-LinkNet [32]	80.38 (+3.81)	70.86 (+2.59)

Note: The values in the table all represent the accuracy after post-processing optimization, and the values in parentheses represent the improved accuracy.

Table 2. Quantitative evaluation results of the RIRNet model on the DeepGlobe dataset.

Post-Processing Method	Road Extraction Methods	F1 (%)	IoU (%)
RIRNet	UNet [39]	74.63 (+1.15)	59.52 (+1.45)
	DeepLabV3+ [40]	75.09 (+2.63)	60.11 (+3.29)
	OCNet [41]	76.98 (+4.83)	62.57 (+6.14)
	DANet [42]	74.42 (+3.96)	59.26 (+4.86)
	ResUNet [19]	78.12 (+5.02)	64.09 (+6.49)
	RFE-LinkNet [32]	83.08 (+2.20)	74.23 (+2.75)

Note: The values in the table all represent the accuracy after post-processing optimization, and the values in parentheses represent the improved accuracy.

Table 3. Quantitative evaluation results of the RIRNet model on the CHN6-CUG dataset.

Post-Processing Method	Road Extraction Methods	F1 (%)	IoU (%)
RIRNet	UNet [39]	75.34 (+2.74)	60.43 (+3.45)
	DeepLabV3+ [40]	76.52 (+0.94)	61.98 (+0.71)
	OCNet [41]	76.72 (+0.83)	62.23 (+1.09)
	DANet [42]	76.50 (+1.27)	61.95 (+1.65)
	ResUNet [19]	77.93 (+6.53)	63.85 (+8.33)
	RFE-LinkNet [32]	76.38 (+3.34)	65.96 (+3.83)

Note: The values in the table all represent the accuracy after post-processing optimization, and the values in parentheses represent the improved accuracy.

Table 4. Quantitative evaluation results of the RIRNet model on the Anyi County dataset.

Post-Processing Method	Road Extraction Methods	F1 (%)	IoU (%)
RIRNet	UNet [39]	67.87 (+3.16)	58.74 (+2.19)
	DeepLabV3+ [40]	74.84 (+1.47)	64.44 (+1.16)
	OCNet [41]	69.95 (+4.96)	60.77 (+4.66)
	DANet [42]	71.86 (+9.56)	61.60 (+6.55)
	ResUNet [19]	72.56 (+1.74)	62.86 (+1.84)
	RFE-LinkNet [32]	74.16 (+0.93)	64.61 (+0.69)

Note: The values in the table all represent the accuracy after post-processing optimization, and the values in parentheses represent the improved accuracy.

Table 5. Results of efficiency analysis of selected models.

	UNet	DeepLabV3+	DANet	ResUNet	RFE-LinkNet	RIRNet
Params (Mb)	13.40	59.44	54.36	13.04	34.29	2.85
FLOPs (Gbps) ¹	124.36	90.35	313.70	323.73	40.84	42.88

¹ Note: The input image size for calculating FLOPs is 512 × 512 × 3.

Table 6. Results of quantitative ablation experiments of the RIRNet model on the RFE-LinkNet model and the DeepGlobe dataset.

Road Direction Inference Task	Information Reasoning Module	F1 (%)	IoU (%)
×	×	77.11 (−3.77)	67.45 (−4.20)
√	×	81.42 (+0.54)	72.20 (+0.73)
×	√	82.51 (+1.63)	73.24 (+1.77)
√	√	83.08 (+2.20)	74.23 (+2.75)

Note: The values in the table represent the accuracy of the road extraction results after post-processing optimization, and the values in the parentheses represent the enhanced or reduced accuracy.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, G.; He, C.; Wang, H.; Xie, Q.; Chen, Q.; Hong, L.; Chen, J. RIRNet: A Direction-Guided Post-Processing Network for Road Information Reasoning. Remote Sens. 2024, 16, 2666. https://doi.org/10.3390/rs16142666

AMA Style

Zhou G, He C, Wang H, Xie Q, Chen Q, Hong L, Chen J. RIRNet: A Direction-Guided Post-Processing Network for Road Information Reasoning. Remote Sensing. 2024; 16(14):2666. https://doi.org/10.3390/rs16142666

Chicago/Turabian Style

Zhou, Guoyuan, Changxian He, Hao Wang, Qiuchang Xie, Qiong Chen, Liang Hong, and Jie Chen. 2024. "RIRNet: A Direction-Guided Post-Processing Network for Road Information Reasoning" Remote Sensing 16, no. 14: 2666. https://doi.org/10.3390/rs16142666

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

RIRNet: A Direction-Guided Post-Processing Network for Road Information Reasoning

Abstract

1. Introduction

2. Related Works

2.1. Deep Learning-Based Road Extraction Methods

2.2. Road Connectivity Modeling

3. Methodology

3.1. Direction-Guided Information Inference Module

3.2. Road Direction Inference Task Branch

3.3. Loss Function

4. Experiments and Result Analysis

4.1. Datasets

4.1.1. Massachusetts Dataset

4.1.2. DeepGlobe Dataset

4.1.3. CHN6-CUG Dataset

4.1.4. Anyi County Dataset

4.2. Experimental Settings

4.3. Evaluation Indicators

4.4. Evaluation of Results

4.5. Ablation Experiments

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI