3.4. Experimental Results
In order to better evaluate the performance of the network semantic segmentation method based on AS-Unet++ proposed in this paper, we conducted three sets of experiments. The first experiment was an ablation experiment, where AS-Unet++, Unet++, A-Unet++ with only ASPP added, and S-Unet++ with only the SE model added are compared with each other in the test set to verify the effectiveness of the two modules, the ASPP and SE models. The second experiment is to compare AS-Unet++ with Unet and AS-Unet model in the training set and test set, which can visualize the performance optimization of the network. The third experiment compared AS-Unet++ with other network models in the training and test sets, including CE Loss and regular data enhancement, to evaluate the method of this paper through further comparative experiments.
- (1)
The ablation experiment
AS-Unet++ and Unet++ were compared with A-Unet++ with only ASPP added and S-Unet++ with only the SE model added to validate the effectiveness of the two modules of the ASPP and SE model.
A comparison of the predicted segmentation maps for houses, roads, forests, and lakes realized by various networks is shown in
Figure 12.
In the recognition of house elements, Unet++ has the phenomenon of missed recognition of some houses due to the difference of light and color, and the edge segmentation effect of houses is not good in recognition. In A-Unet++ with only the addition of ASPP, although the edge segmentation effect of houses has been improved, the phenomenon of missed recognition has not been improved. In S-Unet++ with only the SE model, although the phenomenon of missed recognition has been improved, the edge segmentation effect of houses has not been improved. The AS-Unet++ network with the addition of both modules improved in both missed recognition and the edge segmentation effect. Compared with A-Unet++ with the addition of a single module, it was not improved in the recognition of houses. In S-Unet++ with only the SE model, although the omission recognition phenomenon was improved, the edge segmentation effect of houses was not improved.
AS-Unet++ with both modules improved in both omission recognition and edge segmentation effects, and the improvement was more obvious compared with A-Unet++ and S-Unet++ with a single module. In remote sensing images, the house element accounts for a relatively small proportion, and the lack of SE model has poor performance in capturing semantic features of the house, which leads to the phenomenon of missed recognition. While the lack of ASPP leads to the loss of target spatial information and detailed information caused by downsampling methods, such as convolution and pooling with step size in the original structure of the Unet, it does not have much effect on the house information, such as illumination and color differences, but causes the edge feature information to be lost. Although it does not have much effect on the information of light and color differences in the house, it will cause the loss of edge feature information, resulting in an unsatisfactory edge segmentation effect.
In the recognition of road elements, Unet++ has the phenomenon of missing recognition of some roads due to small widths, and the edge segmentation effect is also poor. A-Unet++ improves the edge segmentation effect of roads, but the phenomenon of missing recognition remains unimproved. S-Unet++ improves the phenomenon of missing recognition, but the edge segmentation effect has not been improved. AS-Unet++ improves the phenomenon of missing recognition in both aspects. Similar to the house element, the road element occupies a relatively small proportion in remote sensing images, and the lack of SE model results in poor capture of road information, which leads to the phenomenon of missed recognition. The lack of ASPP results in the loss of edge feature information, which leads to unsatisfactory edge segmentation effects.
In the recognition of forest elements, Unet++ recognizes the surrounding forest pixels poorly due to the interference of the wire pixels in the lower right corner. S-Unet++ completely recognizes the wire pixels as forest pixels compared with Unet++, with no significant improvement in the recognition performance. A-Unet++ can split the wire pixels and the forest pixels better than Unet++, and the recognition effect is closer to that of AS-Unet++. The lack of SE model does not affect the network’s ability to capture forest information in remote sensing images because the forest elements account for a large proportion of the remote sensing image. The interference of power lines crossing from the forest in remote sensing images become elements with a small proportion in the remote sensing image, and the lack of ASPP results in the loss of interference information, which in turn leads to poor anti-interference ability.
In the recognition of lake elements, the recognition performance of Unet++ and S-Unet++ is similar, and the recognition in the edge part is not satisfactory enough. The recognition performance of A-Unet++ and AS-Unet++ is similar, and both of them improve in the recognition of edges. Same as the forest element, the lake element occupies a relatively small proportion in the remote sensing image, and the lack of SE model does not affect the network’s ability to capture the lake information. While the lack of ASPP leads to the loss of edge feature information, which in turn leads to an unsatisfactory effect of edge segmentation.
The Precision, Recall, and IoU of various networks for house, road, forest, and lake predictions in the test sets are shown in
Table 4.
The MIoU of AS-Unet++ on the test set was 90.2%. Meanwhile, Unet++ had 83.2% MIoU for the test set, A-Unet++ had 86.6% MIoU for the test set, and S-Unet++ had 86.2% MIoU for the test set. Compared with Unet++, A-Unet++, and S-Unet++, the MIoU of AS-Unet++ was improved by 7.0%, 3.6%, and 4.0%, respectively.
In the identification of house elements, AS-Unet++ improved the three metrics of Precision, Recall, and IoU by 5.8%, 6.4%, and 6.3%, respectively, compared to Unet++, and the three metrics of A-Unet++ improved by 4.9%, 4.4%, and 4.5%, respectively, compared with S-Unet++, and the three metrics of S-Unet++ improved by 2.9%, 2.3%, and 2.8%, respectively.
In the identification of road elements, AS-Unet++ improved 6.4%, 7.0%, and 6.9% in the three metrics compared to Unet++; 3.3%, 2.1%, and 2.9% in the three metrics compared to A-Unet++; and 2.4%, 1.3%, and 2.0% in the three metrics compared to S-Unet++.
In the identification of forest elements, AS-Unet++ improved 9.7%, 9.3%, and 9.5% in the three metrics compared to Unet++; 5.4%, 4.6%, and 5.2% in the three metrics compared to A-Unet++; and 7.6%, 5.7%, and 6.7% in the three metrics compared to S-Unet++.
In the identification of lake elements, AS-Unet++ improved the three metrics by 5.5%, 5.4%, and 5.4%, respectively, compared to Unet++; improved the three metrics by 2.5%, 1.2%, and 1.8%, respectively, compared to A-Unet++; and improved the three metrics by 5.0%, 3.9%, and 4.3%, respectively, compared to S-Unet++.
In the recognition of house elements and road elements, S-Unet++ was higher compared to A-Unet++, which shows that the SE model improves the performance of recognition of elements with smaller pixel occupancy more significantly. In the recognition of forest elements and lake elements, A-Unet++ was higher compared to the three metrics of S-Unet++, and ASPP had better recognition performance in the recognition of elements with large pixel occupancy because of better edge segmentation and better resistance to interference with small occupancy.
- (2)
Comparison of AS-Unet++, Unet, and AS-Unet
Comparing AS-Unet++ with Unet and the AS-Unet model in the training sets and test sets allows for visualization of the performance optimization of the network.
The graphs of MIoU in the three kinds of networks during the training of houses, roads, forests, and lakes are shown in
Figure 13.
It can be seen that after training, the MIoU of the AS-Unet++ verification set reached 88.9%. However, the MIoU of Unet and AS-Unet on the verification set was 80.8% and 85.8%, respectively.
The Precision, Recall, and IoU of the verification sets of each network for road elements, forest elements, and lake elements are shown in
Table 5.
It can be seen from the above data that the AS-Unet network is superior to the Unet network in all indicators, and the AS-Unet++ network, as a further optimization of the AS-Unet network, has improved in all aspects of accuracy compared with the AS-Unet network.
Compared with the AS-Unet and Unet network, the MIoU of the AS-Unet++ network increased by 3.1% and 8.1%, respectively. In
Figure 13, the overall convergence speed of the three kinds of differences was small, and, only in the road elements recognition training, the AS-Unet++ network convergence speed was slightly faster than the other two networks. In addition, in the training process, AS-Unet++ compared with the other two network oscillations was smaller, especially in the roads, forests, and lakes element recognition training. In the identification of house elements, the Precision index increased by 2.7% and 6.4%, the Recall index increased by 3.2% and 7.4%, and the IoU index increased by 3.1% and 7.3%, respectively. In the recognition of road elements, the Precision index increased by 2.5% and 9.3%, Recall increased by 2.1% and 9.6%, and IoU increased by 2.2% and 9.3%, respectively. In the recognition of forest elements, the Precision index increased by 6.8% and 14.5%, Recall increased by 6.8% and 13.8%, and IoU increased by 6.7% and 14.0%, respectively. In the identification of lake elements, the Precision index increased by 4.5% and 6.0%, Recall increased by 4.6% and 5.8%, and IoU increased by 4.3% and 5.7%, respectively. The improvement of forest identification accuracy was particularly obvious.
Figure 14 shows a comparison of the predicted segmentation images of houses, roads, forests, and lakes achieved by the three networks.
As can be seen from
Figure 14, although Unet is able to recognize the corresponding elements, there is still some misrecognition and omission. In the recognition of houses, a small number of roof pixels are incompletely recognized due to the difference in light received by different surfaces of the roof. In the recognition of roads, there are omissions in the recognition of roads with small widths. In the recognition of forests, the segmentation interference of the power lines at the lower right side leads to the leakage of recognition of the surrounding pixels. There is leakage recognition in the curved part of the lake edge.
Compared with Unet, AS-Unet significantly improved the recognition and segmentation of various elements. In the recognition of houses, the missing recognition phenomenon of Unet has been improved, but there are still a small number of pixels missing recognition in places with large differences in house lighting, which leads to incomplete recognition of all house pixels. In the identification of roads, the phenomenon of misidentification of banded wasteland similar to roads has been significantly improved, but the problem of the missing identification of roads with small widths still exists. In the recognition of forest, the missing recognition is obviously improved, but there is also a phenomenon of misidentifying grassland as forest. In lake recognition, the edge with a complex shape can be segmented correctly, and the performance is obviously improved.
The segmentation effect of AS-Unet++ is improved compared with both Unet and AS-Unet. In the recognition of houses, AS-Unet++ can identify the houses in the figure more accurately. Moreover, there is no missing recognition phenomenon like Unet caused by differences in lighting for a single house. In the road identification, the problem of road leakage identification with small widths can be solved and the banded wasteland similar to the road is not misidentified. In the forest identification of AS-Unet++, the missing identification phenomenon caused by power lines in the lower right is solved, so that the identification area is larger. In the recognition of lakes, the edges with complex shapes can also be correctly segmented.
The Precision, Recall, and IoU of the test sets of each network for road elements, forest elements, and lake elements are shown in
Table 6.
The MIoUs of AS-Unet++, Unet, and AS-Unet in the test set were 90.2%, 80.5% and 85.5%, respectively.
It can be seen from the above data that AS-Unet++ is superior to Unet and AS-Unet in each index of the test sets. Compared with AS-Unet and Unet, the MIoU of AS-Unet++ increases by 4.7% and 9.7%, respectively. In the identification of housing elements, the Precision index increased by 3.3% and 7.0%, the Recall index increased by 3.5% and 7.9%, and the IoU index increased by 3.4% and 7.5%, respectively. In the recognition of road elements, Precision index increased by 2.0% and 9.0%, Recall index increased by 2.6% and 9.5%, and IoU index increased by 2.6% and 9.8%, respectively. In the recognition of forest elements, the Precision index increased by 7.7% and 14.9%, Recall increased by 7.5% and 14.5%, and IoU increased by 7.4% and 14.9%, respectively. In the recognition of lake elements, the Precision index increased by 5.1% and 6.6%, Recall increased by 5.4% and 6.5%, and IoU increased by 5.3% and 6.6%, respectively.
- (3)
Comparison of AS-Unet++, CE Loss, and Conventional Data Enhancement (CDE)
AS-Unet++, CE Loss, and CDE are compared in the training sets and test sets to evaluate the method of this paper by further comparative experiments.
The graphs of IoU in the three kinds of networks during the training of houses, roads, forests, and lakes are shown in the
Figure 15.
It can be seen that after training, the MIoU of AS-Unet++, CE Loss, and CDE on verification sets is 88.9%, 78.4%, and 78.7%, respectively.
The Precision, Recall, and IoU of the verification set of each network for road elements, forest elements, and lake elements are shown in
Table 7.
From the above data, it can be seen that AS-Unet outperforms the other two networks in all accuracy metrics. Compared with CE Loss and CDE, the MIoU of AS-Unet++ is improved by 10.5% and 10.2%, respectively. In
Figure 15, AS-Unet++ has the fastest convergence speed and is much faster than the other two networks in the training of roads, forests, and lakes. In the training process, CE Loss and CDE have larger oscillations, especially in the second half of the iteration of the training process, which is still obvious, compared with which the oscillations of AS-Unet++ are smaller and the performance is better.
In the identification of house elements, the Precision index increased by 18.5% and 20.2%, the Recall index increased by 18.7% and 19.6%, and the IoU index increased by 18.7% and 20.1%, respectively. In the recognition of road elements, the Precision index increased by 10.3% and 11.2%, the Recall index increased by 9.7% and 9.6%, and the IoU index increased by 10.0% and 10.3%, respectively. In the recognition of forest elements, the Precision index increased by 6.1% and 5.9%, the Recall index increased by 5.2% and 4.2%, and the IoU index increased by 5.8% and 4.8%, respectively. In the identification of lake elements, the Precision index increased by 12.1% and 10.6%, the Recall index increased by 10.5% and 8.8%, and the IoU index increased by 11.3% and 9.8%. The improvement of house, road, and lake identification accuracy is particularly obvious.
As can be seen in
Figure 16, all of these networks can essentially recognize the corresponding elements, but there are more significant differences in performance.
Figure 16 shows a comparison of the predicted segmentation images of houses, roads, forests, and lakes achieved by the three networks.
Although CE loss can basically recognize the corresponding elements, there are serious misrecognitions in other non-element parts. In the case of houses, since they are generally rectangular in the image, the network recognizes vehicles and other rectangular elements as houses and also recognizes non-house pixels around the houses as houses. In the recognition of road elements, the roads are distributed in bands, but the network recognizes other non-road elements of the barren land distributed in bands in the image as roads. In the recognition of forest elements, it will recognize some pixels of grass that have a similar color to forest as forest. Lake element identification also suffers from misidentifying a large number of non-lake elements.
In the recognition based on CDE, there are cases where pixels that should have belonged to the corresponding element are not recognized. For example, some houses are not recognized in the house element because the color of different houses varies greatly. In the road element, a road with a small width is not recognized. In the forest element, most of the elements around the lower right side are not recognized because of the interference of power lines. The lake element is not identified because of the color difference of some waters.
AS-Unet++ overcomes the phenomenon that CE Loss misidentifies other similar elements, such as other non-house elements that also present rectangles in the identification of house elements, other elements that also present banded distributions in the identification of road elements, other grasslands similar to forests in the identification of forest elements, and other similar non-lake elements in the identification of lake elements. AS-Unet++ also overcomes the phenomenon of missing recognition in CDE, such as houses with large color differences in house feature recognition, roads with smaller widths in road feature recognition, and forests in forest feature recognition. The interference of the lower right wire is overcome to identify the pixels belonging to the forest, and there is no omission of different colored waters in the lake feature. Compared with the other two networks, AS-Unet++ has better performance.
The Precision, Recall, and IoU of the test sets of each network for road elements, forest elements, and lake elements are shown in
Table 8.
The MIoUs of AS-Unet++, CE Loss, and CDE on the test set are 90.2%, 77.9%, and 78.1%, respectively.
It can be seen from the above data that AS-Unet++ is superior to CE Loss and CDE in each index of the test sets. Compared with CE Loss and CDE, the MIoU of AS-Unet++ increases by 12.3% and 12.1%, respectively. In the identification of house elements, the Precision index increased by 20.1% and 21.4%, the Recall index increased by 19.2% and 20.5%, and the IoU index increased by 19.7% and 21.3%, respectively. In the recognition of road elements, the Precision index increased by 10.1% and 8.8%, the Recall index increased by 10.1% and 9.4%, and the IoU index increased by 10.3% and 10.5%, respectively. In the recognition of forest elements, the Precision index increased by 7.8% and 7.0%, the Recall index increased by 5.5% and 4.8%, and the IoU index increased by 6.6% and 5.8%, respectively. In the recognition of lake elements, the Precision index increased by 14.0% and 11.4%, the Recall index increased by 11.9% and 10.0%, and the IoU index increased by 12.9% and 10.8%, respectively.