Synthetic Image Augmentation for Damage Region Segmentation using
Conditional GAN with Structure Edge
Takato Yasuno*1 Michihiro Nakajima*1, Seiji Sekiguchi*1,
Kazuhiro Noda *1, Kiyoshi Aoyanagi*1, Sakura Kato*1
*1
Yachiyo Engineering, Co., Ltd.
Recently, social infrastructure is aging, and its predictive maintenance has become important issue. To monitor the state of
infrastructures, bridge inspection is performed by human eye or bay drone. For diagnosis, primary damage region are
recognized for repair targets. But, the degradation at worse level has rarely occurred, and the damage regions of interest are
often narrow, so their ratio per image is extremely small pixel count, as experienced 0.6 to 1.5 percent. The both scarcity and
imbalance property on the damage region of interest influences limited performance to detect damage. If additional dataset of
damaged images can be generated, it may enable to improve accuracy in damage region segmentation algorithm. We propose
a synthetic augmentation procedure to generate damaged images using the image-to-image translation mapping from the tricategorical label that consists the both semantic label and structure edge to the real damage image. We use the Sobel gradient
operator to enhance structure edge. Actually, in case of bridge inspection, we apply the RC concrete structure with the
number of 208 eye-inspection photos that rebar exposure have occurred, which are prepared 840 block images with size 224
by 224. We applied popular per-pixel segmentation algorithms such as the FCN-8s, SegNet, and DeepLabv3+Xception-v2.
We demonstrates that re-training a dataset added with synthetic augmentation procedure make higher accuracy based on
indices the mean IoU, damage region of interest IoU, precision, recall, BF score when we predict test images (236 words).
dataset that consists damage region of interest took much time
consuming. This is one of hurdle to overcome our underlying
problems for social infrastructure aging detection for data mining
from supervised learning approaches. Instead, we proposes a
synthetic augmentation procedure in order to generate inspection
images with damage interest from unsupervised approach.
Since 2014, the original generative adversarial network (GAN)
paper is cited more than 9,000 times to date (July 2019). Starting
from GAN’s invention in 2014, the field of GAN has been
growing exponentially over 360 papers [Hindupur 2017]. GANs
may be used for many applications, not just fighting breast
cancer or generating human faces, but also 62 other medical
GAN applications published through the end of July 2018
[Kazeminia 2018]. Using DAGAN for data augmentation, they
achieved a significant improvement in classification accuracy
compared to the baseline of standard data augmentation only
[Frid-Adar 2018]. They added synthetic data produced by their
DCGAN, then the classification performance improved from
around 80% to 85%, demonstrating the usefulness of GANs.
In order to overcome the scarcity of rare class imageless
including damage region of interest, we expect the usefulness of
synthetic augmentation added the rare class images. We call
“Synthetic Augmentation”. However, in the field of semantic
segmentation task for monitor social infrastructure, it is not clear
that synthetic augmentation can improve the segmentation
accuracy. We demonstrate several training and test added
synthetic augmentation using L1-Conditional GAN.
1. Introduction
1.1 Related GAN Studies for Accuracy
Recently, social infrastructure is aging, and its predictive
maintenance has become an important issue. In order to monitor
the state of infrastructures, the inspection is performed manually
by human eye or automatically by drone. And locations of
damage for screening are detected, in addition primary damage
region for repair targets are segmented per-pixel. After these task,
we often select critical repair targets for predictive maintenance.
Here, we are required to accurately inspect them. However,
deterioration has rarely occurred, and the damage regions are
often narrow, so their ratio within one image is extremely small,
i.e., we experienced cases such as 0.6 and 1.5 percent. Such an
imbalance property of the class weight of ROI-damage toward
background influences constrained performance to improve
accuracy. If this sparse damaged region can be duplicated and an
additional dataset of images and labels can be generated, it will
be possible to achieve stable training process and improved
accuracy in damage region segmentation task.
In the field of social infrastructure inspection, there are related
works to detect their damages such as object detection task
[Gopalakrishnan 2018] and semantic segmentation researches
[Guillanmon 2018]. The damaged class is rare event and the
dataset including that is always imbalance, so the number of rare
class images is very small. The more damaged, the less event
occurred to collect images. Because of this scarcity of damaged
data, it is difficult to improve the accuracy of damaged interest in
social infrastructure monitoring and inspection. Especially, such
a damaged interest images deteriorated is scarce event, the useful
1.2 Synthetic Image Augmentation using cGAN
We think that approaches for generating a damage image
include 1) reproducing the already acquired damage image
(Similar augmentation), and 2) generating a future image
degraded from the current damage grade (What-if degradation)
Contact: Takato Yasuno, RIIPS on 5-20-8, Asakusabashi, Taitoku, Tokyo, 111-8648, tk-yasuno@yachiyo-eng.co.jp.
-1-
the SegNet [Badrinarayanan 2016], and the dense convolution
network such as the DeepLabv3+ResNet18, ResNet50, Xceptionv2 [Chen 2018]. We compare the trained segmentation accuracy
using initial dataset with the re-trained segmentation accuracy
using synthetic augmentation added generated images. We
evaluate the both task performance to compute the similarity
indexes between the ground truth damage region of interest
(ROI) and the predicted region. Exactly, we compute the mean
Intersection of Union (mIoU), class-IoU that consists the ROI
and background. In order to analyze the property of synthetic
augmentation, we compute the precision, recall and BF score.
Therefore, using these existing segmentation architectures, we
get some knowledge whether our method of synthetic
augmentation can improve their segmentation accuracy or not.
Figure 1: Synthetic augmentation method using image-to-image
translation mapping semantic label with structure edge into image.
2.2 Synthetic Augmentation from Semantic Label
with Structure Edge to Image
and 3) what-if newer damage that does not yet exist (what-if
newer). Here, 1) is close to data augmentation that has been
performed as standard in supervised learning by rotation, X/Y
translation, scaling, and so forth. This is useful for giving
variants to the features of the acquired image, increasing
variations, accelerating learning, and increasing generality.
In case of 2), it is possible to simulate the situation where the
deterioration has progressed several years ahead of the current
state. Degraded state that has not yet been experienced, but
generates an image of the state that has progressed one rank
deterioration or the worst image when the management level is
low, exceeds the scope where the supervised data exists. This is
an attempt to eliminate any blind spots in the supervised learning.
3) is an approach that was not possible with supervised learning
based on the experience data. Even in case of social
infrastructures that has not deteriorated, a new damaged image
can be generated in order to prepare for future deterioration, and
even if it has not yet been experienced, it enables to imagine a
degraded future image.
However, it is necessary to have a reality about where and how
much damage occurs in the infrastructure. This means that after
acquiring images of potential damage throughout the life cycle of
social infrastructure. It is necessary to design a new possible
damage scenario and place the damage at the possible position. It
is necessary to generate a new damaged image with ethics in
order to make the manager uneasy about the damaged image
without reality. This paper try to expand the damaged image
using the most basic method 1) of reproducing the current
damaged image.
To train the DCGAN, we need more than 500 images and also
they should have their stable angle. In the infrastructure
deterioration process, a progressed damage is rare event and it is
not easy to collect their damaged images more than even several
hundreds. The eye-inspection view has various angle according
to each field to monitor their social infrastructures. On the other
hand, the image-to-image translation is possible for training a
paired image dataset even with various inspection angle. This
paper propose a synthetic augmentation method using L1Conditional GAN (pix2pix). The original pix2pix paper
translated form the input of edge images to shoe images [Isola
2018]. And using the CamVid dataset, they translated from the
semantic label to photo. However in case of damage images, we
could not success such a naive translation. As shown Figure 1,
this paper proposes the semantic label with structure edge as
input of tri-categorical labels. This augmented label consists
damage-ROI, enhanced structure edge, and background. We tried
several edge detection method such as Gradient operators
(Roberts, Prewitt, Sobel), Laplacian of Gaussian (LoG), Zero
crossing, Canny edge and so forth [Gonzalez 2018]. We selected
the Sobel gradient operator that is a method of finite differences
between the pixel’s function value and that of its right (or left)
neighbor gives gradient at that pixel. This operator is robust to
noise. This paper propose the Sobel detection in order to extract
the background edge from eye-inspection photo. It is possible the
structure feature of concrete parts that consists social
infrastructure such as bridge. We combine the both semantic
label and structure edge produced by the Sobel edge detection
into three class categorical label. We train the mapping from the
semantic label with structure edge to damaged image.
Thus, we summarize a synthetic augmentation step as follows.
First, we train one of semantic segmentation task using the initial
dataset including with eye-inspection images and semantic label.
Second, we apply a synthetic augmentation method mapping to
generate fake images using L1-Conditional GAN from combined
semantic label with structure edge. Third, we re-train another
semantic segmentation task using the both initial dataset and
synthetic augmented dataset added their generated inspection
images. The number of dataset is two times compared with the
initial dataset, so as to extend an opportunity to learn the damage
feature between real inspection photos and synthetic images.
2. Generative Damage Augmentation
2.1 Damage Segmentation Architectures
In order to recognize the damage region of interest for social
infrastructure, semantic segmentation algorithms are useful. We
propose a synthetic augmentation method to generate fake
images and labels using the L1-conditional GAN (pix2pix) to
translate a label image with structure edge to a damaged image.
We apply several existing per-pixel segmentation task based on
transfer learning such as the Fully Convolutional Network (FCN)
[Long 2015] based on AlexNet and VGG16 with different skip
connections that we call the type 8s, 16s, and 32s, furthermore
-2-
whose weight train versus test is 95 : 5, each number of dataset
consists 798 and 42. Next, we trained synthetic augmentation
with the number of 1680, so as to compare the initial result
where we set the same 3 epoch, partition weight 95 : 5, and mini
batch 16 to 32, where the FCN-8s, SegNet needs much memory.
Tabel 1 shows the trained results consist each running time,
mean IoU, class-IoU (rebar exposure, background). The running
time took around two times more than the initial dataset, because
we added the generated images using synthetic augmentation
over the initial dataset. The FCN-AlexNet using synthetic
augmentation outperform the initial trained accuracy to evaluate
value of the mean IoU and rebar exposure IoU. And also, FCN8s, FCN-16s, and SegNet-VGG16 have improved mean IoU and
rebar exposure-IoU more than the initial trained one.
Furthermore, two dense convolutional networks indicated high
performance, these are the DeepLabv3+ResNet50 and Xception.
Therefore, we demonstrated that synthetic augmentation using
L1-Conditional GAN enable to improve the segmentation
accuracy, though it is not always the better off.
3. Applied Results
3.1 Bridge inspection dataset
This paper focuses on the one of social infrastructure
inspection, exactly concrete bridge eye-inspection dataset.
Actually, bridge eye-inspection photos has lower resolution and
also heterogeneous size range from 360 pixels to 1,500. We
select the ROI-rich images from around 20 thousands eyeinspection photos. We got the part of RC concrete structure with
the number of 208 inspection photos that rebar exposure have
occurred. This extracted rate is only one percent, so the rebar
exposure is also rare event. We annotated ROI and background
class labels over each raw image for semantic segmentation task.
Without loss of resolution, in order to keep the pixel feature data
we prepare to extract 998 block images unified with size 224 by
224. Furthermore, we delete small size block images less than
128 pixel, and unusable images without damage-ROI. After these
cleansing, we have a dataset with number of 840 images. We
compute the class weight that the damage-ROI weight is 16.07
and background weight is 0.51 divided by median pixel count.
Table 1: Trained and test predicted results of intersection of union.
architecture
FCNAlexNet
FCN-8s
FCN-16s
FCN-32s
SegNetVGG16
runing
time
initial
49m
0.5376
0.1346
0.9405
synthetic
augment.
100m
0.5874
0.2162
0.9585
mean IoU ROI-IoU
initial
210m
0.6289
0.2778
0.9801
synthetic
augment.
336m
0.7367
0.4851
0.9883
initial
190m
0.6175
0.2612
0.9738
synthetic
augment.
332m
0.6759
0.3720
0.9797
initial
167m
0.5796
0.1963
0.9629
synthetic
augment.
336m
0.5723
0.1999
0.9446
initial
274m
0.6574
0.3263
0.9884
synthetic
augment.
480m
0.8135
0.6344
0.9926
66m
0.6951
0.4044
0.9857
182m
0.7137
0.4447
0.9826
170m
0.7289
0.4686
0.9892
324m
0.8005
0.6082
0.9928
275m
0.6531
0.3255
0.9807
549m
0.7902
0.5886
0.9917
initial
DeepLabv3
+ResNet18 synthetic
augment.
Figure 2: Generated images using synthetic augmentation using L1Conditional GAN in case of rebar exposure at concrete bridge.
DeepLabv3 initial
+ResNet50 synthetic
augment.
3.2 Synthetic Augmentation and Generated Images
DeepLabv3 initial
+Xception synthetic
augment.
We applied the L1-Conditional GAN that carried out image-tomage translation from tri-categorical labels combined between
the semantic label and structure edge by Sobel detection into the
real training dataset 840 block images. We trained 200 epoch that
took 13 hours. The L1 penalty coefficient is 100 at loss function.
Figure 2 shows the generated images by synthetic augmentation.
backgrou
nd-IoU
dataset
3.4 Predict Test Images beyond Initial Dataset
In order to evaluate more general accuracy, we searched and
downloaded another rebar exposure images including concrete
infrastructure such as bridge and building. We tried to predict
these newer test images with the number of 40, where we prepare
center crop procedure located on some rebar exposure. Table 2
shows the predicted results applied on the initial trained networks
and another trained network using synthetic augmentation.
Especially, our synthetic augmentation procedure can perform
3.3 Accuracy Comparison Without-With Deep Fake
We trained initial dataset using RMSProp optimizer 3epoch
with mini batch 16 to 32, around 10 to 20 thousands iterations.
We did standard augmentation random crop extraction multiplied
64 crops with unit size 224 by 224. We partitions the dataset
-3-
consists ROI, structure edge, and background, into real damaged
image. We propose a Sobel edge to extract the feature of
structure edge from eye-inspection photo. Therefore, we
demonstrated that our synthetic augmentation procedure using
L1-Conditional GAN, which enable to improve the segmentation
accuracy, though it is not always the better off. Especially, our
synthetic augmentation procedure can perform higher precision
from the viewpoint of the both rebar exposure and background.
Using our synthetic augmentation procedure, the target region of
interest were approaching to the ground truth of rebar exposure.
Exactly, we demonstrated architectures such as the FCN-AlexNet,
FCN-8s, SegNet-VGG16, and DeepLabv3+Xception and so forth.
Furthermore, we will challenge to develop a pioneer
architecture for social infrastructure health monitoring and asset
management. This paper focused on the road bridge, in future we
would like to increase opportunities to apply dam and river. In
future, another synthetic augmentation due to newer occurrence
of not yet experienced damage and what-if degradation will be
series issue using Cycle/StyleGAN. For more general purpose
application, while maintaining the reality and ethically paying
attention to practical concerns for infrastructure managers.
[Acknowledgments] We would thank Mr. S. Kuramoto and Mr. T.
Fukumoto for providing us information for GAN frameworks.
Table 2: Test prediction beyond initial dataset, precision, recall and BF.
precision
architecture
FCNAlexNet
FCN-8s
FCN-16s
FCN-32s
SegNetVGG16
dataset
recall
BF score
ROI
backgroun
d
ROI
backgroun
d
ROI
backgroun
d
initial
0.1252
0.5937
0.1546
0.5494
0.1296
0.5573
synthetic
augment.
0.1861
0.6975
0.1506
0.5721
0.1532
0.6176
initial
0.2892
0.6347
0.3787
0.7085
0.2951
0.6575
synthetic
augment.
0.3757
0.7182
0.3870
0.7089
0.3600
0.7031
initial
0.1742
0.6374
0.1844
0.6179
0.1688
0.6207
synthetic
augment.
0.2426
0.7031
0.2163
0.6238
0.2134
0.6524
initial
0.1622
0.7099
0.1497
0.5527
0.1465
0.6094
synthetic
augment.
0.1298
0.6509
0.1254
0.5298
0.1187
0.5738
initial
0.2979
0.6560
0.4181
0.7202
0.3267
0.6798
synthetic
augment.
0.3912
0.7063
0.4854
0.7457
0.4124
0.7186
initial
0.3391
0.7562
0.2804
0.7021
0.2916
0.7203
0.1958
0.5798
0.2580
0.6361
0.2090
0.5973
initial
0.3981
0.7598
0.3352
0.7018
0.3451
0.7227
0.3937
0.7918
0.2587
0.6768
0.2849
0.7189
initial
0.2906
0.6491
0.2988
0.6807
0.2714
0.6509
0.3982
0.7327
0.3197
0.6701
0.3130
0.6839
DeepLabv3
+ResNet18 synthetic
augment.
DeepLabv3
+ResNet50 synthetic
augment.
DeepLabv3
+Xception synthetic
augment.
References
[Gopalakrishnan 2018] Gopalakrishnan, K., Gholami, H. et al. :
Crack Damage Detection in Unmanned Aerial Vehicle
Images of Civil Infrastructure using Pre-trained Deep
Learning Model, International Journal for Traffic and
Transport Engineering, 8(1), pp.1-14, 2018.
[Ricard 2018] Ricard, W., Silva, L. et al. : Conclete Cracks
Detection based on Deep Learning Image Classification,
MDPI Proceedings, 2, 489, pp.1-6, 2018.
[Guillanmon 2018] Guillamon, J.R. : Bridge Structural Damage
Segmentation using Fully Convolutional Networks,
Universitat Politecnica de Catalunya, 2018.
[Yasuno 2019] Yasuno, T. : Sparse Damage Per-pixel Prognosis
Indices via Semantic Segmentation, 33th Journal of Society
for Artificial Intelligence, 3B3-E-2-05, 2019.
[Hindupur 2017] Hindupur, A. : The GAN Zoo,
https://github.com/hindupuravinash/the-gan-zoo.
[Kazeminia 2018] Kazeminia, S. et al. : GANs for Medical
Image Analysis, https://arxiv.org/pdf/1809.06222.pdf.
[Frid-Adar 2018] Frid-Adar, M., Diamant, I. et al : GAN-based
Synthetic Medical Image Augmentation for increased CNN
Performance in Lesion Classification, CVPR, 2018.
[Long 2015] Long, J. et al: Fully Convolutional Networks for
Semantic Segmentation, CVPR, pp3431-3440, 2015.
[Badrinarayanan 2016] Badrinarayanan, V., Kendall, A. et al.,
SegNet: Deep Convolutional Encoder-Decoder Architecture
for Image Segmentation, ArXiv:1511.00561v3, 2016.
[Chen 2018] Chen, L-C., Zhu, Y., Papandreou, G. et al :
Encoder-Decoder with Atrous Separable Convolution for
Semantic Image Segmentation, arXiv:1802.02611v3.
[Isola 2018] Isola, P. et al. : Image-to-image Translation with
Conditional Adversarial Network, CVPR, 2017.
[Gonzalez 2018] Gonzalez, R.C., Woods, R.E. : Digital Image
Processing, 4th Global Edition, Pearson.
(2020.March 4)
Figure 3: Overlay between ground truth and predicted mask, to compare
the initial segmentation (top) with the synthetic augmentation (bottom).
From left to right, we show our prediction results as follows : FCNAlexNet, FCN-8s, SegNet-VGG16, DeepLabv3+Xception.
higher precision from the viewpoint of the both rebar exposure
and background. Figure 3 shows the overlay of two labels
between the ground truth region of damage interest and the
predicted region produced by the segmentation task. The top
images stands for the initial dataset based prediction, in contrast
the bottom images denotes the synthetic segmented prediction.
Using our synthetic augmentation procedure, the target region of
interest are approaching to the ground truth of rebar exposure.
Exactly, we demonstrated five improved architectures such as the
FCN-AlexNet, FCN-8s, FCN-16s, SegNet-VGG16, and
DeepLabv3+Xception. On the other hand, the recall sometimes
made a little bit better off. In result, our synthetic augmentation
procedure can improve the precision accuracy at the semantic
segmentation task.
4. Concluding Remarks
This paper proposes a synthetic augmentation procedure using
L1-Conditional GAN. This is an image-to-image translation
algorithm which is mapping from tri-categorized labels that
-4-