Road Crack Detection Using Deep Neural Network Bas
Road Crack Detection Using Deep Neural Network Bas
This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier xxxx
ABSTRACT Intelligent detection of road cracks is crucial for road maintenance and safety. Due to the
interference of illumination and different background factors, the road crack extraction results of existing
deep learning methods are incomplete, and the extraction accuracy is low. We designed a new network
model, called AR-UNet, which introduces a convolutional block attention module (CBAM) in the encoder
and decoder of U-Net to effectively extract global and local detail information. The input and output CBAM
features of the model are connected to increase the transmission path of features. The BasicBlock is adopted
to replace the convolutional layer of the original network to avoid network degradation caused by gradient
disappearance and network layer growth. We tested our method on DeepCrack, Crack Forest Dataset, and
our own labeled road image dataset (RID). The experimental results show that our method focuses more
on crack feature information and extracts cracks with higher integrity. The comparison with existing deep
learning methods also demonstrates the effectiveness of our proposed method. The code is available at:
https://github.com/18435398440/ARUnet.
INDEX TERMS Residual structure, attention mechanism, deep learning, crack detection
VOLUME 4, 2016 1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
connections in the encoder. ResNet34 residual network [16] to mark contours using FAST feature point recognition and
was used, and the original convolution of the residual net- used PYNQ for crack identification. However, the accuracy
work was replaced with an expanded convolution [17] to of these method is poor when there is a lot of noise in the
extract crack information, and an attention mechanism was background.
introduced to obtain the final crack detection results. these Algorithms such as wavelet pavement crack detection
methods have poor detection accuracy in the presence of [23][24] use wavelet transform to convert cracks and noise
many background disturbing factors. into different wavelet coefficients. These methods require
U-Net neural network is a coding and decoding structure high equipment requirements and are prone to disadvantages
that can be trained end-to-end using fewer images to detect such as over-segmentation and susceptibility to interference
road cracks quickly. However, there are many distracting by external factors.
factors in road images, and the U-Net network is insufficient Histogram statistics and shape analysis algorithms [25],
to extract the fine cracks in the images. After the introduction morphological image processing and logistic regression sta-
of the CBAM into the U-Net neural network, the structure tistical classification [26], and free-form path calculation
of the neural network and the number of network layers methods [27], which combine brightness and connectivity
increase, and the network model shows network degradation. to detect cracks. The detection is not practical under the
To solve the above problems, the work in this paper focuses influence of complex backgrounds and the presence of more
on the following aspects: background-interfering factors, etc. The median filtering al-
1) we design a new network model called AR-UNet by in- gorithm [28] enhances grayscale pavement images using four
troducing the convolutional block attention module (CBAM) structural element reconstructions and combines the morpho-
in the U-Net neural network. The CBAM performs global logical gradient operator and morphological closure operator
averaging and global maximum hybrid pooling of channels to extract crack edges. However, these method can identify
and spaces of input features to focus on more global and local crack pixels with noticeable contrast changes in the crack
detail information. The performance of the neural network in image, and its crack extraction accuracy is poor for cracks
detecting fine cracks is improved. with inconspicuous features.
2) CBAM’s input and output features are pooled using Shah and Wang et al. [29] [30] studied crack segmen-
shortcut connections to increase the transmission path of tation based on edge detection. Still, the natural properties
crack features, and the network model can learn more about of road diseases were not considered, and the algorithm’s
crack features. applicability was less than ideal. The segmentation algorithm
3) BasicBlock replaces the convolutional layers of the of edge detection is generally based on local grayscale and
U-Net network to avoid network degradation due to the gradient information to identify crack edges, which is only
increase in the number of network layers. Further, improve applicable to cracks with complete edge information. It is
the accuracy of crack extraction. easy to judge the background with strong edge information
as crack information points. When there is more noise, the
II. RELATED WORK effect of edge detection is poor.
Traditional road pavement crack detection mainly has the fol-
lowing categories: 1) manual detection, 2) threshold method, III. METHOD
3) wavelet transform, 4) morphological image processing and A. OVERALL NETWORK STRUCTURE
classification, 5) path method and 6) edge detection method. The U-Net neural network is divided into three parts: en-
Manual detection is through the pavement investigator driv- coder, decoder, and prediction module. The encoder reduces
ing along the road to record the location of cracks, the degree the image size and extracts the initial image features by
of damage, and the number of information. Such a method convolution and maximum pooling. The decoder obtains the
is detailed and comprehensive, but the amount of human and deep features of the image by convolution (a ReLU func-
material resource consumption is large and inefficient. tion follows each convolution). Finally, pixel classification is
Thresholding-based image segmentation methods have an done by 1×1 convolution.
early origin and are widely used. The thresholding method The established network structure is shown in Figure 1.
detects cracks utilizing the feature that the gray value of crack The network structure mainly consists of a feature extrac-
image pixels is lower than the background [18]. Kirschke et tion network, residual module, and CBAM module. The
al. [19] proposed a histogram-based threshold segmentation BasicBlock module replaces the convolutional layer of the
method, which can only be used for more apparent crack U-Net network. BasicBlock module can effectively solve
identification. Removal algorithms [20] using binary seg- the problem of network model degradation and gradient
mentation, morphological operations, and removal of isolated disappearance when the number of network layers increases.
points and regions are prone to the presence of gaps in The network introduces CBAM and then sums the input and
detected cracks. Segmentation using an improved adaptive output of CBAM; the module is called Res-CBAM. Res-
iterative thresholding segmentation algorithm [21] can also CBAM makes the network pay more attention to the channel
yield crack images. Zhang et al. [22] took advantage of and spatial dimensions crack information and assign more
the significant difference between cracks and background weights to the network coefficients.
2 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
Mc Ms
dimensional channel attention feature map Mc ∈ C × 1 × 1
In put Feature Out put Feature
and the two-dimensional spatial attention feature map Ms ∈
C×1×1 1×H×W
1 × H × W in turn, and finally outputs the weighted features
with channel and space. The overall attention is calculated as
follows:
C× H× W C× H× W
F ′′ = Ms (F ′ ) × F ′ (2)
B. CONVOLUTIONAL BLOCK ATTENTION MODULE ′
Where F denotes the input features after the channel
(CBAM) attention operation, F ′′ is the final refined output.
CBAM is a lightweight module that contains spatial attention
and channel attention. The module derives attention weights C. CHANNEL ATTENTION MODULE(CAM)
sequentially along two independent dimensions, channel and The structure of the Channel Attention Module is shown in
space, and then multiplies the output attention map with Figure 3; The two Mc = 1 × 1 × C feature maps are obtained
the input feature map for adaptive feature refinement. Since by feeding the input features into global max pooling and
CBAM is a lightweight, general-purpose module, it can be global average pooling, respectively. Then after two layers of
seamlessly integrated into any CNN architecture. It can be the fully connected neural network, the number of neurons
C
trained end-to-end with the underlying CNN. Compared to in the first layer is (r is the compression rate). ReLu is
attention modules focusing on only one side, CBAM can take r
the activation function, and the number of neurons in the
care of both sides and extract more information about the second layer is C. Then, the fully connected neural network’s
target. output features are summed and passed through the sigmoid
As shown in Figure 2, assuming F = C × H × W as activation function to generate the channel attention features
the input feature map, the CBAM module computes the one- (Mc ). The channel attention is calculated as follows:
VOLUME 4, 2016 3
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
MaxPool
In put Feature
Mc
s
AvgPool
C×1×1
C× H× W MLP
F` Convolution layer Ms
C× H× W MaxPool/AvgPool 1×H× W
c c
Mc (F ) = σ(W1 (W0 (Favg )) + W1 (W0 (Fmax ))) (3) Input Channel Spatial MaxPool
Feature
Attention Attention 2×2
C H W
C
Where σ denotes the sigmoid function,W0 = × C, W1 = C
H
W
r 2 2
C
C× .
r FIGURE 5. Partial structure of the encoder.
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
The residual block is divided into two parts: the direct map-
ping part and the residual part. h (xl ) is the direct mapping, 0 .1 5
and the response is the curve on the right in Figure 7;
f (xl , wl ) is the residual part, which consists of two convo-
0 .1 0
lution operations, and the part containing the convolution on
the left in Figure 7.
The shortcut connections between the input and output 0 .0 5
2148 and 708 images were obtained from the DeepCrack and L re rin g ra te
Crack Forest datasets, respectively. Then, we made a dataset
(b) Loss values using the SGD optimizer
with 548 images from the road images acquired by mobile
LiDAR mapping system. The labeled images in these three FIGURE 8. Statistical results of training loss values with different learning
datasets were manually labeled. To validate the established rates
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
FIGURE 9. Experimental visualization results of three data sets.(where "Res-CBAM" means only Res-CBAM is introduced, "BasicBlock" means only BasicBlock is
introduced, and "Res-CBAM+ BasicBlock" means all the two structures are introduced.)
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
TABLE 1. Results on different datasets.(where "+" means the structure is introduced and "-" means the structure is not introduced)
in each of the three datasets. As Figure 9 shows the vi- Regarding the results of RID. we see that the network
sualization results of the experiments, rows 1-2 show the achieves the best performance by introducing attention and
detection results of the DeepCrack dataset, which shows that residual structure. The DICE and F1-scores reach 50.39%
the original neural network crack extraction is incomplete and 55.47%, respectively. However, the obtained perfor-
and the extraction accuracy is poor. After the introduction of mance is lower than the performance on the other datasets.
Res-CBAM and BasicBlock, the network model can focus Because the road image dataset (RID) has uneven illumi-
more on the crack region, and the crack completeness is nation and skewed shooting angles. In addition, the ground
higher. Rows 3-4 show the results of the crack forest dataset, labels of this dataset are only one or a few pixels wide, which
and the extracted cracks are more realistic. Rows 5-6 show is one of the reasons for the low detection results.
the results of RID, where the fine cracks are extracted to be
more complete. V. DISCUSSION
A. EFFECTIVENESS OF SHORTCUT CONNECTIONS
E. RESULTS OF ABLATION EXPERIMENTS We further verified through ablation experiments whether
adding shortcut connections in CBAM positively affects the
Results on DeepCrack. We explored the contribution of
extraction of cracks. The experimental results are shown
introducing each component on DeepCrack’s test set. As
in Table 2. We found that by adding shortcut connections,
shown in Table 1, we found that introducing Res-CBAM
the crack extraction accuracy of the network was improved
improved DICE from 65.39% to 68.72% and F1-scores from
because the shortcut connections increased the path of feature
67.26% to 75.64%. And then, we integrated BasicBlock into
information propagation. The neural network learned more
the original network and found that DICE and F1-scores
global and local crack information, proving our method’s
improved further to 83.91% and 83.67%. We concurrently
feasibility.
added Res-CBAM and BasicBlock into the neural network,
and the DICE and F1-scores reached 84.09% and 85.82%, TABLE 2. Test results for CBAM in RID dataset with or without residual
respectively. We improve the structure of the encoder and connections.
decoder and yield higher extraction accuracy compared to U-
Net. Methods D(%) P(%) R(%) F1-score(%)
Results for the Crack Forest dataset. we can see that CBAM 48.78 56.53 51.34 53.81
the DICE and F1-scores improve to 67.2% and 68.85%, Res-CBAM 50.39 58.97 52.36 55.47
respectively, after the introduction of Res-CBAM and Ba-
sicBlock in U-Net. The precision performance of the neural
network is better after introducing Res-CBAM alone. The Since Res-CBAM plays an essential role in the network
neural networks performed better in recall after introducing structure, the position of Res-CBAM may affect the neural
BasicBlock alone. But their F1-scores did not perform as well network performance. We compare two position ways of Res-
as the networks introduced simultaneously. The experimental CBAM placement in the decoder, as shown in Figure 10 (a)
results of the crack forest dataset show that the simultaneous and (b). The effects of introducing Res-CBAM in convolution
introduction of Res-CBAM and BasicBlock can effectively and deconvolution on the neural network are discussed. In
improve the crack detection ability of U-Net. the same experimental environment, the neural networks with
VOLUME 4, 2016 7
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
T r a in lo s s
0 .3
from the encoder, which makes the input information richer.
Introducing Res-CBAM into the position shown in Fig. 0 .2
10(b), the DICE and F1-scores are lower because some fea-
0 .1
ture information is lost after the input features are subjected
to two convolution operations, resulting in a degradation of
0 .0
the network detection performance.
0 1 0 2 0 3 0 4 0 5 0
E p o c h
U -N e t
0 .5
R e s-C B A M
R e s-C B A M + B a s
0 .4
T r a in lo s s
0 .3
Res-CBAM
0 .2
(a) Introducing Res-CBAM in convolution
0 .1
0 .0
0 1 0 2 0 3 0 4 0 5 0
E p o c h
U -N e t
0 .5
R e s-C B A M
R e s-C B A M + B a s
0 .4
(b) Introducing Res-CBAM in up-
convolution
T r a in lo s s
0 .3
0 .1
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
was slightly worse than the original network, and network with an overall precision of 55.4%. Compared with Trans-
degradation occurred. So we connected the input and output former, our method integrates the channel and spatial location
features of CBAM and replaced the convolutional layer of information of cracks in the feature extraction stage, and the
the original network with BasicBlock. The improved neural attention weight is tilted toward cracks. Transformer focuses
network converged faster and with higher accuracy. more on global information and ignores local information.
The proportion of crack pixels in the image is smaller,
C. COMPARISON WITH TRADITIONAL DEEP LEARNING so ignoring local information will lead to lower detection
ALGORITHMS accuracy.
The comparison results with other commonly used methods
are shown in Table 4. And our method has better accuracy VI. CONCLUSION
compared to SegNet [34], RCF [35], and DeepCrack [32]. We introduced Res-CBAM and BasicBlock into the U-Net
The F1-scores in DeepCrack Dataset are 10.2% better than network to establish a neural network model for crack de-
SegNet, and the precision and recall are 15.7% and 4.5% tection. The experimental results show that the introduction
better, respectively. In Crack Forest Dataset, the F1-score of CBAM enhances the attention of the neural network to
is improved by 18.1% compared to DeepCrack, and the the crack region, improves the extraction ability of the neural
precision and recall are improved by 16.5% and 19.7%, re- network for fine cracks, and suppresses the interference of
spectively. In the RID dataset, our network outperforms other background factors. Meanwhile, The shortcut connections of
networks, with a 10.7% improvement in F1-score compared Res-CBAM and the replacement of the convolutional layer
to RCF, 18.3%, and 2.5% improvement in precision and in the network structure by BasicBlock ensure the trans-
recall, respectively. The experimental results show that inte- mission of crucial information as efficiently as possible and
grating CBAM and residual structure in the U-Net network effectively suppress the problem of network degradation. The
can improve its crack detection performance and increase constructed neural network learns more features about cracks
detection accuracy. and improves the ability of the model to detect fine cracks.
Compared with several other neural network methods, the
D. COMPARISON WITH TRANSFORMER ALGORITHM neural network built in this study has a significantly enhanced
To further demonstrate the advantages of the method pro- ability to extract cracks. The excellent accuracy and robust-
posed in this study, we also compare the method with the ness of the neural network were verified through extensive
recently published Vision Transformer (VIT) [36], Swin- experiments on different data sets.
UNet [37], and TransUNet [38] algorithms. Our method also
has some advantages. The comparison results are shown in REFERENCES
Table 4; for the DeepCrack dataset, our method’s overall [1] R, Stefania C., and I Brilakis. “Synthetic structure of industrial plastics,”
Journal of Computing in Civil Engineering, vol.31, no.2, mar.2017.
accuracy is 87.2%, and the precision and recall are 88.9% and [2] J Jong-Hyun, H Jo, and G Ditzler, “Convolutional neural networks for
85.7%, respectively. For Crack Forest Dataset, the precision pavement roughness assessment using calibration-free vehicle dynamics,”
of our method is lower than TransUNet by 0.6%, but our Computer-Aided Civil and Infrastructure Engineering,, vol.35, no.11,
pp.1209-1229, Mar. 2020, 10.1111/mice.12546.
overall accuracy is 0.2% higher than TransUNet. And for the [3] H Y Ju,W Li,S Tighe,Z C Xu,J Z Zhai, “CrackU-net: a novel deep
RID dataset, our method also outperforms other algorithms convolutional neural network for pixelwise pavement crack detection,”
VOLUME 4, 2016 9
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
Structural Control and Health Monitoring, vol. 27, no.8, Mar. 2020, [23] B C SUN,Y J QIU and S Q LIANG, “Research on wavelet-based pavement
10.1002/stc.2551 crack identification” Journal of Chongqing Jiaotong University (Natural
[4] E. H. Miller, “Crack detection and segmentation using deep learning Science Edition), vol.29, no.01 pp.69-72,2010,
with 3D reality mesh model for quantitative assessment and integrated [24] P. Subirats, J. Dumoulin, V . Legeay, and D. Barba, “Automation of pave-
visualization,” Journal of Computing in Civil Engineering., vol. 34, no.3, ment surface crack detection using the continuous wavelet transform”in
May. 2020, 10.1061/(ASCE)CP.1943-5487.0000890 Proc. Int. Conf. Image Process. (ICIP), pp.3037–3040. Oct.2006.
[5] L Zhang, F Yang, Y M Zhang, Daniel, “Road crack detection using [25] Z G XU ,X M ZHAO and H S SONG, “Crack identification algorithm
deep convolutional neural network,” in IEEE International Conference on for asphalt pavement based on histogram estimation and shape analysis”
Image Processing,Phoenix, AZ, USA, 2016, 10.1109/ICIP.2016.7533052 Journal of Instrumentation, vol.31, no.10 pp.2260-2266, Oct. 2010.
[6] Y Shi;, L M Cui, Z Q Qi, F Meng and Z S Chen, “Automatic road [26] A Landstrom ,M J Thurley, “Morphology-based crack detection for steel
crack detection using random structured forests,” IEEE Transactions on slabs” IEEE Journal of selected topics in signal processing, vol.6, no.7
Intelligent Transportation Systems, vol.17, no.12, pp.3434-3445, May. pp.866-875, Aug.2012. 10.1109/JSTSP.2012.2212416
2016, 10.1109/TITS.2016.2552248. [27] T S Nguyen, S Begot, F Duculty and M Avila, “Free-form anisotropy:
[7] G YAO, F J WEI, J Y QIAN and Z G WU, “Crack Detection Of Concrete A new method for crack detection on pavement surface images,”in 2011
Surface Based On Newline Convolutional Neural Networks,” in 2018 18th IEEE International Conference on Image Processing, IEEE, pp.1069-
International Conference on Machine Learning and Cybernetics,Chengdu, 1072. Sept.2011.10.1109/ICIP.2011.6115610
China, 2018, 10.1109/ICMLC.2018.8527035 [28] Y Maode,B Shaobo, X Kun and Y Y He, “Pavement crack detection
[8] Z Liu, Y Cao, Y Wang and W Wang, “Computer vision-based and analysis for high-grade highway,”in 2007 8th International Con-
concrete crack detection using U-net fully convolutional networks,” ference on Electronic Measurement and Instruments, IEEE, Aug.2007.
Automation in Construction, vol.104,(AUG.) pp.129-139, Aug. 2019, 10.1109/ICEMI.2007.4351202
10.1016/j.autcon.2019.04.005 [29] S Shah, “utomatic cell segmentation using a shape-classification model in
[9] Dorafshan, S, RJ.Thomas, and M Maguire, “Comparison of deep convo- immunohistochemically stained cytological images,”IEICE transactions
lutional neural networks and edge detectors for image-based crack detec- on information and systems, vol.E91-D, no.7, pp.1955-1962 Jul.2008.
tion in concrete,” Construction and Building Materials, vol.186,(AUG.) 10.1093/ietisy/e91-d.7.1955
pp.1031-1045, Oct. 2018, 10.1016/j.conbuildmat.2018.08.011 [30] H Wang, N Zhu and Q Wang, “Segmentation of pavement cracks using
differential box-counting approach,”Journal of Harbin Institute of Tech-
[10] H F Li; J P Zong; J J Nie; Z L Wu; H Y Han, “Pavement crack
nology, vol.39, no.1, pp.142-144, 2007.
detection algorithm based on densely connected and deeply supervised
[31] K He, X Zhang, S Ren, J Sun and M Research “Deep residual learning for
network,” IEEE Access, vol.9, pp.11835-11842, Jan. 2021, 10.1109/AC-
image recognition,”in Proceedings of the IEEE conference on computer
CESS.2021.3050401
vision and pattern recognition, IEEE, pp. 770-778. 2016.
[11] O Ronneberger, P Fischer and T Brox , “U-net: Convolutional networks for
[32] Y Liu, J Yao, X Lu, R P Xie and L Li, “DeepCrack: A deep hierarchical
biomedical image segmentation,” in International Conference on Medical
feature learning architecture for crack segmentation,”Neurocomputing,
image computing and computer-assisted intervention,vol.9351 Springer,
vol.338, pp.139-153, Apr. 2019.
Cham, pp.234-241, Nov.2015
[33] Shi Y, Cui L, Qi Z,M Fan and Z S Chen “Automatic road crack
[12] Z W Zhou, M M R Siddiquee, N Tajbakhsh and J M Liang, “Unet++: detection using random structured forests,”IEEE Transactions on In-
A nested u-net architecture for medical image segmentation,” in Deep telligent Transportation Systems, IEEE vol.17,no.12 pp.3434-3445,
learning in medical image analysis and multimodal learning for clinical May.2016.10.1109/TITS.2016.2552248
decision support,vol.11045 Springer, Cham, pp.3-11, Sept.2018 [34] V Badrinarayanan, A Kendall and R Cipolla, “Segnet: A deep convolu-
[13] J Cheng, W Xiong, W Chen, Y Gu and Y S Li “Pixel-level crack detec- tional encoder-decoder architecture for image segmentation,”IEEE trans-
tion using U-net,” in TENCON 2018-2018 IEEE region 10 conference., actions on pattern analysis and machine intelligence, IEEE vol.39,no.12
pp.0462-0466, Oct. 2018 pp.2481-2495, Dec.2017. 10.1109/TPAMI.2016.2644615
[14] Z Fan, C Li, Y Chen,J H Wei, G Loprencipe, X P Chen and P D Mascio [35] Y Liu, M M Cheng,X Hu,J W Bian, L Zhang, X Bai and J H Tang
, “Automatic crack detection on road pavements using encoder-decoder “Richer convolutional features for edge detection,”in Proceedings of the
architecture,” Materials, vol.13, no.13, May.2020, 10.3390/ma13132960 IEEE conference on computer vision and pattern recognition, IEEE, pp.
[15] Z W Zhou, M M R Siddiquee, N Tajbakhsh and J M Liang, “The im- 3000-3009.Oct. 2018. 10.1109/TPAMI.2018.2878849
portance of skip connections in biomedical image segmentation,” in Deep [36] A Dosovitskiy, L Beyer, A Kolesnikov, D Weissenborn, X H Zhai, T
learning and data labeling for medical applications,vol.10008 pp.179- Unterthiner, M Dehghani, M Minderer, G Heigold, S Gelly, J Uszko-
187, Springer, Cham, Sept.2016 reit and N Houlsby, “An image is worth 16x16 words: Transform-
[16] G Xu ,C Liao and J Chen, “Extraction of apparent crack information of ers for image recognition at scale,”arXiv preprint arXiv, Jun.2021,
concrete based on HU-ResNet,” Computer Engineering, vol.46, no.11, 10.48550/arXiv.2010.11929
pp.279-285, 2020 [37] H Cao, Y Y Wang, J Chen, D S Jiang, X P Zhang, Q Tian and M N Wang,
[17] L F Li, N Wang, B Wu, and X Zhang, “Segmentation algorithm of bridge “Swin-Unet: Unet-like Pure Transformer for Medical Image Segmenta-
crack image based on modified pspnet,” Advances in Lasers and Optoelec- tion,”arXiv preprint arXiv, May. 2021, 10.48550/arXiv.2105.05537
tronics, vol.58, no.22, pp.101-109, 2021, 10.3788/LOP202158.2210001 [38] V Badrinarayanan, A Kendall and R Cipolla, “TransUNet: Transformers
[18] S H Hanzaei, A Afshar and F Barazandeh, “Automatic detection and Make Strong Encoders for Medical Image Segmentation,”arXiv preprint
classification of the ceramic tiles’ surface defects,” Pattern Recognition, arXiv, Feb. 2021, 10.48550/arXiv.2102.04306
vol.66, pp.174-189, Jun,2017, 10.1016/j.patcog.2016.11.021
[19] K R Kirschke and S A Velinsky, “Histogram-based approach for
automated pavement-crack sensing,” Journal of Transportation Engi-
neering, vol.118, no.5, pp.700-710, Sept,1992, 10.1061/(ASCE)0733-
947X(1992)118:5(700)
[20] W Huang and N Zhang, “A novel road crack detection and identification
method using digital image processing techniques,” in 2012 7th Interna-
tional Conference on Computing and Convergence Technology (ICCCT),
pp.397-400, Seoul, Korea , Dec.2012 PENG JING was born in Datong, Shanxi
[21] Z W Zhou, M M R Siddiquee, N Tajbakhsh and J M Liang, “Research on Province, China in 1994. He is currently studying
crack detection method of airport runway based on twice-threshold seg- for a master’s degree in the School of Survey-
mentation,” in 2015 Fifth International Conference on Instrumentation and ing and Land Information Engineering of Jiaozuo
Measurement, Computer, Communication and Control (IMCCC), pp.1716- Henan Polytechnic University. His current re-
1720, Qinhuangdao, China, Sept.2015, 10.1109/IMCCC.2015.364 search interests mainly include deep learning ob-
[22] Y H Zhang, J Qin, Z L Guo ,K C Jiang and S Y Cai, “Detection of road ject detection and semantic segmentation.
surface crack based on PYNQ,” in 2020 IEEE International Conference
on Mechatronics and Automation (ICMA),vol.13, no.16 pp.1150-1154,
Beijing, China, Sept.2020
10 VOLUME 4, 2016
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and
content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3233072
P JING et al.: Road Crack Detection Using Deep Neural Network Based on Attention Mechanism and Residual Structure
VOLUME 4, 2016 11
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/