Pavement Crack Detection Algorithm Based On Densely Connected and
Pavement Crack Detection Algorithm Based On Densely Connected and
ABSTRACT In order to improve the accuracy and robustness of existing automated crack detection methods,
a fully convolutional neural network for pixel-level detection based on densely connected and deeply
supervised network is proposed. First, the densely connected layers are applied for enhancing the propagation
and reuse of crack features. Then, the deeply supervised modules are designed to make network extract
more significant features through multi-scale levels. Finally, the feature maps from different scales are fused
to achieve complementarity at different levels. In addition, a class-balanced cross-entropy loss function is
designed to balance backgrounds and cracks by increasing the weight of crack pixel loss. The proposed
method is tested on three public datasets, and the experiments show that our method is superior to state-of-
the-art methods in accuracy, speed and robustness.
INDEX TERMS Crack detection, deep learning, densely connected network, deeply supervised network.
I. INTRODUCTION detect cracks on airport runway surface. Wei et al. [3] adopt
In recent years, highway and airport constructions are boom- gray difference and Hough transform to realize automatic
ing all over the world, especially in the developing countries. detection of small cracks. Kapela et al. [4] utilize Hough
To keep good condition of infrastructure, prompt and efficient transform feature (HTF) and local binary pattern (LBP) to
maintenance of pavement surface has become an important extract the edge direction and texture features of cracks
issue in the field of transportation industry. Cracks are the respectively. Qu et al. [5]employ structural forest edge detec-
very early forms of most diseases on pavement surfaces. tor to extract crack edge, and seepage model to complete
Prompt and accurate detection of cracks could minimize denoising. Amhaz et al. [6] propose an automatic detec-
maintenance costs and improve efficiency. However, nowa- tion algorithm of two-dimensional pavement cracks based
days manual inspection shows the disadvantages of poor on minimum path location. The crack detection algorithms
accuracy, high subjectivity and inefficiency, which cannot based on traditional digital image processing transform or
satisfy the needs of rapid highway construction. Thus, effi- map the original image to a specific space, and obtain the final
cient and automated crack detection has become a research detection result by learning the structure of shallow crack
hotspot. features. However, due to the complexity of real pavement
Numerous efforts have been applied on traditional digital conditions and the various uncertainties of environmental
image processing techniques to detect cracks, such as thresh- impacts, such as texture diversity, strong noise interference,
old segmentation, feature extraction, edge detection, filter irregular crack direction and so on, these algorithms are easy
and minimum path methods. Oliveira and Correia [1] extract to be interfered by environmental factors, and cannot meet
crack feature with the combination of connected compo- the needs of accuracy and speed at the same time. Therefore,
nent and automatic threshold segmentation. Li et al. [2] use the efficient and robust crack detection algorithms still need
improved OTSU threshold and adaptive iterative threshold to to be studied.
Since the cracks and edges have similar characteristics
The associate editor coordinating the review of this manuscript and in shape, structure and thickness, it is practicable to apply
approving it for publication was Tomasz Trzcinski . edge detection method to detect cracks. Based on structural
This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
VOLUME 9, 2021 11835
H. Li et al.: Pavement Crack Detection Algorithm Based on Densely Connected and Deeply Supervised Network
forest [7], Shi et al. [8] propose CrackForest algorithm us an efficient way for feature extraction. However, although
to detect pavement cracks by the combination of comple- the DenseNet based algorithms have achieved superior per-
mentary features of cracks, and the result is more accu- formance for feature extraction, due to the semantic fea-
rate than Free-Form Antioxidant (FFA) [9] and Minimal ture distribution of cracks, and the imbalance of foreground
Path Selection (MPS) [6]. However, the algorithm is still and background ratio in crack detection, it is necessary to
based on the human-selected features of crack, which have supervise and fuse the features from different scales when
weak adaptability and poor robustness in complex back- adopting DenseNet, which induces to our work in this paper.
ground. Richer Convolutional Feature (RCF) [10], as one of Since deeply-supervised nets (DSN) method simultaneously
the most advanced edge detection algorithms, can produce minimizes classification error while making the learning pro-
high-quality edges efficiently by combining multiscale and cess of hidden layers direct and transparent, it provides the
multilevel information of objects. But the backbone of RCF potential to supervise the feature extraction with DenseNet in
is only composed of multiple convolution layers, and the our crack detection applications.
high-level convolution layer only uses the feature map which To overcome the difficulties in crack detection due to its
is transmitted from the previous layer, and it leads to that the very thin shape and semantic feature distribution, we propose
high-level convolution neglects many crack features even if a fully convolutional neural networks for pixel-level detection
the final fusion combines the results of all scales. Thus, RCF based on densely connected and deeply supervised network.
is not fully applicable to crack detection. The main contributions are listed as follows.
Deep learning has been widely used in the field of com- 1) The dense connection module is designed for extracting
puter vision. Some studies have been committed to apply the feature map from the image at various scales. Densely
deep learning to detection and recognition of pavement sur- connected convolution is used to extract the features of cracks
face cracks. Eisenbach et al. [11] propose a road disease more sufficiently.
dataset for training deep learning networks, and evaluate the 2) The deep supervision module is used to constraint mul-
current situation of pavement disease detection technology tiple hidden layers and extract multiscale detail features of
for the first time. Zhang et al. [12] apply a convolution crack.
neural network to the classification of fracture panel and 3) The multiscale information of crack features generated
non-fracture panel, and prove the advantage of deep learning from all the deep supervision modules are fused by the fusion
in fracture detection. Li et al. [13] propose a classification module to obtain the final crack detection results.
model based on convolutional neural network, Deep Bridge 4) To deal with the imbalance of crack and non-crack pix-
Crack Classify (DBCC), and conduct optimized sliding win- els, a class balanced cross entropy loss function is designed to
dow algorithm to detect bridge cracks. The above methods obtain more stable training results by dynamic adjusting the
regard crack detection as a task of image block classifica- weight of crack pixel loss.
tion based on deep neural network. Besides, those methods The proposed method is tested on three public datasets:
neglect the spatial relationship between crack pixels which AEL [16], Crack500 [18] and Cracktree200 [19]. The
causes the lack of global crack features. Inspired by Fully experiment results validate our method.
Convolutional Networks (FCN) [14], some studies have been
devoted to apply semantic segmentation for crack detection. II. OVERVIEW OF METHODS
Schmugge et al. [15] propose a remote video crack detection The main structure of our proposed network is shown
method based on semantic segmentation network. Wei [16] in Fig. 1. The network is composed of convolution modules,
applies semantic segmentation method to automatically learn dense connection modules, conversion modules, deep super-
the linear, direction and edge features of cracks for pixel vision modules, deconvolution layers and fusion module. The
classification. Li et al. [17] develop a lightweight seman- input of the network is a road surface image, while the output
tic segmentation model based on crack characteristics, and is a crack prediction map with the same size as the input,
obtained the average crack width using the axis skeleton algo- and the crack pixels have higher probability than non-crack
rithm. However, since the features generated by deep-level pixels.
layers are abstract semantic features, the general CNN based Given an image into the network, firstly the multiscale
semantic segmentation methods may miss the detail feature feature maps are extracted by the convolution modules and
of cracks and lead to inaccuracy detection results. In addition, dense connection modules, then the dense connection mod-
with growing depth of neural network structures and increas- ules are connected by the conversion modules which mainly
ing number of layers, the extraction of crack feature could compresses the dense features from the previous modules to
be more difficult, and the gradients are going to vanishing. alleviate the feature redundancy. Following each convolution
In 2017, Gao, et al. [20] proposed a classification network, module and dense connection module, a deep supervision
DenseNet, to strengthen feature propagation and alleviate module is connected. Each convolution module and dense
the vanishing-gradient problem. In DenseNet, each layer has connection module extracts a feature map for deep supervi-
direct access to the gradients from the loss function and the sion module, and each deep supervision module generates a
original input signal, leading to an implicit deep supervision. prediction map with loss function. During training, the loss
By densely connecting the feature maps, DenseNet provides function of the feature maps generated by deep supervision
modules or deconvolution layers are calculated. Since the crack features more sufficiently and alleviate the gradient
sizes of the feature maps extracted by deep supervision mod- vanishing problem. Besides, it can reduce the number of
ules are different, deconvolution is used to restore the feature network parameters and the calculation cost.
map to the original image size after the deep supervision
module. Finally, the deconvoluted feature map feeds into the B. CONVERSION MODULE
fusion module to obtain the final crack prediction map. As the dense features extracted from the dense connection
module should be compressed, and the redundant features
should be reduced, the conversion modules are used to con-
nect to dense connection modules adjacently. A conversion
module consists of a 1 × 1 convolution layer and a 2 × 2 max
pooling layer in which 1 × 1 convolution can fuse features of
different levels from the dense connection module and persist
more favorable information, and max pooling layers facilitate
calculation.
FIGURE 2. The connection mechanism of dense block.
C. DEEP SUPERVISION MODULE
The structure of dense connection module can strengthen
A. DENSE CONNECTION MODULE the extraction of crack features, but it is still a single-stream
Inspired by the idea of DenseNet structure, the dense con- supervision network structure overall. As the network struc-
nection module is designed to extract the crack features and ture is deepened, the gradient of the backpropagation will
ensure the effective propagation of the gradient. Fig. 2 shows gradually shrink and the learning speed of the model during
the dense connection mechanism of the module. In the dense training will decrease. In addition, with the increasing num-
connection module, each layer uses the concatenation of ber of feature layers, the supervision for the output layer of the
feature maps produced by all previous layers as the inputs, network cannot achieve effective training for the extraction of
that means the feature map produced by the layer is one of the low and mid-level features, which leads to poor performance.
inputs for all following layers. Denote Dn−l and Dn as input Inspired by the idea of Deeply-Supervised Nets (DSN) [21],
and output of the n-th dense connection module, respectively, deep supervision module is designed to speed up the model
and the output of the layers in the module is defined as convergence and improve the feature extraction capability of
both the low-level layers and the high-level layers. It is con-
Dn,l = Hl ([Dn−l , Dn,l , . . . Dn,l−1 ]), (1)
nected to each dense connection module. Besides, the deep
where Dn,l denotes the output of the ith
layer in the dense con- supervision modules extract feature maps from different
nection module n, while [Dn−l , Dn,l , . . . Dn,l−1 ] refers to the levels, which solves the problem of losing crack details when
concatenation of feature maps from all layers l, . . . , l−1. And using high-level semantic features for crack segmentation
the nonlinear transformation Hi () is a composite function alone.
of 3 ∗ 3 convolutions and the rectified linear unit (ReLU) is The deep supervision modules of the proposed network are
the activation function. By establishing the dense connection designed as follows: the dense connection module is consid-
of features in different layers, the modules can extract the ered as a unit, and the feature maps from each convolutional
TABLE 1. The parameters of model backbone network. Stochastic gradient descent (SGD) with momentum is
adopted for network parameters optimization. The mini-batch
is set to 10, the momentum is set to 0.9, and the weight decay
coefficient is set to 0.0002. While training, Gaussian kernel
with zero-mean and standard deviation 0.01 is used to initial-
ize each layer. The learning rate is set to 1e-6. The learning
rate is divided by 10 for each iteration of 10000 times. The
method is trained for a total of 50000 iterations.
C. COUNTERPARTS
The four existing methods which we compare our algo-
rithm to are CrackForest [8], FCN [14], RCF [18] and
FC-DenseNet [22]. CrackForest is a road crack detection
framework based on random structured forests, by learn-
ing the inherent structured information of cracks. FCN is a
general semantic segmentation neural network. RCF is an
accurate edge detector using richer convolutional features.
FC-DenseNet investigates the use of Densely Connected
Convolutional Networks for semantic segmentation.
III. EXPERIMENTS AND RESULTS
The proposed method is implemented and trained with D. EVALUATION CRITERIA
PyTorch framework. Our method is tested on a computer Given a crack map, a crake prediction map is produced by
with 64GB RAM, 11GB GeForce GTX 1080 Ti, and i7-8700 our method, and the threshold is needed for yielding the final
CPU @ 3.2GHz. detection results. The proposed method uses two thresholds
respectively, which are optimal dataset scale (ODS) and opti-
A. DATASET mal image scale (OIS) because of the similarity between
crack detection and edge detection. ODS employs a fixed
We have evaluated the proposed method on three public
threshold for the whole dataset, while OIS employs the best
datasets: AEL, crack500 and cracktree200. The details of
threshold for each image. Then, the best F-measure of both
those datasets are shown in Table 2.
ODS and OIS are defined as follows
AEL is composed of three data named Aigle-RN, ESAR
and LCMS including 58 crack images. Crack500 is a pave- 1 XN Pit × Rit
FODS = max{ 2 i : t = 0.01, 0.02, . . . , 0.99}
ment crack dataset including 3368 images captured by a cell N i Pt + Rit
phone on main road of Temple University, which has size (6)
of 1440 × 2560 or 2560 × 1440. Cracktree200 is a visible 1 XN Pit × Rit
light dataset containing various kinds of cracks in Complex FOIS = max{2 i : t = 0.01, 0.02, . . . , 0.99}
interference environments like shadow, occlusion, low con-
N i Pt + Rit
trast, noise and other interferences, and it contains 206 crack (7)
images of size 800 × 600. where t denotes the threshold, N is the total number of images
Crack pixels have been manually labeled in the three in the dataset, Pit is the precision of the ith image at the
datasets. And we use the training data from Crack500 to threshold t, Rit is the recall of the ith image at the threshold t.
train the proposed method, and the test data contains the test As the ground truth annotation of edge detection task and
data of Crack500, AEL and Cracktree200. Since images from crack detection task are binary boundary images and binary
the datasets of AEL and Cracktree200 have several different segmentation images respectively, the detection result and the
sizes around 800 ∗ 800 pixels, to guarantee the same image ground truth are processed by non-maximum suppression,
size for training and validation with as little information and the foreground is refined to single pixel width before
loss as possible, we first crop the images in Crack500 into calculation.
800 ∗ 800 pixels, and then resize the images from AEL and
Cracktree200 into the same size. E. EXPERIMENTAL RESULT
According to the above experimental settings, we have com-
B. NETWORK TRAINING PARAMETERS SETTING pleted the compared experiments on the three datasets of
Training data of crack500 only contains 1896 crack images, Crack500, AEL and Cracktree200, and the test results are
and the lack of quantity may lead to poor training results. showed in table 3–5 according to the evaluation criteria.
Therefore, image enhancement methods (rotation and clip- And the results tested on Crack500, Cracktree200 and AEL
ping) are used to enhance the training data. The final training with standard deviation are listed in Table 6. The visualization
data contains 13272 crack images. results of each model on the three datasets are shown in Fig. 4,
TABLE 3. Crack detection results on Crack500 test dataset. TABLE 6. Crack detection results on AEL, Cracktree200, and Crack500.
crack detection, which benefits from more contribution from effective propagation of gradient; Then, the feature informa-
crack pixels. tion in multi-scale space is extracted and the convergence
speed of the model is accelerated by the deep supervision
IV. CONCLUSION of multiple hidden layers; Finally, the feature maps of crack
In this work, we propose a pavement crack detection algo- outputted from multi-level layers is fused to obtain more
rithm based on densely connected and deeply supervised accurate detection results. Besides, using a class balanced
network, which improves the detection accuracy and effi- cross entropy loss function helps increase the weight of
ciency of pavement cracks. Firstly, the dense connection crack pixel loss. The method is tested on several public
module is designed for enhancing the features of cracks crack datasets, showing the accuracy performance with much
continuous propagation, reusing features, and ensuring the less false positive detection, stronger robustness and faster
detection speed of the proposed method compared with RCF, [20] G. Huang, Z. Liu, L. Van Der Maaten, and K. Q. Weinberger, ‘‘Densely
FCN, FC-DenseNet and CrackForest. The method can pro- connected convolutional networks,’’ in Proc. IEEE Conf. Comput. Vis. Pat-
tern Recognit. (CVPR), Piscataway, NJ, USA, Jul. 2017, pp. 2261–2269.
vide a certain technical support for the rapid and accurate [21] C. Y. Lee, S. Xie, P. Gallagher, Z. Zhang, and Z. Tu, ‘‘Deeply-supervised
detection of pavement cracks in the practical engineering. nets,’’ in Proc. 18th Int. Conf. Artif. Intell. Statist., San Diego, CA, USA,
2015, pp. 562–570.
[22] S. Jegou, M. Drozdzal, D. Vazquez, A. Romero, and Y. Bengio, ‘‘The one
REFERENCES hundred layers tiramisu: Fully convolutional densenets for semantic seg-
[1] H. Oliveira and P. L. Correia, ‘‘CrackIT-an image processing toolbox for mentation,’’ in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. Workshops
crack detection and characterization,’’ in Proc. IEEE Int. Conf. Image (CVPRW), Jul. 2017, pp. 11–19.
Process. (ICIP), Piscataway, NJ, USA, May 2014, pp. 798–802.
[2] L. Peng, W. Chao, L. Shuangmiao, and F. Baocai, ‘‘Research on crack
detection method of airport runway based on twice-threshold segmenta- HAIFENG LI was born in Tongliao, Inner
tion,’’ in Proc. 5th Int. Conf. Instrum. Meas., Comput., Commun. Control Mongolia, China, in 1984. He received the B.S.
(IMCCC), Piscataway, NJ, USA, Sep. 2015, pp. 1716–1720.
degree in computer science and technology and the
[3] W. Chuntao, ‘‘Automatic crack detection method based on adaptive thresh-
Ph.D. degree in control theory and control engi-
old for small cracks and micro-gray difference,’’ J. China Foreign High-
neering from Nankai University, Tianjin, China,
way, vol. 39, no. 1, pp. 58–63, 2019.
in 2007 and 2012, respectively.
[4] R. Kapela, P. Sniatala, A. Turkot, A. Rybarczyk, A. Pozarycki,
P. Rydzewski, M. Wyczalek, and A. Bloch, ‘‘Asphalt surfaced pavement
He is currently an Associate Professor with the
cracks detection based on histograms of oriented gradients,’’ in Proc. 22nd College of Computer Science and Technology,
Int. Conf. Mixed Design Integr. Circuits Syst. (MIXDES), Piscataway, NJ, Civil Aviation University of China, Tianjin. He has
USA, Jun. 2015, pp. 579–584. authored or coauthored more than 30 technical
[5] Q. Zhong and J. F. C. Siqi, ‘‘Concrete surface cracks detection combining articles. His research interests include computer vision, image processing,
structured forest edge detection and percolation model,’’ Comput. Sci., robotic sensing, multisensor fusion, and robot localization and navigation.
vol. 45, no. 11, pp. 288–291 and 311, 2018.
[6] R. Amhaz, S. Chambon, J. Idier, and V. Baltazart, ‘‘Automatic crack detec-
tion on 2D pavement images: An algorithm based on minimal path selec- JIANPING ZONG received the bachelor’s degree
tion,’’ IEEE Trans. Intell. Transp. Syst., vol. 17, no. 10, pp. 2718–2729, from Central South University, Changsha, China.
Sep. 2016. He is currently pursuing the master’s degree in
[7] P. Dollar and C. L. Zitnick, ‘‘Structured forests for fast edge detection,’’ computer technology with the Civil Aviation Uni-
in Proc. IEEE Int. Conf. Comput. Vis., Piscataway, NJ, USA, Dec. 2013, versity of China. His research interests include
pp. 1841–1848. image processing and deep learning.
[8] Y. Shi, L. Cui, Z. Qi, F. Meng, and Z. Chen, ‘‘Automatic road crack
detection using random structured forests,’’ IEEE Trans. Intell. Transp.
Syst., vol. 17, no. 12, pp. 3434–3445, Dec. 2016.
[9] T. S. Nguyen, S. Begot, F. Duculty, and M. Avila, ‘‘Free-form anisotropy:
A new method for crack detection on pavement surface images,’’ in Proc.
18th IEEE Int. Conf. Image Process., Piscataway, NJ, USA, Sep. 2011,
pp. 1069–1072. JINGJING NIE received the B.S. degree from
[10] Y. Liu, M.-M. Cheng, X. Hu, K. Wang, and X. Bai, ‘‘Richer convolutional the Chongqing University of Posts and Telecom-
features for edge detection,’’ in Proc. IEEE Conf. Comput. Vis. Pattern munications, Chongqing, China, and the master’s
Recognit. (CVPR), Piscataway, NJ, USA, Jul. 2017, pp. 5872–5881. degree in computer technology from the Civil Avi-
[11] M. Eisenbach, R. Stricker, D. Seichter, K. Amende, K. Debes, M. Sessel- ation University of China, in 2020. Her research
mann, D. Ebersbach, U. Stoeckert, and H.-M. Gross, ‘‘How to get pave- interests include image processing and deep
ment distress detection ready for deep learning? A systematic approach,’’ learning.
in Proc. Int. Joint Conf. Neural Netw. (IJCNN), Piscataway, NJ, USA,
May 2017, pp. 2039–2047.
[12] L. Zhang, F. Yang, Y. Daniel Zhang, and Y. J. Zhu, ‘‘Road crack detection
using deep convolutional neural network,’’ in Proc. IEEE Int. Conf. Image
Process. (ICIP), Piscataway, NJ, USA, Sep. 2016, pp. 3708–3712.
[13] L. Liangfu, W. Ma, L. Li, and C. Lu, ‘‘Research on detection algorithm for ZHILONG WU received the B.S. degree from
bridge cracks based on deep learning,’’ Acta Autom. Sinica, vol. 45, no. 9, the Chongqing University of Posts and Telecom-
pp. 1727–1742, 2019. munications, Chongqing, China, and the mas-
[14] E. Shelhamer, J. Long, and T. Darrell, ‘‘Fully convolutional networks for ter’s degree in computer technology from the
semantic segmentation,’’ IEEE Trans. Pattern Anal. Mach. Intell., vol. 39, Civil Aviation University of China, in 2020. His
no. 4, pp. 640–651, Apr. 2017. research interests include image processing and
[15] S. J. Schmugge, L. Rice, J. Lindberg, R. Grizziy, C. Joffey, and M. C. Shin, deep learning.
‘‘Crack segmentation by leveraging multiple frames of varying illumina-
tion,’’ in Proc. IEEE Winter Conf. Appl. Comput. Vis. (WACV), Piscataway,
NJ, USA, Mar. 2017, pp. 1045–1053.
[16] W. Fang, ‘‘Research of vehicle-mounted automatic pavement crack iden-
tification technology based on semantic segmentation,’’ Changan Univ.,
Xi’an, China, Tech. Rep., 2019, doi: 10710-2016122011. HONGYANG HAN received the B.S. degree
[17] L. Gang, ‘‘Study on improved global convolutional network for pave- from the Zhengzhou University of Aeronau-
ment crack detection,’’ Laser Optoelectron. Prog., vol. 57, no. 8, 2020, tics, Zhengzhou, China. He is currently pursu-
Art. no. 081011. ing the master’s degree in computer technology
[18] F. Yang, L. Zhang, S. Yu, D. Prokhorov, X. Mei, and H. Ling, ‘‘Feature with the Civil Aviation University of China. His
pyramid and hierarchical boosting network for pavement crack detection,’’ research interests include image processing and
IEEE Trans. Intell. Transp. Syst., vol. 21, no. 4, pp. 1525–1535, Apr. 2020. deep learning.
[19] Q. Zou, Y. Cao, Q. Li, Q. Mao, and S. Wang, ‘‘Crack tree: Automatic crack
detection from pavement images,’’ Pattern Recognit. Lett., vol. 33, no. 3,
pp. 227–238, Feb. 2012.