Road_damage_detection_and_Classification
Road_damage_detection_and_Classification
Abstract—Roads are very important for our transportation in vehicle, complexity of the road structure etc. There are a lot
day to day life. Damaged road causes loss of time as well as life of models like CNN, RetinaNet, ResNet, DenseNet, VGG,
sometimes. So detecting and repairing it perfectly will save our YOLO etc to detect road damages. Some of them perform
valuable life and time as well. Detection of road damage is not
that simple what we think. In recent years, road damage detection very impressively. Deep learning techniques are a good way to
has been using costly high performance sensors. But now with detect the cracks of the roads. If it has to be detected manually
the advancement of new technologies, like image processing and it would take a lot of time, but deep learning has made the
computer vision, we can detect and categorize various types of way more easier. It can be done by using a smartphone in
road damages so efficiently. And also detecting road damages front of our car and just driving through the damaged road.
manually takes a lot of effort and time. So by introducing deep
learning and artificial intelligence we can reduce human labor It will automatically detect the damages and classify whether
and increase human satisfaction. the damage needs to be repaired or not.
In this research article, we present a unified model for In this paper, we explore how deep learning techniques can
detecting road damages. Our model is based on YOLO, which be utilized to detect road damages. Our main objectives will
is “You Only Look Once”. We have used three versions of this be:
model, version 5 and version 7 for detecting road damages.Our
model is capable of achieving high accuracy at a reasonable • Detecting Road surface damage form image using deep
computational complexity. We have used 2 datasets for training learning techniques.
up the version 5 and version 7 model, one of which is RDD2020 • As only detecting the road surface damage will not
dataset and another one is a custom dataset which we named as be enough , we will classify the detected damage into
Bangladeshi Datsaset.
Index Terms—Transfer learning, Road damage detection, Deep multiple classes.
learning, , Computer Vision, Image Recognition, Object Detec- • We will try to increase the accuracy rate of road surface
tion, Dataset Augmentation. damage detection compared to previous works in this
Topic.
I. I NTRODUCTION • And lastly we will also build an app that will be able to
Now-a-days road damage detection has become our major interact with the users to collect damaged road images
concern globally, specially to the third world countries, be- and taking necessary steps.
cause clear and smooth road saves a lot of time and plays
a vital role in developing day-to-day public life. Due to II. L ITERATURE R EVIEW
increased operational costs of vehicle repairing a smooth road As we know roads are a very important part of the transport
also saves a lot of money. We use public transport to go from system of a country, in recent years there has been a huge
one place to another for our daily needs. For safe driving and amount of work in this area of road surface damage detection.
transportation the road structure has to be maintained perfectly. Most commonly it is divided into two ways.
Mainly, rainstorms and poor maintenance leads to damaged One of the old ways is the LBP(Local Binary Pattern)
roads. Also vehicles with more weight than the road can bear cascade classifier. In LBP method a 3X3 window is used in
causes damage to roads. Damaged roads are the main source the image to extract the LBP code. The processing involves
of accidents. Specially, countries like Bangladesh faces several thresholding the center pixel of that window with its surround-
accidents of roads daily and many lives are lost due to this. As ing pixels using the window mean, window median or the
Bangladesh has a large population in a very limited area, it’s actual center pixel, as thresholds. As it is an old method, recent
no wonder that the roads and streets of the country has to go work on this method is not so noticeable but in this paper
through immense pressure daily. And to keep the Bangladeshi [1] LBP cascade classifier is used to compare it with other
roads safe, we need to detect the damages as early as possible. methods. And their average accuracy on the LBP classifier
Human based road damage detection takes a lot of time was 0.734075. Which was not so good.In this [2] paper LBP
and energy. Also it is more complex to run this detection cascade classifier is used. Three OpenCV cascade classifiers
under many circumstances like different weather, speed of the were employed here to take three sets of positive data, one
for each paved distress and a set of negative samples for each on Imagenet.We have another paper [12] on YOLO where
training session.Here is another [3] paper where DEMPSTER YOLO V4 was used to detect road damage. In YOLO V4
SHAFER THEORY (DST) is used to detect daamage from CSPDarknet53 is used as the backbone network.They used
very high resulation(VHR) satalite images. IEEE Global Road Damage Detection Challenge (GRDDC)
In unsupervised learning the model which is going to be in 2020 [14] dataset as their dataset.Now we have an updated
used for detection is not pre-trained. The model discovers version of YOLO V4 named YOLO V5 is used in the same
patterns and information that was previously undetected. In dataset mentioned before [14] in this paper [13]. And they
this paper [4] an unsupervised two-step pattern recognition achieved and F1 Score of 0.58. In this paper [15] Yolo V5
system for crack detection is used .Firstly, it is addressed is used to detect road damage. They used a dataset contain-
to recognize picture blocks with crack pixels. Cracks are ing 3039 images which was collected from Rwanda streets.
then described as stated by the Portuguese Catalogue of Dis- Another work is here [16] on YOLO V5. here Yolo V5 is
tress. They used CarckIT(crack detection tool of Instituto de used on GRDDC2020 dataset for road damage detection.Here
Telecomunicações) for detecting the crack.In results, CrackIT is another work [17] based on YOLO v5 for road damage
was able to detect multiple cracks in the same image, taking detection. IEEE Big Data Cup Challenge 2020 dataset is used
about 2 min to process the 56 images of the dataset for which for road damage detection in here.From the above description
ground truth data are available (including crack detection, type we can see YOLO us quite efficient for objection detection.
characterization, and severity level assignment) but cracks of So we are using YOLO V5 and V7 for road damage detection
less than 2 mm width were not included. Another work is here and classification.
[5] base on unsupervised learning. In this paper Unsupervised
disparity map segmentation is used for road damage detection. III. U SED T OOLS
Now we have supervised learning. In recent days there has The main task of our model is to detect damages from
been a huge amount of work in road damage detection using images. It requires both classification and localization. In this
supervised learning. Now in supervised learning SVM(support regard, convolutional neural networks (CNN) are the best to
vector machine) is quite popular. In this paper [6] SVM is get the job done. We used a deep learning algorithm named
used to detect road damage. Their SVM has been trained in You-Only-Look-Once (YOLO) to detect road damages from
LIBSVM, and the RBF kernel (RBF) is utilized in C and , an input image. In recent times YOLO has become one of the
which is measured by 5-fold cross-validation. Their F1 score most popular object detection algorithms as it requires only
for SVM was 0.7359 and precision 0.8112 with a recall value one forward propagation pass through the neural network to
of 0.6734. perform object detection.
In deep learning DNN(Deep neural network) is a pop- Unlike many other object detection methods, YOLO applies
ular method for crack detection. As in this paper [7] a single neural network to the entire image at once, rather than
RetinaNet model is used which is based on DNN and going through the whole image thousands and thousands of
VGG19(convolutional neural network model) as backbone times. This is the core reason why YOLO can detect objects
network. In result their RetinaNet model size was 125.5 MB, in real-time speed along with higher accuracy.
had an inference time 0.5s and their Best mAP( higher mean
In YOLO, it first divides the image into an SxS grid. For
average precision) was 0.91522. D-CNN (Deep Convolutional
each grid it generates some bounding boxes, and predicts
Neural Network) which is quite similar to DNN is used in
confidence scores for each of those bounding boxes. The
this paper [6].CNN (convolutional neural network) is a well
confidence score resembles the probability of how sure the
known and widely used method for road damage detection
model is, that the bounding box contains an object.
and characterization. In this paper [8] F-CNN is used.F-
YOLO generates a bounding-box with the help of four
RCNN(Faster Region Based Convolutional Neural Networks)
parameters, such: x, y,w, h. Here:
is an updated version of CNN which is used in this paper [10].
Now we have some regression-based object detection mod- • (x, y) = coordinates of the center point of the box
els which are vastly used in recent days for road damage • w = width of the box
days.In this paper [10] 2 SSD based model names MobileNet All the objects in the image have a center-point of their
V2 and Inception v2 are used to detect the damage of own. If the center of an object falls into a grid, then that
roads.Another work is here [9] based on SSD. In this paper particular grid is used to detect that object. If no object exists
Inception V2 and MobileNet V2 model is used where both in that grid, then the confidence score of that bounding-box is
were based on the SSD method. very low or zero. Otherwise, YOLO predicts the confidence
Another populer regression based model is the YOLO (You- score with the help of IOU between the predicted-box and any
Only-Look-Once) model.In YOLO, each image is divided into ground-truth-box.
S × S grids, and each grid forecasts N bounding boxes and IOU (Intersection-Over-Union) is a way to evaluate whether
confidence.In this paper [11] YOLO V3 is used for road an object localization algorithm is accurate or not. IOU is
damage detection and classification. YOLO V3 uses a variant a measure of the overlapping areas between 2 bounding
of Darknet, which originally has a 53 layer network trained boxes. If we have two bounding boxes, we can compute their
intersection, and so we can compute their union. The ratio of whether an object is present in a given grid cell or
these 2 areas is called IOU. not. It penalizes the model for false positives and
Each grid also predicts some class-probabilities which are false negatives in terms of object detection.
conditional probabilities with respect to an object. That means, – Classification Loss (Cross-Entropy): This com-
if there is an object, then the probability of any class for that ponent is concerned with the accuracy of the pre-
object, is represented by the class-probability. dicted class probabilities for each bounding box. It
The model takes the class-probability and multiplies them penalizes the model for misclassifying the detected
by the confidence-scores to get the bounding-boxes weight, objects.
which is the actual probability for containing that object in – Localization Loss (Smooth L1 Loss): This compo-
the bounding-box. nent measures the accuracy of the predicted bound-
YOLO threshold the predictions to eliminate the bounding- ing box coordinates (x, y, width, height). The smooth
boxes that have a very low confidence score. YOLO also L1 loss is commonly used for this purpose, as it
applies non-max suppression, so that it can detect each object is less sensitive to outliers compared to the mean
only once. After doing all this YOLO outputs the final squared error loss.
detection of that image. • Activation function: Yolo V5 uses the same activation
A. YOLO v5 architecture function as yolo V3. In inner network it uses Leaky
relu activation function. But in detection layer it uses
Yolo V5 architecture can be divided into three parts. These Sigmoid activation function. Sigmoid activation function
parts are given below with description: is a logistic function that takes input and scales the output
• Model Backbone: In the Backbone of yolo v5 CSPNet is between 0.0 and 1.0. So the output of the function have
used. CSE stands for Cross Stage Partial Networks.Cross a shape similar to S. We can see the equation of sigmoid
Stage Partial Network (CSPNet) is meant to assign the activation function below:
issue to the redundant gradient data in network optimiza- 1
tion, reducing the complexity while ensuring reliability. S(x) = (1)
1 + e−x
Its main task is to extarct feature from the given input.
where e is Euler’s number.
As like CSP Darknet CSPNet seperats feature map into
• Optimizer:An optimizer is and algorithm which min-
two part. One of these feature map will pass through the
imizes the error function (loss function) of maximizes
dense block and transition layer. And the another one part
the efficiency of the model.Optimization techniques are
will be integrated with the transmitted feature map before
mathematical tools that are influenced by the model’s
proceeding to the next step.The gradients from the dense
learnable parameters, such asWeights and Biases. Opti-
layers are combined independently.The subset of features
mizers assist in determining how to adjust the weights
that did not pass through the dense layers is also likewise
and learning rate of a neural net in order to minimise
incorporated individually. In terms of gradient info for
losses. The are various types of optimizes. SGD which
updating weights, neither side has redundant gradient info
stands for Stochastic Gradient Descent is one of them
that belongs to the other side.
which is used in YOLO v5. SGD is mainly a version of
• Model Neck: Model neck is usually used for generating
GD (Gradient Descent).
feature pyramid. Feature pyramid aid model in generaliz-
ing successfully when it comes to object scaling. In yolo B. YOLO v7 architecture
v5 PANet is used for as neck to get feature pyramids. Yolo V7 architecture can be divided into three parts. It was
PANet stands for Path Aggregation Network.Bottom-up derived from YOLOv4, Scaled YOLOv4, and YOLO-R. These
route augmentation is a technique that is used here for parts are given below with description:
shortening the information path among bottom layer and • Extended Efficient Layer Aggregation: In YOLOv7,
the uppermost feature. To connect feature grids at all keeping in mind the amount of memory it takes to keep
feature levels, adaptive feature pooling is employed. To layers in memory along with the distance that it takes a
enhance mask prediction, fully linked fusion is employed gradient to back-propagate through the layers. The shorter
here. the gradient, the more powerfully their network will be
• Model Head: Model head is the part which performed the able to learn. The final layer aggregation they choose is
detection task. In this part the anchor boxes are applied E-ELAN, an extend version of the ELAN computational
and the output vectors with class probabilities, objectness block.
scores, and bounding boxes are calculated. • Model Scaling Techniques: Typically, object detection
• Loss Function: The loss function in YOLO v5 is a models consider the depth of the network, the width of
combination of several components, each corresponding the network, and the resolution that the network is trained
to different aspects of the model’s predictions. The main on. In YOLOv7 the authors scale the network depth and
components of the loss function typically include: width in concert while concatenating layers together.
– Objectness Loss (Binary Cross-Entropy): This • Re-parameterization Planning: Re-parameterization
component measures how well the model predicts techniques involve averaging a set of model weights to
we take a pre-trained model, and fine-tuned it accordingly
to our purpose. Transfer learning is becoming more and
more popular among the researchers and developers as
it saves their time and the pre-trained models are trained
upon millions of millions of images already. This gives
the researchers and developers a huge leap as deep
learning models are overtly data hungry, and having a
pre-trained model can make their task pretty easy. In a
generic way, when neural networks are being used for
object detection, the ‘edges’ were detected to its earlier
layers, the ‘shapes’ were acknowledged in the middle
layers, and the later layers are used to perform some task-
specific features. But in transfer learning slightly different
Fig. 1. Compound scaling in YOLOv7 model sizes
approaches are followed.
IV. M ETHODOLOGY
A. Data Acquisition: Global dataset
As we know data plays and important role on any type
Fig. 2. Coarse-to-fine auxiliary head supervision in the YOLOv7 network of research, here we put our first emphasis on data
collection. As we are working on road damage detection
and classification we needed images of road surface. So
C. Transfer Learning for images we had some option before us.These are:
Here a transfer-learning based approach was followed. 1) RDD-2018: In this paper [citation hbe] they intro-
Transfer learning is a modern deep learning phenomena duced a Road damage Dataset which had 9053 images
where, instead of building our whole model from scratch, collected by a smartphone. They collected these images
TABLE I
DATA D ISTRIBUTION
Fig. 5. Annotation Pipeline for multiple class: (a) original image, (b) image
with bounding boxes, (c) final annotated image containing bounding boxes
and class labels
TABLE II
DAMAGE C ATEGORIES F OR M ULTIPLE CLASS
TABLE IV
OVERALL DATA D ISTRIBUTION
V. R ESULT A NALYSIS
We have used some performance metrics to evaluate our
model. these performance metrics are:
1) Precision and Recall: Precision is the fraction of
Fig. 9. YOLO v5 mechanism data forecasted as relevant by the system that is to-
tally relevant. The capacity of the system to identify
2) YOLO v7 Mechanism: YOLO v7, the latest version all necessary details from the provided dataset is
of YOLO, has several improvements over the previous referred to as recall. Here we can see the equation
versions. of precision and recall:
– Anchor boxes are a set of predefined boxes with dif-
ferent aspect ratios that are used to detect objects of True Positive
different shapes. YOLO v7 uses nine anchor boxes, Precision =
True Positive + False Positive
which allows it to detect a wider range of object
shapes and sizes compared to previous versions, thus True Positive
Recall =
helping to reduce the number of false positives. True Positive + False Negative
– A key improvement in YOLO v7 is the use of a new 2) Mean Average Precision: Mean average precision
loss function called “focal loss.” Previous versions of is a statistic being used evaluate multiple object
TABLE V
DAMAGE C ATEGORIES F OR I NDIAN DATASET
Fig. 14. YOLO v7 Binary Class Detection India Fig. 16. Iteration vs F1 (Multi-Class) India
have to run our models for 50 epoch at least. We believe a powerful workstation built with a powerful
This is the standard epoch size followed for binary cpu, like Ryzen 7 or 9 series, along with Nvidia Titan
classification using YOLO algorithms. From this gpus will save a lot of time for the researchers. Having a
Figure 19 we see the mAP graph and for f1 score powerful workstation, the first piece of advice we want to
we can see it in Figure 20. provide, is to have a more balanced dataset. A dataset that
Now in Figure 21 and Figure 22 we have shown has a balanced amount of images from all classes will
some binary-class road damage detection by using help the model to achieve robustness. The Bangladeshi
YOLO v5 and YOLO v7 algorithm. dataset needs to be resized and compressed. We believe
it would reduce the training time largely. Then, we would
VI. F UTURE W ORK
suggest ensemble multiple models and build a hybrid one.
Here, we have developed an object detection model using Here we used YOLO all along. But ensembling YOLO
YOLO algorithm that can detect road damages from with some other model will increase the model’s overall
images. Though we have tried our best to build a rigid accuracy. Moreover we would suggest to imply multi-
model, the model lacks accuracy. It is suggested to the class classification on Bangladesh dataset too.
researchers to have a powerful workstation. All of our We have also started working on an application to make
models have trained using a free version of google-colab. connection with human and concerned division to address
Fig. 17. YOLOv5 Multi-Class Detection India Fig. 19. Epoch vs mAP (Binary) Bangladesh
Fig. 18. YOLOv7 Multi-Class Detection India Fig. 20. Epoch vs f1 score (Binary) Bangladesh
this road damage issues and get quick results on that. Our v7 object detector for this model got a satisfactory result.
app workflow is mentioned in Figure 23. We have trained our YOLOv5 model with 2 datasets of
which one is Indian and other is Czech dataset. We have
VII. C ONCLUSION also created our own Bangladeshi dataset which was quite
tough task tested it with the latest YOLO v7 model which
Safe road plays an important role for any country. Many
is more accurate and faster than other versions of YOLO.
things depend on safe roads. From the economical growth
We have tried YOLO model because it is easy to build
for a country to our valuable life it all depends on how
and can be trained and tested directly on whole images.
safe and good our roads are. But keeping it safe is not
Unlike CNN, RCNN and Faster RCNN, YOLO is the
that much easy task. There may be a lot of reason for a
best object detection model. YOLO is easy to use, we
damaged road. But with the advancement of technology
can train it by using a conventional GPU. A large number
it is becoming easier nowadays to detect those damage on
of features have been verified. We have selected them to
roads and repair it for safety of our valuable life and time.
use for improving the accuracy.
Computer vision, image recognition, machine learning
are playing a vital role here. In this paper, an unified R EFERENCES
model was proposed for the road damage detection. So [1] A. Angulo, J. A. Vega-Fernández, L. M. Aguilar-Lobo, S. Natraj,
far we have used the state-of-the-art YOLOv5 and YOLO and G. Ochoa-Ruiz,“Road damage detection acquisition system
Fig. 21. YOLO v5 Binary-Class Detection Bangladesh Fig. 23. Application Workflow