Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                
0% found this document useful (0 votes)
7 views

Road_damage_detection_and_Classification

Uploaded by

rayhanahmed49
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views

Road_damage_detection_and_Classification

Uploaded by

rayhanahmed49
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 11

Road Damage Detection with Deep Neural Network

using transfer learning based approaches


Dewan Tanjil Hossain Nazmus Sakib
Department of Computer Science and Engineering Department of Computer Science and Engineering
Bangladesh University of Engineering and Technology Bangladesh University of Engineering and Technology
Dhaka, Bangladesh Dhaka, Bangladesh
0423052014@grad.cse.buet.ac.bd 0423052015@grad.cse.buet.ac.bd

Abstract—Roads are very important for our transportation in vehicle, complexity of the road structure etc. There are a lot
day to day life. Damaged road causes loss of time as well as life of models like CNN, RetinaNet, ResNet, DenseNet, VGG,
sometimes. So detecting and repairing it perfectly will save our YOLO etc to detect road damages. Some of them perform
valuable life and time as well. Detection of road damage is not
that simple what we think. In recent years, road damage detection very impressively. Deep learning techniques are a good way to
has been using costly high performance sensors. But now with detect the cracks of the roads. If it has to be detected manually
the advancement of new technologies, like image processing and it would take a lot of time, but deep learning has made the
computer vision, we can detect and categorize various types of way more easier. It can be done by using a smartphone in
road damages so efficiently. And also detecting road damages front of our car and just driving through the damaged road.
manually takes a lot of effort and time. So by introducing deep
learning and artificial intelligence we can reduce human labor It will automatically detect the damages and classify whether
and increase human satisfaction. the damage needs to be repaired or not.
In this research article, we present a unified model for In this paper, we explore how deep learning techniques can
detecting road damages. Our model is based on YOLO, which be utilized to detect road damages. Our main objectives will
is “You Only Look Once”. We have used three versions of this be:
model, version 5 and version 7 for detecting road damages.Our
model is capable of achieving high accuracy at a reasonable • Detecting Road surface damage form image using deep
computational complexity. We have used 2 datasets for training learning techniques.
up the version 5 and version 7 model, one of which is RDD2020 • As only detecting the road surface damage will not
dataset and another one is a custom dataset which we named as be enough , we will classify the detected damage into
Bangladeshi Datsaset.
Index Terms—Transfer learning, Road damage detection, Deep multiple classes.
learning, , Computer Vision, Image Recognition, Object Detec- • We will try to increase the accuracy rate of road surface
tion, Dataset Augmentation. damage detection compared to previous works in this
Topic.
I. I NTRODUCTION • And lastly we will also build an app that will be able to
Now-a-days road damage detection has become our major interact with the users to collect damaged road images
concern globally, specially to the third world countries, be- and taking necessary steps.
cause clear and smooth road saves a lot of time and plays
a vital role in developing day-to-day public life. Due to II. L ITERATURE R EVIEW
increased operational costs of vehicle repairing a smooth road As we know roads are a very important part of the transport
also saves a lot of money. We use public transport to go from system of a country, in recent years there has been a huge
one place to another for our daily needs. For safe driving and amount of work in this area of road surface damage detection.
transportation the road structure has to be maintained perfectly. Most commonly it is divided into two ways.
Mainly, rainstorms and poor maintenance leads to damaged One of the old ways is the LBP(Local Binary Pattern)
roads. Also vehicles with more weight than the road can bear cascade classifier. In LBP method a 3X3 window is used in
causes damage to roads. Damaged roads are the main source the image to extract the LBP code. The processing involves
of accidents. Specially, countries like Bangladesh faces several thresholding the center pixel of that window with its surround-
accidents of roads daily and many lives are lost due to this. As ing pixels using the window mean, window median or the
Bangladesh has a large population in a very limited area, it’s actual center pixel, as thresholds. As it is an old method, recent
no wonder that the roads and streets of the country has to go work on this method is not so noticeable but in this paper
through immense pressure daily. And to keep the Bangladeshi [1] LBP cascade classifier is used to compare it with other
roads safe, we need to detect the damages as early as possible. methods. And their average accuracy on the LBP classifier
Human based road damage detection takes a lot of time was 0.734075. Which was not so good.In this [2] paper LBP
and energy. Also it is more complex to run this detection cascade classifier is used. Three OpenCV cascade classifiers
under many circumstances like different weather, speed of the were employed here to take three sets of positive data, one
for each paved distress and a set of negative samples for each on Imagenet.We have another paper [12] on YOLO where
training session.Here is another [3] paper where DEMPSTER YOLO V4 was used to detect road damage. In YOLO V4
SHAFER THEORY (DST) is used to detect daamage from CSPDarknet53 is used as the backbone network.They used
very high resulation(VHR) satalite images. IEEE Global Road Damage Detection Challenge (GRDDC)
In unsupervised learning the model which is going to be in 2020 [14] dataset as their dataset.Now we have an updated
used for detection is not pre-trained. The model discovers version of YOLO V4 named YOLO V5 is used in the same
patterns and information that was previously undetected. In dataset mentioned before [14] in this paper [13]. And they
this paper [4] an unsupervised two-step pattern recognition achieved and F1 Score of 0.58. In this paper [15] Yolo V5
system for crack detection is used .Firstly, it is addressed is used to detect road damage. They used a dataset contain-
to recognize picture blocks with crack pixels. Cracks are ing 3039 images which was collected from Rwanda streets.
then described as stated by the Portuguese Catalogue of Dis- Another work is here [16] on YOLO V5. here Yolo V5 is
tress. They used CarckIT(crack detection tool of Instituto de used on GRDDC2020 dataset for road damage detection.Here
Telecomunicações) for detecting the crack.In results, CrackIT is another work [17] based on YOLO v5 for road damage
was able to detect multiple cracks in the same image, taking detection. IEEE Big Data Cup Challenge 2020 dataset is used
about 2 min to process the 56 images of the dataset for which for road damage detection in here.From the above description
ground truth data are available (including crack detection, type we can see YOLO us quite efficient for objection detection.
characterization, and severity level assignment) but cracks of So we are using YOLO V5 and V7 for road damage detection
less than 2 mm width were not included. Another work is here and classification.
[5] base on unsupervised learning. In this paper Unsupervised
disparity map segmentation is used for road damage detection. III. U SED T OOLS
Now we have supervised learning. In recent days there has The main task of our model is to detect damages from
been a huge amount of work in road damage detection using images. It requires both classification and localization. In this
supervised learning. Now in supervised learning SVM(support regard, convolutional neural networks (CNN) are the best to
vector machine) is quite popular. In this paper [6] SVM is get the job done. We used a deep learning algorithm named
used to detect road damage. Their SVM has been trained in You-Only-Look-Once (YOLO) to detect road damages from
LIBSVM, and the RBF kernel (RBF) is utilized in C and , an input image. In recent times YOLO has become one of the
which is measured by 5-fold cross-validation. Their F1 score most popular object detection algorithms as it requires only
for SVM was 0.7359 and precision 0.8112 with a recall value one forward propagation pass through the neural network to
of 0.6734. perform object detection.
In deep learning DNN(Deep neural network) is a pop- Unlike many other object detection methods, YOLO applies
ular method for crack detection. As in this paper [7] a single neural network to the entire image at once, rather than
RetinaNet model is used which is based on DNN and going through the whole image thousands and thousands of
VGG19(convolutional neural network model) as backbone times. This is the core reason why YOLO can detect objects
network. In result their RetinaNet model size was 125.5 MB, in real-time speed along with higher accuracy.
had an inference time 0.5s and their Best mAP( higher mean
In YOLO, it first divides the image into an SxS grid. For
average precision) was 0.91522. D-CNN (Deep Convolutional
each grid it generates some bounding boxes, and predicts
Neural Network) which is quite similar to DNN is used in
confidence scores for each of those bounding boxes. The
this paper [6].CNN (convolutional neural network) is a well
confidence score resembles the probability of how sure the
known and widely used method for road damage detection
model is, that the bounding box contains an object.
and characterization. In this paper [8] F-CNN is used.F-
YOLO generates a bounding-box with the help of four
RCNN(Faster Region Based Convolutional Neural Networks)
parameters, such: x, y,w, h. Here:
is an updated version of CNN which is used in this paper [10].
Now we have some regression-based object detection mod- • (x, y) = coordinates of the center point of the box

els which are vastly used in recent days for road damage • w = width of the box

detection.Single-shot detector (SSD) is quite popular nowa- • h = height of the box

days.In this paper [10] 2 SSD based model names MobileNet All the objects in the image have a center-point of their
V2 and Inception v2 are used to detect the damage of own. If the center of an object falls into a grid, then that
roads.Another work is here [9] based on SSD. In this paper particular grid is used to detect that object. If no object exists
Inception V2 and MobileNet V2 model is used where both in that grid, then the confidence score of that bounding-box is
were based on the SSD method. very low or zero. Otherwise, YOLO predicts the confidence
Another populer regression based model is the YOLO (You- score with the help of IOU between the predicted-box and any
Only-Look-Once) model.In YOLO, each image is divided into ground-truth-box.
S × S grids, and each grid forecasts N bounding boxes and IOU (Intersection-Over-Union) is a way to evaluate whether
confidence.In this paper [11] YOLO V3 is used for road an object localization algorithm is accurate or not. IOU is
damage detection and classification. YOLO V3 uses a variant a measure of the overlapping areas between 2 bounding
of Darknet, which originally has a 53 layer network trained boxes. If we have two bounding boxes, we can compute their
intersection, and so we can compute their union. The ratio of whether an object is present in a given grid cell or
these 2 areas is called IOU. not. It penalizes the model for false positives and
Each grid also predicts some class-probabilities which are false negatives in terms of object detection.
conditional probabilities with respect to an object. That means, – Classification Loss (Cross-Entropy): This com-
if there is an object, then the probability of any class for that ponent is concerned with the accuracy of the pre-
object, is represented by the class-probability. dicted class probabilities for each bounding box. It
The model takes the class-probability and multiplies them penalizes the model for misclassifying the detected
by the confidence-scores to get the bounding-boxes weight, objects.
which is the actual probability for containing that object in – Localization Loss (Smooth L1 Loss): This compo-
the bounding-box. nent measures the accuracy of the predicted bound-
YOLO threshold the predictions to eliminate the bounding- ing box coordinates (x, y, width, height). The smooth
boxes that have a very low confidence score. YOLO also L1 loss is commonly used for this purpose, as it
applies non-max suppression, so that it can detect each object is less sensitive to outliers compared to the mean
only once. After doing all this YOLO outputs the final squared error loss.
detection of that image. • Activation function: Yolo V5 uses the same activation
A. YOLO v5 architecture function as yolo V3. In inner network it uses Leaky
relu activation function. But in detection layer it uses
Yolo V5 architecture can be divided into three parts. These Sigmoid activation function. Sigmoid activation function
parts are given below with description: is a logistic function that takes input and scales the output
• Model Backbone: In the Backbone of yolo v5 CSPNet is between 0.0 and 1.0. So the output of the function have
used. CSE stands for Cross Stage Partial Networks.Cross a shape similar to S. We can see the equation of sigmoid
Stage Partial Network (CSPNet) is meant to assign the activation function below:
issue to the redundant gradient data in network optimiza- 1
tion, reducing the complexity while ensuring reliability. S(x) = (1)
1 + e−x
Its main task is to extarct feature from the given input.
where e is Euler’s number.
As like CSP Darknet CSPNet seperats feature map into
• Optimizer:An optimizer is and algorithm which min-
two part. One of these feature map will pass through the
imizes the error function (loss function) of maximizes
dense block and transition layer. And the another one part
the efficiency of the model.Optimization techniques are
will be integrated with the transmitted feature map before
mathematical tools that are influenced by the model’s
proceeding to the next step.The gradients from the dense
learnable parameters, such asWeights and Biases. Opti-
layers are combined independently.The subset of features
mizers assist in determining how to adjust the weights
that did not pass through the dense layers is also likewise
and learning rate of a neural net in order to minimise
incorporated individually. In terms of gradient info for
losses. The are various types of optimizes. SGD which
updating weights, neither side has redundant gradient info
stands for Stochastic Gradient Descent is one of them
that belongs to the other side.
which is used in YOLO v5. SGD is mainly a version of
• Model Neck: Model neck is usually used for generating
GD (Gradient Descent).
feature pyramid. Feature pyramid aid model in generaliz-
ing successfully when it comes to object scaling. In yolo B. YOLO v7 architecture
v5 PANet is used for as neck to get feature pyramids. Yolo V7 architecture can be divided into three parts. It was
PANet stands for Path Aggregation Network.Bottom-up derived from YOLOv4, Scaled YOLOv4, and YOLO-R. These
route augmentation is a technique that is used here for parts are given below with description:
shortening the information path among bottom layer and • Extended Efficient Layer Aggregation: In YOLOv7,
the uppermost feature. To connect feature grids at all keeping in mind the amount of memory it takes to keep
feature levels, adaptive feature pooling is employed. To layers in memory along with the distance that it takes a
enhance mask prediction, fully linked fusion is employed gradient to back-propagate through the layers. The shorter
here. the gradient, the more powerfully their network will be
• Model Head: Model head is the part which performed the able to learn. The final layer aggregation they choose is
detection task. In this part the anchor boxes are applied E-ELAN, an extend version of the ELAN computational
and the output vectors with class probabilities, objectness block.
scores, and bounding boxes are calculated. • Model Scaling Techniques: Typically, object detection
• Loss Function: The loss function in YOLO v5 is a models consider the depth of the network, the width of
combination of several components, each corresponding the network, and the resolution that the network is trained
to different aspects of the model’s predictions. The main on. In YOLOv7 the authors scale the network depth and
components of the loss function typically include: width in concert while concatenating layers together.
– Objectness Loss (Binary Cross-Entropy): This • Re-parameterization Planning: Re-parameterization
component measures how well the model predicts techniques involve averaging a set of model weights to
we take a pre-trained model, and fine-tuned it accordingly
to our purpose. Transfer learning is becoming more and
more popular among the researchers and developers as
it saves their time and the pre-trained models are trained
upon millions of millions of images already. This gives
the researchers and developers a huge leap as deep
learning models are overtly data hungry, and having a
pre-trained model can make their task pretty easy. In a
generic way, when neural networks are being used for
object detection, the ‘edges’ were detected to its earlier
layers, the ‘shapes’ were acknowledged in the middle
layers, and the later layers are used to perform some task-
specific features. But in transfer learning slightly different
Fig. 1. Compound scaling in YOLOv7 model sizes
approaches are followed.

create a model that is more robust to general patterns that


it is trying to model. In research, there has been a recent
focus on module level re-parameterization where piece of
the network have their own re-parameterization strategies.
In YOLOv7 gradient flow propagation paths are used to
see to see which modules in the network should use re-
parameterization strategies and which should not.
• Auxiliary Head Coarse-to-Fine: The YOLO network
head makes the final predictions for the network, but
since it is so far downstream in the network, it can be
advantageous to add an auxiliary head to the network that
lies somewhere in the middle. While you are training, you
are supervising this detection head as well as the head that
is actually going to make predictions.
The auxiliary head does not train as efficiently as the
final head because there is less network between it an
the prediction - so the YOLOv7 authors experiment with
different levels of supervision for this head, settling on a
coarse-to-fine definition where supervision is passed back Fig. 3. Transfer learning
from the lead head at different granularities.
Here the early and middle layers are kept as it is, but the
latter layers are re-trained. This retraining process is done
according to the objective of the researcher. One of our
main intentions while performing transfer learning, is to
bring as much knowledge as possible from the previous
model to our currently re-trained model, depending on the
objective of our model. This kind of knowledge sharing
plays a huge role to make our model more accurate,
powerful and sustainable.

IV. M ETHODOLOGY
A. Data Acquisition: Global dataset
As we know data plays and important role on any type
Fig. 2. Coarse-to-fine auxiliary head supervision in the YOLOv7 network of research, here we put our first emphasis on data
collection. As we are working on road damage detection
and classification we needed images of road surface. So
C. Transfer Learning for images we had some option before us.These are:
Here a transfer-learning based approach was followed. 1) RDD-2018: In this paper [citation hbe] they intro-
Transfer learning is a modern deep learning phenomena duced a Road damage Dataset which had 9053 images
where, instead of building our whole model from scratch, collected by a smartphone. They collected these images
TABLE I
DATA D ISTRIBUTION

Total Image Image resolution Dataset Training Testing


7706 720x720 Indian 6551(85%) 1155(15%)
2829 600x600 Czech 2405(85%) 424(15%)

from japan and they classified the dataset into multiple


classes

Fig. 5. Annotation Pipeline for multiple class: (a) original image, (b) image
with bounding boxes, (c) final annotated image containing bounding boxes
and class labels

TABLE II
DAMAGE C ATEGORIES F OR M ULTIPLE CLASS

Damage code Damage Type


D00 Longitudinal/ Parallel cracks
D10 Transverse/ Perpendicular cracks
D20 Alligator/Complex crack
Fig. 4. Damage catagories From RDD-18 D40 Potholes

2) RDD-2019: In 2019 the dataset that used on 2018


[3] was updated. They Updated the dataset with adding cracks, and D40 for Potholes. In Table II the categories
4082 road surface images which were also collected from are presented.
street of Japan [28]. 5) Data Acquisition: Bangladeshi dataset: For evalua-
3) IEEE Big Data Challenge Cup for RDD-2020: In tion of model we also collected images for across the
2020 the Dataset 2020 was again updated [2]. As previous country. We have collected around 3000 image from
Dataset only contained street images of japan so this time Jessore, Dhaka and Kushtia city. These images were
they added 3595 images collected from Czech Republic collected through using smartphone camera. We see the
streets. And they also added 9892 road surface images overview of collected images from Table III.
from India. So the Dataset became large containing total here we can some sample of our collected images:
26620 images collected from multiple countries. And this All images can be found in this link.
Dataset [2] was used in IEEE Big Data Challenge Cup 6) Data pre-processing: As we used YOLO v5 and
for Road Damage Detection(RDD)-2020 [23]. YOLO v7 as our model we can not just put the image as
4) Data Selection And Categorization: In this dataset input. We need to do some pre-processing on the collected
RDD-2020 we have images from 3 countries: Japan, images. All the pre-processing steps are given below with
India and Czech Republic. Here we selected images description :
from Czech Republic and India for our dataset images. – Data Augmentation: After collecting the images
As mentioned before Dataset of India contains 9892 data-augmentation is done. In data augmentation
images and Czech Republic contains 3595 images we multiple images are generated which is different
rechecked these images and we selected 2829 images from the source image.For augmentation of the
from Czech Republic, And from India we selected 7706 collected images, rotation operation, shear, zoom-
images for training and testing our model.we can see our ing,horizontal flip are done for augmentation of our
data distribution in this table below.
As Images we selected here was previously used by many
research teams, the images were Pre-labeled. we can see TABLE III
in this Figure 5 how the image was labeled. C OLLECTED IMAGE OVERVIEW
We found the images were labeled in 4 categories. The Place No of image
categories are: D00 to represent Longitudinal/ Parallel Jessore 1500(50%)
cracks, D10 for Transverse/ Perpendicular cracks, D20 Dhaka 750(25%)
for Alligator /Complex crack and D40 for path hole. Kushtia 750(25%)
Fig. 6. Collected Image sample 1 Fig. 8. Labeling collected images (a) original image, (b) Image after selection
ROI and (c)Final Input

TABLE IV
OVERALL DATA D ISTRIBUTION

Dataset Damage Type Training Testing Total(training) Total (testin


Damage 2574(80%) 644(20%)
Bangladeshi 3612(80%) 903(20%)
No Damage 1038(80%) 259(20%)

– YOLO v7 multi-class classification India


– YOLO v5 binary classification Bangladesh
– YOLO v7 multi-class classification Bangladesh
In binary classification for both YOLOv5 and YOLO
v7 the dataset of India and Czech both were split
to 2 part. 85% was given on training and 15% was
for testing. And the iteration number was 6000. And
Fig. 7. Collected Image sample 2 batch size was 64 with learning rate was 0.001.
For multi-class classification for both YOLOv5 and
YOLO v7 only the iteration number was changed to
data and also the brightness of images are increased. 8000. And all other parameters were same as binary
– Image labeling: Collected images and augmented classification. Google colab is used for doing the
images were labelled by us in hand.We can see total process. Dataset was uploaded on Google drive
the labelled data after labelling from this Figure 8. then mounted with google colab. Tesla T4 GPU is
Firstly, the collected and augmented images were used offered by Google Colab is used here, which
uploaded into a site named MakeSense.Ai. Then has 15 GB of video-ram.
the Region of interest (ROI) is selected from each The training was done on Bangladeshi dataset, and
images. After selecting the ROI the label is given to perform the binary classification we’ve used
to selected ROI. In this Bangladeshi dataset Image YOLOv5. The dataset had 4515 images and the train-
were lablled as Damage class. test split was 80-20 percent. And the epoch number
was 50 and batch size was 4 with a learning rate of
B. Data Distribution
0.01. We used google colab to train our model. It
After data augmentation we get 4515 images. Then after provides 12 GB of ram, and a T4 GPU which has a
labelling these images data distribution is done. From this 15 gb of video ram. Due to this, we are obliged to
?? we can see the whole data distribution: keep our batch size very low. At first we tried with a
1) Training: For India, Czech and Bangladeshi dataset batch size of 64, which was considered optimal for
10 models are trained here With the images given be- YOLOv5, but then the memory allocation for this
low.These 6 models are: batch size becomes so humongous, that it exceeds
– YOLO v5 binary classification India the GPU’s allocated CUDA memory. Furthermore,
– YOLO v7 binary classification India we tried to train our model with batch size of 32
– YOLO v5 multi-class classification India and even with 16 also, but in both cases, the batch
size still causes our program to crash due to poor YOLO used a standard cross-entropy loss function,
ram management of google colab’s free GPU. Thus which is known to be less effective at detecting
we have to move with batch size of 4. small objects. Focal loss battles this issue by down-
weighting the loss for well-classified examples and
C. Detection Of Damage
focusing on the hard examples—the objects that are
From the above we know that here 2 main model with hard to detect.
different architecture YOLO V5 and YOLO V7 are used – YOLO v7 also has a higher resolution than the
here for Both binary and multi-class classification. Now previous versions. It processes images at a resolution
how this model detects road damage from images is of 608 by 608 pixels, which is higher than the 416
described below: by 416 resolution used in YOLO v3. This higher
1) YOLO v5 Mechanism: The Mechanism of YOLO v5 resolution allows YOLO v7 to detect smaller objects
is quite similar to yolo v3 and yolo v4. But there are and to have a higher accuracy overall.
changes in dectetion of damages from images. The total – One of the main advantages of YOLO v7 is its speed.
process is given below: It can process images at a rate of 155 frames per
– Like YOLOv4, in YOLOv5 a backbone network is second, much faster than other state-of-the-art object
present which is CSPNet. Here the main feature detection algorithms.
extraction is done by the backbone network which In this Figure 10 the total process of road damage
is used in the detection part. detection from a image is shown:
– The came the next part. In this part PANet is used
as feature pyramid network(FPN). Here the feature
fusion is done.
– In the head part the main detection is done. This part
is similar to YOLO v3 and YOLO v4.
In this Figure 9 the total process of road damage detection
from a image is shown:

Fig. 10. YOLO v7 mechanism

V. R ESULT A NALYSIS
We have used some performance metrics to evaluate our
model. these performance metrics are:
1) Precision and Recall: Precision is the fraction of
Fig. 9. YOLO v5 mechanism data forecasted as relevant by the system that is to-
tally relevant. The capacity of the system to identify
2) YOLO v7 Mechanism: YOLO v7, the latest version all necessary details from the provided dataset is
of YOLO, has several improvements over the previous referred to as recall. Here we can see the equation
versions. of precision and recall:
– Anchor boxes are a set of predefined boxes with dif-
ferent aspect ratios that are used to detect objects of True Positive
different shapes. YOLO v7 uses nine anchor boxes, Precision =
True Positive + False Positive
which allows it to detect a wider range of object
shapes and sizes compared to previous versions, thus True Positive
Recall =
helping to reduce the number of false positives. True Positive + False Negative
– A key improvement in YOLO v7 is the use of a new 2) Mean Average Precision: Mean average precision
loss function called “focal loss.” Previous versions of is a statistic being used evaluate multiple object
TABLE V
DAMAGE C ATEGORIES F OR I NDIAN DATASET

Class name Damage Type


D00 Linear Longitudinal cracks
D10 Linear Lateral cracks
D20 Alligator crack
D40 rutting, bump, pothole etc.

detectors across multiple dataset. To calculate mAP


we need to have a highest precision value for a
specific recall value.
N
1 X
mAP = APi (2)
N i=1
In this equation:
– N is the total number of classes or categories.
– APi is the average precision for class i.
Fig. 11. Iteration vs mAP (Binary) India
3) F1 Socre: The F1 score is calculated as the
weighted sum of precision and recall. If the Score
is greater, the model is thought to perform much
better. Here is the equation for F1 score:
2 · Precision · Recall
F1 = (3)
Precision + Recall
1) Result On Indian Dataset: We have done both binary
and multi-class classification for our indian dataset. In
binary classification our models can detect the road-
damages with a bounding-box labeled as “Damage”.
And in multi-class classification, our models detect the
damages with a bounding-box, label as one of the classes
mentioned in this Table V.
Both binary and multiclass classification were trained and
tested upon YOLO v5 and YOLO v7 algorithms. So,
in total, we trained and tested our Indian dataset on 4
models:
– YOLO v5 binary classification
– YOLO v7 binary classification Fig. 12. Iteration vs F1 (Binary) India
– YOLO v5 Multi-Class classification
– YOLO v7 Multi-Class classification
1) Binary classification: In binary classification we both mAP Figure 15 and f1-score Figure 16 YOLO
have to run our models for 6000 Iteration at least. v7 has gained better results than YOLO v5. Though
This is the standard Iteration size followed for in terms of mAP and f1-score both YOLO v7 and
binary classification using YOLO algorithms. We YOLO v5 were almost the same at the beginning
compared YOLO v5 and YOLO v7 in terms of of the Iteration. But at the end of 8000 Iteration we
both mAP (mean average precision) and f1-score. can see YOLO v7 gained better results than YOLO
As the number of Iteration rises, YOLO v7 starts v5. In Figure 17 we have shown some multi-class
taking the lead compared to YOLO v5.Similar was road-damage detection using YOLO v5 algorithm.
shown for f1-score too. In both cases we can see On the other hand In Figure 18 we have shown some
YOLO v7 performs substantially better than YOLO multi-class road-damage detection using YOLO v7
v5. In Figure 13 we have shown some road- algorithm.
damage detection by using the YOLO v5 algorithm. 2) Result on Bangaldeshi Dataset: In Bangladeshi
In Figure 14 we have shown some road-damage dataset YOLO v5 and YOLO v7 is applied to evaluate
detection by using the YOLO v7 algorithm. the dataset. And only binary classification is done in this
2) Multi-Class Classification: As we are detecting 4 dataset.
types of classes, the total epochs will be 8000. In 1) Binary Classification: In binary classification we
Fig. 13. YOLO v5 Binary Class Detection India Fig. 15. Iteration vs mAP (Multi-Class) India

Fig. 14. YOLO v7 Binary Class Detection India Fig. 16. Iteration vs F1 (Multi-Class) India

have to run our models for 50 epoch at least. We believe a powerful workstation built with a powerful
This is the standard epoch size followed for binary cpu, like Ryzen 7 or 9 series, along with Nvidia Titan
classification using YOLO algorithms. From this gpus will save a lot of time for the researchers. Having a
Figure 19 we see the mAP graph and for f1 score powerful workstation, the first piece of advice we want to
we can see it in Figure 20. provide, is to have a more balanced dataset. A dataset that
Now in Figure 21 and Figure 22 we have shown has a balanced amount of images from all classes will
some binary-class road damage detection by using help the model to achieve robustness. The Bangladeshi
YOLO v5 and YOLO v7 algorithm. dataset needs to be resized and compressed. We believe
it would reduce the training time largely. Then, we would
VI. F UTURE W ORK
suggest ensemble multiple models and build a hybrid one.
Here, we have developed an object detection model using Here we used YOLO all along. But ensembling YOLO
YOLO algorithm that can detect road damages from with some other model will increase the model’s overall
images. Though we have tried our best to build a rigid accuracy. Moreover we would suggest to imply multi-
model, the model lacks accuracy. It is suggested to the class classification on Bangladesh dataset too.
researchers to have a powerful workstation. All of our We have also started working on an application to make
models have trained using a free version of google-colab. connection with human and concerned division to address
Fig. 17. YOLOv5 Multi-Class Detection India Fig. 19. Epoch vs mAP (Binary) Bangladesh

Fig. 18. YOLOv7 Multi-Class Detection India Fig. 20. Epoch vs f1 score (Binary) Bangladesh

this road damage issues and get quick results on that. Our v7 object detector for this model got a satisfactory result.
app workflow is mentioned in Figure 23. We have trained our YOLOv5 model with 2 datasets of
which one is Indian and other is Czech dataset. We have
VII. C ONCLUSION also created our own Bangladeshi dataset which was quite
tough task tested it with the latest YOLO v7 model which
Safe road plays an important role for any country. Many
is more accurate and faster than other versions of YOLO.
things depend on safe roads. From the economical growth
We have tried YOLO model because it is easy to build
for a country to our valuable life it all depends on how
and can be trained and tested directly on whole images.
safe and good our roads are. But keeping it safe is not
Unlike CNN, RCNN and Faster RCNN, YOLO is the
that much easy task. There may be a lot of reason for a
best object detection model. YOLO is easy to use, we
damaged road. But with the advancement of technology
can train it by using a conventional GPU. A large number
it is becoming easier nowadays to detect those damage on
of features have been verified. We have selected them to
roads and repair it for safety of our valuable life and time.
use for improving the accuracy.
Computer vision, image recognition, machine learning
are playing a vital role here. In this paper, an unified R EFERENCES
model was proposed for the road damage detection. So [1] A. Angulo, J. A. Vega-Fernández, L. M. Aguilar-Lobo, S. Natraj,
far we have used the state-of-the-art YOLOv5 and YOLO and G. Ochoa-Ruiz,“Road damage detection acquisition system
Fig. 21. YOLO v5 Binary-Class Detection Bangladesh Fig. 23. Application Workflow

IEEE international conference on image processing (ICIP), pp.


3708–3712, IEEE, 2016.
[7] A. Angulo, J. A. Vega-Fernández, L. M. Aguilar-Lobo, S. Natraj,
and G. Ochoa-Ruiz, “Road damage detection acquisition system
based on deep neural networks for physical asset management,” in
Mexican International Conference on Artificial Intelligence, pp.
3–14, Springer, 2019.
[8] T. Ishtiak, S. Ahmed, M. H. Anila, and T. Farah, “A convolu-
tional neural network approach for road anomalies detection in
bangladesh with image thresholding,” in 2019 Third World Con-
ference on Smart Trends in Systems Security and Sustainablity
(WorldS4), pp. 376–382, IEEE, 2019.
[9] H. Maeda, Y. Sekimoto, T. Seto, T. Kashiyama, and H. Omata,
“Road damage detection and classification using deep neural
networks with smartphone images,” Computer- Aided Civil and
Infrastructure Engineering, vol. 33, no. 12, pp. 1127–1141, 2018.
[10] R. Roberts, G. Giancontieri, L. Inzerillo, and G. Di Mino,
“Towards low-cost pavement condition health monitoring and
analysis using deep learning,” Applied Sciences, vol. 10, no. 1,
p. 319, 2020.
[11] A. Alfarrarjeh, D. Trivedi, S. H. Kim, and C. Shahabi, “A deep
learning approach for road damage detection from smartphone
Fig. 22. YOLO v7 Binary-Class Detection Bangladesh
images,” in 2018 IEEE International Conference on Big Data (Big
Data), pp. 5201–5204, IEEE, 2018.
[12] X. Zhang, X. Xia, N. Li, M. Lin, J. Song, and N. Ding, “Exploring
based on deep neural networks for physical asset management,” the tricks for road damage detection with a one-stage detector,”
in Mexican International Conference on Artificial Intelligence, pp. in 2020 IEEE International Conference on Big Data (Big Data),
3–14, Springer, 2019. pp. 5616–5621, IEEE, 2020.
[2] A. Tedeschi and F. Benedetto, “A real-time automatic pavement [13] D. Jeong, “Road damage detection using yolo with smartphone
crack and pothole recognition system for mobile android-based images,” in 2020 IEEE International Conference on Big Data (Big
devices,” Advanced Engineering Informatics, vol. 32, pp. 11–25, Data), pp. 5559–5562, IEEE, 2020.
2017. [14] D. Arya, H. Maeda, S. K. Ghosh, D. Toshniwal, H. Omata,
[3] M. O. Sghaier and R. Lepage, “Road damage detection from vhr T. Kashiyama, and Y. Sekimoto,“Global road damage detec-
remote sensing images based on multiscale texture analysis and tion: State-of-the-art solutions,” arXiv preprint arXiv:2011.08740,
dempster shafer theory,” in 2015 IEEE international geoscience 2020.
and remote sensing symposium (IGARSS), pp. 4224–4227, IEEE, [15] R. Ishimwe, P. Iradukunda, and J. B. Kwizera, “Real-time road
2015. damage detection using deep convolutional neural networks and
[4] H. Oliveira and P. L. Correia, “Automatic road crack detection and a smartphone: Project report,” 2021.
characterization,”IEEE Transactions on Intelligent Transportation [16] L. Menghini, F. Bella, G. Sansonetti, and V. Gagliardi, “Eval-
Systems, vol. 14, no. 1, pp. 155–168,2012. uation of road pavement conditions by deep neural networks
[5] R. Fan and M. Liu, “Road damage detection based on unsuper- (dnn): an experimental application,” in Earth Resources and
vised disparity map segmentation,” IEEE Transactions on Intel- Environmental Remote Sensing/GIS Applications XII, vol. 11863,
ligent Transportation Systems, vol. 21, no. 11, pp. 4906–4911, pp. 159– 168, SPIE, 2021.
2019. [17] R. Vishwakarma and R. Vennelakanti, “Cnn model tuning for
[6] L. Zhang, F. Yang, Y. D. Zhang, and Y. J. Zhu, “Road crack global road damage detection,” in 2020 IEEE International Con-
detection using deep convolutional neural network,” in 2016 ference on Big Data (Big Data), pp. 5609– 5615, IEEE, 2020.

You might also like