Reviewreport
Reviewreport
Reviewreport
Bachelor of Technology
in
COMPUTER SCIENCE AND ENGINEERING
By
DARSHAN R -ENG20CS0084
DEEKSHITH S -ENG20CS0086
CHAAKRIKA -ENG20CS0135
HARSHITH -ENG20AM3008
(2021-2022)
CHAPTER 1
INTRODUCTION
Now-a-days two wheelers are the most preferred mode of transport. It is highly desirable for
bike riders and the pilots to use helmets. This paper uses image processing techniques by
without a helmet, the vehicle details with the person(s) on the vehicle and the number plate is
motor cyclists using images or videos taken by camera. The recognition of number plate
algorithms has different steps like Vehicle Classification, Pre-processing, choosing the
algorithms, storing in the database with the image as the proof with date and time recorded .
A database will be designed with the proof stored with the offense to identify every offender
accurately and arrest the suspect’s vehicle and hence imposing violation fines, the system
uses pure machine learning in order to identify different types of helmet that it comes across
1.1 SCOPE:
A helmet aims to scale back the impact of a force or collision to the head by an accident, that
reduces a chance of great head and brain injuries by dissipa ng the force and energy of the
impact, motorcyclists must take extra precau ons to guard their bodies.
🠶 As the law mandatorily tells that, every motorcyclist must wear a helmet while riding a
motorbike.
🠶 Wearing a helmet over non helmet wearer increase their possibility of survival.
CHAPTER 2
PROBLEM DEFINITION:
Problem: Many bike riders used to ignores their safety and thus leading to the viola on of RTO
helmet rule as they drive vehicle without defence apparatus like a helmet. The policeman tried to
manage this problem manually but it is inadequate for the real situa ons.
🠶 Solu on: To unravel this problem a more sophis cated computer vision model that
encompasses image processing, CNN, Faster-RNN, OCR (Op cal character
recogni on), SSD (Single Shot mul -box Detector), YOLO (You Only Look Once) using
Python.
CHAPTER 3
LITERATURE REVIEW:
In this paper, the process of classification and descriptors are used to detect the vehicles and
then detect the persons with 2 wheelers and detect if they are wearing the helmet or not.
subtraction, the moving objects(vehicles) are differentiated with the background which gives
only an image of the vehicles and the background will be eliminated. Vehicle classificationThe
each generated image and passed on to random forest classifier to categorize vehicle as
Detection of helmet: Determining RoIThis step is performed so that only the region of
interest is chosen which reduces the processing time and increases processing time.
Extracting the features - A sub window is formed in the above generated RoI and the main
part of the image(head in this case) is extracted and passed as input for the classifier to check
This project/paper does mainly deal with helmet detection. For it to be used in a surveillance
system, it should be able to detect the number plate of the vehicle to impose fines on the
“Helmet detection using machine learning and Automatic Number Plate recognition”
This paper does the process of extracting the objects from the image using YOLO object
1. Helmet detection - Annotated images are given to the YOLOv3 model for training and the
● This paper does not deal with the ability to detect the difference between motorcycle
and a non motorcycle and this project cannot be implemented for input as videos since
preprocessed and used in detecting the riders of motorcycles with and without helmets.
1.Dataset creation and annotation - Random data in the form of videos is collected from
Myanmar and is preprocessed to each video of 100 frames each and object detection is done
through YOLO9000 algorithm with pre trained weights and the recognized vehicle with person
2.Helmet use detection algorithm - For object detection, the single stage approach of
RetinaNet is used to detect the helmets. ResNet50 as backbone initialized with pre-trained
weights from ImageNet. The models were implemented using python keras library with tensor
flow as backend .
3.Results - The helmet use detection results of the algorithm on the test set, using the optimal
model developed on the validation set (where it obtained 72.8% weighted mAP).
● The limitation for this project is that in many instances there will be 2 persons
traveling in the motor-cycle and this model does not recognize if the pillion is wearing
the helmet or not. This can detect only one person with a helmet or not and the
In this model various previous methods related to automatic helmet detection has
been taken into consideration and the new model has been given. This is a technique
of automatic helmet detection , where the input is either the video which has been
two main parameters i.e aspect ratio and size of the particular vehicle and then the
vehicle is classified. 4.Helmet detection - This step includes extraction of the head part
from the classified image and providing it to ROI where the matching of ROI and
This model gives an idea of the number of people who violate the traffic rules. It is also
cost effective as we use open source technology like OpenCV , etc. for development
purposes. Further this model can be used to detect people talking on phone while
Network”
This model tells us that since motorcycles are affordable, people use them for daily
transportation. Due to this increased use the occurrence of accidents are high . Major
of the accidents include head injury, which is due to helmet violation by the
motorcycle users. As many cities have surveillance systems for safety purposes , we
can use it for detecting non helmet riders which would be a cost effective approach.
, etc. There are four different steps included in the process of this model:
1. Background modeling and object detection: This step is basically used for applying
adaptive background subtraction to get the images properly and of same quality no
matter whatever the conditions might be whether it's day time, night or rainy , etc. To
separate various factors not needed we use the Gaussian mixture model.
type of feed forward neural network using back propagation network. The idea of
using this technique was due to the ability to extract interdependent data from the
images. This technique involves various levels for detecting the object , where in each
level we get the data and in the final level the entire image is finally formed.
for the identification of the motorcycle from other objects. These boxes are evaluated
by providing them as an input to the CNN model , which in reference to the various
for the top one fourth of the image, cause that’s the position where the head of the
motorcyclists would always be. Then we find the subtraction of the binary image of
● This model gives a well defined way of dealing with helmet detection and
various ways of getting rid from the problem. Thus this is a new approach using
machine learning apart from the previous approach which used image
PROJECT DESCRIPTION:
The first step to create a helmet detection classifier model will have to train our model with
a lot of images. We will have to select a large number of images which will help us to get
more accuracy. It is suggested to find real-world images as it is quite difficult to find a
dataset for this specific purpose. You can look on Google to find images of people wearing
helmets but make sure not to download very high-quality images because the larger the
data the more time it’s going to take to train your model.
Also, we can create our model from scratch using Convolutional Neural Network but for
detection purposes, and also we can use bigger models like YOLO.
CHAPTER 5
REQUIREMENTS:
● PROJECT REQUIREMENTS:
1. YOLOV3,
2. TENSORFLOW MODULE,
3. KERAS MODULE
● SYSTEM REQUIREMENTS
1. RAM-MIN 8GB
2. WINDOWS/IOS/LINUX/UBUNTU
3. SSD-MIN 256GB
METHODOLOGY:
🠶 Deep Networks for Object Detection. The R-CNN method trains CNN
end-to-end to classify the proposal regions into object
🠶 will gives the knowledge that the box contains an object. PASCALVOC 2007 is
a dataset which incorporates nearly 20
🠶 different classes of images. The CNN model as compared to RCNN and Faster
RCNN is proved efficient because the number
🠶 produces about 100 bounding boxes. Thus, this model enables detecting
multiple objects. Henceforth, the Easynet model is
🠶 easy to create and implement which may even be extended for detecting
moving objects
🠶 YOLO (You Only Look Once) is an algorithm that uses neural networks to
provide real-time object detection and location of the object.
🠶 System Firstly, resizes to 448 × 448 as input picture, Secondly, runs a single
accomplish both image segmentation and image classification at the exact time
in precisely one execution. An objective of
of the bounding box to organize the type of object. Additionally, the best part is
that it can detect numerous objects in the
image executing only once and is significantly quicker and giving high accuracy.
SSD object detection algorithm summarises
out of 2 sections: Extraction of the feature maps and applying convolution filters for
the detection of objects. SSD generally
uses VGG16 to extract feature maps. SSD is more reliable than YOLO.
EXPERIMENTATION:
The first step to create a helmet detection classifier model will have to train our model with
a lot of images. We will have to select a large number of images which will help us to get
more accuracy.
It is suggested to find real-world images as it is quite difficult to find a dataset for this
specific purpose.
You can look on Google to find images of people wearing helmets but make sure not to
download very high-quality images because the larger the data the more time it’s going to
take to train your model.
Also, we can create our model from scratch using Convolutional Neural Network but for
detection purposes, and also we can use bigger models like YOLO.
3. First, it divides the image into a 13×13 grid of cells. The size of these 169 cells vary
depending on the size of the input.
4. For each bounding box, the network also predicts the confidence that the bounding
box actually encloses an object, and the probability of the enclosed object being a
particular class.
5. Most of these bounding boxes are eliminated because their confidence is low or
because they are enclosing the same object as another bounding box with a very high
confidence score. This technique is called non-maximum suppression.
Determining the RoI is an important step of the proposed system. The use of this region
enables the reduction of the area in which the search will be performed, which implies less
processing time and a greater precision of the results compared with the complete image. The RoI is
a region of the captured image in the vehicle segmentation stage. As the proposed system is
interested in the detection of motorcyclists without helmets, the head region of the motorcyclist
must be located completely inside the RoI.
The size of the RoI was tested across the image database. In all images of motorcyclists, the head
region is located within the selected RoI. Other sizes were tested but did not produce satisfactory
results. Figure 6 shows some RoIs that were identified using the image database.
A search for the circle with the largest CHT accumulator (circle with the most points) is performed
based on the obtained circles. Figure 7e shows the best circle in the RoI. A strategy that uses only
geometric information, such as the use of the CHT, will not return acceptable results, as the shape of
the helmet resembles the shape of the head. Thus, more information is necessary to distinguish
heads from helmets.
A sub-window of the RoI is computed to reduce the detection area of the helmet. The use of a
sub-window enables the calculated descriptors to better detail the helmet region, as only that region
will be processed. A sub-window will correspond to the square that circumscribes the obtained
circumference . This sub-window will be employed in the extraction of features. After the
sub-window is computed, the HOG descriptor is calculated. Figure shows the steps for the
calculation of the sub-window.
A hybrid descriptor that combines the CHT and the HOG descriptors was employed for the
extraction of features. The hybrid approach incorporates different information from more than one
The MLP classifier was employed in this stage. The image classification task consists of
differentiating the segmented objects into two classes: with helmet and without helmet.
A feature vector is obtained for each generated sub-window. Figure 8 shows examples of the
sub-windows that are used in the image classification stage. The HOG descriptor was employed for
the extraction of features, which were arranged in nine blocks; each block was partitioned into nine
cells. Each cell generated a feature.
Thus, a vector was generated with 81 features. An extensive variation of blocks and cells was used.
Based on the obtained results, the selected combination was determined to be the best selection. The
MLP classifier was utilized with a hidden layer. Other values of hidden layers and a model without
this layer were tested.
The results obtained with the remaining configurations were not better than the results obtained with
one layer. Another parameter that is very sensitive in the MLP classifier is the number of neurons.
In this study, 50 neurons were employed. Although other values were also tested, this value returned
the best results.
In this section, the results are presented and discussed. In addition, a comparative analysis is
performed with other algorithms to describe and classify the images.
The results are divided into two groups: – results of the vehicle classifier, and – results of the
helmet detector. Information about the image databases, generated from the segmentation of
vehicles and the methodology employed for the classification of the results, are also presented in
this section.
All results were performed on a machine with AMD Phenom II processor at 2.8 GHz with 8 GB
RAM. Regarding to the computational time of the vehicle segmentation, we were capable of
executing this task in real-time using 15 frames/s (half of the original video).
The generated databases showed that this frame ratio is suitable for the proposed system. Database1
was obtained from 110 minutes of video and is composed of a total of 3,245 images, which are
divided as follows:
The images of database1 were employed in the vehicle classification stage, as these images were
captured from a location farther from the road compared with database2.
Therefore, the quality of the images after segmentation were not sufficient to be used in the
detection of helmet use. In some images, helmet use was not distinct. Figure shows examples of
images from database1. Database2 was obtained from 40 minutes of video. These videos were
captured at a location in which the number of motorcyclists without helmets was balanced relative
to the number of motorcyclists with helmets.
The images were captured at a shorter distance from the road compared with database1. Figure
shows examples of images from database2. Database2 is distributed as follows:
🠶 3. Culjak I, Abram D, Pribanic T, Dzapo H, Cifrek M (2012) A brief introduc on to OpenCV. In:
2012 proceedings of the 35th interna onal conven on MIPRO, Opa ja, pp 1725–1730
🠶 4. Krizhevsky A, Sutskever I, Hinton GE (2012) Imagenet classifica on with deep convolu onal
neural networks. pp 1097–1105
h ps://proceedings.neurips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Pape
r.pdf