Object Detection Using Yolo Algorithm-1

Uploaded by

Mayur Kundra

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views

Object Detection Using Yolo Algorithm-1

Uploaded by

Mayur Kundra

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 9

Object Detection Using Yolo Algorithm

Introduction
The primary objective of Object Detection is to accurately locate and identify
all objects present within an image. This task hinges on training a system to
autonomously learn the process of object detection from a dataset. However,
achieving real-time object recognition poses a formidable challenge due to the
intricacies involved in searching and recognizing objects swiftly. Despite
ongoing research endeavors, existing methods often fall short in terms of
efficiency, lengthy training durations, impracticality for real-time applications,
and limited scalability across diverse object classes.

While identifying a single specific object is relatively straightforward,

distinguishing between multiple objects, even those of the same category,
presents a significant challenge, particularly for machines lacking
comprehensive knowledge of potential object variations. Object detection finds
wide-ranging applications in domains such as healthcare, traffic management,
and autonomous vehicles, where precise object identification is crucial for
optimal system performance.

Developing a real-time object detection system requires navigating through

various techniques, including image classification, where images are assigned
class labels, and object localization, which involves outlining bounding boxes
around detected objects. Object detection integrates both tasks, necessitating
simultaneous object identification via bounding boxes and the assignment of
class labels to each identified object within an image.

Literature Survey:

YOLOv4 detects the object with high accuracy when compared to other
algorithms like CNN,
RCNN etc. YOLOv4 is more efficient because it is a mixture of Convolutional
Neural
Networks (CNNs) and sliding window approach which can achieve 65.7%
Average Precision
accuracy in accordance with the Microsoft COCO dataset. The first model that
produced a 30%
improvement in object detection is RCNN. The very similar approach to RCNN is
Fast RCNN,
Selective search was used in Fast RCNN to detect an object. After Fast RCNN,
Faster RCNN came in
for object detection.
Though Selective Search was serviceable it took lot of time to detect the
object. In 2015, SSD
[Single Shot MultiBox Detector] came to detect multiple objects at single shot.
It also increased the
detection rate. Anchors were used in the SSD to count the default regions.
From the name, we can
clearly say that it takes a single shot to detect multiple images. The YOLO group
of architectures
were constructed in the same vein as the SSD architectures but YOLO is more
advantageous than
any other method for object detection as it has high accuracy in object
detection.

Existing System:

There is various real time object detection with voice output models using
different algorithms
like CNN, RCNN, Faster RCNN, YOLOv3 etc. The problem with these algorithms
is accuracy is less and real time speed of object detection is low.
Yolo v4 Architecture:
In real time object detection using YOLOv4 the image goes through several
convolutional layers to form a feature map. The images are divided into grid
cells. Each grid cells in YOLO algorithm generate two anchors each. In every
object detection model the following steps take place, data augmentation is
done on the input images where the different orientation of same images is
trained to improve the accuracy. Then Normalization is done to improve the
quality of images.
Regularization is done to adjust the output within the range. Loss functions are
used to calculate the losses and tries to reduce the loss using backtracking
algorithms. Classification and bounding boxes are done on the images captured
by camera and detection is done. The work is done by dividing the detection
task into two categories, one is detection and other one is classification. We
use darknet framework for implementation of YOLOv4.
YOLOv4 has three important parts:
• Backbone
• Neck
• Head

CutMix,Mosaic data augumentation,class label smoothing and dropblock

regularization are used to increase the classifier training accuracy. Mish
activation function is also used in addition for classification and training.
Instead of using a single image Mosaic data augmentation uses 4-image at the
same time for better processing.
CutMix data augmentation: we use different orientation and random patches
between the training images. Localization ability is increased on less
discriminative parts of the object to be classified.
Class label smoothing: Label smoothing is a regularization technique that
addresses overfitting and overconfidence while classification. Mish Activation
function: It is a self-regularized function which is defined in mathematical
terms below:
f(x)=x tanh(soft plus(x))
The detector performance is increased by using SPP,PAN, and SAM.
SPP: SPP (Spatial pyramid pooling) On simultaneous pooling on multiple kernel
sizes Spatial pyramid pooling acquires both the coarse and fine information
which are required for further process.
PAN: Path Aggregation Network is a technique that uses maximum information
in layers close to the input by inspecting features from different convolution
layers.
SAM: Modified Spatial Attention Module is used to highlight the most
important and miniature features. For further optimization CSPNet divides the
feature map of the base layer into two segments and then merging them
together using a cross-stage hierarchy. In yolov4 CIoU-loss is used to reduce the
errors in boundingboxes by using midpoints of actual bounding boxes and
predicted bounding boxes. CmBN(cross mini batch normalization) technique is
used, and the results is that it decreases the cost of training. Bag Of Freebies
for backbone are Class label smoothing, DropBlock regularization, CutMix . Bag
of Specials for backbone are Cross-stage partial connections and Mish
activation. Bag of Freebies for detector are CmBN, DropBlock regularization,
Self- Adversarial Training, eliminate grid sensitivity, Cosine annealing scheduler,
Optimal hyperparameters, Random training shapes. Bag of Specials (BoS) for
detector are Mish activation, SPP block, SAM-block, PAN path-aggregation
block, DIoUNMS. After the introduction of Bag of features and Bag of species
classification and image detection became very easy and anyone can use this
for training the model.

Related Work:
Presently, automated systems employ object detection techniques utilizing
older CNN algorithms. However, we employ the Yolo algorithm, a newer
approach, for quicker and more accurate object detection, enhancing detection
speed and response times. Over recent years, Yolo has rapidly evolved and
achieved significant success in computer vision research, owing to its efficient
algorithms and effective adaptation of object detection techniques. Object
detection has made remarkable strides in computer vision, with current
research elucidating the underlying workings of these techniques.
In object detection, the system initially identifies the location and scale of
objects within an image. The primary objective of the object detector is to
identify any number of objects belonging to a particular class, regardless of
their type, location, or size in the input image. Object detection typically serves
as the initial step in computer systems, enabling them to gather additional
information such as recognizing specific instances (e.g., human faces), tracking
objects across image sequences (e.g., action tracking), and obtaining detailed
object information.

Object detection techniques find applications in various domains, including

human-computer interaction (e.g., Siri or Alexa), robotics, smartphones, data
tracking, and search engines (e.g., Google, Firefox, and targeted
advertisements). Each application of object detection features distinct
characteristics; some focus on facial recognition, while others involve updating
previously searched data automatically. Different systems may handle single-
object detection within a single view or detect single objects across multiple
views. The output of these systems often depends on the views provided
during training.

The Proposed Framework:

Our proposal aims to achieve swift and accurate image detection by leveraging
the YOLO (You Only Look Once) Algorithm. YOLO stands out from previous
object detection algorithms by considering the entire image rather than
specific regions, thereby enhancing efficiency. Unlike region-based methods,
YOLO employs a single convolutional network to predict bounding boxes and
class probabilities for objects.

Image processing in YOLO involves dividing the image into an SxS grid and
generating m bounding boxes within each grid, each with associated class
probability and offset values. The algorithm's processing speed is notable,
capable of handling 45 frames per second. However, it struggles with small
objects due to spatial constraints, such as the flicker of birds.

Distinct from other detection algorithms, YOLO takes the holistic view of the
object, forming bounding boxes around entire objects in a single instance. Its
efficiency lies in processing 45 frames per second and representing data in
vector form: Y = (pc, bx, by, bh, bw, c1, c2, c3), where pc indicates object
presence probability, bx, by, bh, bw denote object class presence, and c1, c2, c3
represent object classes.

For scenarios with multiple bounding boxes, YOLO employs non-max

suppression to select the most accurate bounding box while discarding others.
This suppression is based on the intersection over union (IoU) formula,
comparing the intersection and union areas of bounding boxes.

CNN (Convolutional Neural Network) has revolutionized object detection in

recent decades by effectively handling vast amounts of data. It gained
prominence in 2012 when AlexNet won the ImageNet Computer Vision contest
with 84% accuracy, utilizing CNN for object detection. CNN's role in computer
vision techniques is pivotal, especially demonstrated by its incorporation into
the winning project of the ImageNet competition.

CNN comprises artificial neural network layers housing artificial neurons, which
mimic biological neurons by computing weighted sums of inputs to produce
activation values. Each layer in CNN detects specific features, progressively
identifying complex patterns from edges to intricate objects like faces and
birds.
Implementation:

The YOLO (You Only Look Once) Algorithm operates by taking an image as
input, partitioning it into an SxS grid, with each grid containing m bounding
boxes. Within these boxes, both the class probability and offset value are
stored. Detection occurs for bounding boxes with class probabilities surpassing
a predefined threshold. Renowned for its exceptional speed, YOLO processes
45 frames per second, making it one of the fastest algorithms in computer
vision. However, it faces challenges in accurately detecting small objects, like
flocks of birds, due to spatial limitations.

Distinguished from other object detection methods, YOLO processes the entire
object in a single instance, emphasizing its remarkable speed of 45 frames per
second. It forms bounding boxes around objects, assigns class probabilities,
and demonstrates an understanding of object generalization.

For training the dataset, we employ both forward and backward propagation
models. During testing, we feed the image into the system, conducting forward
propagation until the desired output is obtained. In real-world applications,
grid sizes are typically large, often reaching dimensions like 20x20, depending
on the input provided.

Algorithm for Object Detection: -

INPUT: Trained Image dataset, Testing Trained image dataset
OUTPUT: Input image labelled with its class name along with the Average
precession value with Rectangular
box around every Object.
1: Pass the Input Image for detection
2: yolo Algorithm for Image
3: Yolo Algorithm process the image
4: Based on the trained dataset it assigns the class names
5: Verify threshold value crosses 0.5 or less
6: If the threshold is less than 0.5 then it is original and provides the mean
Procession value otherwise it simply
ignores.
Results:

When passing an image for input, it undergoes a division into a grid structure
mirroring the training dataset's grid layout. Each grid cell yields an output of
3x3x19.16 values, consistent with the prediction model. The initial value
signifies the probability of an object belonging to a specific class. The
subsequent eight values pertain to the first anchor box's characteristics,
including bounding box coordinates and class information. Likewise, the
following set of eight values corresponds to the second anchor box,
maintaining the same format.

Subsequently, non-maximum suppression is applied to each bounding box to

consolidate them into singular entities, mitigating redundancy.

The training process for YOLO unfolds as follows:

Input image, typically with dimensions (608, 608, 3).

The image is fed into a CNN, resulting in an output tensor of dimensions (19,
19, 5, 85), which is then flattened across the last two dimensions to yield (19,
19, 425).
In this configuration:
Each grid cell (19x19) generates 425 values.
The product of 85 and 5 yields 425, with 5 representing the number of anchor
boxes per grid.
Of these, 85 represents center coordinates, width, height, x, and y values for
bounding boxes, while 80 denotes the total classes for detection.
Finally, non-maximum suppression is applied to refine bounding boxes,
ensuring the retention of only the most relevant box per detected object while
eliminating overlap.

Conclusion and Future Scope:

In light of the increasing popularity of object detection applications across

various domains, we have developed a console-based application. This
application accepts an image as input and outputs the same image with object
names detected, displayed atop bounding boxes drawn around the identified
objects. To train our custom dataset, we utilized Google Colab, employing
supervised learning techniques facilitated by LabelImg for data labeling.
Leveraging the YOLO algorithm, we ensure swift results and accuracy, utilizing
the latest version for optimal performance.

Mastering All YOLO Models From YOLOv1 To YOLO
100% (1)
Mastering All YOLO Models From YOLOv1 To YOLO
58 pages
Project
100% (1)
Project
30 pages
Enhancing Real-Time Object Detection With YOLO Alg
No ratings yet
Enhancing Real-Time Object Detection With YOLO Alg
9 pages
You Only Look Once - Object Detection Models A Review
No ratings yet
You Only Look Once - Object Detection Models A Review
8 pages
YOLOv1 v8综述
No ratings yet
YOLOv1 v8综述
36 pages
Final Synopsis1
No ratings yet
Final Synopsis1
10 pages
Yolo Paper
No ratings yet
Yolo Paper
10 pages
Red Mon 2016
No ratings yet
Red Mon 2016
10 pages
yolopdf
No ratings yet
yolopdf
10 pages
Overview of YOLO ObjectDetectionAlgorithm
No ratings yet
Overview of YOLO ObjectDetectionAlgorithm
7 pages
Detection and Content Retrieval of Object in An Image Using YOLO
No ratings yet
Detection and Content Retrieval of Object in An Image Using YOLO
8 pages
Yolo
No ratings yet
Yolo
10 pages
You Only Look Once - Unified, Real-Time Object Detection
No ratings yet
You Only Look Once - Unified, Real-Time Object Detection
10 pages
Incremental Training For Image Classification of Unseen Objects
No ratings yet
Incremental Training For Image Classification of Unseen Objects
19 pages
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
No ratings yet
(IJCST-V8I3P4) :sakshi Gupta, Dr. T. Uma Devi
5 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Object Detection and Classification Using Yolov3 IJERTV10IS020078
No ratings yet
Object Detection and Classification Using Yolov3 IJERTV10IS020078
6 pages
Object Detection Technique (YOLO)
No ratings yet
Object Detection Technique (YOLO)
19 pages
Yolo Vs RCNN
No ratings yet
Yolo Vs RCNN
5 pages
Yolo Algorithm
No ratings yet
Yolo Algorithm
37 pages
Yolo
No ratings yet
Yolo
10 pages
Paper
No ratings yet
Paper
3 pages
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
No ratings yet
YOLO-LITE: A Real-Time Object Detection Algorithm Optimized For Non-GPU Computers
8 pages
Deep Learning YOLOv2
No ratings yet
Deep Learning YOLOv2
3 pages
YOLO V3 ML Project
No ratings yet
YOLO V3 ML Project
15 pages
Presentation1 FINAL 1
No ratings yet
Presentation1 FINAL 1
11 pages
1-s2.0-S1877050924033301-main
No ratings yet
1-s2.0-S1877050924033301-main
7 pages
Paper 5
No ratings yet
Paper 5
13 pages
Object_Detection_Document
No ratings yet
Object_Detection_Document
4 pages
YOLO Based Object Detection Models: A Review and Its Applications
No ratings yet
YOLO Based Object Detection Models: A Review and Its Applications
40 pages
Evolution of Yolo Algorithm and Yolov5: The State-Of-The-Art Object Detection Algorithm
100% (1)
Evolution of Yolo Algorithm and Yolov5: The State-Of-The-Art Object Detection Algorithm
61 pages
Seminar 201202175023
No ratings yet
Seminar 201202175023
16 pages
Multiple Object Tracking Using Deep Learning With Yolo v5 IJERTCONV9IS13010
No ratings yet
Multiple Object Tracking Using Deep Learning With Yolo v5 IJERTCONV9IS13010
5 pages
Unified Real-Time Object Detection
No ratings yet
Unified Real-Time Object Detection
36 pages
Real Time Object Detection
No ratings yet
Real Time Object Detection
8 pages
YOLO Based Detection and Classification of Objects in Video Records
No ratings yet
YOLO Based Detection and Classification of Objects in Video Records
5 pages
Team 10
No ratings yet
Team 10
20 pages
yolo
No ratings yet
yolo
32 pages
Image Detection and Segmentation Using YOLO v5 For
No ratings yet
Image Detection and Segmentation Using YOLO v5 For
6 pages
Analytical Study On Object Detection Using Yolo Algorithm
No ratings yet
Analytical Study On Object Detection Using Yolo Algorithm
3 pages
Base Paper (YOLO)
No ratings yet
Base Paper (YOLO)
6 pages
27 GSJ8976
No ratings yet
27 GSJ8976
16 pages
Machine Learning - Advanced Concepts
From Everand
Machine Learning - Advanced Concepts
Derrick Mwiti
No ratings yet
Make 05 00083 v2
No ratings yet
Make 05 00083 v2
37 pages
yolo1-11
No ratings yet
yolo1-11
38 pages
2023 - Comparison of Transfer Learning Techniques For Object Detection
No ratings yet
2023 - Comparison of Transfer Learning Techniques For Object Detection
10 pages
Features of Yolo11
No ratings yet
Features of Yolo11
9 pages
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
No ratings yet
The Real-Time Detection of Traffic Participants Using YOLO Algorithm
4 pages
YED-YOLO: An Object Detection Algorithm For Automatic Driving
No ratings yet
YED-YOLO: An Object Detection Algorithm For Automatic Driving
9 pages
Algoritm For MOD
No ratings yet
Algoritm For MOD
32 pages
A Review of YOLO Object Detection Algorithms Based
No ratings yet
A Review of YOLO Object Detection Algorithms Based
4 pages
Tinier YOLO
No ratings yet
Tinier YOLO
10 pages
Improvement of Object Detection Based On Faster R - 220904 150051
No ratings yet
Improvement of Object Detection Based On Faster R - 220904 150051
5 pages
SEMINAR
No ratings yet
SEMINAR
13 pages
BIOMETRICS
No ratings yet
BIOMETRICS
18 pages
Object Detect
No ratings yet
Object Detect
12 pages
YOLO Object Detection Explained_ A Beginner's Guide _ DataCamp
No ratings yet
YOLO Object Detection Explained_ A Beginner's Guide _ DataCamp
14 pages
You Only Look Once Model-Based Object Identification in Computer Vision
No ratings yet
You Only Look Once Model-Based Object Identification in Computer Vision
12 pages
Object Detection: Advances, Applications, and Algorithms
From Everand
Object Detection: Advances, Applications, and Algorithms
Fouad Sabry
No ratings yet
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
From Everand
Pedestrian Detection: Please, suggest a subtitle for a book with title 'Pedestrian Detection' within the realm of 'Computer Vision'. The suggested subtitle should not have ':'.
Fouad Sabry
No ratings yet
Prediction of Bearing Remaining Useful Life Based On LSTM Network
No ratings yet
Prediction of Bearing Remaining Useful Life Based On LSTM Network
10 pages
Agriculture 12 00656 v2
No ratings yet
Agriculture 12 00656 v2
14 pages
(ML) Machine Learning Lab Manual
No ratings yet
(ML) Machine Learning Lab Manual
25 pages
Intern Project
No ratings yet
Intern Project
113 pages
Child Predator Detection System On Social Media
No ratings yet
Child Predator Detection System On Social Media
5 pages
Jhankar Paper Propulsion
No ratings yet
Jhankar Paper Propulsion
13 pages
Ensemble Methods.pptx
No ratings yet
Ensemble Methods.pptx
32 pages
Offline Signature Verification
No ratings yet
Offline Signature Verification
13 pages
Crimes in India
No ratings yet
Crimes in India
88 pages
Arthroplasty Today
No ratings yet
Arthroplasty Today
8 pages
PPT-Template (1)
No ratings yet
PPT-Template (1)
31 pages
Analysis of Machine Learning Algorithms On Cancer Dataset
No ratings yet
Analysis of Machine Learning Algorithms On Cancer Dataset
10 pages
Project Group
No ratings yet
Project Group
20 pages
Unit 3
No ratings yet
Unit 3
17 pages
Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
No ratings yet
Li DeepFusion Lidar-Camera Deep Fusion For Multi-Modal 3D Object Detection CVPR 2022 Paper
10 pages
Breast_Cancer_Classification-Group240
No ratings yet
Breast_Cancer_Classification-Group240
4 pages
TRENDS Q1 Lesson 2
No ratings yet
TRENDS Q1 Lesson 2
38 pages
Can Ensemble of Classifiers Provide Better Recognition Results in Packaging Activity Draft
No ratings yet
Can Ensemble of Classifiers Provide Better Recognition Results in Packaging Activity Draft
13 pages
Steps of Implementation of A GLM
No ratings yet
Steps of Implementation of A GLM
8 pages
Kidney Disease Detection PPT
No ratings yet
Kidney Disease Detection PPT
11 pages
Ijdns 2024 7
No ratings yet
Ijdns 2024 7
14 pages
DMML
No ratings yet
DMML
65 pages
(IJCST-V12I2P14) :asst. Prof. Neethi Narayanan, Kalyani V Nair, Sreelakshmi A S, Sreelekha A, Harsha T K
No ratings yet
(IJCST-V12I2P14) :asst. Prof. Neethi Narayanan, Kalyani V Nair, Sreelakshmi A S, Sreelekha A, Harsha T K
5 pages
Machine Learning in Python
No ratings yet
Machine Learning in Python
5 pages
Machine Larning
No ratings yet
Machine Larning
14 pages
Capstone Project Report (Digit-Recognition Using CNN)
No ratings yet
Capstone Project Report (Digit-Recognition Using CNN)
11 pages
Module-2_Logistic Regression in Machine Learning
No ratings yet
Module-2_Logistic Regression in Machine Learning
28 pages
State of GPT
No ratings yet
State of GPT
50 pages
Anamoly Detection
No ratings yet
Anamoly Detection
20 pages
Name Indexing in Indonesian Translation of Hadith Using Named Entity Recognition With Naïve Bayes Classifier
No ratings yet
Name Indexing in Indonesian Translation of Hadith Using Named Entity Recognition With Naïve Bayes Classifier
8 pages