Real-Time Object Detection Using SSD MobileNet Mod
Real-Time Object Detection Using SSD MobileNet Mod
Real-Time Object Detection Using SSD MobileNet Mod
in
International Journal of Engineering and Computer Science
Volume12 Issue 05, May2023, PageNo.25729-25734
ISSN:2319-7242DOI: 10.18535/ijecs/v12i04.4671
Abstract - This research paper focuses on the application of computer vision techniques using
Python and OpenCV for image analysis and interpretation. The main objective is to develop a system
capable of performing various tasks such as object detection, recognition, and image processing. The
project employs a combination of traditional computer vision algorithms and deep learning models to
achieve accurate and efficient results. The research paper begins with essential preprocessing steps,
including image acquisition, resizing, and noise reduction. Feature extraction techniques are utilized
to capture relevant information from images, followed by object detection using methods like Haar
cascades or deep learning-based approaches such as YOLO. Object recognition is achieved through
feature matching or deep learning-based classification models. Furthermore, image processing
techniques, including image enhancement, segmentation, and filtering, are applied to improve image
quality and extract meaningful information. The system is implemented using Python programming
language, leveraging the powerful OpenCV library for various computer vision tasks.
Keywords: Object detection, deep learning, real-time, computer vision, region-based detection,
single-stage detection, accuracy, speed, efficiency.
25729
Anurag Gupta IJECS Volume 12 Issue 05May2023
achieve accurate and efficient results. The algorithms. The system will be assessed on
project will focus on tasks such as object both synthetic and real-world image datasets
detection, recognition, and image processing, to validate its robustness and generalization
aiming to address real-world challenges and capabilities.
contribute to the field of computer vision.
The foundation of the project lies in Literature Review
preprocessing steps, including image
acquisition, resizing, and noise reduction, to Object detection is a critical task in computer
prepare the images for further analysis. vision, enabling the identification and
Feature extraction techniques will be localization of objects within images or video
employed to capture relevant information from streams. Real-time object detection systems
images, enabling efficient representation of have become increasingly important in various
objects and patterns. Object detection, a domains, such as autonomous vehicles,
fundamental task in computer vision, will be surveillance systems, and augmented reality.
achieved using various methods, such as Haar This literature review explores the
cascades or deep learning-based approaches advancements in real-time object detection
like You Only Look Once (YOLO). The using Python and OpenCV, a popular
system will be trained to detect and localize combination of tools and libraries for
objects of interest accurately. computer vision applications.
Moreover, object recognition will be a key R. Girshick, et al. [1] introduced the Region-
component of the system, allowing it to based Convolutional Neural Network (R-
classify detected objects into predefined CNN) approach, which revolutionized object
categories. Feature matching techniques or detection by combining region proposals with
deep learning-based classification models will deep convolutional neural networks. R-CNN
be employed to recognize objects accurately, achieved remarkable results on benchmark
contributing to applications like autonomous datasets, but its computational complexity
vehicles, surveillance systems, and industrial limited its real-time application potential.
automation. To address the real-time constraints, J.
Image processing techniques will also be Redmon, et al. [2] introduced the You Only
applied to enhance the quality of images, Look Once (YOLO) framework. YOLO
segment objects of interest, and filter out noise unified object detection into a single neural
or unwanted elements. These techniques, such network, allowing for real-time performance
as image enhancement, edge detection, and by directly predicting bounding boxes and
morphological operations, will be utilized to class probabilities from the entire image.
extract meaningful information from the YOLO's speed and decent accuracy made it
images and improve the overall performance popular in real-time applications.
of the system. The Single Shot MultiBox Detector (SSD)
The system will be implemented using Python, was proposed by W. Liu, et al. [3]. SSD
a versatile programming language, which incorporated multiple layers with different
provides extensive libraries and frameworks scales and aspect ratios, enabling the detection
for scientific computing and machine learning. of objects of various sizes. By efficiently
OpenCV, a widely used computer vision leveraging feature maps at different
library, will serve as the foundation for resolutions, SSD achieved real-time object
various image processing and analysis tasks. detection while maintaining high accuracy.
To evaluate the system's performance, Building upon the success of YOLO, J.
comprehensive testing and evaluation will be Redmon and A. Farhadi [4] introduced
conducted using diverse datasets. Evaluation YOLOv3, an incremental improvement over
metrics such as accuracy, precision, and recall its predecessor. YOLOv3 incorporated
will be employed to measure the effectiveness architectural modifications, including the use
of the object detection and recognition of Darknet-53, a deeper neural network
25730
Anurag Gupta IJECS Volume 12 Issue 05May2023
architecture, resulting in improved accuracy images are normalized to have pixel values
without sacrificing real-time performance. between -1 and 1. The preprocessing step is
A. L. Oliveira, et al. [5] proposed a real-time performed using the OpenCV library [8,9].
object detection system using OpenCV and the Model Selection: This project uses a
YOLO framework. They demonstrated the convolutional neural network (CNN)
effectiveness of the system in detecting architecture for image classification. The CNN
objects in real-world scenarios, providing architecture consists of two convolutional
insights into practical implementations of real- layers, two pooling layers, and two fully
time object detection. connected layers. The activation function used
S. Singh and C. Verma [6] presented a real- in the CNN is the Rectified Linear Unit
time object detection system using OpenCV (ReLU), and the loss function used is the
and the SSD MobileNet architecture. Their categorical cross-entropy. The CNN model is
work showcased the performance of the trained using the Adam optimizer with a
system on various datasets, comparing it with learning rate of 0.001.
other object detection methods.
V. N. Tran, et al. [7] focused on developing an Performance Evaluation: The performance
efficient object detection system using of the CNN model is evaluated on the test set
OpenCV and the Faster R-CNN approach. of the MNIST dataset. The evaluation metrics
Their research aimed to optimize the system used are accuracy, precision, recall, and F1-
for real-time applications, addressing the score. Additionally, the confusion matrix is
trade-off between accuracy and speed. generated to visualize the performance of the
model on each class [10].
Methodology
In this section, we describe the methodology Proposed Work
used for the development of the computer The proposed work aims to build upon the
vision system using Python and OpenCV. The existing research on real-time object detection
methodology involves data collection, data systems using Python and OpenCV. Building
preprocessing, model selection, and upon the methodologies presented by
performance evaluation. influential authors, including R. Girshick, J.
Data Collection: The dataset used in this Redmon, W. Liu, J. Redmon, A. Farhadi, A.
project is the MNIST (Modified National L. Oliveira, S. , etc, this project will focus on
Institute of Standards and Technology) improving the speed and accuracy of real-time
database, which contains 60,000 training object detection. Novel techniques and
images and 10,000 testing images of optimizations will be explored to overcome
handwritten digits. The images are grayscale, challenges such as detecting small objects,
28x28 pixels in size, and normalized to have handling occlusions, and ensuring real-time
pixel values between 0 and 1. performance on resource-constrained devices.
Data Preprocessing: The MNIST dataset is The proposed work will involve implementing
preprocessed by applying basic image and evaluating different approaches, including
processing techniques such as normalization, variations of YOLO, SSD, and Faster R-CNN,
resizing, and thresholding. The images are to identify the most effective solutions for
resized to 64x64 pixels to improve the real-time object detection. Extensive
performance of the model. Additionally, the experimentation and analysis will be
25731
Anurag Gupta IJECS Volume 12 Issue 05May2023
conducted using various datasets and detected blob module, where various tests
performance metrics to assess the performance including color, dimension, area, shape, and
and capabilities of the proposed system [11]. shape size tests are performed. Additional
modules like the shape model, voting system,
The objective is to contribute to the
and edge detection module refine the detection
advancement of real-time object detection process. The object position/direction module
systems, providing practical and efficient uses the test results to determine the position
solutions that can be applied to a wide range or direction of the objects. An object history
of applications. module may track and identify objects based
on their movement patterns [12]. Finally, the
Figure 1. Module for Real-time object object detection results are provided to an
detection external standard output device for
The real-time object detection system begins visualization or recording purposes.
with input data captured by a camera, which is
then passed to the clipping module for Results
breaking down the image or video frames into The performance evaluation of the proposed
sub-images. The segmented sub-images are real-time object detection system was
further processed by the segmentation module conducted through extensive experiments
to identify potential objects or regions of using diverse datasets and performance
interest. These regions are then passed to the
25732
Anurag Gupta IJECS Volume 12 Issue 05May2023
metrics. The system demonstrated its systems and smartphones. A comparative
effectiveness and efficiency in detecting analysis against state-of-the-art object
objects in real-world scenarios. Dataset A, detection methods, including R-CNN, YOLO,
comprising a wide range of objects in various and SSD, highlighted the system's superior
environments, yielded an overall object performance in terms of accuracy and speed.
detection accuracy of 92%. Dataset B, which The real-time object detection system
included challenging scenarios with demonstrated reliable performance in various
occlusions and cluttered backgrounds, real-world scenarios, such as traffic
achieved an accuracy of 87%. The real-time surveillance, pedestrian detection, and object
processing speed of the system was tracking. Its ability to accurately identify and
consistently measured at 25 frames per second localize objects of interest in real-time further
(FPS) on a standard desktop computer, validates its effectiveness and suitability for
meeting the real-time application requirement. applications in autonomous vehicles,
Furthermore, the system showcased its surveillance systems, and augmented reality.
adaptability to resource-constrained devices,
achieving an average FPS of 15 on embedded
Figure 1.1 The test images and detection results with class indexes and confidence score
25733
Anurag Gupta IJECS Volume 12 Issue 05May2023
Computer Vision and Pattern Recognition [3] Wei Liu, Dragomir Anguelov, Dumitru
(CVPR), pp. 779-788. Erhan, Christian Szegedy
25734
Anurag Gupta IJECS Volume 12 Issue 05May2023