SSRN Id4107251

International Journal of Engineering and Management Research e-ISSN: 2250-0758 | p-ISSN: 2394-6962
Volume-12, Issue-2 (April 2022)

www.ijemr.net https://doi.org/10.31033/ijemr.12.2.1
Automatic Street Lighting System with Vehicle Detection using Deep-

Learning Based Remote Sensing
Akram Muhammad Shoaib1, Chen Liwen2, Aafaq Saleem3 and Hidayat Ullah4
1
Ubiquitous Perception and Multi-Sensor Intelligent Fusion Research Institute, Fujian University of Technology, Fuzhou,
CHINA
2
CHINA
3
Information Technology Department, University of Education, Lahore, PAKISTAN
4
CHINA
1
Corresponding Author: shoaibakram25@yahoo.com
ABSTRACT interrelating with environment to gather and process the

Automated street lightning is advantageous to data. Frequent satellites everywhere reinvigorated with
society as it decreases the rate of accidents, vandalism and latest image capturing technologies to conjecture the
street crimes. The ability to detect vehicles and smartly facts accessible. For examining and understanding
manage the street light system is among the major duties of satellite images for precise mission in automatic method
electrical distribution companies. To recognize an objects
to discernments the material obligatory for observation,
of interest in an image by a classical technique called object
detection. In order to recover the enactment and reduce the sanctuary and many other tasks. The recognition of
complexity of object detection, numerous computer vision graphic manifestations of convinced programs in images
methods have been proposed over the past decade years. is deal with the object detection, which is one of the
Object detection has a wide variety of applications essential computer visualization chores. Deep learning
including vehicle detection especially in remote sensing has helped enormously to perform such tasks. In this
applications. Vigorous and accurate detection of vehicles study, for automation of street lights deep learning
for such solicitations is a moderately stimulating problem approaches in computer visualization are reconnoitered.
because of discrepancy of color, size, aspect ratio and This study also aims to attempt to connect research work
alignment of vehicles and complex backgrounds of satellite
with real life application by exploiting the region and
images. Nevertheless, the modern deep learning-based
object detection frameworks and convolutional neural regression based deep neural models. Computer vision
networks have great potential for improving the refers to the progression of instruction the computer to
performance of vehicle detection methods in terms of imitator hominids for convinced errands.
exactness, sturdiness and detection time. This paper will To achieve mission such as object detection,
review both conventional and modern object detection owing to the advancement in computing powers and
systems. Assortment of the supreme appropriate set of ongoing breakthroughs in the field of technology
object sensor, mainstay story abstraction system, train-test numerous algorithms have been developed that help
proportion and hyper-parameters for recognition of computers. In military and homeland surveillance,
automobiles for nifty street light system are studied in
identifying objectives in high-resolution imaginings have
detail and compared by deep learning based object
detectors Single shot multibox detector (SSD) and You extensive uses, which demand high-end accuracy. For
Only Look Once (YOLO) and feature abstraction systems street light scheme solicitation, objects transpire in
counting Alexnet, VGG-16, Resnet-18, Resnet-50. As manifold alignments, have great presence disparity and
between accuracy and detection time these methods offer in toting their facet ratio vagaries with their positioning.
several trade-off options. On mean average precision Vehicles may occur in varying sizes and shapes like
(mAP), precision vs, recall curves, computational other fixed objects, which adds complexity to problem. It
complexity and time complexity is established by relative come to be quite problematic for computers to
investigation. discriminate and distinguish automobiles from the
background without sufficient training samples. Certain
Keywords-- Vehicle Detection, Smart Light System, algorithms have been developed by the researchers over
Remote Sensing, Computer Vision, Deep Learning, the years for this problem. A study is conceded out
Convolutional Neural Networks, YOLO hereby in accumulation to progress a dependable real
time automobile recognition system that discourses all
the said issues to study architectures in terms of speed
I. INTRODUCTION and accuracy. This paper provides a comparative
analysis of some of the well-known object detectors
Over the recent years, rapid transition has been based on CNN when practical to the delinquent of
observed in research in computer vision and machine vehicle detection in for remote sensing. The comparison
learning. By massive amount of machineries
1 This Work is licensed under Creative Commons Attribution 4.0 International License.
Electronic copy available at: https://ssrn.com/abstract=4107251

is based on model precision, time complexity, and targets. In object detection problems the manual
computational complexity. The key influences of this extraction of features from objects has many limitations.
study are given below. To detect objects and are characterized by
1) A detailed comparison of the well-known object automatic feature extraction, the contemporary deep
detection techniques for vehicle detection is presented. learning approaches performing well. LeNet is amongst
Among the pre-trained networks considered, Resnet the revolutionary mechanism connected to CNNs by
architectures achieved highest precision. LeCun et al. [10], Krizhevsky et al. [11] is designed. In
2) Performance comparison of SSD and YOLOv2 at ImageNet Bulky Scale Visual Recognition Challenge
different train test ratios shows that YOLOv2 is robust (ILSVRC) 2012, the AlexNet drastically reduced the
with limited training samples, as is the case in the dataset error [12]. In many computer vision difficulties such as
used in this paper. Moreover, the precision of YOLOv2 facial appearance appreciation [13], image cataloguing
degrades drastically as compared to that of SSD when [14], independent pouring [15], medical analysis [16],
training samples are reduced. computational forensics [17] and graphic tracking [18],
The rest of this paper is organized as follows. the state-of-the-art CNNs have gained significant
Section 2 presents the literature review on object attention. Deep learning has also been employed in
detection and vehicle detection. It also describes a remote sensing and satellite image examination for target
comparison of object detectors and pre-trained CNNs. detection [19]–[21], and aircraft detection [20], [22].
Section 3 presents the proposed methodology adopted. Using deep neural networks provides fast and accurate
Experiments and results are presented in Section 4 and means to predict the location and size of the object in an
lastly conclusions are drawn. image object detection.
In recent years, region-based object detection
II. LITERATURE REVIEW techniques including RCNN, Fast RCNN, and Faster
RCNN have gained much popularity in computer vision.
In this section, we review the conventional and These approaches divide the framework of object
state of the art object detection techniques and their detection into two stages. The initial stage deals with the
applications in remote sensing applications. generation of region proposals which may include an
2.1 Traditional Techniques object. The second stage takes into account the
Template-based, knowledge-based, object- classification for the objects proposed in the first stage
based image analysis (OBIA) and machine learning- and the fine tuning of bounding box coordinates. Faster
based object detection methods can be broadly RCNN is a popular region-based object detection
categorized in traditional object detection methods [1]. technique in many applications. Single-stage methods
For object detection in the past template matching has simplify the object detection task by modelling detection
been deployed [2]. Knowledge-based methods include as a regression problem. In comparison to region-based
previous information of the some ground rules, object’s methods, regression-based methods are simpler and
shape and geometry. For object detection Geometric more efficient. YOLO is a common regression-based
information of the target objects has been widely used object detection method, which uses a single backbone
[3]. These methods do not always have a well-versed CNN to straightly predict bounding boxes and class
algorithm to define prior knowledge. Additionally, the possibility from the image in one assessment. It divides
loose criteria can lead to an unacceptable number of the image into grids and for each grid cell, it predicts
false positives in complex scenarios [2]. OBIA methods bounding boxes with their class probabilities. Although
group similar pixels into segments prior to classification YOLO has been shown to yield real-time object
and exploit the properties of scale, shape and detection in many applications, fast object detectors with
neighborhood information [4]. Machine learning-based improved accuracy has remained a topic of interest to the
methods make use of handcrafted features such as Haar research community [23].
[5] and Local Binary Pattern [6] together with classifiers In this research article we have analyzed by
such as SVM [7] and Adaboost [8]. The performance of deep learning based techniques how to save the power
traditional approaches is limited for complex scenes in which is one of major problem all over the world. There
satellite image analysis even though the worthy is no system which when vehicles come and the street
enactment attained by the conventional practices in lights glow with full intensity and whenever vehicles
numerous computer vision applications [9]. pass through that area the lights glow with less intensity
2.2 Deep Learning-based Techniques and turn out as the model is shown in figures 1, 2 and 3
For remote sensing applications is still a in a sequence. We are going to construct an effectual
challenging problem even though the obtainability of street lighting system which safe power by taking
several computer vision performances in the literature, guideline from this study.
accurate and fast detection of vehicles and potential

Figure 1: An overview of the proposed methodology for smart light system using deep learning based object detection
Figure 2: Presence of objects (cars) glows street light with full intensity
Figure 3: Presence of cars cause glow of light with full intensity and where there is no car the intensity of light glowing is
less and then light turn off
III. FRAMEWORK ADOPTED following deep learning-based object detectors are

implemented to detect vehicles to automate street lights.
3.1 Deep Learning-based Object Detectors 1) YOLO: YOLO brought a paradigm shift in
The detection of objects such as ship, aircraft the field of object detection being the first CNN-based
and vehicles [24], from a satellite image is a challenging method to frame object detection as a regression problem.
task due to the small sizes and complex neighborhood Redmon et al. [25] redefined object detection as a single
environment of objects. Deep learning techniques have regression problem, from pixels to bounding box
performed well for such tasks. In this study, the synchronizes and class prospects. The notion of
preceding methodologies concerning object suggestions,

detection and confirmation is uninhibited and a single 2) VGG: it is used smaller convolutional filters
neural network is practical to the image. YOLO miens at compared to Alexnet and achieved better performance
an image and envisages the attendance and position of on the ImageNet database. In 2014, Oxford’s Visual
the objects of interest. Although YOLO has achieved Geometry Group (VGG) proposed the VGG model [28],
detection performance comparable to the region-based a 16-19 layers deep CNN.
methods with reduced detection time, it still struggles 3) Resnet: He et al. [29] planned the deep
with the detection of small-sized objects that appear in lingering networks (Resnet), is considerably profounder
groups and with varying aspect ratios of objects in the (up to 152 layers) than those beforehand used and a new
image [26]. To overcome this problem, improved YOLO sort of convolutional neural system construction. To ease
detectors have been proposed recently. In this paper, the training of the network by using residual or skip
YOLOv2 has been used. connections ResNet. In 2015, comprising ImageNet
2) SSD: Single Shot Multibox Detector (SSD) detection and COCO detection ResNet won numerous
[27] is single deep neural architecture used for object computer vision competitions.
detection. SSD divides the space of image into a grid of
boxes on varying scale and for multiple aspect ratios. IV. EXPERIMENTS AND RESULTS
The network computes the confidence scores for each
object label for the bounding boxes. These bounding 4.1 Platform Specifications
boxes are further refined to suit the shape of object. In The research is instigated by means of a
addition, SSD is known for unifying feature maps peculiar computer on Windows 10 operating system with
produced varying sizes. Intel i3 CPU of 3.3 GHz and NVIDIA GTX 1050TI
3.2 Backbone Networks GPU. MATLAB 2019A is the main software tool. By
As discussed earlier, the deep learning-based using pre-existing MATLAB tools, the object detectors
object detectors require a backbone CNN for feature and evaluators are implemented.
extraction. We use and compare the following well- 4.2 Dataset Specifications
known pre-trained CNN backbone networks. In the figure 4, the proposed vehicle detection
1) Alexnet: Alexnet [11], initiated the system is evaluated on dataset. The dataset include
revolution in computer vision by winning the ImageNet vehicle imagery through numerous decorations and
ILSVRC2012 competition by a large margin and is an placements. Along with the appearance of the vehicles in
eight-layer deep CNN was the first deep learning model. the images the resolution of the images varies.
Figure 4: A few samples from the Caltech Car Dataset
4.3 Evaluation Metrics ratio of the overlying part of ground truth and predicted
To estimate and compare the accuracy of different bounding box to the union of the area as shown in
object detectors, mean average precision (mAP) is used equation (1). If the estimated IoU is greater than a
in this paper. In particular, mAP, as defined in Pascal threshold, the detected object is considered a true
VOC 2011 Challenge [30], is used. Therefore, in the first positive (TP) detection, otherwise, it is a false positive
step, the Intersection over Union (IoU) for each (FP). In this paper, the threshold value = 0:5 was used to
bounding box, detected with a confidence score greater compute the results.
than a threshold, α, and ground truth is computed. Denoting the actual number of vehicle instances
Assume A represents the detected bounding box and B in the image as N, precision and recall are defined in
denotes the ground truth bounding box. The IoU is the equations (2) and (3), respectively. As might be expected,

selecting a smaller threshold α over the confidence score Figure 5. In general, mAP represents the area under the
would improve recall but may result in reduced precision. precision recall curve obtained by varying the threshold
This behavior can be observed in the precision-recall α.
curves obtained by varying the threshold α, as shown in
Figure 5: Performance of SSD and YOLOv2 architectures on varying train test ratios
4.4 Hyper-parameters training from scratch. A suitable initial learn rate is

The hyper-parameters of CNN can be tuned to chosen while observing the higher false alarm rate for
optimally train network with the best choice of greater values. Anchor boxes are important parameters to
parameters, such as, optimizer function, number of tune SSD and YOLOv2 object detectors. The shape,
epochs and learn rate. Epoch corresponds to the full pass scale, and the number of anchor boxes impact the
of the data to the architecture. Usually, the values taken efficiency and precision of the detectors. The value for
for optimizer are kept large enough to reach minimum the box offsets varies depending on the dataset. Anchor
loss for the system, but not too large to avoid over fitting. boxes have been estimated using K-Medoids clustering
In our experiments, the Stochastic Gradient Descent [31] algorithm. The algorithm takes the whole pass of
Method (SGDM) turned out to be a better optimizer than the dataset and ground truth boxes as input and returns a
ADAM. SGDM usually performs well for transfer set of boxes. The specifications for the network
learning. However, ADAM performs better when parameters used in the study are given in Table 1.
Table 1. Network hyper-parameters for YOLOV2 and SSD architectures.

Sr. No. SSD YOLO
1 Mini Batch Size 16 16
2 Initial Learning Rate le-1 le-3
3 Learn Rate Schedule piece-wise __
4 Lear Rate Drop Period 30 __
5 Lear Rate Drop Factor 0.8 __
6 Max Epochs 300 50

7 Optimizer SGDM SGDM
4.5 Results AlexNet to Resnet- 50 architecture. However, going

In general, results show that the mAP value further deeper results in deteriorating performance as can
improves when increasing the depth of the backbone be seen from the mAP results for, Inception-v2 and
network. As a result, mAP increases when going from EfficientNet-B0. This is because object detectors

typically utilize the features at the end of the backbone While evaluating different detectors and feature
architecture to detect objects. Going very deep can result extractors the calculation period for object recognition is
in result in feature maps with very low resolution due to also an energetic influence that is deliberated. Figure 6,
which the object detection performance can deteriorate. shows for automobile detection calculated on a single
Faster RCNN achieved the highest mAP of 0.80 using GPU for the given dataset the average detection time of
Efficientnet-B0 as the backbone feature extraction each deep learning-based object detector and each
network and also outperformed the rest of the detectors. backbone feature extractor.
Figure 6: mAP Values for SSD and Yolo architecture using various backbones
Results show that both YOLOv2 and v3 take proposal algorithm. Table 2, further shows a comparison
similar time to detect vehicle, on average. Furthermore, of YOLOv3 and SSD in terms of accuracy and
these object detectors outperform region proposal based computational complexity for vehicle detection using
object detectors by at least a factor of five. On the other Efficient Net B0 as backbone. Although SSD provides
end, RCNN turns out to be the slowest, with an average faster detection, it does not perform as well in mAP.
detection time of 47 seconds, due to its complex region
Table 2: MAP values for the SSD and YOLOV2 object detections using various features extraction network
Sr. No. Backbone/Architecture SSD YOLO
1 Alexnet 73.2 66.4
2 VGG-16 74.1 67.8
3 Resnet 18 76.4 70.1
4 Resnet-50 77.3 69.8
5 Inception v2 80.2 72.3
6 Efficient Net-B0 80.4 74.2
V. CONCLUSION Alexnet, VGG-16, Resnet-18, Resnet-50 are

executed and associated in facet for assortment of the
Wastage of Power around the world is one of the supreme appropriate set of object sensor, backbone
biggest of problem. Generally, the street will glow feature extraction network, train-test ratio and hyper-
continuously for the entire night in almost all the parameters for detection of vehicles for smart street light
countries whether there is vehicle present or not. A system and include Deep learning based object detectors
system was proposed in which the street light glows with Single shot multibox detector (SSD) and You Only Look
full intensity whenever the vehicle comes near to that Once (YOLO) and feature extraction networks. Between
light and after moving away from light the street light accuracy and detection time these methods offer several
will be glow with very less intensity and finally turn off trade-off options. The relative investigation is
there will be no vehicles. Power Efficient Street lighting constructed on mean average precision (mAP), precision
system can be constructed by using this model. vs., recall curves, computational complexity and time
First, in a single step a deep learning model complexity.
achieves traffic light detection and cataloging of state.
Consequently, previous charts are used to select only REFERENCES
applicable traffic lights from the projected detections,
filtering out false positives as well. Moreover, a new [1] Cheng G & Han J. (2016). A survey on object
technique for generating previous diagrams is anticipated. detection in optical remote sensing images. ISPRS

Journal of Photogrammetry and Remote Sensing, 117, [17] Khan MJ, Khurshid K & Shafait F. (2019). A
11–28. spatiospectral hybrid convolutional architecture for
[2] Weber J & Lef`evre S. (2012). Spatial and spectral hyper-spectral document authentication. In:
morphological template matching. Image and Vision International Conference on Document Analysis and
Computing, 30(12), 934–945. Recognition (ICDAR), pp. 1097–1102.
[3] Huertas A & Nevatia, R. (1988). Detecting buildings [18] Yuan Y, Chu J, Leng L, Miao J & Kim BJ. (2020).
in aerial images. Computer Vision, Graphics, and Image A scale-adaptive object-tracking algorithm with
Processing, 41(2), 131–152. occlusion detection. EURASIP Journal on Image and
[4] Blaschke, T. (2010). Object based image analysis for Video Processing, 2020(1), 1–15.
remote sensing. ISPRS Journal of Photogrammetry and [19] Ding P, Zhang Y, Deng WJ, Jia P & Kuijper A.
Remote Sensing, 65(1), 2–16. (2018). A light and faster regional convolutional neural
[5] Lienhart R & Maydt J. (2002). An extended set of network for object detection in optical remote sensing
haarlike features for rapid object detection. In: images. ISPRS Journal of Photogrammetry and Remote
Proceedings International Conference on Image Sensing, 141, 208–218.
Processing, 1, I–I. [20] Zhang F, Du B, Zhang L & Xu M. (2016). Weakly
[6] Cheng J. Han J, Guo L & Liu, T. (2015). Learning supervised learning based on coupled convolutional
coarseto- fine sparselets for efficient object detection and neural networks for aircraft detection. IEEE Transactions
scene classification. In: Proceedings of the IEEE on Geoscience and Remote Sensing, 54(9), 5553–5563.
Conference on Computer Vision and Pattern [21] Long Y, Gong Y, Xiao Z & Liu Q. (2017). Accurate
Recognition, pp. 1173–1181. object localization in remote sensing images based on
[7] Inglada J. (2007). Automatic recognition of man- convolutional neural networks. IEEE Transactions on
made objects in high resolution optical remote sensing Geoscience and Remote Sensing. 55(5), 2486–2498.
images by svm classification of geometric image [22] Khan MJ, Yousaf A, Javed N, Nadeem S &
features. ISPRS Journal of Photogrammetry and Remote Khurshid K. (2017). Automatic target detection in
Sensing, 62(3), 236–248. satellite images using deep learning. Journal of Space
[8] Grabner H, Nguyen TT, Gruber B & Bischof H. Technology, 7(1), 44–49.
(2008). On-line boosting-based car detection from aerial [23] Tan M, Pang R & Le QV. (2020). Efficientdet:
images. ISPRS Journal of Photogrammetry and Remote Scalable and efficient object detection. In: Proceedings
Sensing, 63(3), 382–396. of the IEEE/CVF Conference on Computer Vision and
[9] Zou Z, Shi Z, Guo Y & Ye Y. (2019). Object Pattern Recognition, pp. 10781–10790.
detection in 20 years: A survey. arXiv preprint arXiv: [24] Jiang Q, Cao L, Cheng M, Wang M & Li J. (2015).
2019 1905.05055. Deep neural networks-based vehicle detection in satellite
[10] LeCun Y, Bottou L, Bengio Y & Haffner P. (1998). images. In: International Symposium on Bioelectronics
Gradient-based learning applied to document recognition. and Bioinformatics (ISBB), pp. 184 187.
Proceedings of the IEEE., 86(11), 2278–2324. [25] Redmon J, Divvala S, Girshick S & Farhadi A.
[11] Krizhevsky A, Sutskever I & Hinton GE. (2012). (2016). You only look once: Unified, real-time object
Imagenet classification with deep convolutional neural detection. In: Proceedings of the IEEE Conference on
networks. In: Advances in Neural Information Computer Vision and Pattern Recognition, pp. 779–788.
Processing Systems, pp. 1097–1105. [26] Garg D, Goel P, Pandya S, Ganatra A & Kotecha K.
[12] Russakovsky O, Deng J, Su H, Krause J, Satheesh S. (2019). A deep learning approach for face detection
Ma S, Huang Z, Karpathy A, Khosla A, Bernstein M, using yolo. In: IEEE Punecon, pp. 1–4.
Berg MA & Fei-Fei L. (2015). ImageNet large scale [27] Liu W, Anguelov D, Erhan D, Szegedy C, Reed S,
visual recognition challenge. International Journal of Fu CY & Berg AC. (2016). SSD: Single shot multibox
Computer Vision (IJCV), 115(3), 211–252. detector. In: European Conference on Computer Vision,
[13] Jeong D, Kim BG & Dong SY. (2020). Deep joint pp. 21–37.
spatiotemporal network (djstn) for efficient facial [28] Simonyan K & Zisserman A. (2014). Very deep
expression recognition. Sensors, 20(7), 1936. convolutional networks for large-scale image
[14] Morgan DA. (2015). Deep convolutional neural recognition. arXiv Preprint arXiv: 1409.1556.
networks for atr from sar imagery. In: Algorithms for [29] He K, Zhang X, Ren S & Sun J. (2016). Deep
Synthetic Aperture Radar Imagery, 9475, pp. 94750F. residual learning for image recognition. In: Proceedings
[15] Khan MJ, Khan HS, Yousaf A, Khurshid K & of the IEEE Conference on Computer Vision and Pattern
Abbas A. (2018). Modern trends in hyperspectral image Recognition, pp. 770–778.
analysis: A review. IEEE Access, 6, 14118– 14129. [30] Everingham M, Van L, Gool, Williams CK, Winn J
[16] Ahmad HM, Khan MJ, Yousaf A, Ghuffar S & & Zisserman A. (2010). The pascal visual object classes
Khurshid S. (2020). Deep learning: A breakthrough in (voc) challenge. International Journal of Computer
medical imaging. Current Medical Imaging, 16(8), 946– Vision, 88(2), 303–338.
956. [31] Jin X & Han J. (2010). K-medoids clustering.

SSRN Id4107251

Uploaded by

Copyright:

Available Formats

SSRN Id4107251

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

SSRN Id4107251

Uploaded by

Copyright:

Available Formats

International Journal of Engineering and Management Research e-ISSN: 2250-0758 | p-ISSN: 2394-6962

Volume-12, Issue-2 (April 2022)

Automatic Street Lighting System with Vehicle Detection using Deep-

ABSTRACT interrelating with environment to gather and process the

Electronic copy available at: https://ssrn.com/abstract=4107251

Electronic copy available at: https://ssrn.com/abstract=4107251

III. FRAMEWORK ADOPTED following deep learning-based object detectors are

Electronic copy available at: https://ssrn.com/abstract=4107251

Figure 4: A few samples from the Caltech Car Dataset

Electronic copy available at: https://ssrn.com/abstract=4107251

4.4 Hyper-parameters training from scratch. A suitable initial learn rate is

Table 1. Network hyper-parameters for YOLOV2 and SSD architectures.

4 Lear Rate Drop Period 30 __

5 Lear Rate Drop Factor 0.8 __

6 Max Epochs 300 50

4.5 Results AlexNet to Resnet- 50 architecture. However, going

Electronic copy available at: https://ssrn.com/abstract=4107251

V. CONCLUSION Alexnet, VGG-16, Resnet-18, Resnet-50 are

Electronic copy available at: https://ssrn.com/abstract=4107251

Electronic copy available at: https://ssrn.com/abstract=4107251

You might also like