Fire Hotspots Detection System On CCTV Videos Using You Only Look Once (YOLO) Method and Tiny YOLO Model For High Buildings Evacuation

2019 2nd International Conference of Computer and Informatics Engineering (IC2IE)
Fire Hotspots Detection System on CCTV Videos

Using You Only Look Once (YOLO) Method and
Tiny YOLO Model for High Buildings Evacuation
Dewi Putrie Lestari, Rifki Kosasih, Tri Handhika, Murni, Ilmiyati Sari, Achmad Fahrurozi
Computational Mathematics Study Center
Gunadarma University
Depok, Indonesia
dewi_putrie@staff.gunadarma.ac.id, rifki_kosasih@staff.gunadarma.ac.id, trihandika@staff.gunadarma.ac.id,
murnipskm@staff.gunadarma.ac.id, ilmiyati@staff.gunadarma.ac.id, achmad_fahrurozi@staff.gunadarma.ac.id
Abstract—Fire is one of the disasters in high buildings that to help supervisors in analyzing CCTV video results, even
often leads to many material losses and casualties. In general, detecting fire hotspots. Therefore, in this research, we
material and nonmaterial loss of fire incidents can be minimized propose image-based detection using You Only Look Once
by solving it quickly. To minimize the extent of the fire area, we (YOLO) method to detect fire hotspots on CCTV videos.
need technology to detect the existence of fire hotspots before
fires become widely. At first, the fire early detection system uses The YOLO method is one of the artificial intelligence
a sensor, but many sensors cannot stand fire. Therefore, another methods that detect an object without having to reclassify. In
method needed that can monitor an area in the building from a one evaluation, the YOLO method uses a neural network to
distance. In this study, CCTV cameras were used to see whether recognize the objects by framing the object that will be
there was a fire hotspot or not. As additional technology, we use detected [2].
artificial intelligence to analyze the results of CCTV. We
propose the You Only Look Once (YOLO) method to detect fire Several studies on fire detection have been carried out,
hotspots on CCTV videos. In this study, the YOLO method can such as research was conducted by Xu et al. [3] about
recognize fire hotspots with an average value of accuracy is detecting smoke using synthetic smoke images. The first step
90%. of this research is to create a synthesis of pipe and simulation
of smoke with various conditions. The second step is to
Keywords— fire hotspots, You Only Look Once (YOLO), divide the dataset into real smoke and no smoke. In non-
CCTV videos, evacuation smoke testing, it has a strong interference with smoke
recognition so that it causes false alarms.
I. INTRODUCTION
Another study was conducted by Appana et al. [4] about
Fire is one of the disasters in high buildings that often lead
smoke detection uses a pattern of smoke flow for the alarm
to many casualties. In general, material and non-material loss
system. In this study, there are three important parameters in
of fire incidents can minimized by solving it quickly. Fire
designing smoke detection systems, i.e., diffusion, color, and
incidents in a building usually caused by several factors such
blur. The first stage carried out in this study was analyzing
as electrical short circuits, putting objects containing the fire,
colors, then extracting features using the Gabor Filtering and
and throwing cigarettes carelessly. When these factors occur,
Spatial-Temporal Energy Analysis methods to obtain feature
there is a possibility that a fire will grow and expand.
vectors. The last stage is classified smoke types with Support
Therefore, to minimize the extent of the fire area, we need
Vector Machine (SVM).
technology to detect the existence of fire hotspots before fires
become widely. The fire hotspots detection system is The next study was conducted by Hendri [5] about the
important because it helps firefighters in determining the detection of forest fires using the Convolutional Neural
location that must sprayed, so it can accelerate evacuation Network (CNN) method. The first stage of this method is to
time. classify objects that will be detected. The results of testing by
the CNN method show that this method in the detection of
In recent years, several methods or technologies for a fire
fire objects has an accuracy rate of about 54%. However, this
early detection applied. One of them, a fire detection system
method used a classification that has many disadvantages.
by detecting smoke and fire in the room using sensors. The
One is the classification can recognize an object but cannot
system is limited by the detection area and does not provide
know the exact location of the object in the image. Therefore,
information on how much the fire occurred. Another
in this study, we propose the YOLO method without using
disadvantage is that when the fire burns up, the sensors
classification.
installed in the building burned and damaged [1]. Therefore,
a safer detection system is needed to detect fires from long II. LITERATURE REVIEW
distances and can monitor a wider area so that losses from
fires can minimized. The YOLO method is a very different approach to the
previous algorithm. The previous algorithm, such as
Monitoring an area in high buildings can be done through Convolutional Neural Network (CNN), uses classification or
videos from CCTV. However, surveillance via CCTV is not localizer to carry out detection by applying the model to
efficient because the supervisors must monitor the CCTV all images in several locations, scaling and assigning values to
day. So an additional system is needed that can monitor the the image as material for detection [6, 7].
area on CCTV. Currently, artificial intelligence can be used
978-1-7281-2384-4/19/$31.00 ©2019 IEEE 87 10-11 September, Indonesia-Banyuwangi, East Java

௧௥௨௧௛
The YOLO method can detect objects in an image at 45 ሺ‫ݏݏ݈ܽܥ‬௜ ȁܱܾ݆݁ܿ‫ݐ‬ሻ ‫ כ‬ሺܱܾ݆݁ܿ‫ݐ‬ሻ ‫ܷܱܫ כ‬௣௥௘ௗ௜௖௧ ൌ
௧௥௨௧௛
FPS [2, 8]. The YOLO method can learn object recognition ሺ‫ݏݏ݈ܽܥ‬௜ ሻ ‫ܷܱܫ כ‬௣௥௘ௗ௜௖௧ (2)
in general compared to other methods such as R-CNN [2].
The YOLO method uses a Neural Network to recognize In the next stage, a bounding box will be selected with the
an object in the image. n the network, the image will be highest probability value to be used as a separator of one
divided into an SxS grid [9]. After that, we define the object with another object, as shown in Fig. 1.
bounding box (B) on each grid, and the bounding box has a
confidence value. Confidence value represents the
probability of the object is in a bounding box that defined as
(1):
௧௥௨௧௛
‫ ܸܥ‬ൌ ሺܱܾ݆݁ܿ‫ݐ‬ሻ ‫ܷܱܫ כ‬௣௥௘ௗ௜௖௧ (1)
IOU is an intersection over the union. The intersection is

the area of incision between the bounding box prediction and
the ground truth, while the union is the total area between the
bounding box prediction and Ground truth. IOU has a value
between 0 and 1. If the value of IOU is close to 1, then it
shows that the bounding box is estimated to be close to the
Fig. 1. Detection of an object using the YOLO method [2]
ground truth [10].
After that, we define the class probability for each grid The YOLO method is a modification of Convolutional
that can be seen in (2): Neural Network (CNN) architecture. A network of the YOLO
method has 24 convolutional layers, followed by 2 connected
layers [11], which can be seen in Fig. 2.
Fig. 2. YOLO network (24 convolution) [11]
To see the performance of the model, we use the loss an object. CV represents the confidence value, and ‫݌‬ሺܿሻ
function, which can be seen in (3): represents the prediction from class.
మ ௢௕௝ To improve the center and the bounding box of each
‫ ݏݏ݋ܮ‬ൌ ߣ௖௢௢௥ௗ σ௦௜ୀ଴ σ஽ ଶ ଶ
௝ୀ଴ ‫ܫ‬௜௝ ሾሺ‫ݎ‬௜ െ ‫ݎ‬Ƹ௜ ሻ ൅ ሺ‫ݏ‬௜ െ ‫ݏ‬Ƹ௜ ሻ ሿ ൅ prediction used loss function. The loss function shows the
మ ଶ
௢௕௝ performance of the model, the lower the loss value indicates
ߣ௖௢௢௥ௗ σ௦௜ୀ଴ σ஽
௝ୀ଴ ‫ܫ‬௜௝ ൤ቀඥ‫ݐ‬௜ െ ඥ‫ݐ‬Ƹ௜ ቁ ൅ ൫ඥ‫ݒ‬௜ െ
a higher performance [10].
ଶ మ
ඥ‫ݒ‬ො௜ ൯ ൨ ൅ σ௦௜ୀ଴ σ஽
௢௕௝ ෢ ଶ
௝ୀ଴ ‫ܫ‬௜௝ ൫‫ܸܥ‬௜ െ ‫ܸܥ‬௜ ൯ ൅ The fast version of the YOLO designed to push the
మ ௢௕௝ ଶ
ߣ௡௢௢௕௝ σ௦௜ୀ଴ σ஽ ෢
௝ୀ଴ ‫ܫ‬௜௝ ൫‫ܸܥ‬௜ െ ‫ܸܥ‬௜ ൯ ൅
boundaries of fast object detection [2]. The Fast version of
௦ మ ௢௕௝ YOLO, such as the Tiny YOLO, uses a neural network with
ߣ௖௢௢௥ௗ σ௜ୀ଴ ‫ܫ‬௜ σ௖ఢ௖௟௔௦௦௘௦ሺ‫݌‬௜ ሺܿሻ െ ‫݌‬Ƹ௜ ሺܿሻሻଶ (3) 9 convolutional layers [10] is shown in Fig 3. In Fig. 3 we
can see that the neural network only uses a standard type of
Where ܵ is the size of the grid, ‫ ܤ‬is bounding boxes. layer: convolutional with 3x3 kernels and max-pooling with
Variable ‫ ݎ‬and ‫ ݏ‬are a center of each prediction, variable ‫ݐ‬, 2x2 kernels.
and ‫ ݒ‬is the bounding box dimensions. Variable ߣ௖௢௢௥ௗ used
to increase the probability of a bounding box with an object,
and ߣ௡௢௢௕௝ used to decrease the probability of a box without
88
divided into images. 125 is a Channel for each grid and
contains data for bounding boxes and class predictions.
III. RESEARCH METHODOLOGY

In this study, we propose image-based detection using
You Only Look Once (YOLO) Method and Tiny YOLO
model to detect fire hotspots on CCTV videos. This detection
system is important because it can help firefighters in the
optimal evacuation process. The stages of this research can
be seen in Fig. 4.
In Fig. 4, the first step in detecting fire hotspots on CCTV
video is we extract the video into frames. The frames stored
in the database that divided into two parts, i.e., the training
Fig. 3. The network of the Tiny YOLO (9 convolutions) dataset and the testing dataset. Then we create an image label
for each training data by giving a bounding box and giving
The last convolutional layer has a 1x1 kernel used to shrink the class names to the objects in each image. Furthermore, we
the data to 13x13x125. Size 13 × 13 is the size of the grid create a training model by using the Tiny YOLO model.
Fig. 4. General frameworks for this research
The Tiny YOLO model contains network code and pre- testing data. Because of the value of K=3, we use 3 iterations
training weight of the network that can be used to transfer the with different testing data for each iteration. The testing data
learning process. This process is done to train the model in used in the 1st iteration is the 20 first images from the dataset,
recognizing the fire object based on training data. Then we the 2nd iteration is the 20 next images, and the 3rd iteration
use the loss function to test the training model. The model can is the 20 last images. The training data in the 1st iteration can
recognize an object if the loss value is less than 1. If the loss be seen in Fig. 6.
value is less than 1, this indicates that the model is good for
object detection. The last step is to make a prediction with the B. Labeling Image
training model using the test image data. After creating the training data, the next step is to create
a label for each training data. At this stage, the labeling
A. The Training Dataset
process is done by giving a bounding box and giving the class
The dataset of the YOLO method divided into two parts, names to the objects in each image. This process is called
i.e., the training dataset and the testing dataset, which amount annotation. The result of annotation is data contains
of training data can use a small dataset [12]. In this research, information of location of the bounding box and the label that
we used 60 images of the dataset are 58 images with the fire stored in the form .xml. Annotation can be seen in Fig. 5.
object and 2 images without the fire object. The dataset used
is uncompressed images and have a resolution of 352x262
pixel. In this study, the division of training data and testing
data was performed using K-fold validation with K=3
[13-15]. Based on this method is obtained that the proportion
of dataset used is 40 images of training data and 20 images of
89
C. The Training Model
In conducting a training model, we use modified of the
YOLO method, i.e., the Tiny YOLO model. In this model,
there are two types, i.e., network code and pre-training weight
of the network that can be used to transfer learning.
The next step is to create a text file that contains the class
name (label). In our research, the class name used is fire and
stored in the form of a .txt file. The Tiny YOLO model is
based on the Darknet reference network and is much faster
than the YOLO model [10].
In this study, the training model is aimed to create the
model to learn of fire object that it wants to detect. This
learning process is called transfer learning. Labeled images
entered into the Tiny YOLO Voc model for recognition. The
transfer learning process requires entries such as learning
rate, batch size, and epoch.
Fig. 5. Labeling image into form xml
Fig. 6. The training dataset in the 1st iteration (24 of 40 images of training dataset with the fire object)
The batch size is a term used in transfer learning and E. The Evaluation of Model Performance
refers to the number of training in one iteration. The epoch is In this study, to see the model can detect fire objects well
a parameter that determines how many times the learning or to evaluate the model performance, we also use some
algorithm will work against the entire training dataset [16]. indicators such as precision, recall, and accuracy. The formula
The learning rate is the number of changes to the model of them can be seen in (4), (5), and (6):
during each step of this search process. The learning rate can
control a neural network model that learns object detection ୘୔
[16]. ൌ ൈ ͳͲͲΨ (4)
୘୔ା୊୔
The training model process will stop when the average ୘୔

loss value is less than 1. This process produces a new model ൌ ൈ ͳͲͲ (5)
୘୔ା୊୒
that has recognized the fire object. After that, the prediction
୘୔ା୘୒
of the new model results made by using the test dataset. ൌ ൈ ͳͲͲ (6)
୘୔ା୘୒ା୊୔ା୊୒
D. The Testing Dataset
In the 1st iteration, we used the 20 first images for the test
dataset with 2 images without the fire object and 18 images where TP is True Positive, FP is False Positive, FN is False
with the fire object. The testing dataset was used to test the Negative, and TN is True Negative.
training model that had trained. Every image has a resolution
of 352x262 pixels. The testing dataset used in our research
can be seen in Fig. 7.
90
Fig. 7. The images of the testing dataset in the 1st iteration (15 of 20 images of the testing dataset)
IV. RESULTS AND DISCUSSIONS The results of the training model can be seen in Fig. 8. We
In this section, the results of the prediction explained can see that the average value of loss =
using the training model. The input used 40 training images 0.3131451712598113<1. This indicates that the model has
and 20 test images. The model used is the Tiny YOLO model, good performance. The next step is to detect the fire object in
which is a modification of the YOLO model. We use the the test image using a new model of training. The results of
learning rate at 0.00001, the batch size value is 8, and the this detection can be seen in Fig. 9.
number of epochs is 300.
Fig. 8. The results of the training model
Fig. 9. The results of the detection fire object in the testing dataset in the 1st iteration (15 of 20 images)
91
In Fig. 9, it can be seen that in the 1st iteration, the model [2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
can detect 17 test images with a fire object and can detect 2 once: Unified, real-time object detection,” in Proceedings of The IEEE
Conference on Computer Vision and Pattern Recognition, pp. 779–
images without a fire object. The next stage is to do the 2nd 788, December 2016
iteration and the 3rd iteration. The results of 3 iterations in [3] G. Xu, Y. Zhang, Q. Zhang, G. Lin, and J. Wang, “Deep domain
this research shown in Table I. adaptation based video smoke detection using synthetic smoke
images,” Fire Safety Journal, vol. 93, pp. 53–59, October 2017
TABLE I. THE EVALUATION RESULTS OF MODEL PERFORMANCE [4] D. K. Appana, R. Islam, S. A. Khan, and J. M. Kim, “A video-based
smoke detection using smoke flow pattern and spatial-temporal energy
Iteration 1 Iteration 2 Iteration 3 Average
analyses for alarm systems,” Information Sciences, vol. 418–419, pp.
True Positive 17 16 17 91–101, December 2017
False Positive 0 1 1 [5] M. Hendri, Perancangan sistem deteksi asap dan api menggunakan
False Negative 1 2 1 pemrosesan citra [The design of smoke and fire detection system using
image processing], Thesis, Industrial Technology Faculty, Indonesian
True Negative 2 1 1
Islamic University Yogyakarta, 2018
Precision 100% 94,117% 94,44% 96,185% [6] B. B. Traore, B. K. Foguem, and F. Tangara, “Deep convolution neural
Recall 94,44% 88,89% 94,44% 92,59% network for image recognition,” Ecol. Inform., vol. 48, pp. 257–268.
2018
Accuracy 95% 85% 90% 90%
[7] R. Takahashi, T. Matsubara, and K. Uehara, “A novel weight-shared
multi-stage CNN for scale robustness,” IEEE Transactions on Circuits
Based on Table I, we obtain an average value of precision is and Systems for Video Technology, vol. 29, no. 4, pp. 1090–1101,
96,185%, the average value of recall is 92,59%, and the April 2019
average value of accuracy is 90%. [8] X. Zhao, H. Jia, and Y. Ni, “A novel three-dimensional object detection
with the modified You Only Look Once method,” International Journal
IV. CONCLUSIONS AND FUTURE WORKS of Advanced Robotic Systems, pp. 1–13, April 2018
[9] J. Lu, C. Ma, L. Li, X. Xing, Y. Zhang, Z. Wang, and J. Xu, “A vehicle
In this study, we propose image-based detection using the detection method for aerial image based on YOLO,” Journal of
YOLO Method and the Tiny YOLO model to detect fire Computer and Communications, vol. 6, pp. 98–107, November 2018
hotspots on CCTV videos. This detection system is important [10] R. Huang, J. Pedoeem, and C. Chen, “YOLO-LITE: A real-time object
because it can help firefighters in the optimal evacuation detection algorithm optimized for non-GPU computers,” IEEE
process. The results of testing by using the YOLO method International Conference on Big Data, November 2018
show an average value of accuracy is 90%, and the average [11] S. Shindea, A. Kotharia, and V. Gupta, “YOLO based human action
recognition and localization,” Procedia Computer Science, vol. 133,
value of the loss is 0.3131451712598113<1. This indicates pp. 831–838, 2018
that the model has a good performance to detect fire objects.
[12] G. Li, Z. Song, and Q. Fu, “A new method of image detection for small
For the next experimental research, we will add sample of datasets under the framework of YOLO network,” IEEE 3rd Advanced
training and use GPU for getting higher performance. Information Technology, Electronic and Automation Control
Conference (IAEAC), October 2018
ACKNOWLEDGMENT [13] R. C. Sharma, K. Hara, and H Hirayama, “A machine learning and
This research was funded by National Competitive cross-validation approach for the discrimination of vegetation
physiognomic types using satellite-based multispectral and
Research Grant 2019 in Penelitian Terapan (PT) scheme, multitemporal data,” Scientifica, vol. 2017, pp. 1–8, June 2017
Direktorat Riset dan Pengabdian Masyarakat, Direktorat [14] Y. Jung, and J. Hu, “A k-fold averaging cross-validation procedure,”
Jenderal Penguatan Riset dan Pengembangan Kementerian Journal of Nonparametric Statistics, vol. 27, no. 2, pp. 1–13, February
Riset, Teknologi, dan Pendidikan Tinggi (DRPM 2015
KEMENRISTEKDIKTI) Indonesia to the Gunadarma [15] R. Kohavi. “A study of cross-validation and bootstrap for accuracy
University No. 15/AKM/PNT/2019. estimation and model selection,” International Joint Conference on
Artificial Intelligence, vol. 14, no. 12, pp. 1137–1143, 1995
REFERENCES [16] J. Brownlee, Deep Learning With Python, Machine Learning Mastery,
2017
[1] S. J. Chen, D. C., Hovde, K. A. Peterson, and A. W. Marshall, "Fire
detection using smoke and gas sensors,” Fire Safety Journal, vol. 42,
no. 8, pp. 507–515, November 2007
92

Fire Hotspots Detection System On CCTV Videos Using You Only Look Once (YOLO) Method and Tiny YOLO Model For High Buildings Evacuation

Uploaded by

Copyright:

Available Formats

Fire Hotspots Detection System On CCTV Videos Using You Only Look Once (YOLO) Method and Tiny YOLO Model For High Buildings Evacuation

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fire Hotspots Detection System On CCTV Videos Using You Only Look Once (YOLO) Method and Tiny YOLO Model For High Buildings Evacuation

Uploaded by

Copyright:

Available Formats

2019 2nd International Conference of Computer and Informatics Engineering (IC2IE)

Fire Hotspots Detection System on CCTV Videos

978-1-7281-2384-4/19/$31.00 ©2019 IEEE 87 10-11 September, Indonesia-Banyuwangi, East Java

IOU is an intersection over the union. The intersection is

Fig. 2. YOLO network (24 convolution) [11]

III. RESEARCH METHODOLOGY

Fig. 4. General frameworks for this research

The training model process will stop when the average ୘୔

Fig. 8. The results of the training model

You might also like