Fire Hotspots Detection System On CCTV Videos Using You Only Look Once (YOLO) Method and Tiny YOLO Model For High Buildings Evacuation
Fire Hotspots Detection System On CCTV Videos Using You Only Look Once (YOLO) Method and Tiny YOLO Model For High Buildings Evacuation
Fire Hotspots Detection System On CCTV Videos Using You Only Look Once (YOLO) Method and Tiny YOLO Model For High Buildings Evacuation
Abstract—Fire is one of the disasters in high buildings that to help supervisors in analyzing CCTV video results, even
often leads to many material losses and casualties. In general, detecting fire hotspots. Therefore, in this research, we
material and nonmaterial loss of fire incidents can be minimized propose image-based detection using You Only Look Once
by solving it quickly. To minimize the extent of the fire area, we (YOLO) method to detect fire hotspots on CCTV videos.
need technology to detect the existence of fire hotspots before
fires become widely. At first, the fire early detection system uses The YOLO method is one of the artificial intelligence
a sensor, but many sensors cannot stand fire. Therefore, another methods that detect an object without having to reclassify. In
method needed that can monitor an area in the building from a one evaluation, the YOLO method uses a neural network to
distance. In this study, CCTV cameras were used to see whether recognize the objects by framing the object that will be
there was a fire hotspot or not. As additional technology, we use detected [2].
artificial intelligence to analyze the results of CCTV. We
propose the You Only Look Once (YOLO) method to detect fire Several studies on fire detection have been carried out,
hotspots on CCTV videos. In this study, the YOLO method can such as research was conducted by Xu et al. [3] about
recognize fire hotspots with an average value of accuracy is detecting smoke using synthetic smoke images. The first step
90%. of this research is to create a synthesis of pipe and simulation
of smoke with various conditions. The second step is to
Keywords— fire hotspots, You Only Look Once (YOLO), divide the dataset into real smoke and no smoke. In non-
CCTV videos, evacuation smoke testing, it has a strong interference with smoke
recognition so that it causes false alarms.
I. INTRODUCTION
Another study was conducted by Appana et al. [4] about
Fire is one of the disasters in high buildings that often lead
smoke detection uses a pattern of smoke flow for the alarm
to many casualties. In general, material and non-material loss
system. In this study, there are three important parameters in
of fire incidents can minimized by solving it quickly. Fire
designing smoke detection systems, i.e., diffusion, color, and
incidents in a building usually caused by several factors such
blur. The first stage carried out in this study was analyzing
as electrical short circuits, putting objects containing the fire,
colors, then extracting features using the Gabor Filtering and
and throwing cigarettes carelessly. When these factors occur,
Spatial-Temporal Energy Analysis methods to obtain feature
there is a possibility that a fire will grow and expand.
vectors. The last stage is classified smoke types with Support
Therefore, to minimize the extent of the fire area, we need
Vector Machine (SVM).
technology to detect the existence of fire hotspots before fires
become widely. The fire hotspots detection system is The next study was conducted by Hendri [5] about the
important because it helps firefighters in determining the detection of forest fires using the Convolutional Neural
location that must sprayed, so it can accelerate evacuation Network (CNN) method. The first stage of this method is to
time. classify objects that will be detected. The results of testing by
the CNN method show that this method in the detection of
In recent years, several methods or technologies for a fire
fire objects has an accuracy rate of about 54%. However, this
early detection applied. One of them, a fire detection system
method used a classification that has many disadvantages.
by detecting smoke and fire in the room using sensors. The
One is the classification can recognize an object but cannot
system is limited by the detection area and does not provide
know the exact location of the object in the image. Therefore,
information on how much the fire occurred. Another
in this study, we propose the YOLO method without using
disadvantage is that when the fire burns up, the sensors
classification.
installed in the building burned and damaged [1]. Therefore,
a safer detection system is needed to detect fires from long II. LITERATURE REVIEW
distances and can monitor a wider area so that losses from
fires can minimized. The YOLO method is a very different approach to the
previous algorithm. The previous algorithm, such as
Monitoring an area in high buildings can be done through Convolutional Neural Network (CNN), uses classification or
videos from CCTV. However, surveillance via CCTV is not localizer to carry out detection by applying the model to
efficient because the supervisors must monitor the CCTV all images in several locations, scaling and assigning values to
day. So an additional system is needed that can monitor the the image as material for detection [6, 7].
area on CCTV. Currently, artificial intelligence can be used
To see the performance of the model, we use the loss an object. CV represents the confidence value, and ሺܿሻ
function, which can be seen in (3): represents the prediction from class.
మ To improve the center and the bounding box of each
ݏݏܮൌ ߣௗ σ௦ୀ σ ଶ ଶ
ୀ ܫ ሾሺݎ െ ݎƸ ሻ ሺݏ െ ݏƸ ሻ ሿ prediction used loss function. The loss function shows the
మ ଶ
performance of the model, the lower the loss value indicates
ߣௗ σ௦ୀ σ
ୀ ܫ ቀඥݐ െ ඥݐƸ ቁ ൫ඥݒ െ
a higher performance [10].
ଶ మ
ඥݒො ൯ ൨ σ௦ୀ σ
ଶ
ୀ ܫ ൫ܸܥ െ ܸܥ ൯ The fast version of the YOLO designed to push the
మ ଶ
ߣ σ௦ୀ σ
ୀ ܫ ൫ܸܥ െ ܸܥ ൯
boundaries of fast object detection [2]. The Fast version of
௦ మ YOLO, such as the Tiny YOLO, uses a neural network with
ߣௗ σୀ ܫ σఢ௦௦௦ሺ ሺܿሻ െ Ƹ ሺܿሻሻଶ (3) 9 convolutional layers [10] is shown in Fig 3. In Fig. 3 we
can see that the neural network only uses a standard type of
Where ܵ is the size of the grid, ܤis bounding boxes. layer: convolutional with 3x3 kernels and max-pooling with
Variable ݎand ݏare a center of each prediction, variable ݐ, 2x2 kernels.
and ݒis the bounding box dimensions. Variable ߣௗ used
to increase the probability of a bounding box with an object,
and ߣ used to decrease the probability of a box without
88
2019 2nd International Conference of Computer and Informatics Engineering (IC2IE)
divided into images. 125 is a Channel for each grid and
contains data for bounding boxes and class predictions.
The Tiny YOLO model contains network code and pre- testing data. Because of the value of K=3, we use 3 iterations
training weight of the network that can be used to transfer the with different testing data for each iteration. The testing data
learning process. This process is done to train the model in used in the 1st iteration is the 20 first images from the dataset,
recognizing the fire object based on training data. Then we the 2nd iteration is the 20 next images, and the 3rd iteration
use the loss function to test the training model. The model can is the 20 last images. The training data in the 1st iteration can
recognize an object if the loss value is less than 1. If the loss be seen in Fig. 6.
value is less than 1, this indicates that the model is good for
object detection. The last step is to make a prediction with the B. Labeling Image
training model using the test image data. After creating the training data, the next step is to create
a label for each training data. At this stage, the labeling
A. The Training Dataset
process is done by giving a bounding box and giving the class
The dataset of the YOLO method divided into two parts, names to the objects in each image. This process is called
i.e., the training dataset and the testing dataset, which amount annotation. The result of annotation is data contains
of training data can use a small dataset [12]. In this research, information of location of the bounding box and the label that
we used 60 images of the dataset are 58 images with the fire stored in the form .xml. Annotation can be seen in Fig. 5.
object and 2 images without the fire object. The dataset used
is uncompressed images and have a resolution of 352x262
pixel. In this study, the division of training data and testing
data was performed using K-fold validation with K=3
[13-15]. Based on this method is obtained that the proportion
of dataset used is 40 images of training data and 20 images of
89
2019 2nd International Conference of Computer and Informatics Engineering (IC2IE)
C. The Training Model
In conducting a training model, we use modified of the
YOLO method, i.e., the Tiny YOLO model. In this model,
there are two types, i.e., network code and pre-training weight
of the network that can be used to transfer learning.
The next step is to create a text file that contains the class
name (label). In our research, the class name used is fire and
stored in the form of a .txt file. The Tiny YOLO model is
based on the Darknet reference network and is much faster
than the YOLO model [10].
In this study, the training model is aimed to create the
model to learn of fire object that it wants to detect. This
learning process is called transfer learning. Labeled images
entered into the Tiny YOLO Voc model for recognition. The
transfer learning process requires entries such as learning
rate, batch size, and epoch.
Fig. 5. Labeling image into form xml
Fig. 6. The training dataset in the 1st iteration (24 of 40 images of training dataset with the fire object)
The batch size is a term used in transfer learning and E. The Evaluation of Model Performance
refers to the number of training in one iteration. The epoch is In this study, to see the model can detect fire objects well
a parameter that determines how many times the learning or to evaluate the model performance, we also use some
algorithm will work against the entire training dataset [16]. indicators such as precision, recall, and accuracy. The formula
The learning rate is the number of changes to the model of them can be seen in (4), (5), and (6):
during each step of this search process. The learning rate can
control a neural network model that learns object detection
[16].
ൌ ൈ ͳͲͲΨ (4)
ା
90
2019 2nd International Conference of Computer and Informatics Engineering (IC2IE)
Fig. 7. The images of the testing dataset in the 1st iteration (15 of 20 images of the testing dataset)
IV. RESULTS AND DISCUSSIONS The results of the training model can be seen in Fig. 8. We
In this section, the results of the prediction explained can see that the average value of loss =
using the training model. The input used 40 training images 0.3131451712598113<1. This indicates that the model has
and 20 test images. The model used is the Tiny YOLO model, good performance. The next step is to detect the fire object in
which is a modification of the YOLO model. We use the the test image using a new model of training. The results of
learning rate at 0.00001, the batch size value is 8, and the this detection can be seen in Fig. 9.
number of epochs is 300.
Fig. 9. The results of the detection fire object in the testing dataset in the 1st iteration (15 of 20 images)
91
2019 2nd International Conference of Computer and Informatics Engineering (IC2IE)
In Fig. 9, it can be seen that in the 1st iteration, the model [2] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You only look
can detect 17 test images with a fire object and can detect 2 once: Unified, real-time object detection,” in Proceedings of The IEEE
Conference on Computer Vision and Pattern Recognition, pp. 779–
images without a fire object. The next stage is to do the 2nd 788, December 2016
iteration and the 3rd iteration. The results of 3 iterations in [3] G. Xu, Y. Zhang, Q. Zhang, G. Lin, and J. Wang, “Deep domain
this research shown in Table I. adaptation based video smoke detection using synthetic smoke
images,” Fire Safety Journal, vol. 93, pp. 53–59, October 2017
TABLE I. THE EVALUATION RESULTS OF MODEL PERFORMANCE [4] D. K. Appana, R. Islam, S. A. Khan, and J. M. Kim, “A video-based
smoke detection using smoke flow pattern and spatial-temporal energy
Iteration 1 Iteration 2 Iteration 3 Average
analyses for alarm systems,” Information Sciences, vol. 418–419, pp.
True Positive 17 16 17 91–101, December 2017
False Positive 0 1 1 [5] M. Hendri, Perancangan sistem deteksi asap dan api menggunakan
False Negative 1 2 1 pemrosesan citra [The design of smoke and fire detection system using
image processing], Thesis, Industrial Technology Faculty, Indonesian
True Negative 2 1 1
Islamic University Yogyakarta, 2018
Precision 100% 94,117% 94,44% 96,185% [6] B. B. Traore, B. K. Foguem, and F. Tangara, “Deep convolution neural
Recall 94,44% 88,89% 94,44% 92,59% network for image recognition,” Ecol. Inform., vol. 48, pp. 257–268.
2018
Accuracy 95% 85% 90% 90%
[7] R. Takahashi, T. Matsubara, and K. Uehara, “A novel weight-shared
multi-stage CNN for scale robustness,” IEEE Transactions on Circuits
Based on Table I, we obtain an average value of precision is and Systems for Video Technology, vol. 29, no. 4, pp. 1090–1101,
96,185%, the average value of recall is 92,59%, and the April 2019
average value of accuracy is 90%. [8] X. Zhao, H. Jia, and Y. Ni, “A novel three-dimensional object detection
with the modified You Only Look Once method,” International Journal
IV. CONCLUSIONS AND FUTURE WORKS of Advanced Robotic Systems, pp. 1–13, April 2018
[9] J. Lu, C. Ma, L. Li, X. Xing, Y. Zhang, Z. Wang, and J. Xu, “A vehicle
In this study, we propose image-based detection using the detection method for aerial image based on YOLO,” Journal of
YOLO Method and the Tiny YOLO model to detect fire Computer and Communications, vol. 6, pp. 98–107, November 2018
hotspots on CCTV videos. This detection system is important [10] R. Huang, J. Pedoeem, and C. Chen, “YOLO-LITE: A real-time object
because it can help firefighters in the optimal evacuation detection algorithm optimized for non-GPU computers,” IEEE
process. The results of testing by using the YOLO method International Conference on Big Data, November 2018
show an average value of accuracy is 90%, and the average [11] S. Shindea, A. Kotharia, and V. Gupta, “YOLO based human action
recognition and localization,” Procedia Computer Science, vol. 133,
value of the loss is 0.3131451712598113<1. This indicates pp. 831–838, 2018
that the model has a good performance to detect fire objects.
[12] G. Li, Z. Song, and Q. Fu, “A new method of image detection for small
For the next experimental research, we will add sample of datasets under the framework of YOLO network,” IEEE 3rd Advanced
training and use GPU for getting higher performance. Information Technology, Electronic and Automation Control
Conference (IAEAC), October 2018
ACKNOWLEDGMENT [13] R. C. Sharma, K. Hara, and H Hirayama, “A machine learning and
This research was funded by National Competitive cross-validation approach for the discrimination of vegetation
physiognomic types using satellite-based multispectral and
Research Grant 2019 in Penelitian Terapan (PT) scheme, multitemporal data,” Scientifica, vol. 2017, pp. 1–8, June 2017
Direktorat Riset dan Pengabdian Masyarakat, Direktorat [14] Y. Jung, and J. Hu, “A k-fold averaging cross-validation procedure,”
Jenderal Penguatan Riset dan Pengembangan Kementerian Journal of Nonparametric Statistics, vol. 27, no. 2, pp. 1–13, February
Riset, Teknologi, dan Pendidikan Tinggi (DRPM 2015
KEMENRISTEKDIKTI) Indonesia to the Gunadarma [15] R. Kohavi. “A study of cross-validation and bootstrap for accuracy
University No. 15/AKM/PNT/2019. estimation and model selection,” International Joint Conference on
Artificial Intelligence, vol. 14, no. 12, pp. 1137–1143, 1995
REFERENCES [16] J. Brownlee, Deep Learning With Python, Machine Learning Mastery,
2017
[1] S. J. Chen, D. C., Hovde, K. A. Peterson, and A. W. Marshall, "Fire
detection using smoke and gas sensors,” Fire Safety Journal, vol. 42,
no. 8, pp. 507–515, November 2007
92