Fire Detection and Segmentation Using Yolov5 and U-Net: Abstract-The Environmental Crisis The World Faces

Fire Detection and Segmentation using YOLOv5
and U-NET
Wided Souidene Mseddi 1,2 Rafik Ghali Marwa Jmal Rabah Attia
1
L2TI, Institut Galilée SERCOM Laboratory, Telnet Innovation Labs SERCOM Laboratory,
Université Sorbonne Paris Nord Ecole Polytechnique de Telnet Holding Ecole Polytechnique de Tunisie
Villetaneuse, France Tunisie Ariana, Tunisia University of Carthage
2
SERCOM Laboratory University of Carthage marwa.jmal@groupe-telnet.net Tunis, Tunisia
University of Carthage Tunis, Tunisia rabah.attia@ept.u-carthage.tn
Tunis, Tunisia. rafik.ghali@ept.rnu.tn
wided.mseddi@univ-paris13.fr
Abstract—The environmental crisis the world faces Machine-learning techniques also are employed to
nowadays is a real challenge to Human Beings. One notable increase the reliability of fire detection systems. Numerous
hazard for humans and nature is the increasing number of models are used such as SVM [8] and neural networks [9].
forest fires. Thanks to the fast development of sensors and
technologies as well as computer vision algorithms, new Recently, thanks to their great performance in detecting and
approaches for fire detection are proposed. However, these identifying objects, Deep Learning (DL) approaches have
approaches face several limitations that need to be resolved, been investigated to detect and localize forest fire.
precisely, the presence of fire-like objects, high false alarm rate, Deep Learning techniques helped researchers a lot to
detection of small size fire objects, and high inference time. An extract relevant features that best represent the fire to be
important step for vision-based fire analysis is the segmentation described. Indeed, these models have been successfully used
of fire pixels. Hence, we propose, in this paper, a novel in several fields such as image classification, self-driving
architecture, combining YOLOv5 and U-net architectures, for cars, speech recognition, pedestrian detection, face
fire detection and segmentation. Using a dataset of wildland
fires mixed with fire-like object images, the experimental results
recognition, cancer detection, etc. [10-12]. For all these
proved that the novel architecture is reliable for forest fire applications, DL proved its efficiency in detecting and
detection without false alarms. segmenting different classes of objects [10-12].
For the task of detection, the newly introduced algorithm
Keywords— Forest Fires, Fire detection, Fire segmentation, named YOLOv5 [13] has proved an excellent tradeoff
Deep learning, YOLOv5, U-Net between accuracy and inference time. For the task of
segmentation, U-Net [14] has given excellent results and
I. INTRODUCTION performance on segmenting medical images.
Forest Fires (FF) are one of the most dangerous and In this paper, we present a novel fire detection system
challenging natural disasters today that can threaten based on DL using pre-trained YOLOv5 and U-Net models
humanity and the environment. FF that are not controlled, concatenated sequentially. For this purpose, we first feed the
can make huge damage with disastrous effects to human original images of fires with their annotations to the
properties and areas of vegetation. Fires affect more than 350 YOLOv5 model, then, we crop the fire class using the
million hectares every year worldwide [21]. bounding boxes obtained via the detection. Finally, we feed
To avoid this dangerous disaster, systems for detecting these cropped images to a trained U-Net model using original
and monitoring Forest Fire at the early stages are very images with their annotations and we obtain our segmented
crucial. images with their bounding lines in the same image.
The earliest fire detection systems used to detect fire More specifically, this paper makes two main
using numerous sensors such as gas detectors, smoke contributions. Firstly, we propose a novel architecture
detectors, flame detectors and temperature detectors, but capable of detecting and segmenting fires in an operationally
these techniques are not efficient in the case of forest or and time-efficient manner aiming to overcome the
wildland Fire detection. Indeed, they have smaller coverage limitations of state-of-the art techniques. Secondly, our
areas, and they do not respond in real-time. To overcome model has demonstrated its high performance on big and
these limitations, vision sensors (embedded or fixed) are the small-size fire objects and its ability to distinguish between
most useful to detect fires with high accuracy, high coverage fire and fire-like objects.
area, and less error. The remainder of the paper is organized as follows: section
Through the years, researchers have proposed many 2, briefly describes the related works of DL techniques for
techniques that allowed them to detect and segment fires fire detection. Section 3 describes the proposed deep learning
with high accuracy using image processing and computer architectures. In section 4, the implementation and the
vision methods. First, fire color features have been widely experimental results are presented. Finally, section 5
used to distinguish fire. These techniques transform the concludes the paper.
image into another color space, such as YCbCr [1] or YUV
[2], and then classify its pixels into fire or non-fire through I.RELATED WORKS
comparing pixel values to some thresholds. Nonetheless, Several fire detection methods have been proposed and
these methods are limited by the complexity to identify fire have been presented in numerous reviews [5, 15].
characteristics in the image. In this related works section, we choose to highlight recent
advances in Deep Learning techniques.
ISBN: 978-9-0827-9706-0 741 EUSIPCO 2021

In [16], GoogleNet, VGG13 and AlexNet models are to detect not only the bounding box where is situated the fire
employed to detect wildfire using unmanned aerial vehicles but the shape
(UAV). A comparative analysis showed that the modified
VGG13 and the GoogleNet presented the better performance
[16].
In [17], authors proposed a CNN model called “Fire_Net”

to localize and identify fire in aerial images. Using 3561
images, a good performance is achieved. Zhang et al. [18]
also used a deep CNN model for fire detection. They used a
patch fire classifier to detect whether an image contains fire
or not, if it does a CNN is implemented to localize fire. A
great result with a detection accuracy of 90% and a false
alarm rate of 2.3% is achieved using training data of 1153
images.
Recently, numerous region-based CNNs are employed to

detect and localize fire in images/videos. In [19], authors used
Faster R-CNN to detect and localize fire in real time. Using a
dataset of 12620 images (forest fires, candle fires, gas ranges
fire), good performances are obtained in terms of detection
accuracy and precision but this solution is not adequate for
real time applications due to its high response time.
In the same way, Shen et al. [20] exploited another region

based CNN model, Yolo (You Only Look Once) to detect fire
from video. Good results are obtained in terms of accuracy,
precision and the response time, which is faster, more than 3
times compared, to Faster R-CNN. In [6], Yolo v3 is explored
to detect fire using aerial images. Great recognition rate of Fig.1. Proposed architecture
83% and speed of 3.2 frames per second were achieved, and
proved the reliability of this model to detect forest fire using A. YOLOv5
UAV. Authors, in [4], also employed various region-based YOLOv5 (You Only Look Once) is a single-stage object
CNN models (Yolo, SSD and Faster R-CNN) to identify detector that has three important parts:
forest. SSD model which employs multi-resolution feature • Model Backbone: This part is used to extract
maps to localize objects at various scales, showed its important features from the given input image. The
efficiency to detect forest fire in real-time in terms of speed Cross Stage Partial Networks (CSPN) are used as
and accuracy. a backbone to extract rich informative features
from the input fire image
II. PROPOSED ARCHITECTURE • Model Neck: It mainly consists of generating
The proposed fire detection architecture is based on feature pyramids. Feature pyramids are used to
combining two Deep Learning models in order to perform generalize well on object scaling. It helps to detect
the detection and localization of wildfires. Our overall the same object with various scales and sizes. This
framework takes an image as input and outputs the localized is useful for performing well on unseen data.
fire flames. The first step of our model is YOLOv5 [13] and • Model Head: This is the final detection part.
the second is U-Net [14]. In the proposed architecture, these YOLO employs anchor boxes on features and
two networks are integrated. First, the network is fed with develops the final output, namely, bounding boxes,
RGB color images of forest fires, which are processed by with a class score.
YOLOv5 to get the bounding box around the fire area. Once
B. U-NET
we get these, a Crop Layer is applied to the image obtained
from YOLOv5 results so that we get only the parts of the U-Net network is a deep convolutional network that has
image limited by the bounding box. Then, these cropped successfully been used in medical image segmentation.
images are fed into the U-Net to confirm the presence of Unlike traditional DL models, which are data-hungry, U-
flame and detect the precise location of fire. The result is a Net can still be trained with a small amount of data.
binary mask representing the fire pixels in the image. The U-Net is a two-stage deep learning model. Its
Finally, we take the bounding lines obtained and we put on architecture includes an encoder model followed by a
the original images. It is important to mention that we decoder model. It contains nine blocks, four blocks in each
trained U-Net offline and then used the trained weights to stage and one shared block. Each block consists of two 2D
segment the cropped images. The model is presented in convolution layers, which use 3*3 kernel and rectified linear
Fig.1. The novelty of our proposed method is that it allows unit activation function, followed by 2D max-pooling
layers. The number of feature channels is duplicate at each
742
down sampling phase. At the final, a 1*1 layer constructs a It is important to use data augmentation techniques to
binary mask. improve the performance of our model and avoid overfitting.
This model uses a set of input fire images and their Mainly, it consists of applying transformations on the image
corresponding binary masks. such as geometrical transformations (rotation, scaling,
padding, cropping, image translation and flip translation),
During training and based on the binary mask as the desired photometric transformation (brightness, contrast, and shear),
output, the model learns how to classify each pixel of the image occlusion techniques (Mixup) and Mosaic data
images into the different object labels. For our task, we create augmentation, which combine numerous transformations for
two classes that are fire and non-fire. a single image [7].
III. IMPLEMENTATION AND RESULTS As for our problem, we chose the data augmentation
techniques based on characteristics of flame. For instance, we
In this section, we detail the implementation settings did not consider techniques using color space adjustments to
adopted to train and test our proposed techniques. Namely, keep the fire color information. Besides, we excluded rotation,
the data preparation step: collecting dataset and performing because for example we cannot find a 90 degree rotated fire in
data augmentation, the deployment of the overall real life. In addition, it is reasonable to flip the fire image
architecture, the Test Time Augmentation (TTA) module horizontally, but it would not be reasonable to flip it vertically.
and finally the results collection and the performance As in the real world, we would not be seeing many images of
analysis. fire flipped upside-down. In conclusion, we used image
translation, image scale, mosaic, mix up, and horizontal flip as
A. Data Preparation
augmentation techniques.
For fire detection problems, there is no benchmark C. Training
dataset, which makes a comparative study between DL We trained the two models using the Pytorch framework
approaches in the field a bit critical. We create our dataset, on a machine with GPU NVIDIA Tesla P100 16 GB.
which contains the Corsican fire dataset [3] and fire like Moreover, we divided our dataset into two subsets as
object images. Corsican fire database is the dataset of forest presented in Table I.
fire images collected from different research teams in the
world. It includes wildfire image sequences acquired in
TABLE I. DATASET SUBSETS
various areas, under numerous conditions like climatic
conditions, burning vegetation type, distance to fire and the Number of Number of Number of
brightness of fire. positive negative annotated bounding
images images boxes
To diversify our dataset and improve the model capability of Training set 883 107 1367
distinguishing between fire and fire-like objects, we added to
Validation set 100 15 237
the Corsican Fire database numerous images that include fire
like colored objects, such as lights, sunrise, sunset, and
firefighters clothing, in various resolutions. The newly 1. Detection Training
created dataset contains about 1300 images. They include
To train YOLOv5, the input data are PNG images and
fire, non-fire, and fire-like images with different resolutions
TXT files containing details of annotated objects. Our
and different sizes.
training was conducted using Binary Cross-Entropy with
Logits Loss function from PyTorch for loss calculation of
Fig.2 depicts a sample of the Corsican Fire dataset and
class probability and object score, a learning rate set to 0.01,
fire-like objects images containing objects having some fire
a batch size of 8, a number of epochs set to 300, and an image
characteristics like sunset, sunrise and lights.
size set to 416x416 or 1024*1024. Note that the training time
changes depending on the models since we trained the small
YOLO and the Large YOLO.
2. Segmentation Training
To train U-Net, the input data are PNG images alongside

their corresponding binary masks. We used loss (LS) which
is a combination for the loss function of Dice Coefficient
(DC) and Binary Cross entropy (BCE) as follows:
𝐿𝑆 = 1 ∗ 𝑒 !" ∗ 𝐵𝐶𝐸 + 𝐷𝐶 (1)
Resized images of 256x256, are used in various training

Fig. 2. Examples from the Dataset: (a) CorsicanFire images (b) Fire-like epochs. We also implemented a learning rate of 10-4, Adam
objects images as optimizer and a batch size of 4.
D. Test Time Augmentation
B. Data Augmentation
743
Test Time Augmentation is yet another data In table II, we present the performance of both YOLOv5s
augmentation method. While data augmentation is done (Small version) and YOLOv5x (Extra-large version) with
before or while the training of the model, this one is done and without TTA.
during the inference time. It is a simple but effective way to
avoid over fitting and optimize results. The idea is to show TABLE II. FIRE DETECTION RESULTS
different versions of the same image to the same model, take
Models TTA Recall MAP
the different outputs and extract the detected bounding boxes
and then combine the results. This is a very fast way to ON 0.842 0.732
improve the model performance (confidence of the output)
YoloV5s
without losing a precious time for data augmentation. OFF 0.805 0.686
E. Evaluation metrics ON 0.869 0.718

The evaluation metrics used in this work are divided into YoloV5x
detection metrics and segmentation metrics.
OFF 0.769 0.654
1. Fire detection metrics
• Recall is the value of the percentage of total relevant We can see that the two versions of YOLOv5 achieved
results correctly classified. great results in terms of fire detection using TTA techniques
and data augmentation techniques.
𝑇𝑃 Due to high resolution of input images, YOLOv5 xlarge
𝑅𝑒𝑐𝑎𝑙𝑙 = (2)
𝑇𝑃 + 𝐹𝑁 achieved the best results compared to YOLOv5s.
Where TP: True positive, FN: False Negative. In fig. 3, we can see that Yolov5 has accurately detected and
localized forest fire, precisely small fire. Accordingly, the model
• MAP (Mean Average Precision) is the mean of AP. The
overcomes the confusion with fire like-objects like sunrise and
AP value of the different classes is calculated as follows: sunset.
𝐴𝑃 = 9(𝑟# − 𝑟#!$ )𝑝# (3)

$:&
where r and p are the recall and the precision at the mth
threshold.
2. Fire segmentation metrics
• Dice Coefficient (DC): The Dice Coefficient is a
statistical indicator that measures the similarity of two
images (predicted and ground truth images).
2 ∗ 𝑇𝑃
𝐷𝐶 = (4)
2 𝑇𝑃 + 𝐹𝑃 + 𝐹𝑁
where TP: True positive, FP: False positive and FN: False Fig. 3: YOLOv5 results
negative.
• Accuracy: is the fraction of correct predictions over the 2. Segmentation Results
number of total predictions achieved by the network.
The table III presents the best scores that we got for our
𝑇𝑃 + 𝑇𝑁 model.
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = (5)
𝑇𝑃 + 𝑇𝑁 + 𝐹𝑁 + 𝐹𝑃
TABLE III. EXPERIMENTAL RESULTS OF U-NET
where TP: True positive, TN: True Negative, FP: False Model Dice coefficient Accuracy
positive and FN: False negative.
U-net 92% 99.6%
F. Experimental Results
We can see that the U-Net model achieved a great
performance (dice coefficient of 92% and accuracy of 99.6
In this section, we discuss the experimental results to %) to segment forest fire. The YOLO v3 applied to our
demonstrate the performance of the proposed method. First, database provides an accuracy of 96.8%.
we discuss the detection results. Then we present the We could attest that the strength of U-Net is its ability, not
segmentation results. Our test set consists of 185 images only to confirm the presence of the forest fire but also to
containing 221 bounding boxes. detect the precise shape of flame. In fig.4, we can see that the
1. Detection Results network accurately and precisely detects the fire and its
shape. By combining the two architectures, we achieved a
744
robust and precise forest fire detector to solve forest fire ACKNOWLEDGEMENT
detection and recognition problems.
This project is carried out under the MOBIDOC scheme, funded by
EU through the EMORI program and managed by the ANPR.
REFERENCES
[1] T. Celik and H. Demirel, “Fire detection in video sequences using a
generic color model,” Fire Safety Journal, vol. 44, no. 2, pp. 147–158,
2009.
[2] G. Marbach, M. Loepfe, and T. Brupbacher, “An image processing
technique for fire detection in video images,” Fire Safety Journal, vol.
41, no. 4, pp. 285–289, 2006.
[3] T. Toulouse, L. Rossi, A. Campana, T. Celik, M. Akhloufi.:” Computer
vision for wildre research: An evolving image dataset for processing
and analysis,” Fire Safety Journal 92, 188-194, 2017.
[4] S. Wu and L. Zhang, “Using popular object detection methods for real
time forest fire detection,” in 11th International Symposium on
Computational Intelligence and Design, 1, 280–284, IEEE, 2018.
[5] R. Ghali, M. Jmal, W. Souidene Mseddi, and R. Attia, “Recent
advances in fire detection and monitoring systems: A review,” in the
International Conference on the Sciences of Electronics, Technologies
Fig. 4: U-Net results: (a) cropped images (b) predicted images of Information and Telecommunications. Springer, 2018.
[6] Z. Jiao, Y. Zhang, J. Xin, et al., “A , 2019 based forest fire detection
approach using uav and yolov3,” in 1st International Conference on
As an example, we can see in fig. 5 that our proposed model Industrial Artificial Intelligence (IAI), 1–5, IEEE, 2019.
performs very well in detecting fire pixels and segmenting [7] C. Shorten and T. M Khoshgoftaar, “A survey on image data
fire surfaces, especially small areas (figure in the middle), augmentation for deep learning,” Journal of Big Data, vol. 6, no. 1, pp.
and it has successfully overcome the confusion with fire-like 1–48, 2019.
objects (figure in the right). These results outperform the [8] Y. Chunyu, F. Jun, W. Jinjun, et al., “Video fire smoke detection using
state-of-the-art fire detection techniques. motion and color features,” Fire Technology, 46 (3), 651–663, 2010.
[9] M. A. I. Mahmoud and H. Ren, “Forest fire detection and identification
using image processing and svm,” Journal of Information Processing
Systems, 15(1), 159–168, 2019.
[10] Z.-Q. Zhao, P. Zheng, S.-t. Xu, et al., “Object detection with deep
learning: A review,” IEEE transactions on neural networks and
learning systems, 30(11), 3212–3232, 2019.
[11] L. Liu, W. Ouyang, X. Wang, et al., “Deep learning for generic object
detection: A survey,” International Journal of Computer Vision, 128
(2), 261–318, 2020.
[12] S. Minaee, Y. Boykov, F. Porikli, A. Plaza, N. Kehtarnavaz & D.
Terzopoulos. “Image segmentation using deep learning: A survey,”
arXiv preprint arXiv:2001.05566, 2020.
[13] G. Jocher, A. Stoken, J. Borovec, N. Code, C. STAN, L.
C. Laughing, T. YxNONG, A. Hogan, L. mammana, A.
Wang, A. Chaurasia, L. D. Marc, W. Haoyang, D. Durgesh, F.
Ingham, F. Guil-hen, A. Colmagro, H. Ye, J. Solawetz, J. Poznanski, J.
Fang, J. Kim, K. Doan, and L.
Yu, “ultralytics/yolov5:v4.0, PyTorch Hubintegration,” Jan. 2021
[14] O. Ronneberger, P. Fischer and T. Brox, “U-net: Convolutional
networks for biomedical image segmentation,” Springer , 2015.
[15] A. Gaur, A. Singh, A. Kumar, A. Kumar,K. Kapoor.: “Video flame and
smoke based re-detection algorithms: A literature review,” FIRE
TECHNOLOGY, 2020.
[16] W. Lee, S. Kim, Y.-T. Lee, et al., “Deep neural networks for wildfire
detection with unmanned aerial vehicles,” in IEEE International
Conference on Consumer Electronics (ICCE), 252–253, IEEE ,2017.
[17] Y. Zhao, M. Jiale, L. Xiaohui and Z. Jie, “Saliency detection and deep
Fig. 5: Results: (a) input mages (b) output models learning-based wildfire identification in UAV imagery,” Sensors, vol.
18, no. 3, p. 712, 2018.
IV. CONCLUSION [18] Z. Qingjie, X. Jiaolong, X. Liang and G. Haifeng, “Deep convolutional
neural networks for forest fire detection,” in International Forum on
Management, Education and Information Technology Application.
In this paper, we introduced a novel method of forest fire Atlantis Press, 2016.
detection and segmentation based on YOLOv5 and U-net. [19] Q.-x. Zhang, G.-h. Lin, Y.-m. Zhang, G. Xu and J.-j. Wang, “Wildland
Using Corsican Fire dataset and various fire-like objects forest fire smoke detection based on faster R-CNN using synthetic
images, we evaluated our methods. Experimental results smoke images”, Procedia engineering, vol. 211, pp. 441-446, 2018.
proved that this method is able to detect forest fires precisely [20] D. Shen, X. Chen, M. Nguyen and W. Q. Yan, “Flame detection using
deep learning,” in 4th International Conference on Control,
small fires (with small flames) in different acquisition Automation and Robotics (ICCAR). IEEE, 2018.
conditions. For future work, we aim to introduce a smoke/fire [21] D. Stav. "Fighting fire with science". Nature, vol. 576, no 7786, p. 328-
detection method based on CNN in order to identify and 329, 2019
localize both fire and smoke without false alarms.
745

Fire Detection and Segmentation Using Yolov5 and U-Net: Abstract-The Environmental Crisis The World Faces

Uploaded by

Copyright:

Available Formats

Fire Detection and Segmentation Using Yolov5 and U-Net: Abstract-The Environmental Crisis The World Faces

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Fire Detection and Segmentation Using Yolov5 and U-Net: Abstract-The Environmental Crisis The World Faces

Uploaded by

Copyright:

Available Formats

Fire Detection and Segmentation using YOLOv5

ISBN: 978-9-0827-9706-0 741 EUSIPCO 2021

In [17], authors proposed a CNN model called “Fire_Net”

Recently, numerous region-based CNNs are employed to

In the same way, Shen et al. [20] exploited another region

To train U-Net, the input data are PNG images alongside

𝐿𝑆 = 1 ∗ 𝑒 !" ∗ 𝐵𝐶𝐸 + 𝐷𝐶 (1)

Resized images of 256x256, are used in various training

E. Evaluation metrics ON 0.869 0.718

𝐴𝑃 = 9(𝑟# − 𝑟#!$ )𝑝# (3)

You might also like