Smoke Detection Based On YOLOv8
Smoke Detection Based On YOLOv8
Smoke Detection Based On YOLOv8
Article
An Improved Wildfire Smoke Detection Based on YOLOv8 and
UAV Images
Saydirasulov Norkobil Saydirasulovich 1 , Mukhriddin Mukhiddinov 2 , Oybek Djuraev 2 ,
Akmalbek Abdusalomov 1,3, * and Young-Im Cho 1, *
Abstract: Forest fires rank among the costliest and deadliest natural disasters globally. Identifying the
smoke generated by forest fires is pivotal in facilitating the prompt suppression of developing fires.
Nevertheless, succeeding techniques for detecting forest fire smoke encounter persistent issues, in-
cluding a slow identification rate, suboptimal accuracy in detection, and challenges in distinguishing
smoke originating from small sources. This study presents an enhanced YOLOv8 model customized
to the context of unmanned aerial vehicle (UAV) images to address the challenges above and attain
heightened precision in detection accuracy. Firstly, the research incorporates Wise-IoU (WIoU) v3
as a regression loss for bounding boxes, supplemented by a reasonable gradient allocation strategy
that prioritizes samples of common quality. This strategic approach enhances the model’s capacity
for precise localization. Secondly, the conventional convolutional process within the intermediate
neck layer is substituted with the Ghost Shuffle Convolution mechanism. This strategic substitution
reduces model parameters and expedites the convergence rate. Thirdly, recognizing the challenge
of inadequately capturing salient features of forest fire smoke within intricate wooded settings, this
study introduces the BiFormer attention mechanism. This mechanism strategically directs the model’s
Citation: Saydirasulovich, S.N.;
attention towards the feature intricacies of forest fire smoke, simultaneously suppressing the influence
Mukhiddinov, M.; Djuraev, O.; of irrelevant, non-target background information. The obtained experimental findings highlight the
Abdusalomov, A.; Cho, Y.-I. An enhanced YOLOv8 model’s effectiveness in smoke detection, proving an average precision (AP) of
Improved Wildfire Smoke Detection 79.4%, signifying a notable 3.3% enhancement over the baseline. The model’s performance extends to
Based on YOLOv8 and UAV Images. average precision small (APS) and average precision large (APL), registering robust values of 71.3%
Sensors 2023, 23, 8374. https:// and 92.6%, respectively.
doi.org/10.3390/s23208374
consume a more extensive acreage, accounting for approximately 53% of the mean property
burned during the period spanning 2018 to 2022 [1].
Forest fires pose a serious hazard to both human lives and property, exerting a
markedly harmful impact on the natural ecological balance of forest ecosystems. Further-
more, their occurrence remains unpredictable and engenders tough challenges regarding
rescue operations [2,3]. As a result, the prevention of forest fires has consistently held a sig-
nificant position in strategically establishing public infrastructure across diverse nations. In
forest fire outbreaks, the representation of smoke typically precedes the actual ignition, with
detectable pre-smoke indicators [4–6]. Timely and precise detection of wildfire-induced
smoke holds immense significance, not solely for early forest fire alert systems and fighting
measures but also for shortening the loss of human lives and property.
Traditional methods for monitoring forest fires involve manual observation through
ground-based surveys and observation towers. Manual observation is sensitive to external
factors such as logistical limitations, communication challenges, and weather, leading to
inefficiencies. As a means of monitoring, observation towers possess limitations, including
restricted coverage, areas with no surveillance coverage, and subsequent high maintenance
expenses [7]. Despite its broad coverage, satellite-based monitoring [8] of forest fires faces
limitations such as inadequate spatial resolution of satellite imagery, dependence on orbital
cycles, susceptibility to weather and cloud cover interference, and low satellite numbers.
Furthermore, achieving real-time forest fire monitoring using satellite systems is challenging.
Aerial monitoring has emerged as a productive method for forest fire surveillance [9],
primarily using aircraft or unmanned aerial vehicles (UAV) and drones for surveillance.
Nevertheless, this approach encounters substantial operational expenses due to the expan-
sive expanse of forested landscape under consideration. Conventional methods of early
forest fire detection predominantly rely on smoke and temperature-sensitive sensors, often
in a combined configuration [10–12]. These sensors are engineered to detect airborne smoke
particulates and swift escalations in ambient temperature, thereby facilitating fire detection.
Notably, activating an alert is contingent upon achieving predetermined thresholds in either
smoke concentration or ambient temperature. Despite their utility, these hardware-based
sensors exhibit spatial and temporal constraints, compounded by challenges in mainte-
nance after deployment. Consequently, it becomes evident that sensor-based solutions need
to catch up in catering to the difficulties of real-time monitoring and preemptive detection
and mitigation of forest fires within vast and complicated ecosystems, such as forests.
With the advancement of computer technology, there has been a shift towards more
advanced approaches for detecting fire smoke, moving away from manual feature extrac-
tion methods. This newer paradigm predominantly revolves around surveillance systems
positioned at observation points, capturing forest fire imagery or videos. Subsequently,
manual extraction of features from these data sets is conducted, followed by the formu-
lation of distinctive identifiers. This process is demonstrated in the work of Hidenori
et al. [13], who used textural features of smoke to train a support vector machine model
for identifying wildfire smoke. The efficacy of this approach is dependent on a sufficient
number of training cases and the precision of feature extraction, both of which influence
the recognition performance of the support vector machine. However, it is noteworthy
that this technique engenders substantial data storage requirements and exhibits sluggish
computational processing speeds. Fileonenko et al. [14] conducted smoke recognition by
leveraging color and visual attributes inherent in smoke regions within surveillance videos.
Exploiting the steadiness of the camera’s perspective, these researchers extracted smoke
regions by computation of pixel edge roughness, subsequently employing background
subtraction for identification. Nevertheless, this technique’s susceptibility to noise impairs
its capability to achieve precision and rapid smoke detection. Tao and colleagues [15]
worked on automating smoke detection using a Hidden Markov Model. They focused on
capturing the changing characteristics of smoke areas in videos. They divided the color
changes in consecutive frames into distinct blocks and used Markov models to classify each
of these blocks. Despite these endeavors, this strategy still needs to be challenged by the
Sensors 2023, 23, 8374 3 of 24
intricacies of its operational setting. Traditional methods that use image or video analysis
to detect forest fire smoke have achieved good results but also have some limitations. The
underlying feature extraction process necessitates adept domain knowledge for feature
selection, introducing the possibility of suboptimal design. Moreover, characteristics such
as background, fog, cloud, and lighting can lead to reduced detection and recognition
accuracy. Furthermore, these methods may not work as well in complex and changing
forest circumstances.
With the rapid progress of deep learning techniques, researchers are increasingly
using them for detecting forest fire smoke. Deep learning allows automatic detection and
feature extraction through more complicated algorithms, leading to faster learning, better
accuracy, and improved performance in dense forest conditions. For example, Zhang and
colleagues [16] expanded their dataset by creating synthetic instances of forest fire smoke
and used the Faster R-CNN framework for detection. This approach avoids the need for
manual feature extraction but requires more computational resources. Another study by
Qiang and team [17] used a dual-stream fusion method to detect wildfire smoke using a
motion detection algorithm and deep learning. They achieved an accuracy of 90.6% by
extracting temporal and spatial features from smoke images. However, there’s still a chal-
lenge in capturing feature information effectively from long sequences at the beginning. In
the study by Filonenko et al. [18], various established convolutional classification networks,
including VGG-19 [19], AlexNet [20], ResNet [21], VGG-16, and Xception, were utilized to
classify wildfire smoke images. They employed Yuan’s dataset [22] of four smoke images
for both training and validation. Their assessment of these model networks’ performance in
recognizing smoke on this dataset highlighted Xception as the most effective detector. In an-
other work, Li et al. [23] introduced an innovative technique called the Adaptive Depthwise
Convolution module. This module dynamically adjusts the weights of convolutional layers
to enhance the capture of features related to forest fire smoke. Their methodology yielded
an accuracy of 87.26% at a frame rate of 43 FPS. Pan et al. [24] explored the deployment
of ShuffleNet, coupled with Weakly Supervised Fine Segmentation and Faster R-CNN
frameworks, for predicting the presence of fire smoke. However, due to the intricate nature
of fire smoke and the high memory requirements for model training, the complexity of the
task necessitated exceedingly robust hardware resources.
The extensive adaptability, rapidity, and precision of UAVs have led to their widespread
integration in forest fire detection endeavors. UAVs can use their capacity to operate at
low altitudes to capture high-resolution images of forested regions, enabling early fire
identification. Moreover, UAVs demonstrate proficiency in navigating difficult and inac-
cessible terrains [25]. They can carry diverse cameras and sensors capable of detecting
diverse spectral ranges, encompassing infrared radiation, which facilitates the discern-
ment of latent heat sources beyond human visual perception. Furthermore, UAVs can
be equipped with real-time communication systems, enabling quick responsiveness by
firefighters and furnishing pertinent information about the fire’s parameters, positioning,
and trajectory [26,27]. The collective attributes of UAVs render their deployment in forest
fire detection increasingly pivotal, poised to assume an even more consequential role in the
future of wildfire management.
Prior investigations into forest fire smoke detection have demonstrated the efficacy of
various detection models, yielding favorable outcomes. Nevertheless, the complex back-
ground of forest environments and the difficulties linked to smoke feature extraction lead
to numerous early detection challenges. Principally, forest imagery frequently encompasses
both smoke and analogous background elements, such as clouds, water surfaces, and mist,
which confound differentiation. The interplay of natural lighting fluctuations further com-
pounds these issues, inducing image attribute alterations that impede downstream feature
extraction and recognition processes. Moreover, precisely identifying nascent smoke in-
stances remains formidable, given their dynamic characteristics and diminutive, indistinct
shapes. Our framework employs an enhanced YOLOv8 model [28] for forest fire smoke
detection. We initiated the model with pre-trained weights as foundational parameters
Sensors 2023, 23, 8374 4 of 24
for the underlying backbone network, subsequently adjusting network architecture pa-
rameters to optimize the conventional YOLOv8 model’s efficacy. Integrating this refined
network architecture into a dataset relevant to forest fire smoke enabled precise recognition
of perilous emissions such as smoke, including hazardous compounds.
The significant contributions of this study are as follows:
• We incorporate the Wise-IoU (WIoUv3) [29] method into the bounding box regression
loss. This involves using a dynamic, non-monotonic approach to create a strategy for
allocating gradient gains with improved rationality. WIoU v3 effectively adjusts gradi-
ent gains for samples of both high and low quality, resulting in enhanced precision in
localization and an improved overall capacity for generalization in the model.
• We incorporate a dynamic sparse attention design named BiFormer [30] into the
backbone network. This addition is known for its computational efficiency. By incor-
porating this mechanism, the model is better able to emphasize essential information
within the feature map, ultimately improving its ability to detect objects.
• We employ GSConv [31] as a substitute for the conventional convolution within the
neck layer, thereby establishing rapid pyramid pooling modules. This implementation
expedites model convergence, facilitating the more expeditious amalgamation of
smoke features with a reduced computational load when processing smoke images.
• In contrast to various prominent YOLO series models and an additional set of six con-
ventional detection models, our approach showcases its evident superiority through
comprehensive experimental outcomes.
The subsequent sections of this paper are structured as follows: Section 2 offers
a presentation of the relevant literature. Section 3 outlines our dataset and the specific
enhancements to YOLOv8. Section 4 provides a comprehensive account of the experimental
findings and conducts a detailed performance analysis. Limitations and future work are
discussed in Section 5. Ultimately, Section 6 serves to draw conclusions.
2. Related Works
Various approaches exist for smoke and fire detection, broadly categorized as follows:
(a) vision-based methods and (b) sensor-based methods. This article specifically delves
into vision-based strategies, crucial for outdoor settings where sensor deployment might
be infeasible. Vision-based methods can be further divided into two distinct groups. The
initial category entails feature extraction coupled with machine learning techniques, while
the second category focuses on the utilization of deep neural networks.
fire detection. Furthermore, Ghosh et al. [35] concurrently leverage color and motion
attributes to detect smoke and fire. In this endeavor, fuzzy rules are employed to enhance
classification performance. Conversely, Sankarasubramanian et al. [36] employ an edge
detection algorithm to identify fire. Chen et al. [37] employs dynamic fire properties for fire
area identification; however, instances involving objects resembling fire within the image
might degrade the method’s performance. Lastly, Xie et al. [38] employ static and dynamic
features in tandem for fire detection.
The important advantage inherent in these approaches lies in their minimal data
requirements. Additionally, their incorporation of movement considerations serves to
mitigate the misclassification of objects such as the sun as fire sources. Nonetheless,
a drawback of these methods arises from their reliance on feature extraction methods
anchored in attributes such as color. Consequently, these algorithms exhibit substantial
error rates; for instance, an item such as a moving orange box might erroneously trigger
a fire detection. Another noteworthy issue within this realm pertains to the necessity of
fine-tuning pertinent thresholds, a labor-intensive process that often results in elevated
false alarms. Moreover, the methods introduced in this domain grapple with the need for
adept experience to appropriately design and configure suitable features.
The principal challenge associated with AI-driven methodologies resides in the de-
Sensors 2023, 23, 8374
mand for extensive training datasets and the time-intensive nature of the training process,
6 of 24
compounded by limited oversight over the smoke and fire detection procedures. This con-
cern is notably exacerbated by the lack of wide, standardized datasets exhibiting the req-
uisite diversity. In the context of this study, a wide collection of datasets is curated to
requisite diversity. In the context of this study, a wide collection of datasets is curated to
address these challenges and facilitate robust learning.
address these challenges and facilitate robust learning.
3. Materials and Methods
3. Materials and Methods
3.1. Overview
3.1. Overview of Wildfire Smoke
of Wildfire Smoke Detection
Detection
This section
This section delineates
delineatesthe
theutilization
utilizationofof
a deep
a deeplearning model
learning employed
model employedfor the
forpur-
the
pose of detecting wildfire smoke. Additionally, the dataset utilized for training purposes
purpose of detecting wildfire smoke. Additionally, the dataset utilized for training purposes
is explained.
is explained. Prior to the
Prior to the commencement
commencement of of the
the task,
task, the
the requisite
requisite procedures,
procedures, including
including
navigation, model and algorithm selection, and system execution, must be successfully successfully
undertaken. As depicted in Figure 1, the camera onboard UAVs captures
undertaken. captures images or videos,
videos,
which are
which are then subjected to a sequence of operations encompassing preprocessing, feature
extraction, smoke
extraction, smokedetection,
detection,and
andfire
firedetection,
detection,ultimately
ultimately culminating
culminating in in
thethe generation
generation of
of predictive outcomes.
predictive outcomes.
wildfire smoke
Figure 1. Overview of the proposed wildfire smoke detection
detection system
system based
based on
on UAV
UAV images.
images.
This research
research utilized
utilized UAV
UAV images
images andand deep learning models to enhance the accuracy
of
of early
early detection
detection of of forest
forest fire
fire smoke,
smoke, eveneven in
in varied weather conditions
varied weather conditions suchsuch as
as sunny,
sunny,
hazy,
hazy, and
and cloudy
cloudy atmospheres.
atmospheres. We We introduce
introduce an an optimized
optimized YOLOv8
YOLOv8 modelmodel along
along with
with aa
UAV
UAV image-based
image-basedsystemsystemfor forforest
forestfire smoke
fire smokedetection.
detection.Usually, UAVs
Usually, carrycarry
UAVs cameras that
cameras
send images to a control station. At this station, an AI system is used to detect
that send images to a control station. At this station, an AI system is used to detect if there if there is
smoke or fire. In this study, a method was developed that utilizes a deep
is smoke or fire. In this study, a method was developed that utilizes a deep neural network neural network to
accurately
to accurately obtain precise
obtain localization
precise localization of smoke
of smokeregions, executed
regions, executed by abyrobust processor
a robust processorfor
rapid real-time image processing.
for rapid real-time image processing.
Upon
Upon obtaining
obtaining the the image
image and conducting essential
and conducting essential preprocessing
preprocessing optimizations,
optimizations, thethe
task necessitates the separation of pixels outlining the subject of interest
task necessitates the separation of pixels outlining the subject of interest from the sur-from the surround-
ing imageimage
rounding context. The extraction
context. of features
The extraction related
of features to smoke
related and and
to smoke fire involved images
fire involved im-
captured under specific daytime and lighting circumstances. Aspects
ages captured under specific daytime and lighting circumstances. Aspects encompassing encompassing edges,
corner points, motion, color attributes, luminance levels, and intensities were considered
edges, corner points, motion, color attributes, luminance levels, and intensities were con-
integral components of the feature extraction process. To conduct a comprehensive study
sidered integral components of the feature extraction process. To conduct a comprehen-
of the segmented image and identify pivotal points of significance, the image underwent
sive study of the segmented image and identify pivotal points of significance, the image
feature extraction procedures, thereby requiring the execution of relevant operations. The
underwent feature extraction procedures, thereby requiring the execution of relevant
resultant processed image was subsequently inputted into a trained model to determine no-
ticeable patterns that either affirm or reject the presence of smoke. The exact methodology
of the proposed method is illustrated in Figure 2. In the subsequent phase, if the AI model
produces a positive result, the system generates an alert using either the UAV platform or
the control station. This alert prompts firefighting personnel to take the necessary actions.
operations. The resultant processed image was subsequently inputted into a trained
model to determine noticeable patterns that either affirm or reject the presence of smoke.
The exact methodology of the proposed method is illustrated in Figure 2. In the subse-
quent phase, if the AI model produces a positive result, the system generates an alert using
Sensors 2023, 23, 8374 7 of 24
either the UAV platform or the control station. This alert prompts firefighting personnel
to take the necessary actions.
Figure2.
Figure 2. Overview
Overview of
of the
the proposed
proposedforest
forestfire
firesmoke
smokedetection
detectionsystem
systembased
basedon
onUAV
UAVimages.
images.
3.2.
3.2. Original
Original YOLOv8
YOLOv8
The
The YOLO
YOLO model
model has
has achieved
achieved considerable
considerable acclaim
acclaim within
within the
the domain
domain of of computer
computer
vision.
vision. Building
Buildinguponuponthis
thisfoundation,
foundation,scholars
scholarshave
have undertaken
undertaken enhancements
enhancements andandincor-
in-
porated novel modules into the methodology, giving rise to a multitude
corporated novel modules into the methodology, giving rise to a multitude of classical of classical models.
Introduced by Ultralytics
models. Introduced on 10 January
by Ultralytics on 102023, YOLOv8
January 2023, marks
YOLOv8 a significant advancement
marks a significant ad-
in this evolution. In contrast to earlier models such as YOLOv5 and
vancement in this evolution. In contrast to earlier models such as YOLOv5 and YOLOv7, YOLOv7, YOLOv8
is a cutting-edge
YOLOv8 and innovative
is a cutting-edge model known
and innovative modelfor its improved
known detectiondetection
for its improved accuracyaccu-and
faster
racy and faster processing. The YOLOv8 network architecture comprises three main the
processing. The YOLOv8 network architecture comprises three main elements: ele-
backbone,
ments: theneck, and head
backbone, neck,[28].
and head [28].
The modified CSPDarknet53 [46] serves as the backbone network in YOLOv8, which
The modified CSPDarknet53 [46] serves as the backbone network in YOLOv8, which
results in five distinct scale features (denoted as B1–B5) through five consecutive down-
results in five distinct scale features (denoted as B1–B5) through five consecutive
sampling stages. In the original backbone’s architecture, the Cross Stage Partial (CSP)
downsampling stages. In the original backbone’s architecture, the Cross Stage Partial
module has been replaced with the C2f module. This new module, C2f, introduces a
(CSP) module has been replaced with the C2f module. This new module, C2f, introduces
gradient shunt connection to enhance the flow of information within the feature extrac-
a gradient shunt connection to enhance the flow of information within the feature extrac-
tion network while still maintaining a lightweight design. The CBS (Convolution, Batch
tion network while still maintaining a lightweight design. The CBS (Convolution, Batch
Normalization, Silu) module is a composite element initially utilized in the YOLOv5 ar-
Normalization, Silu) module is a composite element initially utilized in the YOLOv5 ar-
chitecture for deep learning-based object detection tasks. This module combines three key
chitecture for deep learning-based object detection tasks. This module combines three key
components, namely: Convolution: Convolutional layers are employed to perform feature
components, namely: Convolution: Convolutional layers are employed to perform feature
extraction from the input data. These layers apply convolutional operations to capture
extraction from the input data. These layers apply convolutional operations to capture
essential patterns and features within the data. Batch Normalization: Batch normalization
essential
is used topatterns
normalize andthe
features within
activations ofthe
thedata. Batch
neural Normalization:
network Batch
at each layer. normalization
It helps stabilize
is used to normalize the activations of the neural network at each
and accelerate the training process by reducing internal covariate shifts. Silu Module:layer. It helps stabilize
The Silu (Sigmoid Linear Unit) module, also known as the Swish activation function,The
and accelerate the training process by reducing internal covariate shifts. Silu Module: is
Silu (Sigmoid Linear Unit) module, also known as the Swish activation
a type of activation function that introduces non-linearity into the network. It is known function, is a type
of activation
for its smoothfunction
gradientthat introduces
behavior, which non-linearity into the
aids in effective network.
training. TheItCBS
is known
module, for by
its
smooth gradient behavior, which aids in effective training. The
incorporating these components, enhances the expressive power of the neural network CBS module, by incorpo-
rating
and these components,
contributes enhances
to its ability to learnthe expressive
complex power of thefrom
representations neural
thenetwork
input data.and This
con-
tributes to its ability to learn complex representations from the input
composite module is enabling more accurate and efficient object detection in a variety ofdata. This composite
module is enabling
applications. morestages
In the later accurate andbackbone
of the efficient object
network,detection in a variety
the spatial pyramid of applications.
pooling fast
(SPPF) module is utilized to adaptively generate output of a consistent size by pooling
input feature maps. In comparison to the spatial pyramid pooling (SPP) structure [47],
SPPF optimizes computational efficiency and reduces latency through a sequence of three
consecutive maximum pooling layers.
Incorporating ideas from PANet [48], YOLOv8 introduces a PAN-FPN architecture
into its neck component. Unlike the neck designs found in the YOLOv5 and YOLOv7
Sensors 2023, 23, 8374 8 of 24
models, YOLOv8 brings about a modification by eliminating the convolution operation post
up-sampling within the PAN structure. This alteration preserves the model’s initial perfor-
mance while achieving a more streamlined configuration. Distinct feature scales within the
PAN structure and FPN structure of the YOLOv8 model are denoted as P4–5 and N4–N5,
respectively. Conventional FPN employs a top-down methodology to convey profound se-
mantic details. However, while FPN enriches the merging of semantic information between
B4–P4 and B3–P3, it may result in the loss of object localization information. To tackle this
concern, PAN–FPN integrates PAN with FPN. By infusing PAN, the acquisition of location
information is bolstered through the merging of P4–N4 and P5–N5, thereby facilitating an
enhancement in the top-down pathway. This strategy orchestrates a comprehensive network
structure that unifies both top-down and bottom-up components. Through feature fusion,
it amalgamates surface-level positional insights and profound semantic details, thereby
enriching the breadth and depth of features.
YOLOv8 employs a decoupled head architecture. This architecture features discrete
branches for both object classification and the prediction of bounding box regression.
Tailored loss functions are then applied to each task. Specifically, the task of bounding
box regression prediction utilizes the CIoU [49] and distribution focal loss (DFL) [50].
Meanwhile, the classification task is supported by the binary cross-entropy loss (BCE
loss). This deliberate design choice contributes to the enhancement of detection precision
and accelerates the model’s convergence. YOLOv8 is distinct as an anchor-free detection
model, simplifying the differentiation between positive and negative samples. Additionally,
it incorporates the Task-Aligned Assigner [51] for dynamic sample allocation, thereby
elevating both detection accuracy and the model’s robustness.
gt gt
( bc x − bc x ) 2 + ( bc y − bc y ) 2
RW IoU = exp( ) (2)
(cw 2 + ch 2 )
The concept of outlier β is introduced by WIoUv3 to evaluate the quality of the anchor
box, generating a non-monotonic focus factor r from this β, and then incorporating r into the
established WIoUv1 formulation. A reduced β weight signifies superior anchor box quality,
leading to a proportional reduction in the assigned r value, subsequently diminishing the
impact of high-quality anchor instances in the overall loss function. Conversely, a larger
β value signifies lower anchor box quality, leading to a reduced gradient gain allocation,
which serves to mitigate adverse gradients stemming from low-quality anchor boxes. By
dynamically allocating gradient gains, WIoUv3 optimizes the weighting of anchor boxes
with varying qualities in the loss function, directing the model’s focus towards samples
of average quality. This approach enhances the general implementation of the model
through rational adjustments. Equations (5)–(7) present the formulations for WIoUv3. The
parameters δ and α in Equation (6) are hyperparameters that can be tuned to align with
specific model characteristics.
β
r=( ) (6)
δα β−δ
L∗ IoU
β= ∈ [0, +∞] (7)
L IoU
Through a comprehensive comparison of various mainstream loss functions, we
ultimately introduce WIoUv3 as the chosen object bounding box regression loss. This
decision is predicated on several factors. Firstly, WIoUv3 merges the merits of EIoU and
SIoU, aligning with the design philosophy of exemplary loss functions. Utilizing a dynamic
non-monotonic approach, WIoU v3 evaluates anchor box quality, with a specific focus on
average-quality instances. This enhancement subsequently strengthens the model’s ability
to precisely locate objects. In scenarios involving object detection through UAV images, the
challenges posed by small objects are prominent. The adaptive adjustment of loss weights
for small objects within WIoUv3 inherently contributes to the improved effectiveness of
the model’s detection.
uct between every query and its connected key. Subsequently, the result is normalized
and combined with √ matrix V through a weighted sum operation. To prevent the result’s
vanishing, a term
gradient from vanishing, a term 𝑑 where
dK is introduced, dK represents
is introduced, 𝑑 dimensionality
wherethe represents theof matrix K.
dimension-
The
alityprocedure
of matrix for this attention
𝐾. The procedureprocess
for thisisattention
outlinedprocess
in Equation (8): in Equation (8):
is outlined
𝑄𝐾
QK T
𝐴𝑡𝑡𝑒𝑛𝑡𝑖𝑜𝑛(𝑄,
Attention ( Q, K,𝐾,V𝑉)
) ==so𝑠𝑜𝑓𝑡𝑚𝑎𝑥(
f tmax ( √ )V )𝑉 (8)
(8)
dK𝑑
However, the
However, the typical
typical attention
attention mechanism
mechanism comes
comes with
with challenges
challenges suchsuch asas high
high com-
com-
putational demands and substantial memory usage. When it comes
putational demands and substantial memory usage. When it comes to detection models to detection models
usedon
used onUAV
UAVplatforms,
platforms,there
thereare
arelimitations
limitationsininterms
termsofofavailable
available resources.
resources. Introducing
Introducing a
a regular attention module directly into the model could take up
regular attention module directly into the model could take up a significant portiona significant portion
of theseof
these resources,
resources, leadingleading to a decrease
to a decrease in the model’s
in the model’s speed for speed
making for predictions.
making predictions.
To address To
address
these these resource-related
resource-related concerns, researchers
concerns, researchers have suggestedhave asuggested
solution that a solution
involvesthat
usingin-
volvesqueries
sparse using sparse
focusingqueries
only focusing onlypairs.
on key-value on key-value pairs. Various
Various related researchrelated research
has emerged has
from
emerged
this from encompassing
approach, this approach, encompassing
concepts such as concepts
expansivesuchattention,
as expansive attention,
deformable deform-
attention,
ablelocal
and attention, and local
attention. attention. Nevertheless,
Nevertheless, these methodsthese methods
generally relygenerally
on manually rely on manu-
designed
ally designed content-independent
content-independent sparsity and fixedsparsity andTofixed
patterns. patterns.
address these To address these
limitations, Lei Zhulimita-
and
tions,
his teamLei[31]
Zhu and his team
introduced [31] introduced
a creative a creative
solution named solution
dynamic sparse named dynamic
attention, named sparse
the
Bi-Level
attention,Routing
namedAttention
the Bi-Levelillustrated
Routingin Figure 3b.
Attention illustrated in Figure 3b.
Figure 3. (a) Architecture of the BiFormer block; (b) Architecture of the Bi-Level Routing Attention
Figure 3. (a) Architecture of the BiFormer block; (b) Architecture of the Bi-Level Routing Attention block.
block.
As depicted
As depicted in
in Figure
Figure3b,
3b,the
theinitial input
initial feature
input map
feature map𝑋 ∈X𝑅∈× R×H ×is
W×initially parti-
C is initially
partitioned 𝑆 × 𝑆S subregions,
tioned into into with with
× S subregions, each region containing
each region HW
containingfeature vectors. We modify
2 feature vectors. We
S
× ×S2 × HW ×C
the shape
modify 𝑋 toof
theofshape yield ∈ 𝑅X r ∈ R . Subsequently,
X to𝑋yield S2 the feature
. Subsequently, the vectors undergoundergo
feature vectors a linear
atransformation to yield to
linear transformation three matrices,
yield namely 𝑄,
three matrices, 𝐾, and
namely Q,𝑉.
K,The
andmathematical formulas
V. The mathematical
for these for
formulas calculations are provided
these calculations in Equations
are provided (9)–(11). (9)–(11).
in Equations
Q𝑄==X r𝑋W𝑊
Q (9)
(9)
𝐾 = 𝑋 𝑊 (10)
K = Xr W K (10)
𝑉 = 𝑋 𝑊 (11)
V = Xr W V (11)
Next, the relationship of attention between different regions is established by con-
Next,athe
structing relationship
directed of attention
graph and between
determining different regions
the connected regions is
forestablished
each givenby con-
region.
structing a directed graph and determining the connected regions for each given region.
The specific process involves the following steps: For each region, the Q and V compo-
nents are subjected to region averaging, producing the region-level counterparts Qr and
2
K r ∈ RS ×C . Next, the dot product of Qr and K r is computed to generate the adjacency
Sensors 2023, 23, 8374 11 of 24
2 2
matrix Ar ∈ RS ×S . This matrix gauges the correlation among different regions, and its
formulation is presented in Equation (12).
Ar = Qr ( K r ) T (12)
Thereafter, the matrix Ar undergoes pruning, where the least relevant token in Ar
is removed, operating at a coarse level. This results in the retention of the top k most
relevant regions within Ar , leading to the derivation of the routing index matrix denoted
2
as I r ∈ N S ×k . The mathematical formulation for this process is depicted in Equation (13).
I r = topkIndex ( A r
(13)
The architecture of the BiFormer block is derived from the Bi-Level Routing Attention
concept, illustrated in Figure 3a. Within this block, DWConv represents deep separable
convolution, an operation that diminishes the model’s parameter count and computational
load. LN signifies the application of layer normalization, a technique that expedites training
and enhances the model’s ability to generalize. A multilayer perceptron is represented by
the acronym MLP, and it serves to further fine-tune and modify attention weights in order
to enhance the model’s emphasis on specific features. In Figure 3b, the addition symbol
signifies the linkage of two feature vectors.
Incorporating the BiFormer block into the backbone network constitutes a key aspect
of this research. This addition infuses the model with a dynamic attention mechanism
that heightens its emphasis on vital object-related details, thereby augmenting the overall
efficacy of object detection. To utilize the potential of this efficient attention mechanism,
the BiFormer block is strategically positioned between B3 and B4, effectively supplanting
the previously existing C2f block.
4.Architecture
Figure 4.
Figure Architectureofofthe GSConv
the model.
GSConv model.
However,
Here, an all-encompassing
Conv2d symbolizes the integration
two-dimensionalof GSConv throughout
convolution all stages
applied of the
to the model
input image
would lead to a substantial escalation in the model’s layer computation, subsequently
Xinput , bn denotes the normalization operation, σ signifies the activation function, extend-
L
de-
ing thethe
notes inference durationof
concatenation required
the twofor rapid smoketypes,
convolution targetand
detection. As a result,
the ultimate it is advisable
δ signifies shuffling,
to restrict
with the use
the intent ofof GSConvthe
deriving to alast
single
outputstage. Within
Xout the network
through architecture
this shuffling of YOLOv8,
process.
particularly in the backbone layer, where a significant amount
However, an all-encompassing integration of GSConv throughout all of convolution is essential
stages for
of the
extracting sufficient smoke-related features, preserving the substantial inter-channel
model would lead to a substantial escalation in the model’s layer computation, subsequently correla-
tion inherent
extending thetoinference
standard duration
convolution is crucial.
required for rapid smoke target detection. As a result, it is
Through the replacement of standard convolution with GSConv, an endeavor fo-
advisable to restrict the use of GSConv to a single stage. Within the network architecture of
cused on diminishing computational intricacies and parameter count, a more pronounced
YOLOv8, particularly in the backbone layer, where a significant amount of convolution
acceleration can be achieved in real-time execution. The incoming smoke image under-
is essential for extracting sufficient smoke-related features, preserving the substantial
goes consecutive GSConv convolutions, and each shuffling operation adeptly
inter-channel correlation inherent to standard convolution is crucial.
Through the replacement of standard convolution with GSConv, an endeavor focused
on diminishing computational intricacies and parameter count, a more pronounced ac-
celeration can be achieved in real-time execution. The incoming smoke image undergoes
consecutive GSConv convolutions, and each shuffling operation adeptly amalgamates
smoke feature maps from distinct channels with a diminished parameter count, thus
approximating the outcome of standard convolution.
amalgamates smoke feature maps from distinct channels with a diminished parameter
count, thus approximating the outcome of standard convolution.
Figure 5. Illustrative
Figure 5. Illustrative samples
samples from
from the
the forest
forest fire
fire smoke
smoke dataset
dataset include:
include: (a)
(a) instances
instances of
of small
small smoke
smoke
with
with concentrated attention at the center and reduced attention at the edges; (b) varying of
concentrated attention at the center and reduced attention at the edges; (b) varying sizes large
sizes of
largemedium
and and medium
smoke smoke occurrences;
occurrences; (c) non-smoke
(c) non-smoke pictures pictures taken
taken under underweather
diverse diversesituations
weather situa-
such
tions
as suchand
cloudy as cloudy
sunny; and
and sunny; and (d)with
(d) instances instances with low
low smoke smoke
density, density,
posing posing in
challenges challenges
discerning in
attributes such as edges, textures, and color. This collection offers a representation of smoke scenarios
encountered in natural environments.
Figure 5a displays images containing small-sized smoke instances, where the concen-
tration is high at the center and low at the edges, presenting challenges in determining
the smoke’s area. Conversely, Figure 5b shows medium and large wildfire smoke images.
Figure 5c provides non-smoke images taken under diverse weather conditions, such as
Sensors 2023, 23, 8374 14 of 24
cloudy and sunny. Additionally, Figure 5d illustrates an image with a low smoke con-
centration where properties such as the edges of the smoke, texture, and color are not
prominently discernible. Generally, the variation in smoke appearance and quantity in
natural environments poses a challenge for conventional smoke detection systems. Con-
sequently, the development of a wildfire smoke detection method capable of effectively
identifying diverse smoke forms originating from natural sources is crucial.
The effective performance of a deep learning model hinges on the availability of a
substantial quantity of well-labeled training data. However, achieving reliable outcomes
for wildfire smoke detection using such datasets can prove challenging due to issues
such as overfitting, class imbalance, or insufficient data. Overfitting, characterized by a
model’s failure to accurately capture visual patterns, is a potential concern. To address
this, image data augmentation, involving the manipulation and reuse of existing images
to enhance model accuracy, was employed as a remedy. Insights garnered from pertinent
literature [55,56] underscore the significance of geometric modifications, encompassing
flips and rotations, as valuable techniques for enhancing image data. By employing
strategies such as rotation and horizontal flips [57,58], the forest fire smoke detection
dataset was augmented experimentally, leading to an increase in the number of images.
The performance of CNN models is notably responsive to the quantity and quality of image
datasets utilized for training purposes.
Several modifications were introduced to each initial fire image to enhance the model’s
capacity for generalization across the spectrum of preceding training images, enabling
it to assimilate insights from a more extensive array of scenarios. These adaptations
encompassed actions such as horizontal flipping and counterclockwise rotations of 60 and
120 degrees. Moreover, the training dataset was enriched by integrating images capturing
non-smoke scenarios that share similarities with the environment, such as mountainous
terrains, cloud formations, fog, and other comparable scenes. This initiative was undertaken
to mitigate the occurrence of false positives.
To achieve our research goals, a dataset comprising 6000 images was utilized for
the purpose of detecting forest fire smoke. This dataset was partitioned into a training
subset containing 4800 images and a separate test subset comprising 1200 images. Only the
training subset underwent data augmentation procedures, aiming to augment its volume.
As outlined in Table 2, this approach led to a cumulative count of 30,000 images at the
disposal for the task of identifying forest fire smoke.
4. Experimental Results
This section provides an elaborate description of the hyperparameter settings, the
utilized test dataset, the experimental configuration, and the validation process employed
to measure the effectiveness of the improved YOLOv8 model in identifying wildfire smoke
in UAV photos. To ensure the reliability of the proposed methodology, all experiments were
conducted under consistent hardware conditions. The experimentation was carried out on
a self-assembled computer system with specific specifications, including Nvidia GeForce
1080 Ti graphics processing units, 32 GB of RAM, and a 9-core CPU running at 4.90 GHz [59],
as specified in Table 3. The input images for the enhanced YOLOv8 model were drawn from
a forest fire smoke dataset, each resized to dimensions of 640 × 640 pixels. The comprehen-
sive evaluation encompasses a diverse range of facets, covering the experimental setup and
design, YOLOv8 performance analysis, method impact assessment, model comparisons,
Sensors 2023, 23, 8374 15 of 24
ablation study, and visualization results. The table displaying the parameters utilized during
the training of the model for detecting forest smoke has been included as Table 4 in the
manuscript. This provides a clear overview of the training settings and configuration for
this specific task.
TPCij
PrecisionCij = , (19)
TPCij + FPCij
TPCij
Recall Cij = , (20)
TPCij + FN Cij
The quantity of accurately identified smoke regions is denoted as TPCij (true positives),
while instances of false positives stemming from the misclassification of non-smoke regions
as smoke are indicated as FPCij (false positives). False negatives manifest when authentic
smoke regions are erroneously classified as non-smoke regions, and they are denoted as
Sensors 2023, 23, 8374 16 of 24
FN Cij (false negatives). The computation of the average precision (APCij ) was conducted
using Equation (21) by considering these aforementioned values.
1 m
APCij = ∑ PrecisionCij , (21)
m j =1
The detection rate can be quantified as frames per second (FPS), representing the
average rate of detection in terms of images processed per second. This calculation is based
on the following formula:
1
FPS = (22)
t
Here, t determines the average time for each image. This formula allows us to com-
pute the frames per second metric, which is a crucial measure of the model’s real-time
performance in processing images.
Additionally, we assessed the model’s complexity by quantifying the number of
floating-point operations per second (FLOPS), which serves as a metric for gauging the
computational workload of the model.
achieves noteworthy results, demonstrating an average precision of 78.5% for small objects
and an impressive 92.6% AP for large objects. Typically, single-stage object detectors
tend to exhibit higher precision results compared to multiple-stage object detectors. As
depicted in Table 7, versions of the YOLO object detector [28,70] achieve the second and
third best AP results, registering scores of 76.1% and 75.2%, respectively. In contrast, single-
stage detectors such as M2Det [72] and FSAF [71] demonstrate comparatively lower AP
performance, with 60.2% and 60.5% in the results, respectively.
Table 6. Comparison results between the proposed method and multiple-stage object detectors.
Table 7. Comparison results between the proposed method and single-stage object detectors.
Exampleof
Figure6.6.Example
Figure ofqualitative
qualitativeevaluation
evaluationofofthe
theforest
forestfire
firesmoke
smokedetection
detectionmodel:
model:(a)
(a)large-size
large-size
smoke;
smoke;(b)
(b)small-size
small-sizesmoke.
smoke.
Figure 6 visually
Numerous illustrates
methodologies the efficacy
outlined of the proposed
in the existing methodology
literature have encountered forchallenges
forest fire
smoke identification, employing the enhanced YOLOv8 model, in
in effectively detecting smoke from minor wildfire incidents in images. To address this,a diverse array of forest
we
backgrounds. The robustness of the proposed technique underwent
curated a collection of photographs capturing forest fire smoke on varying scales, aiming to verification through
assessments
augment involving
the dataset and both substantial
enhance and minute
the precision of smoke smoke images.
detection. Timely
In Figure 6b,detection
smoke im- of
smoke is pivotal for forest fire prevention and containment efforts.
ages characterized by smaller dimensions are showcased. In order to identify diminutive mov- Even a minor hint
of smoke
ing entities can
whileactivate
retaininga catastrophic forestwe
intricate attributes, fireadopted
if left unchecked, endangering
a strategy influenced human
by previous
lives, natural resources, and ecosystems. Moreover, the proposed
work [9]. This approach involves amalgamating a feature map originating from a precedingapproach demonstrates
remarkable
layer precision in
with a high-scale detecting
feature map.minute smoke feature
The extensive patchesmapwithin
holdsimages.
the capacity to discern
smoke pixels exhibiting diverse dimensions, as it combines positional of
The outcomes of our study demonstrate the effective capacity the proposed
information frommethod
lower
to significantly reduce instances of false detections.
strata with intricate characteristics derived from upper layers. This efficacy translates to expedited
suppression
Figure 6and prompt
visually responsethe
illustrates durations
efficacyacross
of theaproposed
spectrum methodology
of forest fire smoke scenarios,
for forest fire
irrespective of their orientation, morphology, or scale. Traditional visual smoke and fire
smoke identification, employing the enhanced YOLOv8 model, in a diverse array of forest
detection systems tend to misclassify slight amounts of smoke sharing analogous color and
backgrounds. The robustness of the proposed technique underwent verification through
intensity attributes with the surrounding environment as actual smoke.
assessments involving both substantial and minute smoke images. Timely detection of
smoke is pivotal
4.4. Ablation Studyfor forest fire prevention and containment efforts. Even a minor hint of
smoke can activate a catastrophic forest fire if left unchecked, endangering human lives,
In order to conduct ablation analyses aimed at evaluating the efficacy of different
natural resources, and ecosystems. Moreover, the proposed approach demonstrates re-
bounding box regression loss modules, we substituted the WIoU loss module with the
markable precision in detecting minute smoke patches within images.
Generalized-IoU (GIoU), Distance-IoU (DIoU), and Complete-IoU (CIoU) loss modules.
The outcomes of our study demonstrate the effective capacity of the proposed method
The GIoU loss was introduced as a remedy for the deficiencies observed in the original IoU
to significantly reduce instances of false detections. This efficacy translates to expedited sup-
loss. In comparison to the IoU loss, the GIoU loss exhibits enhanced dynamic behavior,
pression and prompt response durations across a spectrum of forest fire smoke scenarios,
enabling it to capture the spatial arrangement between two bounding boxes even when
irrespective of their orientation, morphology, or scale. Traditional visual smoke and fire de-
the IoU is equal to zero. However, the GIoU loss is not without its limitations. For
tection
example,systems tend to misclassify
in scenarios slight amounts
where a containment of smokeexists
relationship sharing analogous
between two color and
bounding
intensity attributes with the surrounding environment as actual smoke.
boxes, the GIoU loss regresses to the IoU loss, failing to discern the relative positioning of
the boxes. Furthermore, in cases where a significant vertical directional disparity occurs
4.4. Ablation
between theStudy
two boxes, the GIoU loss demonstrates instability, potentially impeding
In order during
convergence to conduct ablation analyses
the optimization aimed
process. at evaluating
The DIoU the efficacy
loss, introduced as an of different
extension of
bounding box regression loss modules, we substituted the WIoU loss module with the
Generalized-IoU (GIoU), Distance-IoU (DIoU), and Complete-IoU (CIoU) loss modules.
The GIoU loss was introduced as a remedy for the deficiencies observed in the original
Sensors 2023, 23, 8374 19 of 24
the IoU loss, incorporates a supplementary penalty term related to the distance between the
centers of two bounding boxes. This inclusion facilitates faster model convergence during
optimization. While the DIoU loss does alleviate the gradual convergence issue associated
with the GIoU loss to some degree, it retains limitations in accurately characterizing
the overlap information between the two bounding boxes. Furthermore, even with the
DIoU loss, when the center points of the two boxes coincide entirely, both the GIoU and
DIoU losses revert to the IoU loss. The CIoU loss, an enhanced version of the DIoU loss,
integrates the aspect ratio characteristics of two bounding boxes. This augmentation enables
a more accurate representation of the spatial distance and alignment between the boxes,
consequently advancing the effectiveness and efficiency of regression. Nevertheless, it’s
worth noting that the aspect ratios employed in the CIoU loss are relative measurements,
introducing a certain level of inherent uncertainty.
In order to ascertain the effectiveness of the improved algorithm, the present research
integrated the WIoUv3 as the bounding box regression loss within the YOLOv8 model
and conducted a comprehensive analysis using the custom smoke dataset. The outcomes,
quantified through metrics such as AP, AP50, AP75, APS, APM, and APL, are presented in
Table 8 for evaluation purposes.
Table 8. Comparison results of the ablation study for bounding box regression.
Ablation studies have demonstrated that despite the robustness of the YOLOv8 object
detection model, its performance can be suboptimal in certain scenarios. These findings
Sensors 2023, 23, 8374 20 of 24
suggest that the integration of GSConv and BiFormer into the network architecture of
YOLOv8 could lead to substantial improvements in model accuracy.
6. Conclusions
The challenge of achieving robust performance in wildfire smoke detection algorithms
arises from the lack of suitable training images, leading to complications such as overfitting
and data imbalance. In this study, we present an improved YOLOv8 model customized for
wildfire smoke detection under complicated forest conditions. As shown in Table 9, these
improvements, which include features such as GSConv and BiFormer, lead to remarkable
results with an AP of 79.4%, an AP50 of 87.1%, and an AP75 of 82.4%. Consequently, the
improvements contribute to an improved AP, AP50, and AP75, representing increases
of 3.3%, 2.8%, and 5%, respectively. In the ablation analysis focused on bounding box
regression, the consistently superior performance of WIuOv3 is evident with an AP50 of
85.1%, outperforming GIuO and DIoU with AP50 values of 84.6% and 84.5%, respectively.
The experimental results highlight that the optimized YOLOv8 model outperforms both
the state-of-the-art models and the multilevel models for object detection on the specific
Sensors 2023, 23, 8374 21 of 24
smoke image dataset, achieving an APS of 71.3% and an APL of 92.6%, respectively. While
YOLOv8 achieves the second-best performance on AP75 and APL with 77.4% and 89.3%,
respectively, conventional wildland fire smoke detection sensors are reaching their limits in
terms of coverage of a limited area and ability to detect fires simultaneously. The refined
YOLOv8 approach alleviates these limitations and enables wildfire smoke detection with
geographic and material attributes.
Enhancing the diversity of wildfire smoke pictures is critical for advances in wildfire
smoke detection in natural environments. Thus, our prospective study will concentrate on
collecting a variety of images of smoke from wildfires and using techniques to improve
these images. We will also look for ways to speed up the detection process without losing
accuracy by making the model smaller. In addition, the development of robust algorithms
for use in real time under different environmental conditions is needed. In addition, the
integration of multimodal data sources, such as satellite imagery and weather data, can
improve the accuracy and reliability of recognition systems. Emphasizing these aspects
would not only improve early detection of wildfires but also contribute to effective disaster
mitigation and management strategies, thereby protecting ecosystems and human lives.
Author Contributions: Conceptualization, S.N.S. and M.M.; methodology S.N.S., A.A., M.M., O.D.
and Y.-I.C.; software, S.N.S., A.A., M.M. and O.D.; validation, S.N.S., A.A., M.M. and O.D.; formal
analysis, A.A., M.M. and O.D.; investigation, S.N.S., A.A., M.M. and O.D.; resources, S.N.S., A.A. and
M.M.; data curation, S.N.S., A.A., M.M. and O.D.; writing—original draft preparation, S.N.S., A.A.,
M.M. and O.D.; writing—review and editing, M.M., O.D. and Y.-I.C.; visualization, A.A. and M.M.;
supervision, O.D. and Y.-I.C.; project administration, Y.-I.C.; funding acquisition, Y.-I.C. All authors
have read and agreed to the published version of the manuscript.
Funding: This study was funded by the Korea Agency for Technology and Standards in 2022, project
numbers are K_G012002073401 and K_G012002236201, and by the Gachon University research fund
of 202208820001.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: Not applicable.
Conflicts of Interest: The authors declare no conflict of interest. The funders had no role in the design
of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or
in the decision to publish the results.
References
1. Hoover, K.; Hanson, L.A. Wildfire Statistics; Congressional Research Service (CRS) in Focus; Congressional Research Service (CRS):
Washington, DC, USA, 2023.
2. Xu, X.; Li, F.; Lin, Z.; Song, X. Holocene fire history in China: Responses to climate change and human activities. Sci. Total Environ.
2020, 753, 142019. [CrossRef]
3. Abdusalomov, A.B.; Islam, B.M.S.; Nasimov, R.; Mukhiddinov, M.; Whangbo, T.K. An improved forest fire detection method
based on the detectron2 model and a deep learning approach. Sensors 2023, 23, 1512. [CrossRef]
4. Hu, Y.; Zhan, J.; Zhou, G.; Chen, A.; Cai, W.; Guo, K.; Hu, Y.; Li, L. Fast forest fire smoke detection using MVMNet. Knowl.-Based
Syst. 2022, 241, 108219. [CrossRef]
5. Mukhiddinov, M.; Abdusalomov, A.B.; Cho, J. Automatic Fire Detection and Notification System Based on Improved YOLOv4 for
the Blind and Visually Impaired. Sensors 2022, 22, 3307. [CrossRef] [PubMed]
6. Avazov, K.; Mukhiddinov, M.; Makhmudov, F.; Cho, Y.I. Fire detection method in smart city environments using a deep-learning-
based approach. Electronics 2021, 11, 73. [CrossRef]
7. Zhang, F.; Zhao, P.; Xu, S.; Wu, Y.; Yang, X.; Zhang, Y. Integrating multiple factors to optimize watchtower deployment for wildfire
detection. Sci. Total Environ. 2020, 737, 139561. [CrossRef]
8. Yao, J.; Raffuse, S.M.; Brauer, M.; Williamson, G.J.; Bowman, D.M.; Johnston, F.H.; Henderson, S.B. Predicting the minimum
height of forest fire smoke within the atmosphere using machine learning and data from the CALIPSO satellite. Remote Sens.
Environ. 2018, 206, 98–106. [CrossRef]
9. Mukhiddinov, M.; Abdusalomov, A.B.; Cho, J. A Wildfire Smoke Detection System Using Unmanned Aerial Vehicle Images Based
on the Optimized YOLOv5. Sensors 2022, 22, 9384. [CrossRef] [PubMed]
Sensors 2023, 23, 8374 22 of 24
10. Fernández-Berni, J.; Carmona-Galán, R.; Martínez-Carmona, J.F.; Rodríguez-Vázquez, Á. Early forest fire detection by vision-
enabled wireless sensor networks. Int. J. Wildland Fire 2012, 21, 938. [CrossRef]
11. Ullah, F.; Ullah, S.; Naeem, M.R.; Mostarda, L.; Rho, S.; Cheng, X. Cyber-threat detection system using a hybrid approach of
transfer learning and multi-model image representation. Sensors 2022, 22, 5883. [CrossRef] [PubMed]
12. Abdusalomov, A.B.; Mukhiddinov, M.; Kutlimuratov, A.; Whangbo, T.K. Improved Real-Time Fire Warning System Based on
Advanced Technologies for Visually Impaired People. Sensors 2022, 22, 7305. [CrossRef]
13. Maruta, H.; Nakamura, A.; Kurokawa, F. A new approach for smoke detection with texture analysis and support vector machine.
In Proceedings of the International Symposium on Industrial Electronics, Bari, Italy, 4–7 July 2010; pp. 1550–1555.
14. Filonenko, A.; Hernández, D.C.; Jo, K.H. Fast smoke detection for video surveillance using CUDA. IEEE Trans. Ind. Inform. 2017,
14, 725–733. [CrossRef]
15. Tao, H.; Lu, X. Smoke Vehicle detection based on molti-feature fusion and hidden Markov model. J. Real-Time Image Process. 2019,
32, 1072–1078.
16. Zhang, Q.X.; Lin, G.H.; Zhang, Y.M.; Xu, G.; Wang, J.J. Wildland Forest Fire Smoke Detection Based on Faster R-CNN using
Synthetic Smoke Images. Procedia Eng. 2018, 211, 441–446. [CrossRef]
17. Qiang, X.; Zhou, G.; Chen, A.; Zhang, X.; Zhang, W. Forest fire smoke detection under complex backgrounds using TRPCA and
TSVB. Int. J. Wildland Fire 2021, 30, 329–350. [CrossRef]
18. Filonenko, A.; Kunianggoro, L.; Jo, K.H. Comparative study of modern convolutional neural network for smoke detection on
image data. In Proceedings of the 2017 10th International Conference on Human System Interactions (HSI), Ulsan, Republic of
Korea, 17–19 July 2017; pp. 64–68.
19. Simonyan, K.; Zisseman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556.
20. Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. In Proceedings of the
Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; pp. 1097–1105.
21. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778.
22. Yuan, F.; Shi, J.; Xia, X.; Fang, Y.; Fang, Z.; Mei, T. High-order local ternary patterns with locality preserving projection for smoke
detection and image classification. Inf. Sci. 2016, 372, 225–240. [CrossRef]
23. Li, J.; Zhou, G.; Chen, A.; Wang, Y.; Jiang, J.; Hu, Y.; Lu, C. Adaptive linear feature-reuse network for rapid forest fire smoke
detection model. Ecol. Inform. 2022, 68, 101584. [CrossRef]
24. Pan, J.; Ou, X.; Xu, L. A Collaborative Region Detection and Grading Framework for Forest Fire Smoke using weakly Supervised
Fine Segmentation and Lightweight Faster-RCNN. Forests 2021, 12, 768. [CrossRef]
25. Li, T.; Zhao, E.; Zhang, J.; Hu, C. Detection of wildfire smoke images based on a densely dilated convolutional network. Electronics
2019, 8, 1131. [CrossRef]
26. Kanand, T.; Kemper, G.; König, R.; Kemper, H. Wildfire detection and disaster monitoring system using UAS and sensor fusion
technologies. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2020, 43, 1671–1675. [CrossRef]
27. Rahman, E.U.; Kha, M.A.; Algarni, F.; Zhang, Y.; Irfan Uddin, M.; Ullah, I.; Ahmad, H.I. Computer vision-based wildfire smoke
detection using UAVs. Math. Probl. Eng. 2021, 2021, 9977939. [CrossRef]
28. Jocher, G. YOLOv8. Ultralytics: Github. 2023. Available online: https://github.com/ultralytics/ultralytics (accessed on 10 May
2023).
29. Tong, Z.; Chen, Y.; Xu, Z.; Yu, R. Wise-IoU: Bounding Box Regression Loss with Dynamic Focusing Mechanism. arXiv 2023,
arXiv:2301.10051.
30. Zhu, L.; Wang, X.; Ke, Z.; Zhang, W.; Lau, R.W. BiFormer: Vision Transformer with Bi-Level Routing Attention. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 10323–10333.
31. Li, H.; Li, J.; Wei, H.; Liu, Z.; Zhan, Z.; Ren, Q. Slim-neck by GSConv: A better design paradigm of detector architectures for
autonomous vehicles. arXiv 2022, arXiv:2206.02424.
32. Töreyin, B.U.; Dedeoğlu, Y.; Güdükbay, U.; Cetin, A.E. Computer vision based method for real-time fire and flame detection.
Pattern Recognit. Lett. 2006, 27, 49–58. [CrossRef]
33. Chen, T.H.; Wu, P.H.; Chiou, Y.C. An early fire-detection method based on image processing. In Proceedings of the 2004
International Conference on Image Processing ICIP’04, Singapore, 24–27 October 2004; pp. 1707–1710.
34. Dang-Ngoc, H.; Nguyen-Trung, H. Aerial forest fire surveillance-evaluation of forest fire detection model using aerial videos. In
Proceedings of the 2019 International Conference on Advanced Technologies for Communications (ATC), Hanoi, Vietnam, 17–19
October 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 142–148.
35. Ghosh, R.; Kumar, A. A hybrid deep learning model by combining convolutional neural network and recurrent neural network
to detect forest fire. Multimed. Tools Appl. 2022, 81, 38643–38660. [CrossRef]
36. Sankarasubramanian, P.; Ganesh, E.N. Artificial Intelligence-Based Detection System for Hazardous Liquid Metal Fire. In
Proceedings of the 2021 8th International Conference on Computing for Sustainable Global Development (INDIACom), New
Delhi, India, 17–19 March 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6.
37. Chen, Y.; Xu, W.; Zuo, J.; Yang, K. The fire recognition algorithm using dynamic feature fusion and IV-SVM classifier. Clust.
Comput. 2019, 22, 7665–7675. [CrossRef]
Sensors 2023, 23, 8374 23 of 24
38. Xie, Y.; Zhu, J.; Cao, Y.; Zhang, Y.; Feng, D.; Zhang, Y.; Chen, M. Efficient video fire detection exploiting motion-flicker-based
dynamic features and deep static features. IEEE Access 2020, 8, 81904–81917. [CrossRef]
39. Abdusalomov, A.; Baratov, N.; Kutlimuratov, A.; Whangbo, T.K. An improvement of the fire detection and classification method
using YOLOv3 for surveillance systems. Sensors 2021, 21, 6519. [CrossRef]
40. Khan, S.; Khan, A. Ffirenet: Deep learning based forest fire classification and detection in smart cities. Symmetry 2022, 14, 2155.
[CrossRef]
41. Jeon, M.; Choi, H.S.; Lee, J.; Kang, M. Multi-scale prediction for fire detection using convolutional neural network. Fire Technol.
2021, 57, 2533–2551. [CrossRef]
42. Norkobil Saydirasulovich, S.; Abdusalomov, A.; Jamil, M.K.; Nasimov, R.; Kozhamzharova, D.; Cho, Y.I. A YOLOv6-based
improved fire detection approach for smart city environments. Sensors 2023, 23, 3161. [CrossRef]
43. Khan, A.; Khan, S.; Hassan, B.; Zheng, Z. CNN-based smoker classification and detection in smart city application. Sensors 2022,
22, 892. [CrossRef] [PubMed]
44. Talaat, F.M.; ZainEldin, H. An improved fire detection approach based on YOLO-v8 for smart cities. Neural Comput. Appl. 2023, 5,
20939–20954. [CrossRef]
45. Liu, G.; Yuan, H.; Huang, L. A fire alarm judgment method using multiple smoke alarms based on Bayesian estimation. Fire Saf. J.
2023, 136, 103733. [CrossRef]
46. Redmon, J.; Farhadi, A. YOLOv3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767.
47. He, K.; Zhang, X.; Ren, S.; Sun, J. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition. IEEE Trans.
Pattern Anal. Mach. Intell. 2015, 37, 1904–1916. [CrossRef]
48. Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8759–8768.
49. Zheng, Z.; Wang, P.; Liu, W.; Li, J.; Ye, R.; Ren, D. Distance-IoU loss: Faster and better learning for bounding box regression. In
Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; pp. 12993–13000.
50. Li, X.; Wang, W.; Wu, L.; Chen, S.; Hu, X.; Li, J.; Tang, J.; Yang, J. Generalized Focal Loss: Learning Qualified and Distributed
Bounding Boxes for Dense Object Detection. arXiv 2020, arXiv:2006.04388.
51. Feng, C.; Zhong, Y.; Gao, Y.; Scott, M.R.; Huang, W. TOOD: Task-Aligned One-Stage Object Detection. In Proceedings of the 2021
IEEE International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 3490–3499.
52. High Performance Wireless Research and Education Network. Education Network University of California San Diego. HPWREN
Dataset. 2023. Available online: http://hpwren.ucsd.edu/HPWREN-FIgLib/ (accessed on 12 June 2023).
53. Kim, S.-Y.; Muminov, A. Forest Fire Smoke Detection Based on Deep Learning Approaches and Unmanned Aerial Vehicle Images.
Sensors 2023, 23, 5702. [CrossRef]
54. Jeong, C.; Jang, S.-E.; Na, S.; Kim, J. Korean Tourist Spot Multi-Modal Dataset for Deep Learning Applications. Data 2019, 4, 139.
[CrossRef]
55. Mukhiddinov, M.; Muminov, A.; Cho, J. Improved Classification Approach for Fruits and Vegetables Freshness Based on Deep
Learning. Sensors 2022, 22, 8192. [CrossRef] [PubMed]
56. Tang, Y.; Li, B.; Liu, M.; Chen, B.; Wang, Y.; Ouyang, W. Autopedestrian: An automatic data augmentation and loss function
search scheme for pedestrian detection. IEEE Trans. Image Proc. 2021, 30, 8483–8496. [CrossRef] [PubMed]
57. Seydi, S.T.; Saeidi, V.; Kalantar, B.; Ueda, N.; Halin, A.A. Fire-Net: A deep learning framework for active forest fire detection. J.
Sens. 2022, 2022, 8044390. [CrossRef]
58. Lin, J.; Lin, H.; Wang, F. STPM_SAHI: A Small-Target Forest fire detection model based on Swin Transformer and Slicing Aided
Hyper inference. Forests 2022, 13, 1603. [CrossRef]
59. Mukhiddinov, M.; Djuraev, O.; Akhmedov, F.; Mukhamadiyev, A.; Cho, J. Masked Face Emotion Recognition Based on Facial
Landmarks and Deep Learning Approaches for Visually Impaired People. Sensors 2023, 23, 1080. [CrossRef]
60. Mukhiddinov, M.; Jeong, R.G.; Cho, J. Saliency cuts: Salient region extraction based on local adaptive thresholding for image
information recognition of the visually impaired. Int. Arab J. Inf. Technol. 2020, 17, 713–720. [CrossRef]
61. Peng, C.; Xiao, T.; Li, Z.; Jiang, Y.; Zhang, X.; Jia, K.; Yu, G.; Sun, J. Megdet: A large mini-batch object detector. In Proceedings of
the IEEE conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6181–6189.
62. Girshick, R. Fast R-CNN. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13
December 2015; pp. 1440–1448.
63. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the IEEE International Conference on Computer
Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969.
64. Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra r-cnn: Towards balanced learning for object detection. In Proceedings
of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830.
65. Tychsen-Smith, L.; Petersson, L. Denet: Scalable real-time object detection with directed sparse sampling. In Proceedings of the
IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 428–436.
66. Cai, Z.; Vasconcelos, N. Cascade r-cnn: Delving into high quality object detection. In Proceedings of the IEEE Conference on
Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6154–6162.
67. Zhu, Y.; Zhao, C.; Wang, J.; Zhao, X.; Wu, Y.; Lu, H. Couplenet: Coupling global structure with local parts for object detection. In
Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 4126–4134.
Sensors 2023, 23, 8374 24 of 24
68. Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934.
69. Jocher, G. YOLOv5. Ultralytics: Github. 2022. Available online: https://github.com/ultralytics/yolov5 (accessed on 12 June
2023).
70. Wang, C.Y.; Bochkovskiy, A.; Liao, H.Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object
detectors. arXiv 2022, arXiv:2207.02696.
71. Zhu, C.; He, Y.; Savvides, M. Feature selective anchor-free module for single-shot object detection. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 840–849.
72. Zhao, Q.; Sheng, T.; Wang, Y.; Tang, Z.; Chen, Y.; Cai, L.; Ling, H. M2det: A single-shot object detector based on a multi-level
feature pyramid network. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1
February 2019; Volume 33, pp. 9259–9266.
73. Tan, M.; Pang, R.; Le, Q.V. Efficientdet: Scalable and efficient object detection. In Proceedings of the IEEE/CVF Conference on
Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 10781–10790.
74. Zhang, S.; Wen, L.; Bian, X.; Lei, Z.; Li, S.Z. Single-shot refinement neural network for object detection. In Proceedings of the IEEE
Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4203–4212.
75. Liu, W.; Anguelov, D.; Erhan, D.; Szegedy, C.; Reed, S.; Fu, C.-Y.; Berg, A.C. Ssd: Single shot multibox detector. In Proceedings of
the European Conference on Computer Vision, Amsterdam, The Netherlands, 8–16 October 2016; pp. 21–37.
76. Ghiasi, G.; Lin, T.Y.; Le, Q.V. Nas-fpn: Learning scalable feature pyramid architecture for object detection. In Proceedings of the
IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7036–7045.
77. Khan, S.; Muhammad, K.; Hussain, T.; Del Ser, J.; Cuzzolin, F.; Bhattacharyya, S.; Akhtar, Z.; De Albuquerque, V.H.C. Deepsmoke:
Deep learning model for smoke detection and segmentation in outdoor environments. Expert Syst. Appl. 2021, 182, 115125.
[CrossRef]
78. Deng, L.; Yang, M.; Li, T.; He, Y.; Wang, C. RFBNet: Deep multimodal networks with residual fusion blocks for RGB-D semantic
segmentation. arXiv 2019, arXiv:1907.00135.
79. Lin, T.Y.; Goyal, P.; Girshick, R.; He, K.; Dollár, P. Focal loss for dense object detection. In Proceedings of the IEEE International
Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2980–2988.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.