Location via proxy:   [ UP ]  
[Report a bug]   [Manage cookies]                

Hierarchical Framework for Optimizing Wildfire Surveillance and Suppression using Human-Autonomous Teaming

Mahdi Al-Husseini Stanford Intelligent Systems Laboratory, Stanford, CA, 94305 111MS Student, Aeronautics & Astronautics Dept., Stanford University, Stanford, CA 94305, Student Member. Kyle H. Wray Stanford Intelligent Systems Laboratory, Stanford, CA, 94305 222Visiting Scholar, Aeronautics & Astronautics Dept., Stanford University, Stanford, CA 94305, Member. and Mykel J. Kochenderfer 333Associate Professor, Aeronautics & Astronautics Dept., Stanford University, Stanford, CA 94305, Associate Fellow. Stanford Intelligent Systems Laboratory, Stanford, CA, 94305
Abstract

The integration of manned and unmanned aircraft can help improve wildfire response. Wildfire containment failures occur when resources available to first responders, who execute the initial stages of wildfire management referred to as the initial attack, are ineffective or insufficient. Initial attack surveillance and suppression models have linked action spaces and objectives, making their optimization computationally challenging. The initial attack may be formulated as a multi-agent partially observable Markov decision process (MPOMDP). We divide the initial attack MPOMDP into surveillance and suppression processes with their respective planners operating on different, but constant, time scales. A hierarchical framework iterates between surveillance and suppression planners while also providing collision avoidance. This framework is exemplified by a set of multi-rotor unmanned aircraft surveying an initial attack fire while a manned helicopter suppresses the fire with a water bucket. Wildfire-specific solver extensions are formulated to reduce the otherwise vast action spaces. The hierarchical framework outperforms firefighting techniques and a myopic baseline by up to 242% for moderate wildfires and 60% for rapid wildfires when simulated in abstracted and actual case studies. We also validate the early dispatching of additional suppression assets using regression models to ensure wildfire containment to thresholds established by wildfire agencies.

Nomenclature

x𝑥xitalic_x  = wildfire cell
F(x)𝐹𝑥F(x)italic_F ( italic_x )  = fuel remaining in x𝑥xitalic_x
β(x)𝛽𝑥\beta(x)italic_β ( italic_x )  = fuel reduced in x𝑥xitalic_x due to suppressive activities at T𝑇Titalic_T
(x)𝑥\mathcal{B}(x)caligraphic_B ( italic_x )  = whether x𝑥xitalic_x is on fire in belief map \mathcal{B}caligraphic_B
𝒰(x)𝒰𝑥\mathcal{U}(x)caligraphic_U ( italic_x )  = uncertainty regarding state of x𝑥xitalic_x in belief map \mathcal{B}caligraphic_B
R(x)𝑅𝑥R(x)italic_R ( italic_x )  = resources on x𝑥xitalic_x at time t𝑡titalic_t = 0
𝒟(x)𝒟𝑥\mathcal{D}(x)caligraphic_D ( italic_x )  = instantaneous destruction by wildfire on x𝑥xitalic_x in belief map \mathcal{B}caligraphic_B with respect to R(x)𝑅𝑥R(x)italic_R ( italic_x )
𝒲(x)𝒲𝑥\mathcal{W}(x)caligraphic_W ( italic_x )  = whether x𝑥xitalic_x is on fire in the actual wildfire map 𝒲𝒲\mathcal{W}caligraphic_W
t𝑡titalic_t, dtsubscript𝑑𝑡d_{t}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT  = time-step such that t𝑡titalic_t = 0 is the incidence of the wildfire, duration of t𝑡titalic_t
k𝑘kitalic_k  = function of distance between wildfire and suppression source and manned aircraft cruising speeds
T𝑇Titalic_T, dTsubscript𝑑𝑇d_{T}italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT  = time-step such that T𝑇Titalic_T = 1 denotes the first act of wildfire suppression, duration of T𝑇Titalic_T equal to kdt𝑘subscript𝑑𝑡k*d_{t}italic_k ∗ italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
α𝛼\alphaitalic_α  = expended unit of fuel in cell x𝑥xitalic_x in t𝑡titalic_t
δ(x)𝛿𝑥\delta(x)italic_δ ( italic_x )  = probability that x𝑥xitalic_x is suppressed in T𝑇Titalic_T
P(x)𝑃𝑥P(x)italic_P ( italic_x )  = probability that x𝑥xitalic_x ignites in t𝑡titalic_t
p(x,x)𝑝𝑥superscript𝑥p(x,x^{\prime})italic_p ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT )  = probability that xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ignites x𝑥xitalic_x at t+1𝑡1t+1italic_t + 1
FTsubscript𝐹𝑇F_{T}italic_F start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, PTsubscript𝑃𝑇P_{T}italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT  = set of x𝑥xitalic_x fully suppressed at T𝑇Titalic_T, set of x𝑥xitalic_x partially suppressed at T𝑇Titalic_T
γFsubscript𝛾𝐹\gamma_{F}italic_γ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT, γPsubscript𝛾𝑃\gamma_{P}italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT  = fuel reduced for a fully suppressed cell, fuel reduced for a partially suppressed cell
Vesubscript𝑉𝑒V_{e}italic_V start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT  = set of x𝑥xitalic_x surveyed given action e𝑒eitalic_e
Rosubscript𝑅𝑜R_{o}italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT  = surveillance reward for belief map updates
Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT  = penalty for unmanned aircraft proximity to one another
Dusubscript𝐷𝑢D_{u}italic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT  = distance between unmanned aircraft below which Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT is incurred
Pmsubscript𝑃𝑚P_{m}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT  = penalty for unmanned aircraft proximity to manned aircraft
Dmsubscript𝐷𝑚D_{m}italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT  = distance between unmanned aircraft and manned aircraft below which Pmsubscript𝑃𝑚P_{m}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT is incurred
Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT  = penalty for distance between unmanned aircraft and IAO𝐼subscript𝐴𝑂IA_{O}italic_I italic_A start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT
(Xu1,Yu1,Zu1)subscript𝑋subscript𝑢1subscript𝑌subscript𝑢1subscript𝑍subscript𝑢1(X_{u_{1}},Y_{u_{1}},Z_{u_{1}})( italic_X start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT )  = location in x𝑥xitalic_x, y𝑦yitalic_y, z𝑧zitalic_z of first unmanned aircraft U1subscript𝑈1U_{1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT
(Xu2,Yu2,Zu2)subscript𝑋subscript𝑢2subscript𝑌subscript𝑢2subscript𝑍subscript𝑢2(X_{u_{2}},Y_{u_{2}},Z_{u_{2}})( italic_X start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT )  = location in x𝑥xitalic_x, y𝑦yitalic_y, z𝑧zitalic_z of second unmanned aircraft U2subscript𝑈2U_{2}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT
𝒮𝒮\mathcal{S}caligraphic_S  = surveillance state composed of (Xu1,Yu1,Zu1,Xu2,Yu2,Zu2)subscript𝑋subscript𝑢1subscript𝑌subscript𝑢1subscript𝑍subscript𝑢1subscript𝑋subscript𝑢2subscript𝑌subscript𝑢2subscript𝑍subscript𝑢2(X_{u_{1}},Y_{u_{1}},Z_{u_{1}},X_{u_{2}},Y_{u_{2}},Z_{u_{2}})( italic_X start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_X start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT , italic_Z start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT )
e/E𝑒𝐸e/Eitalic_e / italic_E  = surveillance action / set of surveillance actions
\mathcal{H}caligraphic_H  = suppression state composed of (Xm,Ym,DRsubscript𝑋𝑚subscript𝑌𝑚𝐷𝑅X_{m},Y_{m},DRitalic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_D italic_R)
(Xm,Ym)subscript𝑋𝑚subscript𝑌𝑚(X_{m},Y_{m})( italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT )  = location in x𝑥xitalic_x, y𝑦yitalic_y of water drop
DR𝐷𝑅DRitalic_D italic_R  = drop type
2|PT|superscript2subscript𝑃𝑇2^{|P_{T}|}2 start_POSTSUPERSCRIPT | italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT  = all outcomes of the partially suppressed set (218superscript2182^{18}2 start_POSTSUPERSCRIPT 18 end_POSTSUPERSCRIPT)
a/A𝑎𝐴a/Aitalic_a / italic_A  = suppression action / set of suppression actions
RMsubscript𝑅𝑀R_{M}italic_R start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT  = suppression reward for localized resource destruction minimization model
PMsubscript𝑃𝑀P_{M}italic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT  = suppression penalty for global resource destruction minimization model
AOA𝐴𝑂𝐴AOAitalic_A italic_O italic_A  = manned aircraft axis of advance
IAO𝐼subscript𝐴𝑂IA_{O}italic_I italic_A start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT  = initial attack fire origin
\mathcal{R}caligraphic_R  = historical wildfire ring array

1 Introduction

\lettrine

Each year since 2000, an average of 70,600 wildfires have burned through a cumulative seven million acres, resulting in tens of billions of dollars in damages and thousands of lives lost. Wildfires in 2020 alone were responsible for more than 17,000 destroyed structures and 3,500 fatalities [1]. The initial attack occurs when a first set of dispatched assets responds to an incipient wildfire [2]. Initial attack effectiveness significantly influences the outcome of a wildfire’s eventual containment, placing notable burden on the part of first responders [3]. The California Department of Forestry and Fire Protection (CALFIRE) and others define initial attack response success as maintaining 95% of wildfires under 10 acres [4]. The integration of manned and unmanned aircraft as part of an initial attack is introduced to improve wildfire visibility and minimize initial attack fire damage, while further serving to ensure fire containment to 10 acres through the early requisition of additional suppression assets.
Although there exists ample literature on the optimized maneuver of distributed unmanned aircraft to conduct tactical wildfire management, there is less focus on hierarchical coordination of surveillance and suppression operations, and even less research integrating both manned and unmanned aircraft. To our knowledge, this paper is the first to do all three: optimize initial attack surveillance and suppression activities using a hierarchical framework of asynchronous planners to support a human-autonomous aircraft team.

Refer to caption
Figure 1: Manned and unmanned aircraft coordinating to survey and suppress an initial attack fire.

Human-autonomous teaming involves manned and unmanned agents collaborating to optimize mission execution [5]. Teaming in aviation applications frees flight crews to focus on complex mission-essential tasks, such as wildfire suppression, while allowing critical but routine secondary tasks, such as wildfire surveillance, to be automated [6]. Despite the opportunity presented by unmanned aircraft, manned aircraft are expected to play a critical role in wildfire response into the foreseeable future. CALFIRE, the premier firefighting aviation program in the world with more firefighting aircraft than any other, continues to rely almost exclusively on manned aircraft for surveillance and suppression [7]. Integrating low and medium-altitude unmanned aircraft into existing manned fleets provides immediately beneficial and realistic policies that may enhance wildfire control efforts [8]. However, special attention must be paid to industry-standard aviation tactics, techniques, and procedures, and airspace models, to ensure collision avoidance [9].

Multi-agent problems with asynchronous actions and high-dimensional state spaces can be made tractable by using a hierarchical approach to planning [10]. A valuable characteristic of heterogeneous multi-agent models is that action abstraction may be held constant by agent type. In our wildfire scenario, manned aircraft are provided optimal suppression locations but not guided to them. The process by which manned aircraft arrive at those suppression locations, through underlying primitive actions, is based on aerial firefighting standard operating procedures. Unmanned aircraft are explicitly guided to optimal surveillance locations through primitive actions that also satisfy collision avoidance requirements. A single manned aircraft action may occur over the course of multiple unmanned aircraft actions, resulting in an asynchronous action space. All aircraft share a single belief state, and possess each other’s location. We decompose the initial attack multi-agent partially observable Markov decision process (MPOMDP) into separate surveillance and suppression partially observable Markov decision processes (POMDPs) operating on different time scales with distinct but related reward functions. The POMDPs are simplified into Markov decision processes (MDPs) using shared belief and uncertainty maps. The MDPs are subsequently coupled in a manner exploiting the known structure of an initial attack and respecting collision avoidance requirements. This framework may be adapted to optimize other aerospace teaming applications related to attack, search and rescue, and medical evacuation operations.

This paper’s contributions include: (1) an adaptable, hierarchical framework for the manned and unmanned multi-agent, multi-objective, partially-observable initial attack problem featuring a high-dimensional state and action space, (2) several Monte Carlo tree search (MCTS) extensions informed by wildfire domain knowledge to include three suppression action space restriction mechanisms (ASRs), surveillance and suppression reward models, and an internal wildfire propagation model, (3) simulations demonstrating the efficacy of (1) and (2) given various wind conditions, elevation profiles, and resource topologies, and compared against firefighting techniques and a myopic baseline, and (4) a method for the early-dispatching of additional wildfire suppression assets to meet the 10 acre containment standard established by CALFIRE and other wildfire agencies. This framework, solver extensions, and methods validate the use of MDPs to optimize initial attack operations through the provision of collision avoidance, increased flexibility in suppression, and real-time wildfire insights.

2 Related Work

This section reviews related research that applies distributed aircraft coordination frameworks, human-autonomous teaming models, hierarchical structures for multi-agent systems, or some combination of the three, to the wildfire surveillance and/or suppression problem. Further considered is the use of reinforcement learning or probabilistic search techniques to make informed decisions. We especially highlight the technical contributions of Seraj et al., which exist at the confluence of distributed aircraft coordination and hierarchical structures for multi-agent systems, for the joint wildfire surveillance and suppression problem [11]. Similarities and differences with our paper are discussed.

Distributed Aircraft Coordination Frameworks

The academic literature provides several examples of distributed unmanned aircraft coordination frameworks that support either wildfire surveillance or suppression (but rarely both). Julian and Kochenderfer demonstrate how deep reinforcement learning may be used to coordinate multiple autonomous fixed-wing aircraft to accurately track wildfire front expansion [12]. Pham et al. similarly introduce a distributed framework for controlling a set of quadcopters to chart a wildfire’s progression while avoiding in-flight collisions [13]. Ghamry and Zhang divide the wildfire surveillance problem into three stages: search, confirmation, and observation. In the search stage, the unmanned aircraft team uses a leader-follower approach and moves in geometric formation to the wildfire. On arrival, the unmanned aircraft distribute in accordance with a generated elliptical fire front perimeter [14]. Griffith et al. compare the use of MCTS and mathematical optimization (MO) as applied to the allocation of wildfire suppression teams. MO models the MDP as a mixed-integer linear optimization problem, then applies a commercial solver to determine feasible solutions [15].

Human-Autonomous Teaming Models

At its simplest, the relationship between man and machine in a system may be categorized as human-in-the-loop, human-on-the-loop, or human-out-of-the-loop. Human-in-the-loop systems require human approval prior to action by the unmanned agent. Human-on-the-loop systems involve the human receiving updates from the unmanned agent, and the human in turn providing guidance to the unmanned agent. Human-out-of-the-loop systems have an unmanned agent that acts independently, albeit with initial or occasional guidance from the human [16]. There are surprisingly few research efforts that integrate man and machine in support wildfire management. The academic literature has largely prioritized fully autonomous teams conducting surveillance and suppression operations. There are exceptions. Bjurling et al. consider human-in-the-loop operator control over a swarm of unmanned wildfire surveillance aircraft [17]. Human-autonomous teaming may be defined as manned and unmanned agents working interdependently to accomplish a common goal [5]. Symbiotic autonomy is a form of teaming that accomplishes complex tasks by distributing sub-tasks and sharing information across multiple agents and agent groups [18]. In this framework, human and autonomous agents may act asynchronously to execute individual sub-tasks that enhance or inform each other’s efforts [19], and may be supported by a “smart environment” which provides shared understanding across the team and thereby improve performance. Seraj and Gombolay introduce a distributed control framework for a team of unmanned aircraft conducting human-centered surveillance of wildfires and transmitting high-fidelity fire front observations to on-ground firefighters [20].

Hierarchical Structures for Multi-Agent Systems

A hierarchical representation of tasks can enable decision making across different levels of temporal abstraction for complex, multi-agent systems. Macro-actions are temporally extended actions that can be incorporated into multi-agent decision problems to overcome high-dimension state spaces. The use of macro-actions can result in cooperative agents acting asynchronously. Menda et al. introduce an algorithm that modifies policy gradient estimators for macro-actions, permitting policy optimization in models where agents act asynchronously. They successfully apply this algorithm to the cooperative multi-agent wildfire suppression problem [21]. Hierarchical representation is also a framework on which to scale reinforcement learning to large domains [22]. The category of hierarchical reinforcement learning (HRL) algorithms is large and diverse. With the advent of deep reinforcement learning, HRL has branched into options [23] and sub-goal [24] methods. Similarly, there exist several state-of-the-art multi-agent deep reinforcement learning (MARL) algorithms to include MADDPG [25], COMA [26], and QMIX [27]. An in-depth discussion of HRL and MARL algorithms is beyond of the scope of this paper. We do however highlight the work of Xu et al. in developing the HierArchical Value dEcompositioN (HAVEN) framework for solving decentralized partially-observable Markov decision process (Dec-POMDP) problems, one of the few algorithms straddling both HRL and MARL [28].

Seraj et al. apply a hierarchical framework of decision-making modules to a perception-action composite team of heterogeneous aircraft both surveying and suppressing a wildfire [11]. They model the environment as a multi-agent partially-observable Semi-Markov decision process (MAPOSMDP). A high-level module assigns specialized surveillance tasks to a set of perception UAVs, while a low-level module coordinates a control and planning framework through which all UAVs execute their assigned tasks. A novel reinforcement learning algorithm is proposed that employs a variant of the State-Action-Reward-State-Action (SARSA) algorithm tailored to multi-agent problems. This hierarchical design lends itself to “prolific cooperation between perception and action agents”. We instead introduce a hierarchical framework that encourages cooperation between perception agents (unmanned surveillance aircraft), which in turn unilaterally support actions agents (manned suppression aircraft). Thus, the perception agents are a complement to the action agents. Action agents are modeled as non-cooperative interacting entities [29], and although provided suppression guidance, are otherwise uncontrolled. Ultimately, we seek to minimize the extent to which unmanned aircraft disrupt manned aircraft operations. We divide the initial attack MPOMDP formulation into surveillance and suppression decision processes operating on different time scales, and share information between their planners to ensure collision avoidance and fused information collection. Each sub-problem is resolved using MCTS with several domain-specific extensions. We emphasize the integration of manned aircraft conducting suppression operations, and demonstrate how certain tactics, techniques, and procedures employed by those aircraft can be supported, or at least avoided, by an unmanned aircraft network.

3 Problem Statement

We begin by formulating the initial attack problem as a multi-agent partially observable Markov decision process (MPOMDP). The MPOMDP is a generalization of the MDP in which multiple agents, each unable to fully observe the underlying world state, collaborate and communicate freely towards one or more shared objectives [30]. The initial attack MPOMDP is represented by =α,𝒮,𝒜,P,O,Ω,γ,R𝛼𝒮𝒜𝑃𝑂Ω𝛾𝑅\mathcal{M}=\langle\alpha,\mathcal{S,A},P,O,\Omega,\gamma,R\ranglecaligraphic_M = ⟨ italic_α , caligraphic_S , caligraphic_A , italic_P , italic_O , roman_Ω , italic_γ , italic_R ⟩, where α𝛼\alphaitalic_α is the number of agents, 𝒮𝒮\mathcal{S}caligraphic_S is the state space, 𝒜𝒜\mathcal{A}caligraphic_A is the action space, P𝑃Pitalic_P is the transition model, O𝑂Oitalic_O is the observation function, ΩΩ\Omegaroman_Ω is the observation space, γ𝛾\gammaitalic_γ is the discount factor, and R𝑅Ritalic_R is the reward function [31], [29]. Tailoring \mathcal{M}caligraphic_M to the initial attack, take the number of agents to equal the number of participating unmanned and manned aircraft, or α𝛼\alphaitalic_α = αusubscript𝛼𝑢\alpha_{u}italic_α start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT + αmsubscript𝛼𝑚\alpha_{m}italic_α start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT. Each possible state stsuperscript𝑠𝑡s^{t}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT \in 𝒮𝒮\mathcal{S}caligraphic_S at time t𝑡titalic_t includes 1.) unmanned aircraft i𝑖iitalic_i and manned aircraft j𝑗jitalic_j positions and 2.) the wildfire state, such that stsuperscript𝑠𝑡s^{t}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [ptisuperscriptsubscriptabsent𝑖𝑡{}_{i}^{t}start_FLOATSUBSCRIPT italic_i end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, pjtsubscriptsuperscriptabsent𝑡𝑗{}^{t}_{j}start_FLOATSUPERSCRIPT italic_t end_FLOATSUPERSCRIPT start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT, Wt]. Action space 𝒜𝒜\mathcal{A}caligraphic_A is the set of all possible surveillance actions for unmanned aircraft and all possible suppression actions for manned aircraft, giving 𝒜𝒜\mathcal{A}caligraphic_A = (𝒜isubscript𝒜𝑖\mathcal{A}_{i}caligraphic_A start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ×\times××\times× 𝒜αusubscript𝒜subscript𝛼𝑢\mathcal{A}_{\alpha_{u}}caligraphic_A start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUBSCRIPT) ×\times× (𝒜jsubscript𝒜𝑗\mathcal{A}_{j}caligraphic_A start_POSTSUBSCRIPT italic_j end_POSTSUBSCRIPT ×\times××\times× 𝒜αmsubscript𝒜subscript𝛼𝑚\mathcal{A}_{\alpha_{m}}caligraphic_A start_POSTSUBSCRIPT italic_α start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_POSTSUBSCRIPT). The transition model P𝑃Pitalic_P assigns the probability of transitioning from existing state st1superscript𝑠𝑡1s^{t-1}italic_s start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT by action a𝑎aitalic_a to state stsuperscript𝑠𝑡s^{t}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, with s𝒮P(st1,a,st)=1subscript𝑠𝒮𝑃superscript𝑠𝑡1𝑎superscript𝑠𝑡1\sum_{s\in\mathcal{S}}P(s^{t-1},a,s^{t})=1∑ start_POSTSUBSCRIPT italic_s ∈ caligraphic_S end_POSTSUBSCRIPT italic_P ( italic_s start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT , italic_a , italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = 1. The discount factor γ𝛾\gammaitalic_γ is a hyperparameter that balances short and long-term rewards. A larger γ𝛾\gammaitalic_γ prioritizes long-term rewards, whereas a smaller γ𝛾\gammaitalic_γ prioritizes short-term rewards. Observation function O𝑂Oitalic_O gives a probability distribution over all possible observations ω𝜔\omegaitalic_ω after taking action a𝑎aitalic_a resulting in state s𝑠sitalic_s. Observation space ΩΩ\Omegaroman_Ω includes the set of all possible partial wildfire observations by unmanned aircraft i𝑖iitalic_i at time t𝑡titalic_t, where ωitsuperscriptsubscript𝜔𝑖𝑡\omega_{i}^{t}italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = [Wtisuperscriptsubscriptabsent𝑖𝑡{}_{i}^{t}start_FLOATSUBSCRIPT italic_i end_FLOATSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT], ωtsuperscript𝜔𝑡\omega^{t}italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT = i=1αusubscriptsuperscriptsubscript𝛼𝑢𝑖1\cup^{\alpha_{u}}_{i=1}∪ start_POSTSUPERSCRIPT italic_α start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_POSTSUPERSCRIPT start_POSTSUBSCRIPT italic_i = 1 end_POSTSUBSCRIPT ωitsuperscriptsubscript𝜔𝑖𝑡\omega_{i}^{t}italic_ω start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, and ωtsuperscript𝜔𝑡\omega^{t}italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT \in ΩΩ\Omegaroman_Ω. Each unmanned aircraft takes its own surveillance action for which it receives an individual observation. Observations are fused across all unmanned aircraft to attain shared observation ωtsuperscript𝜔𝑡\omega^{t}italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT. The shared belief is a probability distribution over 𝒮𝒮\mathcal{S}caligraphic_S. We update the belief btsuperscript𝑏𝑡b^{t}italic_b start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT that the current state is stsuperscript𝑠𝑡s^{t}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, for each stsuperscript𝑠𝑡s^{t}italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT, using:

bt(st)=βO(ωt,st,at1)st1𝒮bt1(st1)P(st,at,st1)superscript𝑏𝑡superscript𝑠𝑡𝛽𝑂superscript𝜔𝑡superscript𝑠𝑡superscript𝑎𝑡1subscriptsuperscript𝑠𝑡1𝒮superscript𝑏𝑡1superscript𝑠𝑡1𝑃superscript𝑠𝑡superscript𝑎𝑡superscript𝑠𝑡1b^{t}(s^{t})=\beta O(\omega^{t},s^{t},a^{t-1})\sum_{s^{t-1}\in\mathcal{S}}b^{t% -1}(s^{t-1})P(s^{t},a^{t},s^{t-1})italic_b start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ( italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT ) = italic_β italic_O ( italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_a start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) ∑ start_POSTSUBSCRIPT italic_s start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ∈ caligraphic_S end_POSTSUBSCRIPT italic_b start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ( italic_s start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) italic_P ( italic_s start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_a start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT , italic_s start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT ) (1)

where bt1superscript𝑏𝑡1b^{t-1}italic_b start_POSTSUPERSCRIPT italic_t - 1 end_POSTSUPERSCRIPT is the initial belief, ωtsuperscript𝜔𝑡\omega^{t}italic_ω start_POSTSUPERSCRIPT italic_t end_POSTSUPERSCRIPT is the new shared observation, and β𝛽\betaitalic_β is a normalizing constant [32]. Reward function R𝑅Ritalic_R returns the reward received when taking action a𝑎aitalic_a from state s𝑠sitalic_s. Rewards are jointly considered across α𝛼\alphaitalic_α aircraft.

The heterogeneity of the agent population suggests that the larger MPOMDP may be decomposed into smaller decision processes tailored to particular sub-tasks. We partition the initial attack agents into unmanned and manned agent groups, with each group possessing its own action space and a unique level of temporal abstraction. The initial attack itself delineates agent group responsibilities and defines agent group relationships. This permits the overarching MPOMDP to be divided into separable surveillance and suppression POMDPs with differing but linked actions spaces and reward models. The number of wildfire states is intractable at 210,000superscript2100002^{10,000}2 start_POSTSUPERSCRIPT 10 , 000 end_POSTSUPERSCRIPT, and a proxy in the form of a single modifiable belief map is introduced. The wildfire belief map size matches that of the actual wildfire state, and is updated in accordance with unmanned aircraft observations. Surveillance and suppression POMDPs are further simplified into MDPs by assuming the developed wildfire belief map is the actual wildfire state, which we find incurs an acceptable level of error. The resulting surveillance and suppression MDPs are building blocks in a hierarchical framework, and their associated planners repeat at a frequency specific to the wildfire environment. MDPs are P-Complete and finite horizon POMDP approximations are PSPACE complete, and the complexity of our approach is similar. This section introduces the stochastic wildfire propagation model, manned and unmanned aircraft dynamics, and state space formulations.

3.1 Wildfire Propagation

Refer to caption
Figure 2: The initial attack fire propagates based on fuel, winds, and terrain. Included in the initial attack environment are aerial agents surveying and suppressing the wildfire, and the resources that are subject to destruction.

The wildfire propagation problem has been the target of significant research activity, and the resulting models feature varying levels of complexity. We use a stochastic propagation model inspired by that of Bertsimas et al. [33], formulated to simulate the efficacy of the proposed hierarchical surveillance and suppression framework. The propagation model is modified to include the effects of aerial suppression, wind direction and strength, and terrain elevation [34]. Wind and terrain are critical environmental factors affecting not only wildfire spread but unmanned and manned aircraft activity [35]. In keeping with CALFIRE and other wildfire agencies, initial attack fire containment is defined as wildfire control within two hours or spread not greater than 10101010 acres. The wildfire model therefore consists of a roughly 10101010 acre square discretized into a 100×\times×100 grid of 2×\times×2 meter cells. Each cell x𝑥xitalic_x features a Boolean value representing whether x𝑥xitalic_x is actually on fire, 𝒲(x)𝒲𝑥\mathcal{W}(x)caligraphic_W ( italic_x ), a Boolean value representing whether x𝑥xitalic_x is believed to be on fire, (x)𝑥\mathcal{B}(x)caligraphic_B ( italic_x ), and an integer value representing the amount of fuel remaining in x𝑥xitalic_x, F(x)𝐹𝑥F(x)italic_F ( italic_x ). Fuel is defined as the amount of combustible material in a cell that contributes to fire behavior and effects [36].

The duration of surveillance time-step t𝑡titalic_t is dtsubscript𝑑𝑡d_{t}italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and is held constant at one minute. At t𝑡titalic_t, each burning cell can either 1.) continue burning and expend a unit of fuel α𝛼\alphaitalic_α, 2.) stop burning by virtue of having expended all available fuel such that Ft(x)subscript𝐹𝑡𝑥F_{t}(x)italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) = 0, or 3.) be partially or fully suppressed with probability δ(x)𝛿𝑥\delta(x)italic_δ ( italic_x ). Similarly, each non-burning cell can ignite with probability P𝑃Pitalic_P(x𝑥xitalic_x), which is a function of the number of neighboring burning cells, remaining fuel, wind strength and direction, and terrain elevation. More specifically, the probability that neighboring cell xsuperscript𝑥x^{\prime}italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ignites cell x𝑥xitalic_x at t+1𝑡1t+1italic_t + 1 is p(x,x)𝑝𝑥superscript𝑥p(x,x^{\prime})italic_p ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ).The duration of suppression time-step T𝑇Titalic_T is dTsubscript𝑑𝑇d_{T}italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, equal to kdt𝑘subscript𝑑𝑡kd_{t}italic_k italic_d start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, where k𝑘kitalic_k is a function of both the cruising speed of the suppression asset with and without load and the distance from the initial attack fire to a water replenishing source. There are five suppression actions, each with a different distributions of partially and fully suppressed wildfire grid cells. Notably, suppression affects not only the probability of ignition, but reduces fuel remaining as well [37]. The set of cells fully suppressed at T𝑇Titalic_T with zero probability of ignition is FTsubscript𝐹𝑇F_{T}italic_F start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where γFsubscript𝛾𝐹\gamma_{F}italic_γ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT is fuel reduced when fully suppressed. The set of cells partially suppressed at T𝑇Titalic_T with probability of ignition compounded by Ppartialsubscript𝑃𝑝𝑎𝑟𝑡𝑖𝑎𝑙P_{partial}italic_P start_POSTSUBSCRIPT italic_p italic_a italic_r italic_t italic_i italic_a italic_l end_POSTSUBSCRIPT is PTsubscript𝑃𝑇P_{T}italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT, where γPsubscript𝛾𝑃\gamma_{P}italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT is fuel reduced for a partially suppressed cell. The individual success probabilities of multiple suppressive actions on a single cell x𝑥xitalic_x are independent. The resulting wildfire propagation equations are therefore

Ft+1(x)={max(0, Ft(x)  α  β(x))if t(x)max(0, Ft(x)  β(x))otherwisesubscript𝐹𝑡1𝑥casesmax(0, Ft(x)  α  β(x))if t(x)max(0, Ft(x)  β(x))otherwiseF_{t+1}(x)=\begin{cases}\text{max(0, $F_{t}(x)$ $-$ $\alpha$ $-$ $\beta(x)$)}&% \text{if $\mathcal{B}_{t}(x)$}\\ \text{max(0, $F_{t}(x)$ $-$ $\beta(x)$)}&\text{otherwise}\end{cases}italic_F start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ( italic_x ) = { start_ROW start_CELL max(0, italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) - italic_α - italic_β ( italic_x ) ) end_CELL start_CELL if caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) end_CELL end_ROW start_ROW start_CELL max(0, italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) - italic_β ( italic_x ) ) end_CELL start_CELL otherwise end_CELL end_ROW (2)
β(x)={γFif tmodk = 0 and x  FTγPif tmodk = 0 and x  PT0otherwise𝛽𝑥casessubscript𝛾𝐹if tmodk = 0 and x  FTsubscript𝛾𝑃if tmodk = 0 and x  PT0otherwise\beta(x)=\begin{cases}\text{$\gamma_{F}$}&\text{if $t\bmod k$ = 0 and x $\in$ % $F_{T}$}\\ \text{$\gamma_{P}$}&\text{if $t\bmod k$ = 0 and x $\in$ $P_{T}$}\\ 0&\text{otherwise}\end{cases}italic_β ( italic_x ) = { start_ROW start_CELL italic_γ start_POSTSUBSCRIPT italic_F end_POSTSUBSCRIPT end_CELL start_CELL if italic_t roman_mod italic_k = 0 and x ∈ italic_F start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_γ start_POSTSUBSCRIPT italic_P end_POSTSUBSCRIPT end_CELL start_CELL if italic_t roman_mod italic_k = 0 and x ∈ italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW (3)
P(x)={0Ft(x) = 0δ(x)if t(x) and Ft(x) > 0δ(x) (1  x (1  p(x,x) t(x)))if ¬t(x) and Ft(x) > 0𝑃𝑥cases0Ft(x) = 0𝛿𝑥if t(x) and Ft(x) > 0δ(x) (1  x (1  p(x,x) t(x)))if ¬t(x) and Ft(x) > 0P(x)=\begin{cases}0&\text{$F_{t}(x)$ $=$ 0}\\ \text{$\delta(x)$}&\text{if $\mathcal{B}_{t}(x)$ and $F_{t}(x)$ $>$ 0}\\ \text{$\delta(x)$ (1 $-$ $\prod_{x^{\prime}}$ (1 $-$ $p(x,x^{\prime})$ $% \mathcal{B}_{t}(x^{\prime})$))}&\text{if $\neg\mathcal{B}_{t}(x)$ and $F_{t}(x% )$ $>$ 0}\end{cases}italic_P ( italic_x ) = { start_ROW start_CELL 0 end_CELL start_CELL italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) = 0 end_CELL end_ROW start_ROW start_CELL italic_δ ( italic_x ) end_CELL start_CELL if caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) and italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) > 0 end_CELL end_ROW start_ROW start_CELL italic_δ ( italic_x ) (1 - ∏ start_POSTSUBSCRIPT italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT end_POSTSUBSCRIPT (1 - italic_p ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) )) end_CELL start_CELL if ¬ caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) and italic_F start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) > 0 end_CELL end_ROW (4)
δ(x)={0if tmodk = 0 and x  FTPpartialif tmodk = 0 and x  PT1otherwise𝛿𝑥cases0if tmodk = 0 and x  FTsubscript𝑃𝑝𝑎𝑟𝑡𝑖𝑎𝑙if tmodk = 0 and x  PT1otherwise\delta(x)=\begin{cases}0&\text{if $t\bmod k$ = 0 and x $\in$ $F_{T}$}\\ P_{partial}&\text{if $t\bmod k$ = 0 and x $\in$ $P_{T}$}\\ 1&\text{otherwise}\end{cases}italic_δ ( italic_x ) = { start_ROW start_CELL 0 end_CELL start_CELL if italic_t roman_mod italic_k = 0 and x ∈ italic_F start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_P start_POSTSUBSCRIPT italic_p italic_a italic_r italic_t italic_i italic_a italic_l end_POSTSUBSCRIPT end_CELL start_CELL if italic_t roman_mod italic_k = 0 and x ∈ italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 1 end_CELL start_CELL otherwise end_CELL end_ROW (5)

Wind and elevation kernels may be convolved with a grid of p(x,x)𝑝𝑥superscript𝑥p(x,x^{\prime})italic_p ( italic_x , italic_x start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) to help propagate the wildfire in the direction of the wind or towards up-sloping terrain. A modifiable resource grid is included in the initial attack environment to associate value to areas of importance, such as housing communities in the midst of a forest. The initial attack model begins with an assignment of fuel quantity and terrain elevation to each grid cell. As shown in Fig. 3, fire is seeded at t𝑡titalic_t = 0 mintimes0minute0\text{\,}\mathrm{min}start_ARG 0 end_ARG start_ARG times end_ARG start_ARG roman_min end_ARG at a handful of cells in the middle of the wildfire grid and allowed to propagate at each consecutive t𝑡titalic_t. The simulation terminates when the initial attack becomes an escaped fire and additional suppression assets are introduced as a matter of policy; this occurs when t𝑡titalic_t = 120 mintimes120minute120\text{\,}\mathrm{min}start_ARG 120 end_ARG start_ARG times end_ARG start_ARG roman_min end_ARG or when the wildfire expands beyond the wildfire grid. The unmanned aircraft arrives five minutes after the initial attack fire begins, and the manned aircraft arrives 10 minutes after that. These estimates may be adjusted in the model depending on the proximity of assets to the wildfire.

Refer to caption
Figure 3: The initial attack timeline depicts the arrival time for unmanned and manned aircraft, the escaped fire transformation, and the calculation and execution of surveillance and suppression actions.

3.2 Unmanned Aircraft Dynamics

Two multi-rotor unmanned aircraft with identical characteristics are introduced to survey the initial attack fire. The unmanned aircraft operate in a 10×\times×10×\times×7 airspace gridworld consisting of 20 ×\times×20×\times×20 meter grid cells, and arrive on station at t𝑡titalic_t = 5 mintimes5minute5\text{\,}\mathrm{min}start_ARG 5 end_ARG start_ARG times end_ARG start_ARG roman_min end_ARG. The kinematics of each multi-rotor unmanned aircraft is abstracted in time, and simple gridworld commands, essentially primitive actions, translate the drone accordingly. More specifically, each drone may be translated up, down, left, right, ascend, descend, or hover in place. Translation may occur once per time-step t𝑡titalic_t, and observations are collected en route to the next grid cell. Each drone is limited to the confines of the airspace gridworld, and the action space is pruned at the edges accordingly. Although each unmanned aircraft acts independently, the surveillance MPOMDP is simplified into a POMDP by having the controller take joint actions; by doing so, the surveillance action space increases from 7 to 49. A multi-rotor design was selected over a fixed-wing design to enable hovering, and because multi-rotor aircraft typically possess the kind of high-quality camera control needed to survey a wildfire.
Each unmanned aircraft must balance coverage with capture. The more elevated the unmanned aircraft, the wider its field of view, but the less resolution it has on the wildfire at a given location. Strategically, this results in a lower altitude selected when the fire is condensed, and a higher altitude when the fire is dispersed. The problem becomes more complicated with the addition of a second drone and collision avoidance penalties. As will be demonstrated, the inclusion of multiple unmanned aircraft that can actuate on two or more axes results in unique emergent behaviors, to include dispersion, loitering, stacking, and circling. Each unmanned aircraft is penalized by Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT if within Dusubscript𝐷𝑢D_{u}italic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT meters of another unmanned aircraft.

3.3 Manned Aircraft Dynamics

Refer to caption
Figure 4: Five suppression action drop-types and their associated aircraft axis of advance. A line drop and point drop have fundamentally different on-ground suppression characteristics.

A rotary-wing manned aircraft, with characteristics similar to a S-70 Firehawk helicopter, provides initial attack suppression capabilities through a short-line 660 gallon water bucket. The manned aircraft must fetch water from the nearest adequate water replenishing source prior to each suppression action. The distance between the water replenishing source and the initial attack fire can vary considerably, but is here assumed to be 10 kmtimes10kilometer10\text{\,}\mathrm{km}start_ARG 10 end_ARG start_ARG times end_ARG start_ARG roman_km end_ARG. Suppression time-step T𝑇Titalic_T duration dTsubscript𝑑𝑇d_{T}italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT is tied to the distance between the water replenishing source and initial attack fire, and cruise speed of the manned aircraft with and without load. For example, a S-70 with an unloaded cruise speed of 140 KIAS and a loaded cruise speed of 80 KIAS, with 10 kmtimes10kilometer10\text{\,}\mathrm{km}start_ARG 10 end_ARG start_ARG times end_ARG start_ARG roman_km end_ARG between the initial attack fire and water replenishing source, can reasonably expect to perform a drop once every five minutes (thus, dTsubscript𝑑𝑇d_{T}italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT and k𝑘kitalic_k = 5, where k𝑘kitalic_k is the number of iterations that occur within T𝑇Titalic_T). Given an on station arrival at t𝑡titalic_t = 15 mintimes15minute15\text{\,}\mathrm{min}start_ARG 15 end_ARG start_ARG times end_ARG start_ARG roman_min end_ARG, the manned aircraft can therefore expect to perform 21 drops during the initial attack. These 21 water drops, in addition to efforts of ground-based suppression assets, determine whether an initial attack fire is contained or instead evolves into an escaped wildfire. Ground-based suppression activities are not considered in this paper.

The manned aircraft in an initial attack operates in accordance with best practices, in response to environmental factors, and often, per the guidance of a ground controller in the form of a fire truck, helitack crew, battalion chief, or hand crew [38]. Manned aircraft operations entail not only the placement of the water drop, but the type of drop, and aircraft axis of advance in light of winds, smoke, and other aircraft operating in the immediate vicinity [39]. This paper does not attempt to fully capture the decision making of a pilot in command of a firefighting aircraft, but rather, seeks to recognize the extent of that decision making, and provide supplemental guidance while ensuring supplemental assets (i.e. unmanned aircraft) do not restrict the manned aircraft’s range of movement. Water may be dropped on any cell in the 100×\times×100 wildfire grid. As shown in Fig. 4, there are five standard drops, each categorized as either a point drop or line drop; this results in a suppression action space of 50,000. The point drop occurs when a suppressing helicopter slows to a near-hover prior to dumping the contents of its water bucket. The line drop occurs when a suppressing helicopter maintains forward airspeed prior to dumping the contents of its water bucket. Point drops are condensed in nature and typically used to suppress a condensed fire area, whereas line drops distribute the spread of water or retardant and are ideal for the placement of wet-lines. Drop selection, bucket line length, and bucket volume, determines the on-ground suppression profile [40]. This paper’s model structure may be easily adapted to other aircraft and buckets of varying volumes and line lengths.
Simplifying assumptions include paralleling drop type and axis of advance, early identification of drop placement and type, and generalization of manned aircraft altitude and direction during the drop. The 200×\times×200 mmeter\mathrm{m}roman_m wildfire grid is small relative to the operational area of a manned aircraft which may travel more than 10 kmtimes10kilometer10\text{\,}\mathrm{km}start_ARG 10 end_ARG start_ARG times end_ARG start_ARG roman_km end_ARG to reach a water replenishing source. It becomes reasonable to assume the manned aircraft’s axis of advance is linked to its drop selection; a north-south line drop involves the manned aircraft traveling either north to south, or south to north, through the wildfire grid, such that the axis of advance includes the point of application. The axis of advance of a point drop is less straightforward, given the manned aircraft slows to a near-hover; however, it is assumed that the axis of advance parallels the vector extending from the water replenishing source to the point of application of the initial attack fire, such that additional maneuvering is not required and time is saved. Fig. 4 also depicts the axis of advance associated with each drop type. Each iteration of the surveillance model is held at one minute, providing the unmanned aircraft two minutes to determine suppression location and axis of advance. The temporal resolution may be increased to allow determination of manned aircraft maneuvering at a time nearer to suppression. Finally, for collision avoidance purposes, the unmanned systems are penalized by Pmsubscript𝑃𝑚P_{m}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT if they come within Dmsubscript𝐷𝑚D_{m}italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT meters of the two-dimensional manned aircraft axis of advance at any altitude. This ensures that unmanned aircraft are penalized for proximity to the manned aircraft regardless of bucket-line length, drop altitude, or axis of advance directionality. Both Pmsubscript𝑃𝑚P_{m}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT and Dmsubscript𝐷𝑚D_{m}italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT are larger than Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT and Dusubscript𝐷𝑢D_{u}italic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT respectively, representing the critical importance of ensuring the unmanned aircraft maintain separation from the manned aircraft, and the somewhat lesser importance of ensuring the unmanned aircraft maintain separation from one another.

Given the partial observability of the initial attack fire, a shared belief map B𝐵Bitalic_B, matching the dimensions of the 100×\times×100 grid wildfire, is introduced and updated at regular intervals. The surveillance planner, executed at each time-step t𝑡titalic_t, updates each wildfire cell in B𝐵Bitalic_B that is observed by either of the two unmanned aircraft, to reflect the actual wildfire state. The suppression planner, executed at each time-step T𝑇Titalic_T, may further update the belief and uncertainty maps at locations where suppression is assured. Fig. 5 depicts fuel and belief maps over the course of an initial attack fire. The red and green circular markers represent the unmanned aircraft locations in the two-dimensional representation, and the unmanned aircraft trajectory in the last five time-steps in the three-dimensional representation. The green square represents a high-value resource area. The ability of the unmanned aircraft to accurately capture the wildfire state is degraded as the wildfire expands.

3.4 State Space Formulations

Refer to caption
Figure 5: Fuel, belief, and actual wildfire maps in two and three-dimensions as the initial attack fire propagates.

The surveillance POMDP is simplified into an MDP by assuming a shared belief map \mathcal{B}caligraphic_B to be the actual wildfire state rather than maintain a probability distribution across 210,000superscript2100002^{10,000}2 start_POSTSUPERSCRIPT 10 , 000 end_POSTSUPERSCRIPT possible wildfire states; this does incur nontrivial error, but results in a notable reduction in computational complexity by detaching each instantiation of the evolving wildfire gridworld from the state space. Instead, the surveillance state space, which is explicitly defined, encompasses state variables (Xu1subscript𝑋subscript𝑢1X_{u_{1}}italic_X start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, Yu1subscript𝑌subscript𝑢1Y_{u_{1}}italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, Zu1subscript𝑍subscript𝑢1Z_{u_{1}}italic_Z start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, Xu2subscript𝑋subscript𝑢2X_{u_{2}}italic_X start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, Yu2subscript𝑌subscript𝑢2Y_{u_{2}}italic_Y start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT, Zu2subscript𝑍subscript𝑢2Z_{u_{2}}italic_Z start_POSTSUBSCRIPT italic_u start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_POSTSUBSCRIPT). This results in a large, but manageable 490,000 states, equivalent to the Cartesian product of sets U1subscript𝑈1U_{1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and U2subscript𝑈2U_{2}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, each constituting all possible locations for an unmanned aircraft in the airspace.
The suppression problem also assumes the shared belief map \mathcal{B}caligraphic_B to be the actual wildfire state. Rather than an explicit formulation, the suppression state space is developed by calling a generative model on any number of state-action pairs (s0subscript𝑠0s_{0}italic_s start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, a𝑎aitalic_a). The state variables for the suppression MDP are (Xmsubscript𝑋𝑚X_{m}italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, Ymsubscript𝑌𝑚Y_{m}italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, DR𝐷𝑅DRitalic_D italic_R, 2|PT|superscript2subscript𝑃𝑇2^{|P_{T}|}2 start_POSTSUPERSCRIPT | italic_P start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT | end_POSTSUPERSCRIPT). This generates a state space with an upper bound of approximately 13 billion states, representing every outcome of each possible suppression action.

4 Solution

Refer to caption
Figure 6: A hierarchical framework for the initial attack using teaming involving linked surveillance and suppression planners.

A hierarchical planner derived from wildfire and teaming domain knowledge, shown in Fig. 6 and outlined in Algorithm 1, is used to split the larger surveillance-suppression MPOMDP into separate surveillance and suppression POMDPs operating on different time scales. A shared belief map is assumed for both models, further simplifying each POMDP into an MDP. The shared belief map is updated using the observed wildfire data, and used to generate a wildfire uncertainty map and in the internal wildfire propagation model. The wildfire uncertainty map informs the surveillance planner reward function. The surveillance planner recommends a surveillance action which updates the observed wildfire data and thus belief map. The internal wildfire propagation model informs the suppression planner rewards. The suppression planner recommends a suppression action, which affects the actual wildfire state. We choose to apply MCTS, a simulation-based search algorithm, to both surveillance and suppression planners. The Upper Confidence Bound for Trees (UCT) algorithm selects promising actions in the search trees [41]. Several MCTS extensions are introduced to reduce the large surveillance and suppression action spaces. As shown in Algorithm 1, the execution of a planner-recommended action does not always immediately follow the planning process itself; this delay allows heterogeneous planners to adjust the recommendation of future actions based on the expected actions of another (e.g. in support of collision avoidance).

Algorithm 1 Hierarchical Planner
1:Initial wildfire belief 0subscript0\mathcal{B}_{0}caligraphic_B start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, surveillance state 𝒮0subscript𝒮0\mathcal{S}_{0}caligraphic_S start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT, uncertainty map 𝒰0subscript𝒰0\mathcal{U}_{0}caligraphic_U start_POSTSUBSCRIPT 0 end_POSTSUBSCRIPT
2:Updated ksubscript𝑘\mathcal{B}_{k}caligraphic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝒮ksubscript𝒮𝑘\mathcal{S}_{k}caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝒰ksubscript𝒰𝑘\mathcal{U}_{k}caligraphic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT after k𝑘kitalic_k iterations, suppression action a𝑎aitalic_a
3:for i=0,1,,k2𝑖01𝑘2i=0,1,...,k-2italic_i = 0 , 1 , … , italic_k - 2 do
4:   e𝑒eitalic_e \leftarrow Surveillance Planner (𝒰i,𝒮isubscript𝒰𝑖subscript𝒮𝑖\mathcal{U}_{i},\mathcal{S}_{i}caligraphic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_S start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT)
5:   i+1subscript𝑖1\mathcal{B}_{i+1}caligraphic_B start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT, 𝒰i+1subscript𝒰𝑖1\mathcal{U}_{i+1}caligraphic_U start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT, 𝒮i+1subscript𝒮𝑖1\mathcal{S}_{i+1}caligraphic_S start_POSTSUBSCRIPT italic_i + 1 end_POSTSUBSCRIPT \leftarrow survey(i,𝒰i,esubscript𝑖subscript𝒰𝑖𝑒\mathcal{B}_{i},\mathcal{U}_{i},ecaligraphic_B start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , caligraphic_U start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT , italic_e)
6:a𝑎aitalic_a \leftarrow Suppression Planner (k1subscript𝑘1\mathcal{B}_{k-1}caligraphic_B start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT)
7:e𝑒eitalic_e \leftarrow Surveillance Planner (𝒰k1subscript𝒰𝑘1\mathcal{U}_{k-1}caligraphic_U start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT, 𝒮k1subscript𝒮𝑘1\mathcal{S}_{k-1}caligraphic_S start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT, a𝑎aitalic_a)
8:ksubscript𝑘\mathcal{B}_{k}caligraphic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝒰ksubscript𝒰𝑘\mathcal{U}_{k}caligraphic_U start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT, 𝒮ksubscript𝒮𝑘\mathcal{S}_{k}caligraphic_S start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT \leftarrow survey(k1,𝒰k1,esubscript𝑘1subscript𝒰𝑘1𝑒\mathcal{B}_{k-1},\mathcal{U}_{k-1},ecaligraphic_B start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , caligraphic_U start_POSTSUBSCRIPT italic_k - 1 end_POSTSUBSCRIPT , italic_e)
9:ksubscript𝑘\mathcal{B}_{k}caligraphic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT \leftarrow suppress(k,asubscript𝑘𝑎\mathcal{B}_{k},acaligraphic_B start_POSTSUBSCRIPT italic_k end_POSTSUBSCRIPT , italic_a)

4.1 MCTS Solver

The introduced hierarchical planner is solver-agnostic. That said, the large state and action spaces coupled with a non-stationary environment in the propagating wildfire encourages use of an online, stochastic planning algorithm like MCTS. MCTS is also an anytime algorithm, which means it can return a valid solution if interrupted before runtime completion [42]. The set of possible actions are represented as edges, and resulting states as nodes, in the search tree. Each MCTS iteration follows four general steps: selection, expansion, simulation, and propagation. MCTS selects nodes and traverses the search tree using a predefined heuristic informed by the problem domain. This heuristic results in an asymmetric search where the most promising actions are prioritized. We choose to apply the standard Upper Confidence bound for Trees (UCT) algorithm to balance exploration versus exploitation in the tree search policy and determine the value of each node n𝑛nitalic_n, as follows

UCB(n)=u¯(n)+clog(visits(n))visits(n)\text{UCB}(n)=\bar{u}(n)+c\sqrt{\frac{\text{log(visits}(n))}{\text{visits}(n^{% \prime})}}UCB ( italic_n ) = over¯ start_ARG italic_u end_ARG ( italic_n ) + italic_c square-root start_ARG divide start_ARG log(visits ( italic_n ) ) end_ARG start_ARG visits ( italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT ) end_ARG end_ARG (6)

such that u¯(n)¯𝑢𝑛\bar{u}(n)over¯ start_ARG italic_u end_ARG ( italic_n ) is the value of the state at n𝑛nitalic_n, c𝑐citalic_c is an exploration constant that adjusts the balance between exploration of new nodes and exploitation of previously visited nodes, visits(n𝑛nitalic_n) is the visitation count for n𝑛nitalic_n, and visits(nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT) is the visitation count for the parent node of n𝑛nitalic_n, nsuperscript𝑛n^{\prime}italic_n start_POSTSUPERSCRIPT ′ end_POSTSUPERSCRIPT. The search tree is then expanded by adding a child node to the selected node. State value estimation occurs by simulating a play-out from a node to the end of a predefined planning horizon. A random policy is here applied to estimate action value during search tree roll-outs. The simulation results are then backpropagated up the search tree, concluding a search iteration. It can be difficult to accurately assess MCTS computational complexity given the many sub-tasks involved, but in the general case, MCTS runtime and memory complexity scale linearly with the number of search iterations [43], [44]. Fig. 7 shows how MCTS may be applied to search trees for unmanned aircraft surveying an initial attack wildfire. Selection is depicted by the bold circles, and involves traversing the tree to a select depth using an algorithm like Upper Confidence bounds applied to Trees. The selected node is then expanded by adding a child node, shown in grey. A simulation is preformed from the child node to a preordained depth or condition, shown in red. The simulation results are associated with the child node, and backpropagated up the search tree. This repeats until a given condition is true, computation time-limit is met, or a ceiling on search iterations is reached. Given that the state space is evolving in time, we use an internal wildfire model to propagate the wildfire belief map between progressive search tree depths. States that exceed the boundaries of the model, shown crossed out in red, are pruned from the search tree.

Refer to caption
Figure 7: Monte Carlo search trees depicted for two unmanned aircraft surveying an initial attack wildfire. Each iteration of Monte Carlo tree search has four steps: selection, expansion, simulation, and back-propagation.

4.2 Surveillance Planning

Algorithm 2 [Simplified] Surveillance Planner (Uncertainty Reward Model)
1:Uncertainty map 𝒰tsubscript𝒰𝑡\mathcal{U}_{t}caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, surveillance state 𝒮tsubscript𝒮𝑡\mathcal{S}_{t}caligraphic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, suppression action a𝑎aitalic_a (optional)
2:Surveillance action e𝑒eitalic_e
3:Initialize Surveillance action score map ~~\tilde{\mathcal{E}}over~ start_ARG caligraphic_E end_ARG \leftarrow \emptyset
4:for  action eisubscript𝑒𝑖e_{i}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT \in E𝐸Eitalic_E do
5:   St+1subscript𝑆𝑡1S_{t+1}italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT \leftarrow apply eisubscript𝑒𝑖e_{i}italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT to Stsubscript𝑆𝑡S_{t}italic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
6:   𝒱eisubscript𝒱subscript𝑒𝑖\mathcal{V}_{e_{i}}caligraphic_V start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT \leftarrow ranging(St+1subscript𝑆𝑡1S_{t+1}italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT)
7:   Roeisubscript𝑅subscript𝑜subscript𝑒𝑖R_{o_{e_{i}}}italic_R start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT \leftarrow Σx𝒱ei𝒰t(x)subscriptΣ𝑥subscript𝒱subscript𝑒𝑖subscript𝒰𝑡𝑥\Sigma_{x\in\mathcal{V}_{e_{i}}}\mathcal{U}_{t}(x)roman_Σ start_POSTSUBSCRIPT italic_x ∈ caligraphic_V start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x )
8:   RUeisubscript𝑅subscript𝑈subscript𝑒𝑖R_{U_{e_{i}}}italic_R start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT \leftarrow τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT Roeisubscript𝑅subscript𝑜subscript𝑒𝑖R_{o_{e_{i}}}italic_R start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT + τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT Pu(St+1)subscript𝑃𝑢subscript𝑆𝑡1P_{u}(S_{t+1})italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT ) + τ3subscript𝜏3\tau_{3}italic_τ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT Pm(St+1,a)subscript𝑃𝑚subscript𝑆𝑡1𝑎P_{m}(S_{t+1},a)italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT , italic_a ) + τ4subscript𝜏4\tau_{4}italic_τ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT Pi(St+1)subscript𝑃𝑖subscript𝑆𝑡1P_{i}(S_{t+1})italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ( italic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT )
9:   ~[ei]~delimited-[]subscript𝑒𝑖\tilde{\mathcal{E}}[e_{i}]over~ start_ARG caligraphic_E end_ARG [ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] \leftarrow RUeisubscript𝑅subscript𝑈subscript𝑒𝑖R_{U_{e_{i}}}italic_R start_POSTSUBSCRIPT italic_U start_POSTSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT end_POSTSUBSCRIPT
10:Recommended surveillance action e𝑒eitalic_e \leftarrow argmaxeisubscript𝑒𝑖{}_{e_{i}}start_FLOATSUBSCRIPT italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT ~[ei]~delimited-[]subscript𝑒𝑖\tilde{\mathcal{E}}[e_{i}]over~ start_ARG caligraphic_E end_ARG [ italic_e start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]

There exists 49 possible surveillance actions at time t𝑡titalic_t for a given belief map, equivalent to the Cartesian product of the set of seven actions for two unmanned aircraft. At depth of two, there becomes 2401 possible actions, and at depth three, there are 1.18×\times×105superscript10510^{5}10 start_POSTSUPERSCRIPT 5 end_POSTSUPERSCRIPT. Given 210,000 wildfire states, developing an explicit policy using dynamic programming is infeasible. Instead of an explicit policy formulation, a sampling-based approach with a generative model is used. An additional complication is reward uncertainty, given a stochastic element in the selection-process for observations following a selected surveillance action. As each unmanned aircraft increases in altitude, the resolution of the wildfire grid below decreases, and a sampling process is used to balance observation coverage and capture.

Algorithm 2 outlines a process by which a surveillance action with depth one is recommended; as will be later discussed, a probabilistic-search formulation of this process is applied to enable search depths of two and three. Uncertainty map 𝒰tsubscript𝒰𝑡\mathcal{U}_{t}caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, surveillance state 𝒮tsubscript𝒮𝑡\mathcal{S}_{t}caligraphic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, and optionally, suppression action a𝑎aitalic_a are submitted to the surveillance planner. A surveillance score map ~~\tilde{\mathcal{E}}over~ start_ARG caligraphic_E end_ARG is initialized. Each surveillance action e𝑒eitalic_e in set E𝐸Eitalic_E is applied to the current surveillance state 𝒮tsubscript𝒮𝑡\mathcal{S}_{t}caligraphic_S start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT to attain the next surveillance state 𝒮t+1subscript𝒮𝑡1\mathcal{S}_{t+1}caligraphic_S start_POSTSUBSCRIPT italic_t + 1 end_POSTSUBSCRIPT. The ranging function provides an array of wildfire grid cells 𝒱esubscript𝒱𝑒\mathcal{V}_{e}caligraphic_V start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT which are observed from the updated surveillance state position, as a function of both location and altitude. As mentioned, there is a stochastic element to the ranging function, meaning that observations attained from a particular surveillance position, and thus ensuing rewards, will differ slightly with each simulation. The uncertainty in observation for each wildfire cell x𝑥xitalic_x in set 𝒱esubscript𝒱𝑒\mathcal{V}_{e}caligraphic_V start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT, U(x)𝑈𝑥U(x)italic_U ( italic_x ), is then summed to attain Roesubscript𝑅subscript𝑜𝑒R_{o_{e}}italic_R start_POSTSUBSCRIPT italic_o start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT, one of four components of the total surveillance reward RUsubscript𝑅𝑈R_{U}italic_R start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT for a particular action e𝑒eitalic_e. This uncertainty model thus provides a reward corresponding to the extent to which observations reduce overall belief map uncertainty, regardless of the actual state of the wildfire or belief map. As shown in Fig. 8, the uncertainty reward model maintains a 100×\times×100 uncertainty map in which each cell not observed in a particular time-step increments its uncertainty by an amount proportional to its proximity to burning cells (in the belief map), and each cell observed resets its uncertainty to zero. Cells on the outskirts of a wildfire grid with negligible likelihood of igniting change their uncertainty minimally, while cells at the head of the fire change their uncertainty rapidly. Observations of cells that have not been recently observed reward more than cells that were more recently observed, regardless of whether the cell changes state post-observation.

Refer to caption
Figure 8: Wildfire uncertainty map during initial attack fire propagation. Dark red areas suggest nearby fire activity but few recent observations. Grid cells distant from the wildfire, such as at the map periphery, increment their percent uncertainty minimally.

Reward RUsubscript𝑅𝑈R_{U}italic_R start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT is further comprised of penalties Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, Pmsubscript𝑃𝑚P_{m}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, and Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT. Unmanned aircraft U1subscript𝑈1U_{1}italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT and U2subscript𝑈2U_{2}italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT are penalized for their proximity to one another (Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT) and to the manned aircraft’s axis of advance (Pmsubscript𝑃𝑚P_{m}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT). The axis of advance and aircraft direction along it may be inferred from environmental data such as winds, aviation and wildfire best practices, and the suppression action a𝑎aitalic_a, or alternatively, they may be explicitly passed to the surveillance planner. A small penalty (Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT), proportional to the distance from each unmanned aircraft to the initial attack fire origin IAO𝐼subscript𝐴𝑂IA_{O}italic_I italic_A start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT, is added to encourage unmanned aircraft to approach the initial attack fire. Penalty Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is quickly overtaken by other rewards and penalties. The resulting total reward equation for the unmanned aircraft follow, where τ1subscript𝜏1\tau_{1}italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT, τ2subscript𝜏2\tau_{2}italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT, τ3subscript𝜏3\tau_{3}italic_τ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT, and τ4subscript𝜏4\tau_{4}italic_τ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT are tunable parameters:

Ro=τ1x𝒱e𝒰t(x)subscript𝑅𝑜subscript𝜏1subscript𝑥subscript𝒱𝑒subscript𝒰𝑡𝑥R_{o}=\tau_{1}\sum_{x\in\mathcal{V}_{e}}\mathcal{U}_{t}(x)italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT ∑ start_POSTSUBSCRIPT italic_x ∈ caligraphic_V start_POSTSUBSCRIPT italic_e end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT ( italic_x ) (7)
Pu={τ2if U1  U2  Du0otherwisesubscript𝑃𝑢casessubscript𝜏2if U1  U2  Du0otherwiseP_{u}=\begin{cases}\tau_{2}&\text{if $\|U_{1}$ $-$ $U_{2}\|$ $\leq$ $D_{u}$}\\ 0&\text{otherwise}\\ \end{cases}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT = { start_ROW start_CELL italic_τ start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT end_CELL start_CELL if ∥ italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT ∥ ≤ italic_D start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW (8)
Pm={τ3if U1  AOA  Dmτ3else if U2  AOA  Dm0otherwisesubscript𝑃𝑚casessubscript𝜏3if U1  AOA  Dmsubscript𝜏3else if U2  AOA  Dm0otherwiseP_{m}=\begin{cases}\tau_{3}&\text{if $\|U_{1}$ $-$ $AOA\|$ $\leq$ $D_{m}$}\\ \tau_{3}&\text{else if $\|U_{2}$ $-$ $AOA\|$ $\leq$ $D_{m}$}\\ 0&\text{otherwise}\\ \end{cases}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT = { start_ROW start_CELL italic_τ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL if ∥ italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_A italic_O italic_A ∥ ≤ italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL italic_τ start_POSTSUBSCRIPT 3 end_POSTSUBSCRIPT end_CELL start_CELL else if ∥ italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_A italic_O italic_A ∥ ≤ italic_D start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT end_CELL end_ROW start_ROW start_CELL 0 end_CELL start_CELL otherwise end_CELL end_ROW (9)
Pi=τ4(U1IAO+U2IAO)subscript𝑃𝑖subscript𝜏4delimited-∥∥subscript𝑈1𝐼subscript𝐴𝑂delimited-∥∥subscript𝑈2𝐼subscript𝐴𝑂P_{i}=\tau_{4}(\lVert U_{1}-IA_{O}\rVert+\lVert U_{2}-IA_{O}\rVert)italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT = italic_τ start_POSTSUBSCRIPT 4 end_POSTSUBSCRIPT ( ∥ italic_U start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT - italic_I italic_A start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT ∥ + ∥ italic_U start_POSTSUBSCRIPT 2 end_POSTSUBSCRIPT - italic_I italic_A start_POSTSUBSCRIPT italic_O end_POSTSUBSCRIPT ∥ ) (10)
RU=RoPuPmPisubscript𝑅𝑈subscript𝑅𝑜subscript𝑃𝑢subscript𝑃𝑚subscript𝑃𝑖R_{U}=R_{o}-P_{u}-P_{m}-P_{i}italic_R start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (11)

The simplified approach in algorithm 2 becomes computationally intractable when considering depths greater than one given the large action space and reward uncertainty. The algorithm 2 reward structure is maintained, and MCTS with UCT is applied to a generative MDP model to search the action space efficiently. The wildfire belief state tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is passed to the MDP model to be adjusted and modified, along with uncertainty map 𝒰tsubscript𝒰𝑡\mathcal{U}_{t}caligraphic_U start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, with progressive extension of the search-tree’s depth. Each MCTS surveillance calculation is capped at 30 stimes30second30\text{\,}\mathrm{s}start_ARG 30 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG of calculation time, to allow each drone to proceed to its new location and attain observations prior to the next iteration. Any action which would take either drone beyond gridworld boundaries, or any combination of actions which would take both drones into the same grid cell, is pruned. The computational complexity of a single MCTS surveillance iteration increases with wildfire growth. For example, given a 30 stimes30second30\text{\,}\mathrm{s}start_ARG 30 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG restriction, and depth of three, MCTS surveillance conducts roughly 800 iterations at t𝑡titalic_t = 0 mintimes0minute0\text{\,}\mathrm{min}start_ARG 0 end_ARG start_ARG times end_ARG start_ARG roman_min end_ARG, and only 60 iterations at t𝑡titalic_t = 120 mintimes120minute120\text{\,}\mathrm{min}start_ARG 120 end_ARG start_ARG times end_ARG start_ARG roman_min end_ARG. Reducing the MCTS depth as the wildfire progresses ensures a roughly equivalent number of iterations at depth of one, throughout the lifespan of the wildfire. A depth factor of three is reduced to two, and then one, as the wildfire propagates, to maintain a minimum of two runs for each possible action at depth of one.

4.3 Suppression Planning

Algorithm 3 [Simplified] Suppression Planner (Localized Destruction Minimization Reward Model)
1:Wildfire belief tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, hyperparameter set {Q,RO}𝑄𝑅𝑂\{Q,RO\}{ italic_Q , italic_R italic_O }
2:Suppression action a𝑎aitalic_a
3:t+ROsubscript𝑡𝑅𝑂\mathcal{B}_{t+RO}caligraphic_B start_POSTSUBSCRIPT italic_t + italic_R italic_O end_POSTSUBSCRIPT \leftarrow propagate(tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, RO𝑅𝑂ROitalic_R italic_O)
4:A𝐴Aitalic_A \leftarrow ASR(tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, Q𝑄Qitalic_Q)
5:Initialize Suppression action score map 𝒜~~𝒜\tilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG \leftarrow \emptyset
6:for action aiAsubscript𝑎𝑖𝐴a_{i}\in Aitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_A do
7:   where aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is (Xm,Ym,DR)subscript𝑋𝑚subscript𝑌𝑚𝐷𝑅(X_{m},Y_{m},DR)( italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_D italic_R )
8:   stsubscript𝑠𝑡\mathcal{B}_{st}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t end_POSTSUBSCRIPT \leftarrow suppress(Xmsubscript𝑋𝑚X_{m}italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, Ymsubscript𝑌𝑚Y_{m}italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, DR𝐷𝑅DRitalic_D italic_R, tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT)
9:   st+ROsubscript𝑠𝑡𝑅𝑂\mathcal{B}_{st+RO}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t + italic_R italic_O end_POSTSUBSCRIPT \leftarrow propagate(stsubscript𝑠𝑡\mathcal{B}_{st}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t end_POSTSUBSCRIPT, RO𝑅𝑂ROitalic_R italic_O)
10:   ssubscript𝑠\mathcal{L}_{s}caligraphic_L start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT \leftarrow localize(st+ROsubscript𝑠𝑡𝑅𝑂\mathcal{B}_{st+RO}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t + italic_R italic_O end_POSTSUBSCRIPT, Xmsubscript𝑋𝑚X_{m}italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, Ymsubscript𝑌𝑚Y_{m}italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT)
11:   \mathcal{L}caligraphic_L \leftarrow localize(t+ROsubscript𝑡𝑅𝑂\mathcal{B}_{t+RO}caligraphic_B start_POSTSUBSCRIPT italic_t + italic_R italic_O end_POSTSUBSCRIPT, Xmsubscript𝑋𝑚X_{m}italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, Ymsubscript𝑌𝑚Y_{m}italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT)
12:   RMisubscript𝑅subscript𝑀𝑖R_{M_{i}}italic_R start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT = Σx𝒟(x)Σxs𝒟(x)subscriptΣ𝑥𝒟𝑥subscriptΣ𝑥subscript𝑠𝒟𝑥\Sigma_{x\in\mathcal{L}}\mathcal{D}(x)-\Sigma_{x\in\mathcal{L}_{s}}\mathcal{D}% (x)roman_Σ start_POSTSUBSCRIPT italic_x ∈ caligraphic_L end_POSTSUBSCRIPT caligraphic_D ( italic_x ) - roman_Σ start_POSTSUBSCRIPT italic_x ∈ caligraphic_L start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_D ( italic_x )
13:   𝒜~[ai]~𝒜delimited-[]subscript𝑎𝑖\tilde{\mathcal{A}}[a_{i}]over~ start_ARG caligraphic_A end_ARG [ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] \leftarrow RMisubscript𝑅subscript𝑀𝑖R_{M_{i}}italic_R start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT
14:Recommended suppression action a𝑎aitalic_a \leftarrow argmaxaisubscript𝑎𝑖{}_{a_{i}}start_FLOATSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT 𝒜~[ai]~𝒜delimited-[]subscript𝑎𝑖\tilde{\mathcal{A}}[a_{i}]over~ start_ARG caligraphic_A end_ARG [ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]

The suppression model features 50,000 possible actions, encompassing five suppression actions centered on each of 10,000 wildfire grid cells, for a given belief map. The large action space again prevents an exact policy formulation, and a generative method is instead applied. Reward uncertainty remains a factor, as water buckets fully suppress some wildfire cells, but only partially suppress others; the partially suppressed cells are selected via random sampling. An understanding of wildfire dynamics allows for various forms of ASR that minimize the overall action space by 99%percent\%% or more with minimal consequence. A generated and internally-held wildfire propagation mode with limited information, shown in Fig. 6, enables the suppression planner to conduct customized roll-outs to optimize action selection. Two reward models are here introduced, localized and global resource destruction minimization. Global resource destruction minimization calculates the instantaneous destruction of the full 100×\times×100 grid after roll-out, whereas localized destruction minimization calculates the instantaneous destruction of a smaller grid centered on the location of the action taken after roll-out. In both cases, the suppression aircraft is rewarded RMsubscript𝑅𝑀R_{M}italic_R start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT, or penalized PMsubscript𝑃𝑀P_{M}italic_P start_POSTSUBSCRIPT italic_M end_POSTSUBSCRIPT, proportional to the instantaneous destruction caused by the wildfire following one or more roll-out periods in which the internal fire propagation model with limited information propagates post-suppression. Instantaneous destruction is a function of number of cells burning and the value of resources contained within those cells. Rewards for minimized instantaneous resource destruction following a roll-out period ensures the long-term impact of water lines can be measured against the short-term impact of more immediate suppressive activities.

Localized resource destruction minimization examines only the area immediately surrounding the suppression activity for reward considerations. Suppressive activities applied to different portions of the wildfire cannot be directly compared due to the differing windows under consideration, and must instead be compared against a non-suppressed but equally propagated “reference grid”, as illustrated in Fig. 9. Reward is then maximized when the difference between the appropriately localized portions of the reference grid and suppressed grid are greater. Algorithm 3 outlines the process by which a suppression action a𝑎aitalic_a is selected using localized rewards. Wildfire belief map tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is propagated to roll-out depth RO𝑅𝑂ROitalic_R italic_O to attain propagated reference grid t+ROsubscript𝑡𝑅𝑂\mathcal{B}_{t+RO}caligraphic_B start_POSTSUBSCRIPT italic_t + italic_R italic_O end_POSTSUBSCRIPT. An internal wildfire propagation model possessing wind and elevation, but only limited fuel, data is used. While the quality of the internal propagation model affects reward optimization, quality results may be attained for even a low-fidelity model. Belief map tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is also submitted to one of three ASR functions with quantile selection Q𝑄Qitalic_Q to attain a smaller action space set A𝐴Aitalic_A. Each suppression action aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in set A𝐴Aitalic_A is then applied to tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the results of which are propagated to depth RO𝑅𝑂ROitalic_R italic_O to attain st+ROsubscript𝑠𝑡𝑅𝑂\mathcal{B}_{st+RO}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t + italic_R italic_O end_POSTSUBSCRIPT. A grid with size proportional to RO𝑅𝑂ROitalic_R italic_O is centered on action location (Xmsubscript𝑋𝑚X_{m}italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, Ymsubscript𝑌𝑚Y_{m}italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT), and corresponding subarrays \mathcal{L}caligraphic_L and ssubscript𝑠\mathcal{L}_{s}caligraphic_L start_POSTSUBSCRIPT italic_s end_POSTSUBSCRIPT are indexed from t+ROsubscript𝑡𝑅𝑂\mathcal{B}_{t+RO}caligraphic_B start_POSTSUBSCRIPT italic_t + italic_R italic_O end_POSTSUBSCRIPT and st+ROsubscript𝑠𝑡𝑅𝑂\mathcal{B}_{st+RO}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t + italic_R italic_O end_POSTSUBSCRIPT respectively. The summed instantaneous destruction in both subarrays are calculated and subtracted from one another to attain reward RMisubscript𝑅subscript𝑀𝑖R_{M_{i}}italic_R start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

Refer to caption
Figure 9: The localized suppression destruction minimization reward model. A propagated reference grid and post-suppression propagation grid are clipped to attain local subarrays which are then compared to determine action consequence.
Algorithm 4 [Simplified] Suppression Planner (Global Destruction Minimization Reward Model)
1:Wildfire belief tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, hyperparameter set {Q,RO}𝑄𝑅𝑂\{Q,RO\}{ italic_Q , italic_R italic_O }
2:Suppression action a𝑎aitalic_a
3:A𝐴Aitalic_A \leftarrow ASR(tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, Q𝑄Qitalic_Q)
4:Initialize Suppression action score map 𝒜~~𝒜\tilde{\mathcal{A}}over~ start_ARG caligraphic_A end_ARG \leftarrow \emptyset
5:for action aiAsubscript𝑎𝑖𝐴a_{i}\in Aitalic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ∈ italic_A do
6:   where aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT is {Xm,Ym,DR}subscript𝑋𝑚subscript𝑌𝑚𝐷𝑅\{X_{m},Y_{m},DR\}{ italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT , italic_D italic_R }
7:   stsubscript𝑠𝑡\mathcal{B}_{st}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t end_POSTSUBSCRIPT \leftarrow suppress(Xmsubscript𝑋𝑚X_{m}italic_X start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, Ymsubscript𝑌𝑚Y_{m}italic_Y start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, DR𝐷𝑅DRitalic_D italic_R, tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT)
8:   st+ROsubscript𝑠𝑡𝑅𝑂\mathcal{B}_{st+RO}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t + italic_R italic_O end_POSTSUBSCRIPT \leftarrow propagate(stsubscript𝑠𝑡\mathcal{B}_{st}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t end_POSTSUBSCRIPT, RO𝑅𝑂ROitalic_R italic_O)
9:   PMisubscript𝑃subscript𝑀𝑖P_{M_{i}}italic_P start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT \leftarrow Σxst+RO𝒟(x)subscriptΣ𝑥subscript𝑠𝑡𝑅𝑂𝒟𝑥\Sigma_{x\in\mathcal{B}_{st+RO}}\mathcal{D}(x)roman_Σ start_POSTSUBSCRIPT italic_x ∈ caligraphic_B start_POSTSUBSCRIPT italic_s italic_t + italic_R italic_O end_POSTSUBSCRIPT end_POSTSUBSCRIPT caligraphic_D ( italic_x )
10:   𝒜~[ai]~𝒜delimited-[]subscript𝑎𝑖\tilde{\mathcal{A}}[a_{i}]over~ start_ARG caligraphic_A end_ARG [ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ] \leftarrow PMisubscript𝑃subscript𝑀𝑖P_{M_{i}}italic_P start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT
11:Recommended suppression action a𝑎aitalic_a \leftarrow argminaisubscript𝑎𝑖{}_{a_{i}}start_FLOATSUBSCRIPT italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_FLOATSUBSCRIPT 𝒜~[ai]~𝒜delimited-[]subscript𝑎𝑖\tilde{\mathcal{A}}[a_{i}]over~ start_ARG caligraphic_A end_ARG [ italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT ]

Global resource destruction minimization, shown in Fig. 10, allows for the direct comparison of any action taken across the global grid following the appropriate propagation sequence. However, the scope of destruction often causes the impact of the selected suppression activity to get lost amid the extent of the stochastic fire spread. Algorithm 4 outlines the process by which a suppression action a𝑎aitalic_a is selected using global rewards. Unlike the localized approach, the global reward model does not initialize a single non-suppressed, propagated reference grid at the onset. Instead, action roll-out rewards are compared directly. The belief map tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT is again submitted to one of three ASR functions with quantile selection Q𝑄Qitalic_Q to attain a smaller action space set A𝐴Aitalic_A. Each suppression action aisubscript𝑎𝑖a_{i}italic_a start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT in set A𝐴Aitalic_A is then applied to tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, the results of which are propagated to depth RO𝑅𝑂ROitalic_R italic_O to attain st+ROsubscript𝑠𝑡𝑅𝑂\mathcal{B}_{st+RO}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t + italic_R italic_O end_POSTSUBSCRIPT. The summed instantaneous destruction of st+ROsubscript𝑠𝑡𝑅𝑂\mathcal{B}_{st+RO}caligraphic_B start_POSTSUBSCRIPT italic_s italic_t + italic_R italic_O end_POSTSUBSCRIPT, now global rather than localized, is used to attain penalty PMisubscript𝑃subscript𝑀𝑖P_{M_{i}}italic_P start_POSTSUBSCRIPT italic_M start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT end_POSTSUBSCRIPT.

Refer to caption
Figure 10: The global suppression destruction minimization reward model. Unlike in the localized model, there is no propagated reference grid. The wildfire belief map is suppressed, then propagated, then compared across suppression actions.

Three ASR methods are presented. ASR method 1 limits suppression actions to cells burning in the belief map; it can be assumed that ideal suppression locations involve at least one burning cell. ASR method 1 reduces the overall action space by 90-95%. ASR method 2 restricts suppression to a pre-determined percentage of cells, here held at 5, 10, or 15%, that are furthest away from the fire origin. ASR method 2 recognizes that suppressing the wildfire exterior, and wildfire head, results in the greatest likelihood of minimizing overall wildfire propagation. ASR method 2 reduces the overall action space by 99% or more. ASR method 2 may be modified to increase the percentage of considered wildfire cells, the set of which is then randomly selected from. A stochastic wildfire does not spread evenly, and consideration of cells to include those not furthest away may better suppress multiple fire regions. ASR method 3 further restricts ASR method 2 to two 60 degree arcs, one centered on the area of highest resource value, and the other on the wildfire head. Cells qualified by ASR method two that are also within the established arcs, qualify for suppression per ASR method 3. These arcs may shift as environmental factors change throughout the simulation.

Refer to caption
Figure 11: Three action space restriction schemes: belief only, wildfire exterior, and wildfire sectors based on high-value areas and the wildfire head.

MCTS with UCT is again applied to a generative MDP model with the algorithms 3 or 4 used to determine action rewards. Each MCTS suppression calculation is capped at 120 stimes120second120\text{\,}\mathrm{s}start_ARG 120 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG of calculation time, to allow the suppression calculation updated belief information while also permitting time to execute the suppression action prior to the next iteration.

4.4 Early Dispatch

The development of an intertwined surveillance-suppression model enables the early allocation of additional suppression assets in instances where wildfire growth exceeds even the optimized capabilities of a single aircraft. Challenges include when to consider requesting additional assets, and what thresholds should determine that additional assets are required. The vast majority of initial attack fires do not become escaped fires, and premature dispatching of aircraft is therefore costly and wasteful. At the same time, initial attack fires destined to become escaped fires have limited windows of opportunity in which additional suppression can result in containment; after a certain point, the wildfire spread is too significant to be reasonably contained except by a full-blown suppression operation. A regression-backed approach is introduced in Algorithm 5, wherein the wildfire ring radius at t𝑡titalic_t = 120 mintimes120minute120\text{\,}\mathrm{min}start_ARG 120 end_ARG start_ARG times end_ARG start_ARG roman_min end_ARG is predicted at each time-step T𝑇Titalic_T following suppression execution. If the predicted wildfire ring radius exceeds a given time and ring threshold, then a second S-70 is dispatched to the scene to conduct suppression activities.

Algorithm 5 Early Dispatch Procedures
1:Wildfire belief tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT, historical wildfire ring array \mathcal{R}caligraphic_R
2:Early dispatch recommendation
3:if t𝑡titalic_t mod dTsubscript𝑑𝑇d_{T}italic_d start_POSTSUBSCRIPT italic_T end_POSTSUBSCRIPT = 0 then
4:   tsubscript𝑡\mathcal{R}_{t}caligraphic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT \leftarrow radius(tsubscript𝑡\mathcal{B}_{t}caligraphic_B start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT)
5:   \mathcal{R}caligraphic_R[t𝑡titalic_t] \leftarrow tsubscript𝑡\mathcal{R}_{t}caligraphic_R start_POSTSUBSCRIPT italic_t end_POSTSUBSCRIPT
6:   t=120subscript𝑡120\mathcal{R}_{t=120}caligraphic_R start_POSTSUBSCRIPT italic_t = 120 end_POSTSUBSCRIPT \leftarrow regression(\mathcal{R}caligraphic_R)
7:   if (t𝑡titalic_t > time threshold) and (t=120subscript𝑡120\mathcal{R}_{t=120}caligraphic_R start_POSTSUBSCRIPT italic_t = 120 end_POSTSUBSCRIPT > ring threshold) then
8:      Recommend early dispatch
9:   else
10:      Recommend against early dispatch    

5 Experimentation

5.1 Environmental Models

Abstracted Case Studies
Three abstracted case studies are presented to simulate the introduced hierarchical framework, surveillance and suppression reward models, and MCTS extensions in various environmental conditions. Case 1 involves flat terrain, variable winds, and a single high-value resource area. Case 2 again features flat terrain, has two high-value resource areas, but also experiences a significant but randomized wind-shift at t𝑡titalic_t = 60 mintimes60minute60\text{\,}\mathrm{min}start_ARG 60 end_ARG start_ARG times end_ARG start_ARG roman_min end_ARG. Case 3 has hilly to mountainous elevation, variable winds, and three high-value resources areas. Each progressive case features increased opportunity for destroyed resources.

Refer to caption
Figure 12: Three abstracted environmental models featuring different terrain elevation maps, environmental wind profiles, and number and size of high-value areas.

Actual Wildfire Case Study
The state of Hawaii is particularly susceptible to wildfires due to an abundance of fire-prone grasses and shrubs and an increasingly warm and dry climate [45], [46]. Roughly 0.5% of Hawaii’s total landmass burns annually, a proportion greater than or roughly equal to any other American state [47]. We examine an initial attack fire in the Makaha Valley on the island of Oahu on 19 October 2007, which would go on to burn approximately 500 acres [48]. The 2007 Makaha Valley wildfire required three days, more than sixty firefighters, and two manned helicopters conducting water drops to contain. Although wildfire perimeter data was not available, the initial wildfire location, historical wind data, elevation map, and fuelbed characteristics were aggregated to simulate and recreate the wildfire. Fig. 13 shows three photographs of the 19 October 2007 Makaha Valley fire [48], along with a satellite photograph of the 10 acre grid where the wildfire began [49], the associated LANDFIRE fuelbed map [36], and historical wind directionality trends in the Makaha Valley [50]. Surface friction typical of mountainous environments such as the Makaha Valley can result in reduced wind speeds.

Refer to caption
Figure 13: Top three: photographs of the 19 October 2007 Makaha Valley fire. From bottom-left to bottom-right: a satellite photograph of the 10 acre grid where the wildfire began, the associated LANDFIRE fuelbed map, and historical wind directionality trends in the Makaha Valley.

5.2 Surveillance

Baseline
A myopic baseline is introduced in which unmanned aircraft are jointly rewarded Rosubscript𝑅𝑜R_{o}italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT at time-step t𝑡titalic_t for the number of cells observed that had a change in state compared to the belief map. Previous literature typically restricts this reward to only newly identified burning cells, as opposed to newly identified extinguished cells, to encourage surveillance along the wildfire head. However, without an accurate identification of the wildfire tail, suppression assets may be guided to previously extinguished cells. Reward Rosubscript𝑅𝑜R_{o}italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT may be weighted to emphasize observations of higher resource cells. Penalties Pusubscript𝑃𝑢P_{u}italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT, Pmsubscript𝑃𝑚P_{m}italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT, and Pisubscript𝑃𝑖P_{i}italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT remain in effect.

Ro=τ1|{x|B(x)𝒲(x)}|subscript𝑅𝑜subscript𝜏1conditional-set𝑥𝐵𝑥𝒲𝑥R_{o}=\tau_{1}|\{x|B(x)\neq\mathcal{W}(x)\}|italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT = italic_τ start_POSTSUBSCRIPT 1 end_POSTSUBSCRIPT | { italic_x | italic_B ( italic_x ) ≠ caligraphic_W ( italic_x ) } | (12)
RU=RoPuPmPisubscript𝑅𝑈subscript𝑅𝑜subscript𝑃𝑢subscript𝑃𝑚subscript𝑃𝑖R_{U}=R_{o}-P_{u}-P_{m}-P_{i}italic_R start_POSTSUBSCRIPT italic_U end_POSTSUBSCRIPT = italic_R start_POSTSUBSCRIPT italic_o end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_u end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_m end_POSTSUBSCRIPT - italic_P start_POSTSUBSCRIPT italic_i end_POSTSUBSCRIPT (13)

Hyperparameters
A three-dimensional grid-search is conducted over a range of discount factors, depths, and exploration constants to identify a high-performing set of three hyperparameters for surveillance MCTS. Table one shows all surveillance hyperparameters and considered values. The model performs well when setting discount factor to 0.95, depth of search to three, and exploration constant to 100. The total iteration limit is 1000 and computation time limit is capped at 30 stimes30second30\text{\,}\mathrm{s}start_ARG 30 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG. The computational time limit must provide sufficient time for the unmanned aircraft to act on the guidance received prior to reevaluation.

Simulation
Abstracted case study 1 was simulated 20 times with select hyperparameters to compare the myopic baseline against the uncertainty reward model. Suppressive activities were not undertaken to help isolate the effects of both policies without external interference. Two surveillance accuracy metrics were considered: accuracy of the overall belief map relative to the actual wildfire state, and accuracy of the burning cells in the belief map relative to the actual wildfire state.

Table 2: Surveillance planner hyperparameters and considered values. Selected values are in bold
Hyperparameter Value(s)
computation time limit 30 stimes30second30\text{\,}\mathrm{s}start_ARG 30 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG
iteration limit 1000
discount factor {0.80, 0.85, 0.90, 0.95, 0.99}
depth {1, 2, 3, 4}
exploration constant {10, 50, 100, 200, 1000}

5.3 Suppression

Baselines
Firefighting technique and immediate suppression baselines are introduced for comparison against our approach. Firefighting technique reflects traditional firefighting policy and employs a conditions-based, multi-step approach to suppression. If resources are unevenly distributed, wet-lines are placed around high-value resource areas in an order reflecting their proximity to the wildfire. Thereafter, each drop is aimed at the head of the fire, defined as the distance furthest from the fire origin. The drop selection is made based on increased proximity of the drop to the fire; for example, a north-south drop when placing a wet-line east or west of the fire. Immediate suppression is a myopic policy that rewards actions which maximize the number of wildfire cells extinguished by suppressive activity. MCTS with UCT is applied to optimize immediate suppression policy results.

Hyperparameters
Three-dimensional grid-searches are conducted to identify a high-performing set of six hyperparameters for suppression MCTS. We first simulate over discount factor, search depth, and exploration constant, then over ASR selection, quantile, and roll-out depth. Table two shows all suppression hyperparameters and considered values. The model performs well when setting search depth to two, exploration constant to 100, ASR selection to two, quantile to 90, and roll-out depth to ten. We note that for a search depth of three outperforms a search depth of two for incipient stages of the wildfire. However, performance for search depth of three falters precipitously as the action space grows with the wildfire. Similarly, during the later stages of the wildfire’s growth, a search depth of one outperforms a search depth of two. This suggests an expanding action space may be supported by progressively restricting the search depth or branching factor over the lifetime of the model, through use of double progressive widening or similar [51]. The total iteration limit is 1000 and computation time limit is capped to 120 stimes120second120\text{\,}\mathrm{s}start_ARG 120 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG. The computational time limit must provide sufficient time for the manned aircraft to act on the guidance received prior to reevaluation.

Simulation
Abstracted case study 1 was simulated 20 times with select hyperparameters to compare the immediate suppression baseline and firefighting technique against our localized and global reward models. Surveillance activities were not undertaken to help isolate the effects of all policies without external interference; perfect information was assumed by equating the belief map to the actual wildfire state. Three suppression metrics were considered: total resources destroyed by the wildfire, wildfire flame size (number of burning grid cells), and average wildfire ring radius. Individual wildfires were then categorized as either fully suppressed, contained, or escaped. A fully suppressed wildfire indicates that suppressive activities were able to extinguish the entirety of the wildfire. A contained wildfire indicates plateauing growth, defined as a wildfire ring size at t𝑡titalic_t = 120 mintimes120minute120\text{\,}\mathrm{min}start_ARG 120 end_ARG start_ARG times end_ARG start_ARG roman_min end_ARG within 10% of the average of the last three (rapid, ultrarapid spread) or four (moderate spread) wildfire ring size states.

Table 3: Suppression planner hyperparameters and considered values. Selected values in bold
Hyperparameter Value(s)
computation time limit 120 stimes120second120\text{\,}\mathrm{s}start_ARG 120 end_ARG start_ARG times end_ARG start_ARG roman_s end_ARG
iteration limit 1000
discount factor {0.80, 0.85, 0.90, 0.95, 0.99}
depth {1, 2, 3}
exploration constant {10, 50, 100, 200, 1000}
ASR selection {{\{{1, 2, 3}}\}}
quantile {{\{{80, 90}}\}}
roll-out depth {{\{{5, 10, 15}}\}}

5.4 Joint Surveillance and Suppression

Simulation
Abstracted case studies 1, 2, and 3 were simulated 20 times with hyperparameters shown in Tables 1 and 2 to compare the immediate suppression baseline and firefighting technique against our localized and global reward models, using imperfection wildfire information attained using the uncertainty surveillance model. Three suppression metrics were again considered: total resources destroyed by the wildfire, wildfire flame size (number of burning grid cells), and average wildfire ring radius.

5.5 Early Dispatch Procedures

Early dispatch windows are shown as gray boundaries in Fig. 14. Should the predicted wildfire ring size at t𝑡titalic_t = 120 min coincide with this window, a secondary aircraft is dispatched to aid in suppressive activity. The predictive accuracy for a Case 1 model using linear regression is quite good, and stabilizes to within 10% of the final outcome around 20 minutes when there is no suppressive activity, and within 40 minutes when there is a single aircraft conducting suppression. In situations involving wind shifts, irregular elevation profiles, or uneven fuel maps, a machine-learning model may be integrated to better predict the extent of the wildfire propagation. Fig. 14 also provides insight into the consequence of aircraft suppression across differing wildfire spreads. Unsurprisingly, the less severe the wildfire spread, the greater the impact of added suppressive activity. Perhaps more surprising is the significant impact the addition of a second suppressive aircraft has on all examined wildfires regardless of spread. This suggests a sort of suppression resource “tipping point”, after which point wildfire containment is likely - assuming optimal use of resources.

Refer to caption
Figure 14: Linear regression is used to predict wildfire ring size at t𝑡titalic_t = 120 min for various Case 1 wildfire spread rates with no, single aircraft, and dual aircraft suppression applied.

6 Results

6.1 Surveillance

Emergent Behaviors
Emergence occurs when unique and complex behaviors emerge through the interaction of two or more otherwise simplistic entities. The introduced surveillance planner and associated MCTS solver result in unmanned aircraft exhibiting several emergent behaviors to include dispersion, loitering, circling, and stacking, all shown in Fig. 16. Dispersion occurs when both unmanned aircraft depart from an area of interest to clear room for the manned aircraft, only to return to their original positions immediately following the manned aircraft’s departure. Loitering involves one unmanned aircraft avoiding the wildfire entirely while the other maintains full freedom of maneuver around the wildfire. Circling occurs when unmanned aircraft follow one another in a circular pattern. Stacking involves two unmanned aircraft in close lateral proximity with significant altitude between them, such that one unmanned aircraft provides a high-level view of the wildfire while the other provides a condensed low-level view of the wildfire. These emergent behaviors regularly combine with one another (such as circling and stacking), and are typically more prevalent during certain stages in a wildfire’s propagation (loitering typically occurs during the initial stages of a wildfire whereas circling and stacking occur during the latter stages).
Simulation Data
As demonstrated in Fig. 15, the uncertainly model increasingly outperforms the belief baseline as the wildfire spread becomes increasingly severe. The belief baseline performs well during early stages of wildfire response (20-40 minutes from inception) and when wildfire spread is slow. When the wildfire is small enough to be local to either unmanned aircraft, querying the immediate vicinity is all that is required, especially with a search depth of up to three. As the wildfire expands beyond the reach of the search tree, weighting the recency of past queries on a by-grid cell basis becomes beneficial. The uncertainty model does this by maintaining surveillance recency data across the entire wildfire in a regularly updated uncertainty map. Given that the uncertainty model rivals or exceeds the performance of the belief baseline in each stage of wildfire propagation and across all spread rates, the uncertainty model will be used exclusively for later joint surveillance and suppression simulations.

Refer to caption
Figure 15: Surveillance accuracy of burning cells in % with 95% CI ranges for slow, moderate, rapid, and ultrarapid wildfire spread.
Refer to caption
Figure 16: Unmanned aircraft exhibiting various forms of emergent behavior to include dispersion, loitering, stacking, and circling.

6.2 Suppression

Policy Behavior
The three suppression policy categories under consideration each uniquely prioritize different aspects of the wildfire, as evidenced by their behavior in simulation. Comparison of firefighting technique against optimized destruction minimization policies, shown in Fig. 17, indicates several commonalities however. The destruction minimization policies also place strategic wet-lines, although not with the expediency of the firefighting technique; it prefers to immediately reduce the extent of the initial attack fire rather than preemptively place wet-lines alongside high-value areas. This extinguishes the fire at its infancy, when the greatest effect can be had through suppression. Otherwise, the destruction minimization policies also prioritize the head and exterior of the fire. Destruction minimization policies typically select wet-lines that simultaneously suppress at least some portion of the wildfire, rather than treating wet-lines and suppression as largely exclusive. The immediate suppression baseline is the most myopic of the three categories, as it discards any consideration of future outcomes in favor of extinguishing the greatest amount wildfire in the present.

Refer to caption
Figure 17: Limitations of an overly proactive approach in firefighting technique compared to the strategically-proactive approach in our destruction minimization models.

Simulation Data
As shown in Fig. 18 and 19 and in Table 3, the localized and global resource destruction minimization policies outperform the immediate baseline across all wildfire spreads, and are likely to outperform firefighting techniques in moderate and rapid wildfire spreads. Both forms of resource destruction minimization appear to perform about evenly. Resource destruction minimization suppression results vary increasingly as the wildfire spread becomes increasingly severe; this may be attributed to the unevenness between wildfires that are fully suppressed and those that escape.

Refer to caption
Figure 18: A 20-run average comparison of suppression policy selection on resources destroyed over time with 95% CI ranges for moderate, rapid, and ultrarapid wildfire spreads in Case 1 given a perfect surveillance information assumption.
Refer to caption
Figure 19: A 20-run comparison of suppression policy selection on resources destroyed at t𝑡titalic_t=120 min for moderate, rapid, and ultrarapid wildfire spreads in Case 1 with a perfect surveillance information assumption.
Table 4: Suppression Performance (Final Destruction), Case 1, 20 Run Average, 95% CI
Suppression Method Moderate Fire Spread Rapid Fire Spread Ultrarapid Fire Spread
Perfect Information Assumption
Baselines
Firefighting Technique 600.40 ±plus-or-minus\pm± 146.52 1100.18 ±plus-or-minus\pm± 121.02 2120.49 ±plus-or-minus\pm± 320.49
Immediate Suppression 1223.14 ±plus-or-minus\pm± 334.92 2687.47 ±plus-or-minus\pm± 483.72 4107.61 ±plus-or-minus\pm± 393.40
Our Methods
Localized Destruction Minimization 122.66 ±plus-or-minus\pm± 55.57 760.37 ±plus-or-minus\pm± 275.75 1617.31 ±plus-or-minus\pm± 532.71
Global Destruction Minimization 272.49 ±plus-or-minus\pm± 241.16 669.38 ±plus-or-minus\pm± 274.02 1910.78 ±plus-or-minus\pm± 628.51
  • Bold indicates methods that outperform both baselines by a statistically significant margin (α𝛼\alphaitalic_α = 0.05)

6.3 Joint Surveillance and Suppression

Simulation Data
Localized and global resource destruction minimization policies outperformed the immediate baseline for moderate wildfire spreads and are likely to outperform the immediate baseline for rapid and ultrarapid wildfire spreads. The localized and global resource destruction minimization policies are also likely to outperform firefighting techniques in moderate wildfire spreads, and are on par for rapid and ultrarapid wildfire spreads. Both forms of resource destruction minimization continue to perform about evenly. Notably, imperfect as opposed to perfect surveillance information has a significantly more detrimental effect on the optimized policies as opposed to firefighting technique. Accurate wildfire data is required to fight the wildfire in a manner that results in full suppression, whereas placing wet-lines along boundaries and high-value areas is more forgiving. These results hold for Cases 2, 3, and 4, despite the increase in environmental complexity. It may be inferred that improving surveillance accuracy is among the most effective ways to optimize suppression results.

Fig. 20 analyzes the end-state of the propagated wildfires post-suppression by bucketing them into one of three categories: fully suppressed, contained, and escaped. Escaped fires represent between 1 and 17% of all wildfires in the United States, but result in 97% of the overall burned landmass [52], [53]. Escaped fire containment requires a significant expansion in suppression capability by firefighting agencies through the employment of multiple ground and air assets. As shown, the percentage of escaped wildfires in moderate, rapid, and ultrarapid wildfire spreads all greatly exceed 1 to 17% when firefighting techniques are applied. Firefighting technique resulted in containment in 5 to 15% of wildfire simulations across all spreads, outperforming the immediate suppression baseline and no suppression whatsoever. This indicates our focus on initial attack fires with the potential to become escaped fires. For reference, the joint surveillance and suppression framework with an optimized resource destruction minimization model applied results in a 100% full suppression rate when the wildfire spread corresponds to an 83% full suppression rate using firefighting techniques. Fig. 22 also demonstrates the effectiveness of resource destruction minimization models, with and without early dispatch, relative to the immediate suppression baseline and firefighting techniques. The only instances of fully suppressed wildfires, across all three wildfire spreads, were when resource destruction minimization models were applied. The moderate wildfire spread chart does not feature an early dispatch variation of the destruction minimization model due to the slow propagation sequence not triggering the selected early dispatch window.

Refer to caption
Figure 20: A 20-run comparison of suppression policy selection to include early dispatching on wildfire status category at t𝑡titalic_t=120 min for moderate, rapid, and ultrarapid wildfire spreads in Case 1 with imperfect surveillance information.

7 Discussion

7.1 System Capability

The introduced hierarchical framework is designed to integrate unmanned surveillance aircraft into existing initial attack operations with minimal disruption to participating manned aircraft. It is therefore important to consider the assumptions that enable this capability to function as intended. Those assumptions are related to network architecture, autonomy modes in the case of network degradation, free and stable communication, and enabling hardware.

A MPOMDP, as opposed to a Dec-POMDP, has no impediment to communication [30] - and yet no network is infallible. In our framework, communication exists between unmanned aircraft, between manned aircraft, and between unmanned and manned aircraft. Although there are several feasible approaches to aircraft network architecture, we propose a dual-layer system. The low-level layer is a decentralized mesh network with self-healing properties where aircraft communicate their location in real-time to other networked aircraft. This robust layer aids in collision avoidance. The high-level layer is hierarchical and mirrors the introduced framework. One unmanned aircraft is assigned to be the unmanned network lead. This assignment may rotate such that the unmanned aircraft most central to the mesh, in terms of relative position or strength of network connection, becomes the lead. The unmanned network lead is responsible for: 1.) receiving and fusing observation data across unmanned aircraft, 2.) updating the wildfire belief map, 3.) resolving the surveillance planner using the updated wildfire belief map, 4.) disseminating surveillance guidance to unmanned aircraft, and 5.) disseminating the updated wildfire belief map to all aircraft. A manned network lead may be assigned if there are multiple participating manned aircraft. Alternatively, manned aircraft may independently resolve their own suppression planners. This suits when one manned aircraft is actively suppressing the fire while the other is replenishing water off-station.

Human-autonomous frameworks typically traverse an autonomy hierarchy in response to changes in the network and operating environment. This has significant repercussions for system scalability and robustness. While the ideal case assumes a shared wildfire belief map and centralized decision-making within homogeneous agent groups, this need not be true in our framework. Individual decision processes can be resolved independently if the high-level network layer is degraded. This highlights one advantage of dividing the larger MPOMDP into smaller MDPs. Given lapses in communication with the respective network lead, all aircraft can proceed, if sub-optimally, to the best of their independent ability and per their individual MDPs and belief maps. If the network lead is damaged, another aircraft assumes the network lead role. If a non-lead aircraft is damaged, the framework continues to operate unabated, albeit with fewer observations or suppression actions. Separately, it is critical that the low-level network layer remains intact. An unmanned aircraft that disconnects from the decentralized mesh network, and who is therefore no longer receiving location information, must depart the wildfire vicinity immediately or risk collision. Given the inherent risks involved with manned aircraft integration, we would expect additional fail safe measures to mitigate collision avoidance concerns during communication lapses.

The initial attack is limited to ten acres, or a roughly 200 by 200 meter grid. This is small enough to ensure reliable communication between aircraft using any one of a number of wireless protocols to include WiFi or Bluetooth 5.0. The only exception occurs when the suppression aircraft departs the wildfire to replenish its water bucket. The nearest water replenishing source may be upwards of 10 kmtimes10km10\text{\,}\mathrm{k}\mathrm{m}start_ARG 10 end_ARG start_ARG times end_ARG start_ARG roman_km end_ARG away, in which case there is limited communication between manned aircraft off-station and the unmanned network lead on-station. This results in the manned aircraft possessing a stale wildfire belief map. Suppression planner performance using the stale wildfire belief map is determined by comparing the frequency of suppression to how quickly the wildfire expands. In slowly and moderately propagating fires, optimizing suppression on a belief map a few minutes old is still likely to provide useful results. Alternatively, the suppression planner can resolve on the condition that the manned aircraft receives an updated wildfire belief map from the unmanned network lead while en-route back to the fire. This ensures a more accurate wildfire belief at the time of suppression planner execution, but reduces the amount of time available for computation. Satellite-based communication or the presence of a nearby ground station can mitigate network reliability concerns.

The initial attack wildfire is constrained in time, and is either suppressed, contained, or becomes an escaped fire at the 120 minute mark. The vast majority of manned suppression aircraft can remain airborne for at least two hours. For example, the S-70 has 150 minutes of flight time when filled with fuel. More limiting is the battery life for unmanned aircraft, especially those with multi-rotors. There exist a handful of higher-end electric and hybrid surveillance quad-copters which have between 55-120 minutes of flight time, depending on the payload. This flight time is expected to incrementally improve with advancements in lithium batteries and composite materials. The presence of recharge stations, or the dispatch a second fleet of unmanned aircraft, can help maintain continuous surveillance operations over the course of an expanding incipient wildfire.

7.2 Limitations and Future Work

The hierarchical framework introduced, while promising, is subject to certain limitations. These can generally be categorized as model-specific, solver-specific, or domain-specific. From those limitations we identify opportunities for further analysis.

Model-Specific

MDPs are excellent tools for capturing the dynamics underpinning and uncertainties affecting complex systems, though they are not without limitations. MDPs are subject to the “curse of dimensionality”, meaning they grow exponentially as the number of states and actions increase [10]. Through a combination of wildfire-specific constraints, probabilistic search algorithms, and the decomposition of the MPOMDP into hierarchically arranged sub-problems, we overcame the high-dimensional state and action spaces associated with the initial attack and attained significant and meaningful results. As the initial attack grows beyond the confines of its regulated boundaries and becomes an escaped fire, we would expect the state and suppression action spaces to swell. Additionally, if more aircraft were introduced, surveillance and suppression action spaces would exponentially enlarge. In both cases, the introduced constructs would become less effective. To maintain a similar resolution for surveillance and suppression operations for an escaped fire with an increase in the number of participating aircraft, we suggest an added hierarchical layer, now within homogeneous groupings in addition to just between them. This transforms our hierarchy from an iterating series of planners into an iterating series of sub-hierarchies. We leave a more extensive review of nested MDP hierarchy design and associated computational complexity for the escaped wildfire problem to future research. A second model-specific limitation is surveillance planner frequency and the merging of collision avoidance penalties and wildfire surveillance rewards into one objective function. While the duration between surveillance decisions can be easily modified, there needs to be sufficient time between decisions to 1.) apply the solver and get results, and 2.) actually execute surveillance operations and attain observations. This time scale may differ from that required for robust collision avoidance, which is like to operate at a much higher frequency. In application, the frequency of location sharing between aircraft would be greater than that of observation sharing. We therefore suggest a low-level “detect and avoid” system for each unmanned aircraft. The surveillance planner introduced then effectively keeps unmanned aircraft away from the expected manned aircraft axis of advance, while the low-level detect and avoid system informed by location sharing data adjusts for unexpected changes in manned aircraft trajectory.

Solver-Specific

MCTS is an online planning algorithm and therefore an effective solver choice for the initial attack problem. MCTS can account for non-stationary behavior in the initial attack wildfire. This includes instantaneous changes in wildfire propagation direction and intensity, as demonstrated with the randomized wind-shift simulated in abstracted case study two. Despite its upside, MCTS has certain limitations that must be addressed. MCTS suffers from sample inefficiency in large search spaces, is prone to high-variance, and faces domain-specific challenges related to its internal model. We address state aggregation via nested hierarchical design in the model-specific limitation section above, which can minimize an otherwise expansive search tree, and enable MCTS to obtain accurate statistics and make informed decisions. MCTS roll-outs can vary significantly due to the random nature of simulations. This may result in inconsistent action value estimations and induce problematic noise. Parallelization, variance reduction, and progressive widening techniques can mitigate these concerns. MCTS uses an internal model to conduct simulations, and solver performance is therefore a consequence of how well that internal model reflects reality. This is not an issue when MCTS is used for deterministic games, but becomes concerning when applied to environments rife with uncertainty. The disparity between reality and the internal model is exacerbated with depth. As such, a typical disadvantage of model-based methods is that compounding errors make long-horizon roll-outs unreliable. The depth considered during surveillance planning is shallow enough to avoid these compounding errors, while still remaining useful due to the frequency of reevaluation.

Domain-Specific

While we have sought to develop a high-fidelity initial attack model, we have not so far addressed fire intensity and flame height, smoke plumes, mixed fleet operations, and the ember attack. Unmanned aircraft attain higher resolution observation data by descending, but doing so may put them at risk of catching fire. Descending below an established safety distance from the ground may result in a penalty applied to the violating aircraft’s objective function. A more tailored approach involves modifying the safety distance as a function of flame intensity or height for a given cell column. We have assumed a binary mapping of the wildfire state, but can reasonably include an assessment of both flame intensity given the appropriate sensors, or flame height using three-dimensional wildfire propagation and observation models. Pham et al. apply a safety distance to unmanned aircraft tracking a wildfire, and also introduce a flame intensity model as an artificial potential field [13]. Smoke plumes obscure surveillance aircraft vision-based observations, induce significant errors in sensor measurements [54], and modify manned suppression aircrafts’ axis of advance. Smoke sensors or low-cost cameras outfitted to unmanned aircraft may be used to develop three-dimensional keep-out geofences [55] for smoke plumes. We address agent heterogeneity by separating available assets into homogeneous manned and unmanned aircraft groups. The suppression patterns and partial suppression probabilities in this paper are tailored to a 660 gallon water bucket on a short-line hauled by a S-70 Firehawk helicopter. These patterns and probabilities may be adjusted to any manned aircraft and water suppression platform. We would expect fundamentally different results for a CH-47 Chinook hauling a 2,600 gallon water bucket on a long-line. As the initial attack wildfire escapes, a variety of new manned aircraft arrive on station to include fixed-wing tankers, helicopters carrying various sizes of water buckets, and smoke-jumper aircraft. The set of joint actions across the diversity of manned aircraft may give rise to interesting emergent behaviors, but must also comply with Fire Traffic Area altitudes, orbits, and routing structures. The set of considered neighboring cells during wildfire propagation may be expanded to incorporate the spread of embers across sizable distances in the case of an ember attack. An ember attack occurs when wildfires in strong wind conditions carry embers beyond the fire front. We focus on the general form of the initial attack and therefore consider only adjacent neighbors during propagation. MDP models exist that have incorporated airborne ember spread for larger wildfires [12].

8 Conclusion

The coordination of wildfire surveillance and suppression activities using manned and unmanned aircraft in tandem is an effective means of reducing wildfire propagation severity and minimizing wildfire destruction. A hierarchical framework involving iterating surveillance and suppression planners is introduced to divide an otherwise intractable MPOMDP into optimizable sub-problems acting on asynchronous but otherwise consistent time scales. Surveillance, suppression, and joint surveillance and suppression models with unique MCTS-extensions applied are compared in simulation across abstracted and actual case studies. We demonstrate how a hierarchical approach to Markov decision processes may be used to ensure collision avoidance between unmanned and manned aircraft operating in close proximity, and how teaming aerospace operations may extend into wildfire management. We further find that our hierarchical framework with a resource destruction minimization reward model applied significantly outperforms firefighting techniques and a myopic baseline in preventing initial attack fires from developing into escaped fires.

Funding Sources

No sponsorship or financial support was provided in the development of this manuscript.

Acknowledgments

We thank CALFIRE and the C/3-25 Aviation Regiment at Wheeler Army Airfield, Hawaii for their collective insight into initial attack aerial firefighting operations.

References

  • Hoover and Hanson [2023] Hoover, K., and Hanson, L. A., “Wildfire Statistics,” https://crsreports.congress.gov/product/pdf/IF/IF10244, 2023.
  • Hirsch et al. [1998] Hirsch, K. G., Corey, P. N., and Martell, D. L., “Using expert judgment to model initial attack fire crew effectiveness,” Forest Science, Vol. 44, No. 4, 1998, pp. 539–549. 10.1093/forestscience/44.4.539.
  • Rahn [2010] Rahn, M., Initial Attack Effectiveness: Wildfire Staffing Study, Montezuma Publishing, 2010.
  • California Department of Forestry and Fire Protection [2022] California Department of Forestry and Fire Protection, “2022 Strategic Fire Plan Amador El Dorado Unit,” https://osfm.fire.ca.gov/media/irefprho/2022-amador-el-dorado-alpine-sacramento-unit-fire-plan.pdf, 2022.
  • O’Neill et al. [2022] O’Neill, T., McNeese, N., Barron, A., and Schelble, B., “Human–autonomy teaming: A review and analysis of the empirical literature,” Human factors, Vol. 64, No. 5, 2022, pp. 904–938. 10.1177/0018720820960865.
  • Gaydos and Curry [2014] Gaydos, S. J., and Curry, I. P., “Manned-Unmanned Teaming: Expanding the Envelope of UAS Operational Employment,” Aviation, Space, and Environmental Medicine, Vol. 85, No. 12, 2014, pp. 1231–1232. 10.3357/ASEM.4164.2014.
  • California Department of Forestry and Fire Protection [2019] California Department of Forestry and Fire Protection, “Firefighting Aircraft Recognition Guide,” https://www.fire.ca.gov/media/4950/aviation-guide-2019-access.pd, 2019.
  • Ambrosia et al. [2011] Ambrosia, V. G., Wegener, S., Zajkowski, T., Sullivan, D., Buechel, S., Enomoto, F., Lobitz, B., Johan, S., Brass, J., and Hinkley, E., “The Ikhana unmanned airborne system (UAS) western states fire imaging missions: from concept to reality (2006–2010),” Geocarto International, Vol. 26, No. 2, 2011, pp. 85–101. doi.org/10.1080/10106049.2010.539302.
  • Albaker and Rahim [2009] Albaker, B., and Rahim, N., “A survey of collision avoidance approaches for unmanned aerial vehicles,” 2009 international conference for technical postgraduates (TECHPOS), IEEE, 2009, pp. 1–7. 10.1109/TECHPOS.2009.5412074.
  • Ghavamzadeh et al. [2006] Ghavamzadeh, M., Mahadevan, S., and Makar, R., “Hierarchical Multi-agent Reinforcement Learning,” Autonomous Agents and Multi-Agent Systems, Vol. 13, 2006, pp. 197–229. doi.org/10.1007/s10458-006-7035-4.
  • Seraj et al. [2021] Seraj, E., Chen, L., and Gombolay, M. C., “A hierarchical coordination framework for joint perception-action tasks in composite robot teams,” IEEE Transactions on Robotics, Vol. 38, No. 1, 2021, pp. 139–158. 10.1109/TRO.2021.3096069.
  • Julian and Kochenderfer [2019] Julian, K. D., and Kochenderfer, M. J., “Distributed Wildfire Surveillance with Autonomous Aircraft using Deep Reinforcement Learning,” Journal of Guidance, Control, and Dynamics, Vol. 42, No. 8, 2019, pp. 1768–1778. 10.2514/1.G004106.
  • Pham et al. [2018] Pham, H. X., La, H. M., Feil-Seifer, D., and Deans, M. C., “A distributed control framework of multiple unmanned aerial vehicles for dynamic wildfire tracking,” IEEE Transactions on Systems, Man, and Cybernetics: Systems, Vol. 50, No. 4, 2018, pp. 1537–1548. 10.1109/TSMC.2018.2815988.
  • Ghamry and Zhang [2016] Ghamry, K. A., and Zhang, Y., “Cooperative control of multiple UAVs for forest fire monitoring and detection,” 2016 12th IEEE/ASME International Conference on Mechatronic and Embedded Systems and Applications (MESA), IEEE, 2016, pp. 1–6. 10.1109/MESA.2016.7587184.
  • Griffith et al. [2017] Griffith, J. D., Kochenderfer, M. J., Moss, R. J., Mišic, V. V., Gupta, V., and Bertsimas, D., “Automated Dynamic Resource Allocation for Wildfire Suppression,” Lincoln Laboratory Journal, Vol. 22, No. 2, 2017, pp. 38–59.
  • Hobbs and Li [2024] Hobbs, K. L., and Li, B., “Safety, Trust, and Ethics Considerations for Human-AI Teaming in Aerospace Control,” AIAA SCITECH 2024 Forum, 2024, p. 2583. doi.org/10.2514/6.2024-2583.
  • Bjurling et al. [2020] Bjurling, O., Granlund, R., Alfredson, J., Arvola, M., and Ziemke, T., “Drone swarms in forest firefighting: A local development case study of multi-level human-swarm interaction,” Proceedings of the 11th Nordic Conference on Human-Computer Interaction: Shaping Experiences, Shaping Society, 2020, pp. 1–7. doi.org/10.1145/3419249.3421239.
  • Basich et al. [2023] Basich, C., Svegliato, J., Wray, K. H., Witwicki, S., Biswas, J., and Zilberstein, S., “Competence-aware systems,” Artificial Intelligence, Vol. 316, 2023, p. 103844. 10.1016/j.artint.2022.103844.
  • Veloso et al. [2015] Veloso, M., Biswas, J., Coltin, B., and Rosenthal, S., “Cobots: Robust symbiotic autonomous mobile service robots,” Twenty-fourth international joint conference on artificial intelligence, Citeseer, 2015. 10.5555/2832747.2832901.
  • Seraj and Gombolay [2020] Seraj, E., and Gombolay, M., “Coordinated control of uavs for human-centered active sensing of wildfires,” 2020 American control conference (ACC), IEEE, 2020, pp. 1845–1852. 10.23919/ACC45564.2020.9147613.
  • Menda et al. [2018] Menda, K., Chen, Y.-C., Grana, J., Bono, J. W., Tracey, B. D., Kochenderfer, M. J., and Wolpert, D., “Deep reinforcement learning for event-driven multi-agent decision processes,” IEEE Transactions on Intelligent Transportation Systems, Vol. 20, No. 4, 2018, pp. 1259–1268. 10.1109/TITS.2018.2848264.
  • Barto and Mahadevan [2003] Barto, A. G., and Mahadevan, S., “Recent advances in hierarchical reinforcement learning,” Discrete event dynamic systems, Vol. 13, No. 1-2, 2003, pp. 41–77. doi.org/10.1023/A:1022140919877.
  • Bacon et al. [2017] Bacon, P.-L., Harb, J., and Precup, D., “The option-critic architecture,” Proceedings of the AAAI conference on artificial intelligence, Vol. 31, 2017. 10.1609/aaai.v31i1.10916.
  • Vezhnevets et al. [2017] Vezhnevets, A. S., Osindero, S., Schaul, T., Heess, N., Jaderberg, M., Silver, D., and Kavukcuoglu, K., “Feudal networks for hierarchical reinforcement learning,” International Conference on Machine Learning, PMLR, 2017, pp. 3540–3549. 10.48550/arXiv.1703.01161.
  • Lowe et al. [2017] Lowe, R., Wu, Y. I., Tamar, A., Harb, J., Pieter Abbeel, O., and Mordatch, I., “Multi-agent actor-critic for mixed cooperative-competitive environments,” Advances in neural information processing systems, Vol. 30, 2017. 10.48550/arXiv.1706.02275.
  • Foerster et al. [2018] Foerster, J., Farquhar, G., Afouras, T., Nardelli, N., and Whiteson, S., “Counterfactual multi-agent policy gradients,” Proceedings of the AAAI conference on artificial intelligence, Vol. 32, 2018. 10.1609/aaai.v32i1.11794.
  • Rashid et al. [2020] Rashid, T., Samvelyan, M., De Witt, C. S., Farquhar, G., Foerster, J., and Whiteson, S., “Monotonic value function factorisation for deep multi-agent reinforcement learning,” The Journal of Machine Learning Research, Vol. 21, No. 1, 2020, pp. 7234–7284. 10.48550/arXiv.2003.08839.
  • Xu et al. [2023] Xu, Z., Bai, Y., Zhang, B., Li, D., and Fan, G., “Haven: Hierarchical cooperative multi-agent reinforcement learning with dual coordination mechanism,” Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 37, 2023, pp. 11735–11743. 10.48550/arXiv.2110.07246.
  • Pierre et al. [2023] Pierre, J.-E., Sun, X., and Fierro, R., “Multi-agent partial observable safe reinforcement learning for counter uncrewed aerial systems,” IEEE Access, 2023. 10.1109/ACCESS.2023.3298601.
  • Kochenderfer [2015] Kochenderfer, M. J., Decision making under uncertainty: theory and application, MIT press, 2015.
  • Gmytrasiewicz and Doshi [2005] Gmytrasiewicz, P. J., and Doshi, P., “A framework for sequential planning in multi-agent settings,” Journal of Artificial Intelligence Research, Vol. 24, 2005, pp. 49–79. 10.1613/jair.1579.
  • Kaelbling et al. [1998] Kaelbling, L. P., Littman, M. L., and Cassandra, A. R., “Planning and acting in partially observable stochastic domains,” Artificial intelligence, Vol. 101, No. 1-2, 1998, pp. 99–134.
  • Bertsimas et al. [2017] Bertsimas, D., Griffith, J. D., Gupta, V., Kochenderfer, M. J., and Mišić, V. V., “A Comparison of Monte Carlo Tree Search and Rolling Horizon Optimization for Large-Scale Dynamic Resource Allocation Problems,” European Journal of Operational Research, Vol. 263, No. 2, 2017, pp. 664–678. 10.1016/j.ejor.2017.05.032.
  • Papadopoulos and Pavlidou [2011] Papadopoulos, G. D., and Pavlidou, F.-N., “A comparative review on wildfire simulators,” IEEE systems Journal, Vol. 5, No. 2, 2011, pp. 233–243. 10.1109/JSYST.2011.2125230.
  • Saikin et al. [2020] Saikin, D. A., Baca, T., Gurtner, M., and Saska, M., “Wildfire fighting by unmanned aerial system exploiting its time-varying mass,” IEEE Robotics and Automation Letters, Vol. 5, No. 2, 2020, pp. 2674–2681. 10.1109/LRA.2020.2972827.
  • Landfire [2020] Landfire, “Hawaii LANDFIRE - LF 2020 Fuel Characteristic Classification System Fuelbeds,” https://www.landfire.gov/viewer/, 2020. Accessed: 2023-19-08.
  • Hansen [2012] Hansen, R., “Corrigendum to: Estimating the Amount of Water Required to Extinguish Wildfires under Different Conditions and in Various Fuel Types,” International Journal of Wildland Fire, Vol. 21, No. 6, 2012, pp. 778–778. 10.1071/WF11022.
  • California Department of Forestry and Fire Protection [2023] California Department of Forestry and Fire Protection, “CALFIRE Sonoma-Lake-Napa Unit 2023 Strategic Fire Plan,” https://osfm.fire.ca.gov/media/e33a3ior/2023-sonoma-lake-napa-unit-fire-plan.pdf, 2023.
  • Kal’avskỳ et al. [2019] Kal’avskỳ, P., Petríček, P., Kelemen, M., Rozenberg, R., Jevčák, J., Tomaško, R., and Mikula, B., “The efficiency of aerial firefighting in varying flying conditions,” International Conference on Military Technologies (ICMT), IEEE, 2019, pp. 1–5. 10.1109/MILTECHS.2019.8870050.
  • Ault et al. [2012] Ault, R., Mooney, C., and Thomasson, J., “Drop Patterns for the Bambi-Max Bucket and the Bell 212 Helitanker,” 2012.
  • Kocsis and Szepesvári [2006] Kocsis, L., and Szepesvári, C., “Bandit based monte-carlo planning,” European conference on machine learning, Springer, 2006, pp. 282–293. 10.1007/11871842_29.
  • Browne et al. [2012] Browne, C. B., Powley, E., Whitehouse, D., Lucas, S. M., Cowling, P. I., Rohlfshagen, P., Tavener, S., Perez, D., Samothrakis, S., and Colton, S., “A survey of monte carlo tree search methods,” IEEE Transactions on Computational Intelligence and AI in games, Vol. 4, No. 1, 2012, pp. 1–43. 10.1109/TCIAIG.2012.2186810.
  • Li et al. [2023] Li, W., Liu, Y., Ma, Y., Xu, K., Qiu, J., and Gan, Z., “A self-learning Monte Carlo tree search algorithm for robot path planning,” Frontiers in Neurorobotics, Vol. 17, 2023. 10.3389/fnbot.2023.1039644.
  • Powley et al. [2017] Powley, E., Cowling, P., and Whitehouse, D., “Memory bounded monte carlo tree search,” Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 13, 2017, pp. 94–100. 10.1609/aiide.v13i1.12932.
  • Ellsworth et al. [2014] Ellsworth, L. M., Litton, C. M., Dale, A. P., and Miura, T., “Invasive grasses change landscape structure and fire behaviour in Hawaii,” Applied Vegetation Science, Vol. 17, No. 4, 2014, pp. 680–689. 10.1111/avsc.12110.
  • Zhu [2019] Zhu, T. R., “Modeling fuels and wildfire behavior in Hawaiian ecosystems,” Ph.D. thesis, University of Hawai’i at Manoa, 2019.
  • Tetra Tech [2018] Tetra Tech, “State of Hawai’i 2018 Hazard Mitigation Plan,” https://dod.hawaii.gov/hiema/files/2020/06/2018-State-HI-HMP-Update-100218.pdf, 2018.
  • Kim [2007] Kim, L., “Makaha Valley Residents Return Home as Fire Nears Containment,” Hawaii News Now, 2007.
  • Google Maps [2023] Google Maps, “Makaha Valley 1:1200,” https://goo.gl/maps/4KqzyzjqPYCz3Soy5, 2023. Accessed: 2023-19-08.
  • Weather Spark [2023] Weather Spark, “Wind Direction in the Fall in Makaha Valley,” https://weatherspark.com/s/96/2/Average-Fall-Weather-in-M%C4%81kaha-Valley-Hawaii-United-States#Figures-WindDirection, 2023. Accessed: 2023-19-08.
  • Sunberg and Kochenderfer [2018] Sunberg, Z., and Kochenderfer, M., “Online algorithms for POMDPs with continuous state, action, and observation spaces,” Proceedings of the International Conference on Automated Planning and Scheduling, Vol. 28, 2018, pp. 259–263. 10.1609/icaps.v28i1.13882.
  • Reimer et al. [2019] Reimer, J., Thompson, D. K., and Povak, N., “Measuring initial attack suppression effectiveness through burn probability,” Fire, Vol. 2, No. 4, 2019, p. 60. 10.3390/fire2040060.
  • Calkin et al. [2005] Calkin, D. E., Gebert, K. M., Jones, J. G., and Neilson, R. P., “Forest Service large fire area burned and suppression expenditure trends, 1970–2002,” Journal of Forestry, Vol. 103, No. 4, 2005, pp. 179–183. 10.1093/jof/103.4.179.
  • Yuan et al. [2015] Yuan, C., Zhang, Y., and Liu, Z., “A survey on technologies for automatic forest fire monitoring, detection, and fighting using unmanned aerial vehicles and remote sensing techniques,” Canadian journal of forest research, Vol. 45, No. 7, 2015, pp. 783–792. 10.1139/cjfr-2014-0347.
  • Kim and Atkins [2022] Kim, J., and Atkins, E., “Airspace geofencing and flight planning for low-altitude, urban, small unmanned aircraft systems,” Applied Sciences, Vol. 12, No. 2, 2022, p. 576. 10.3390/app12020576.

Additional Materials

Refer to caption
Figure 21: A 20-run average comparison of suppression policy selection on resources destroyed over time with 95% CI ranges for moderate, rapid, and ultrarapid wildfire spreads in all four cases with imperfect surveillance information.
Refer to caption
Figure 22: A 20-run comparison of suppression policy selection on resources destroyed t𝑡titalic_t=120 min for moderate, rapid, and ultrarapid wildfire spreads in all four cases with imperfect surveillance information.
Table 5: Joint Surveillance-Suppression Performance (Final Destruction), 20 Run Average, 95% CI
Case Suppression Method Moderate Rapid Ultrarapid
Uncertainty Surveillance Model Fire Spread Fire Spread Fire Spread
Baselines
Firefighting Technique 663.16±plus-or-minus\pm± 153.86 1416.30 ±plus-or-minus\pm± 270.56 2504.41 ±plus-or-minus\pm± 259.58
1111 Immediate Suppression 1247.83 ±plus-or-minus\pm± 351.86 2523.71 ±plus-or-minus\pm± 442.56 3909.05 ±plus-or-minus\pm± 373.87
Our Methods
Localized Destruction Minimization 444.48 ±plus-or-minus\pm± 151.70 1455.15 ±plus-or-minus\pm± 244.14 2727.90 ±plus-or-minus\pm± 480.10
Global Destruction Minimization 348.12 ±plus-or-minus\pm± 133.91 1602.76 ±plus-or-minus\pm± 390.03 2581.25 ±plus-or-minus\pm± 305.61
Baselines
Firefighting Technique 1050.80 ±plus-or-minus\pm± 134.23 1586.44 ±plus-or-minus\pm± 130.94 2515.49 ±plus-or-minus\pm± 227.91
2222 Immediate Suppression 1552.63 ±plus-or-minus\pm± 298.54 2653.19 ±plus-or-minus\pm± 364.15 3354.86 ±plus-or-minus\pm± 364.23
Our Methods
Localized Destruction Minimization 417.43 ±plus-or-minus\pm± 209.84 1210.73 ±plus-or-minus\pm± 489.17 2494.31 ±plus-or-minus\pm± 610.34
Global Destruction Minimization 445.21 ±plus-or-minus\pm± 233.31 1549.57 ±plus-or-minus\pm± 535.60 2581.25 ±plus-or-minus\pm± 305.61
Baselines
Firefighting Technique 1792.59 ±plus-or-minus\pm± 382.81 2396.35 ±plus-or-minus\pm± 465.55 3728.30 ±plus-or-minus\pm± 441.38
3333 Immediate Suppression 1532.93 ±plus-or-minus\pm± 592.88 4173.03 ±plus-or-minus\pm± 677.81 6483.51 ±plus-or-minus\pm± 665.34
Our Methods
Localized Destruction Minimization 447.74 ±plus-or-minus\pm± 185.17 1493.11 ±plus-or-minus\pm± 532.93 4778.15 ±plus-or-minus\pm± 892.61
Global Destruction Minimization 934.72 ±plus-or-minus\pm± 406.79 1753.15 ±plus-or-minus\pm± 551.94 4145.07 ±plus-or-minus\pm± 651.05
Baselines
Firefighting Technique 1061.27 ±plus-or-minus\pm± 90.40 2020.31 ±plus-or-minus\pm± 131.77 3160.39 ±plus-or-minus\pm± 165.56
4444 Immediate Suppression 690.67 ±plus-or-minus\pm± 160.69 2010.83 ±plus-or-minus\pm± 262.97 3690.32 ±plus-or-minus\pm± 372.43
Our Methods
Localized Destruction Minimization 288.61 ±plus-or-minus\pm± 113.47 1329.51 ±plus-or-minus\pm± 226.86 3290.26 ±plus-or-minus\pm± 407.79
Global Destruction Minimization 434.65 ±plus-or-minus\pm± 143.71 1477.92 ±plus-or-minus\pm± 321.53 2800.06 ±plus-or-minus\pm± 332.47
  • Bold indicates methods that outperform both baselines by a statistically significant margin (α𝛼\alphaitalic_α = 0.05)